Quickstart

Installation

pip install pandas-cat

Requires Python 3.10 or later.

Minimal example

import pandas as pd
import pandas_cat

df = pd.read_csv('https://petrmasa.com/pandas-cat/data/accidents.zip',
                 encoding='cp1250', sep='\t')
df = df[['Driver_Age_Band', 'Driver_IMD', 'Sex', 'Journey']]

pandas_cat.profile(df=df, dataset_name="Accidents")

This writes report/report.html. Open it in any browser.

See the pre-generated output here

Built-in templates

pandas-cat ships three templates — pick one via the template= parameter.

default — static HTML with embedded SVG charts (used when template is omitted):

pandas_cat.profile(df=df, dataset_name="Accidents")

modern — same static output with a refreshed visual style:

pandas_cat.profile(df=df, dataset_name="Accidents", template="modern")

interactive — data-driven report with three correlation metrics (Cramér’s V, Spearman Rank, Theil’s U) and per-category crosstabs:

pandas_cat.profile(df=df, dataset_name="Accidents", template="interactive")

Sample reports:

Automatic data preparation

By default (auto_prepare=True) pandas-cat calls prepare() before profiling. It converts all columns to ordered pandas.Categorical where possible — numeric-string ranges ("0-10", "Over 75", "60+") are sorted by their leading numeric token, plain text columns are sorted alphabetically, and columns with no meaningful order are left as-is.

The effect is easiest to see directly:

import pandas as pd
import pandas_cat

df = pd.read_csv('https://petrmasa.com/pandas-cat/data/accidents.zip',
                 encoding='cp1250', sep='\t')
df = df[['Driver_Age_Band', 'Driver_IMD', 'Sex', 'Journey']]

print("Before:")
print(df['Driver_Age_Band'].dtype)           # str
print(sorted(df['Driver_Age_Band'].unique())) # alphabetical

df = pandas_cat.prepare(df)

print("After:")
print(df['Driver_Age_Band'].dtype)                   # category (ordered=True)
print(df['Driver_Age_Band'].cat.categories.tolist())  # natural numeric order

Output:

Before:
str
['16 - 20', '21 - 25', '26 - 35', '36 - 45',
 '46 - 55', '56 - 65', '66 - 75', 'Over 75']

After:
category
['16 - 20', '21 - 25', '26 - 35', '36 - 45',
 '46 - 55', '56 - 65', '66 - 75', 'Over 75']

Without prepare(), "6 - 10" sorts after "56 - 65" alphabetically. After prepare(), every downstream operation — bar charts, frequency tables, correlation matrices — uses the correct semantic order.

To skip preparation:

pandas_cat.profile(df=df, dataset_name="Accidents",
                   opts={"auto_prepare": False})

To run preparation without generating a report:

df = pandas_cat.prepare(df)

Note

pandas-cat is a categorical profiling package. The default preparation engine reflects this: plain numeric columns with cat_limit or fewer distinct values (default 20) are converted to ordered CategoricalDtype and shown as frequency bar charts rather than histograms. Numeric columns with more distinct values are left as continuous and shown with histograms.

No analytical information is lost: a bar chart shows every value and its exact frequency. A KDE histogram for low-cardinality discrete data (e.g. a 0/1 flag or a 1–5 rating scale) is less informative and visually misleading anyway.

To force a numeric column to stay continuous regardless of cardinality, cast it explicitly before calling profile():

df['rating'] = df['rating'].astype(float)
pandas_cat.profile(df=df, opts={"auto_prepare": False})

Options reference

All options are passed as a dict to the opts parameter:

Option	Default	Description
`auto_prepare`	`True`	Call `prepare()` before profiling.
`cat_limit`	`20`	Categorical columns with more distinct values than this are excluded. Does not apply to continuous columns.
`na_values`	`None`	Additional strings to treat as missing (on top of the built-in list).
`na_ignore`	`None`	Strings from the built-in missing-value list to not treat as missing.
`keep_default_na`	`True`	When `False`, only `na_values` are used; the built-in list is ignored.