pandas-cat

pandas-cat (PANDAS-CATegorical profiling) is a library for profiling categorical datasets and preparing them for analysis. It generates HTML reports with category distributions, correlations, and missing-value summaries, and automatically reorders numeric-like categories into their natural order.

Pass any DataFrame and get a self-contained HTML report in one call:

import pandas_cat
pandas_cat.profile(df, dataset_name="Road accidents")

The report gives you:

Bar charts — frequency counts and percentages for every categorical column.
Histograms — distribution for every numeric column.
Correlations — between all variables and between categorical values.
Missing-value summary — sentinel detection and gap counts per column.
Memory breakdown — usage by column.

Two preparation helpers keep the data clean before profiling:

prepare(df) detects numeric-like categories and converts them to ordered CategoricalDtype so charts and correlations respect natural order.

Without prepare(), pandas sorts categories alphabetically — a common trap:
```
# Alphabetical (wrong) — pandas default
16–20, 21–25, 26–35, 36–45, 46–55, 56–65, 66–75, 6–10, 76+, Under 6
```
After prepare(), the natural numeric order is restored:
```
# Natural order (correct) — after prepare()
Under 6, 6–10, 16–20, 21–25, 26–35, 36–45, 46–55, 56–65, 66–75, 76+
```
handle_missing_values(df) replaces 75+ sentinel strings ("Unknown", "N/A", "–", "Missing", …) with pd.NA so they are counted as missing rather than treated as valid categories.