Releases · leelesemann-sys/rfm-customer-segmentation

v2.0: Multi-Algorithm Comparison (K-Means vs GMM)

Systematic comparison of clustering algorithms and preprocessing methods.

What's new

GMM clustering (cluster_gmm()) — Gaussian Mixture Model with BIC/AIC model selection
Yeo-Johnson transform — Power transform as alternative to log1p
Hopkins statistic — Validates clustering tendency before running algorithms (0.956)
compare_algorithms() — Runs all 4 combinations in one call
Algorithm comparison dashboard — New 7th visualization
61 unit tests (up from 36 in v1.0)

Algorithm Comparison Results

Algorithm	Transform	Silhouette	Davies-Bouldin
K-Means	log	0.380	0.857
K-Means	Yeo-Johnson	0.338	1.019
GMM	log	0.112	1.851
GMM	Yeo-Johnson	0.197	1.768

Key Finding

K-Means + log-transform remains the best approach for this dataset. Unlike some published results (Shobayo et al., 2023), GMM does not improve cluster quality here — likely because log-transformed RFM features already favor spherical clusters, which is K-Means' strength.

Changelog vs v1.0

src/rfm_pipeline.py: +258 lines (GMM, Yeo-Johnson, Hopkins, compare_algorithms)
tests/test_pipeline.py: +189 lines (25 new tests)
run_pipeline.py: +136 lines (comparison pipeline + visualization)
New: visualizations/7_algorithm_comparison.png

v1.0: Rule-Based RFM + K-Means Baseline

First complete version of the customer segmentation pipeline.

What's included

10 RFM segments via rule-based quintile scoring (R/F/M 1-5 scale)

K-Means clustering (K=4, Silhouette Score: 0.380) with log-transform + StandardScaler

Reusable RFMPipeline class (src/rfm_pipeline.py) with full type hints and docstrings

36 unit tests with pytest, GitHub Actions CI across Python 3.10-3.12

82% test coverage with auto-generated badge

CLI entrypoint (run_pipeline.py) for reproducible pipeline execution

6 publication-ready visualizations

Key findings

Metric	Value
Total revenue	£6.7M
Champions	1,127 customers (£4.4M, 65.4%)
At-risk revenue	£575k (512 customers)
K-Means Silhouette	0.380

Metric

Value

Total revenue

£6.7M

Champions

1,127 customers (£4.4M, 65.4%)

At-risk revenue

£575k (512 customers)

K-Means Silhouette

0.380

Known limitations (addressed in v2.0)

Only K-Means clustering (no algorithm comparison)

Simple log-transform (no Yeo-Johnson/Box-Cox)

No clustering tendency test (Hopkins Statistic)

No outlier handling strategy for clustering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v2.0: Multi-Algorithm Comparison (K-Means vs GMM)

What's new

Algorithm Comparison Results

Key Finding

Changelog vs v1.0

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v1.0: Rule-Based RFM + K-Means Baseline

What's included

Key findings

Known limitations (addressed in v2.0)

Uh oh!

Releases: leelesemann-sys/rfm-customer-segmentation

v2.0 — Multi-Algorithm Comparison

v2.0: Multi-Algorithm Comparison (K-Means vs GMM)

What's new

Algorithm Comparison Results

Key Finding

Changelog vs v1.0

Uh oh!

v1.0 — RFM + K-Means Baseline

v1.0: Rule-Based RFM + K-Means Baseline

What's included

Key findings

Known limitations (addressed in v2.0)

Uh oh!