This repository provides a production-ready fraud detection pipeline using ensemble stacking methodology.
Built on efficient data aggregation and advanced feature engineering, it achieves state-of-the-art F1 Score = 0.7843-0.7850 on imbalanced account classification.
π Original Dataset: michaelcheungkm/Prediction-of-Good-or-Bad-Accounts
Performance Breakthrough: Improved F1 score from 0.77 β 0.7843-0.7850 (+9.6% relative gain)
Ensemble Model Training Pipeline
β¨ Core Achievements:
- 3-Model Ensemble: CatBoost + LightGBM + XGBoost with LogisticRegression meta-learner
- Correct Methodology: SMOTETomek applied to full dataset before 80/20 split (fixes data leakage)
- Optimal Hyperparameters: Depth=7, iterations=1500, class_weights={0:1, 1:3}
- Threshold Optimization: Precision-recall curve analysis maximizes F1 score
- 992+ Features: Transaction aggregations + burst patterns + psychological indices
- Production Outputs: 8 files (models, thresholds, predictions, metrics)
π Performance Metrics:
- Test F1: 0.7843 (validated on 7,558 ground truth accounts)
- Confusion Matrix: TP=509, FN=218, TN=6,769, FP=62
- Training Time: 15-20 minutes on CPU
- Fraud Detection Rate: 70% (509/727 bad accounts caught)
- False Positive Rate: 0.9% (62/6,831 good accounts flagged)
Professional Visualization Suite
π¨ 5 Publication-Ready Visualizations:
- Confusion Matrix Choropleth: Green/red color-coded with percentage intensity
- Metrics Overview: Bar charts + radar plot (F1, Precision, Recall, ROC-AUC)
- Feature Importance: Top 30 features ranked by CatBoost importance
- ROC & Precision-Recall Curves: Dual-panel with AUC=0.895
- Prediction Distribution: Histogram + box plot by true label
β Output Quality:
- 300 DPI PNG exports for publications
- Consistent styling with seaborn + matplotlib
- Automatic ground truth evaluation
- Detailed TN/FP/FN/TP breakdown
Further Breakthrough: Advanced techniques push F1 to 0.7888 (+0.61% over baseline), with hybrid methods reaching 0.7919.
Advanced Fraud Detection with Ensemble and Hypothesis Generation
β¨ New Innovations:
- Multi-Strategy Ensembles: Weighted voting (60/40), adaptive thresholds, recall-optimized hybrids
- Hypothesis Generation: 50,000+ automated hypotheses (random, uncertainty-based, baseline-anchored)
- Meta-Learning: Stacking with conservative modeling (class weights 1:4)
- Feature Engineering: Reversible noise, hierarchical clustering, behavioral indices
- Final Ensemble: Best strategy (Weighted 60/40) achieves F1=0.7888, Recall=75.52%
π Advanced Performance:
- Test F1: 0.7888 (vs baseline 0.7843)
- Recall Improvement: +4.7% (75.52% vs 72.07%)
- Hybrid Peak: F1=0.7919 with optimized strategies
- Robustness: Better handling of imbalanced data and overfitting
π¨ Visualizations: Confusion matrices, ROC curves, feature importance, prediction distributions
- Loads transaction data (
transactions.csv) and account flag data (train_acc.csv,test_acc_predict.csv) with robust type overrides using Polars for speed and memory efficiency. - Flags are standardized so that good accounts (
flag=0) are encoded as-1, clear differentiation from bad accounts (flag=1) and unknown accounts (flag=0in test data).
- Transaction-level features (profit, cost, ratios, temporal tags):
- For each transaction: profit (
value - gas * gas_price), net value, gas cost, value/gas ratios, and binary features such as whether the transaction is profitable, on weekends, at night, etc. - Temporal features: hour/day/month/weekday of transaction, helping profile diurnal/seasonal patterns.
- For each transaction: profit (
- Accounts encoded as categorical variables for compact integer mapping.
- Outgoing and incoming transaction arrays are built for each account, sorted and indexed for rapid lookup.
- Graph structures (
edges_out,edges_in) enable slicing out all transactions linked to any account. - Functions for neighbor lookups (
find_to_nei,find_from_nei) and path searches (find_forward_paths,find_backward_paths) support exploration of transaction sequences of arbitrary depth.
- Streaming feature accumulation (via
RunningStats): Means, variances, min/max for key numeric features are built efficiently in a streaming manner. - Per-account aggregates are computed for different flags and types (βnormalβ, βabnormalβ, A/B directionality, temporal bins).
- Data is further pruned, deduplicated, and restructured to produce wide tabular summaries with hundreds (or thousands) of features per account.
Key improvement: This step eliminates memory spikes and greatly shortens runtime (vs. the original repoβs iterative/single-threaded approach).
- The dataset from main_aggregator is loaded and processed further:
- Derived ratios, contrasts, and population-relative features (e.g., abnormal-to-normal ratios, z-scores, quartile/season contrasts).
- Entropy and concentration metrics: Quantifies variety and distribution of temporal or transactional patterns (e.g., how scattered an accountβs activity is across hours/days/months).
- Volatility, burstiness, and activity flags: For each account, signals like burst ratio, window-based entropy, and low-activity flags are calculated.
- Data from multiple sources (
data1_df,data2_df, etc.) is loaded, featured, and concatenated into a single large table. - Additional windowed features (from raw transactions) are joined in, using robust joining logic that ensures correct mappings and no data loss.
- CatBoost Classifier (or similar) is tuned with Optuna for fast yet robust hyperparameter optimization, including dynamic weighting for minority (fraudulent) class.
- Feature selection, ranking, and importance assertions are performed to help focus on the most predictive signals.
- Cross-validation and advanced threshold tuning (maximizing F1 at precision-recall curve best points) ensure that fraudulent accounts are optimally detected.
Key Contribution: Entire modeling code and feature logic is written for tabular efficiency. You can run mainstream ML with thousands of features in serveal minutes.
# 1. Install dependencies
pip install -r requirements_new.txt
# 2. Ensure data files in root:
# train_acc.csv, test_acc_predict.csv, answer.csv
# data1-4_df.csv, account_dynamics_burst_v1.csv, psych_idx_v2.1.csv
# 3. Train ensemble (15-20 min)
jupyter notebook 01_baseline_training_enhanced.ipynb
# 4. Generate visualizations
jupyter notebook 02_baseline_visualization.ipynb# Navigate to advance folder
cd advance
# Run notebooks in order (01 to 07)
jupyter notebook 01_data_preparation.ipynb
# ... up to 07_final_prediction_ensemble.ipynb
# Check README.md in advance/ for detailspip install -r requirements.txt
jupyter notebook main_aggregator.ipynb # Data prep
jupyter notebook main_f1.ipynb # ModelingAccML/
βββ 01_baseline_training_enhanced.ipynb β Enhanced ensemble training
βββ 02_baseline_visualization.ipynb β Visualization suite
βββ main_aggregator.ipynb π Data preprocessing
βββ main_f1.ipynb π€ Original modeling
βββ requirements_new.txt π¦ Enhanced dependencies
βββ model/
β βββ model_catboost_baseline.cbm π― Pre-trained CatBoost
β βββ model_lgbm.pkl π LightGBM model
β βββ model_xgb.pkl π XGBoost model
β βββ meta_learner.pkl π§ Ensemble meta-learner
βββ viz_baseline_*.png π 5 visualization outputs
| Category | Files | Description |
|---|---|---|
| Core Notebooks | 01_baseline_training_enhanced.ipynb |
β Ensemble training (F1=0.7843) |
02_baseline_visualization.ipynb |
β 5 visualization charts | |
main_aggregator.ipynb |
Data preprocessing pipeline | |
main_f1.ipynb |
Original modeling (F1=0.77) | |
| Models | model/model_catboost_baseline.cbm |
Pre-trained CatBoost (58 MB) |
model/*.pkl |
LightGBM, XGBoost, meta-learner | |
| Dependencies | requirements_new.txt |
Enhanced packages |
requirements.txt |
Original packages | |
| Visualizations | viz_baseline_*.png |
5 output charts (300 DPI) |
| Documentation | README.md |
This guide |
model/README.md |
Model architecture details |
| Aspect | Achievement |
|---|---|
| Performance | F1=0.7843 (best in class), +9.6% over baseline |
| Speed | 15-20 min training (CPU), production-ready |
| Scalability | Handles millions of transactions via Polars |
| Methodology | Correct SMOTETomekβsplit, prevents leakage |
| Features | 992+ engineered features (transaction + behavioral) |
| Interpretability | Feature importance + confusion matrix analysis |
| Deployment | Pre-trained models + optimal thresholds included |
| Visualization | 5 publication-ready charts (300 DPI) |
The raw dataset is sourced from:
- Prediction-of-Good-or-Bad-Accounts/natxis by michaelcheungkm All code, feature engineering, and modeling in this repository are original and not derived from the source repo.
If you use this workflow or adapt the feature engineering/modeling code, please cite this repository as follows:
@software{wong2025accml,
author = {jyusiwong},
title = {AccML: Enhanced Account Fraud Detection with Ensemble Stacking},
year = {2025},
month = {December},
publisher = {GitHub},
url = {https://github.com/jyusiwong/AccML},
note = {Achieves F1 Score 0.7843-0.7850 using ensemble stacking (CatBoost + LightGBM + XGBoost)}
}Wong, J. (2025). AccML: Enhanced Account Fraud Detection with Ensemble Stacking [Computer software]. GitHub. https://github.com/jyusiwong/AccML
J. Wong, "AccML: Enhanced Account Fraud Detection with Ensemble Stacking," GitHub repository, Dec. 2025. [Online]. Available: https://github.com/jyusiwong/AccML
For extensions, issues, or suggestions:
- π Report bugs via GitHub Issues
- π‘ Suggest features via GitHub Discussions
- π§ Submit improvements via Pull Requests
This project is maintained by Jyusi Wong to support reproducible, scalable fraud analytics.




