Build a machine learning model to detect fraudulent financial transactions and provide business insights.
- 6.3M transactions
- Highly imbalanced dataset
- Columns include transaction type, amount, balances
Note: Dataset not uploaded due to large size.
- Data cleaning & feature engineering
- Label encoding for categorical variables
- Time-based train-test split
- XGBoost model for fraud detection
- Precision (Fraud): 0.93
- Recall (Fraud): 0.89
- F1-score: 0.91
- Python
- Pandas, NumPy
- XGBoost
- Scikit-learn
- Matplotlib
- Fraud detection requires imbalance-aware metrics
- Tree-based models do not require feature scaling
- Business understanding is critical in feature engineering