This project is an end-to-end machine learning model that detects fraudulent credit card transactions. I built this to practice data preprocessing, handling imbalanced data using SMOTE, and comparing the performance of different classification models like Logistic Regression, Random Forest, and XGBoost.
The project uses the "Credit Card Fraud Detection" dataset from Kaggle.
- Complete machine learning pipeline for fraud detection
- Handles extreme class imbalance using appropriate evaluation metrics
- Compares multiple algorithms: Logistic Regression, Random Forest, and XGBoost
- Includes preprocessing, model training, and evaluation
credit-card-fraud-detection/
βββ images/ # Visualizations
βββ models/ # Trained models (.pkl files)
βββ notebooks/ # Jupyter notebook with full implementation
βββ .gitignore
βββ README.md
βββ requirements.txt # Dependencies
-
Clone the repository:
git clone https://github.com/arshadmurtaza03/credit-card-fraud-detection.git cd credit-card-fraud-detection -
Create and activate a virtual environment:
# Linux/macOS python3 -m venv venv source venv/bin/activate # Windows python -m venv venv venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Download the dataset from the Kaggle link above and place it in the
data/directory. -
Run the Jupyter Notebook
jupyter notebook notebooks/fraud_detection.ipynb
All the code and analysis can be found in the Jupyter Notebook inside the notebooks/ directory.
The models were evaluated based on their ability to correctly identify fraudulent transactions (Recall) while maintaining reasonable precision.
| Model | Precision (Fraud) | Recall (Fraud) | F1-Score (Fraud) |
|---|---|---|---|
| Logistic Regression | 0.058 | 0.918 | 0.109 |
| Random Forest | 0.871 | 0.827 | 0.848 |
| XGBoost | 0.728 | 0.847 | 0.783 |
- Author: Arshad Murtaza
- GitHub: https://github.com/arshadmurtaza03/credit-card-fraud-detection.git
- Email: arshadmurtaza2016@gmail.com