This project focuses on detecting fraudulent credit card transactions using machine learning techniques.
The primary objective is to identify fraud cases in a highly imbalanced dataset, while minimizing false negatives β a critical requirement in real-world financial systems.
-
Source: Credit Card Fraud Detection Dataset (Kaggle)
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud -
Description:
- Transactions made by European cardholders in September 2013
- Highly imbalanced dataset
- Features
V1βV28are PCA-transformed for confidentiality - Target variable:
Class0β Non-Fraud1β Fraud
- Performed Exploratory Data Analysis (EDA) to understand:
- Class imbalance
- Feature distributions
- Applied feature scaling to
TimeandAmount - Used stratified trainβtest split to preserve class distribution
- Focused on Recall and ROC-AUC as primary evaluation metrics
- Performed threshold tuning to balance recall and precision for fraud detection
- Logistic Regression (baseline)
- Random Forest
- XGBoost (primary model)
Due to the imbalanced nature of the dataset, the following metrics were prioritized:
- ROC-AUC Score
- Recall (Fraud Class)
- Precision
- F1-Score
- Confusion Matrix
This ensures the model effectively identifies fraudulent transactions while controlling false positives.