Skip to content

Fraud detection project for CS506 Midterm – Top 20 on Kaggle | Feature engineering, decision trees, and imbalanced data handling.

Notifications You must be signed in to change notification settings

Mohitsai/credit-card-fraud-detection-kaggle-competition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

CS506 Midterm – Fraud Detection

Project Overview

This project was part of the CS506 Spring 2024 Midterm, focusing on detecting fraudulent credit card transactions using machine learning. The dataset was highly imbalanced (~0.4% fraud), requiring careful feature engineering and model selection. This project was hosted on Kaggle as a private competition.

Kaggle Competition: CS506 Midterm 2024 Achievement: Placed in Top 20 among all participants.


Key Features & Approach

  1. Exploratory Data Analysis (EDA)

    • Identified extreme class imbalance.
    • Visualized geographical clusters of fraud using geopandas.
    • Found strong correlation of fraud with high transaction amounts.
  2. Feature Engineering

    • Added age and distance (Haversine) features.
    • Created average recent spend and fraudulent_day indicators.
    • Applied k-means clustering to user and merchant locations.
    • Performed label encoding and feature pruning using correlation analysis.
  3. Modeling & Evaluation

    • Tested Decision Tree, XGBoost, and KNN classifiers.
    • Decision Tree performed best due to interpretability and robustness on imbalanced data.
    • Used GridSearchCV for hyperparameter tuning.
    • Verified model stability across multiple validation splits.
  4. Results

    • Achieved a Top 20 rank on the Kaggle leaderboard.
    • Consistent performance across unseen validation sets.

Project Files

  • explore.ipynb – Exploratory Data Analysis and visualizations.
  • starter_code.ipynb – Initial setup and model experimentation.
  • U48519832_Midterm_Report.pdf – Detailed report with methodology and findings.
  • README.md – This file summarizing the project.

Insights & Learnings

  • Handling highly imbalanced datasets is challenging and requires thoughtful feature engineering.
  • Geographical features can add predictive power if transformed meaningfully.
  • Decision Trees provided interpretable and robust performance for this task.

Contact

  • Author: Mohit Sai Gutha
  • Email

© 2024 Mohit Sai Gutha | CS506 Midterm Project

About

Fraud detection project for CS506 Midterm – Top 20 on Kaggle | Feature engineering, decision trees, and imbalanced data handling.

Topics

Resources

Stars

Watchers

Forks