IEEE Fraud Detection Project

Welcome to the IEEE Fraud Detection Project! This project leverages machine learning techniques to detect fraudulent e-commerce transactions using real-world data from the IEEE-CIS Fraud Detection competition.

Overview

This project implements a full-stack fraud detection solution using Python. It includes:

Data Preprocessing & Feature Engineering:
Handling missing values, extracting time features, grouping email domains, processing address and distance information, and aggregating binary flags.
Machine Learning Models:
Multiple models were developed and compared (XGBoost, LightGBM, Random Forest, and a Neural Network prototype) with boosting models (XGBoost/LightGBM) achieving strong AUC (up to 0.949) and balanced precision/recall performance.
API Backend:
A FastAPI backend serves predictions. It processes incoming JSON data (transaction and identity tables), applies preprocessing and feature engineering, and returns a fraud probability.
Frontend:
A simple Streamlit-based frontend allows users to input data and view predictions, demonstrating an end-to-end solution.
Dockerization & Deployment:
The application is containerized using Docker and deployed on Google Cloud Run, making it available online: https://fraud-detection-frontend-x2ugjgse3q-uc.a.run.app

Features

Robust Data Processing:
Handles a variety of feature types (numerical, categorical, binary) and performs extensive feature engineering.
Modeling:
Implements gradient boosting models (XGBoost and LightGBM) with competitive performance and an initial Random Forest baseline.
End-to-End Pipeline:
From data ingestion to API-based inference, ensuring consistency across training and production.
Interactive Frontend:
A Streamlit-based UI for demoing predictions interactively.
Production-Ready Deployment:
Dockerized application deployed on Google Cloud Run.

Project Files & Notebooks

The project is organized into several key files and folders to facilitate development and experimentation:

data_processing.py
Responsible for loading, merging, and orchestrating data processing along with related functions.
feature_engineering.py
Contains methods for encoding, transforming features, and other feature engineering techniques.
EDA.ipynb
A Jupyter notebook for Exploratory Data Analysis (EDA) to analyze raw data before processing.
FeatureEngineering.ipynb
Explores feature engineering in detail, including close-up analysis of features and the feature importance from applied models.
ModelDevelopment.ipynb
Notebook for training various models and comparing their performances.
helpers.py
A collection of helper functions used throughout the project.
models folder
Contains model-related Python files and a config.py file to store model configurations.

Installation

Clone the Repository:

git clone https://github.com/elnurisg/ieee-fraud-detection.git

Set Up Virtual Environment:

python3 -m venv .venv
source .venv/bin/activate

Install Dependencies:

pip install --upgrade pip
pip install -r requirements.txt

Local Development

Running the Backend:

Navigate to the project root and run:

uvicorn app.backend.main:app --reload

The API will be available at http://localhost:8000.

Running the Frontend:

Navigate to the app/frontend directory and run:

cd app/frontend
streamlit run app.py

This opens a browser window with the Streamlit app.

Running Tests:

From the project root, run:

pytest

API Usage

Endpoints

GET /

Returns a welcome message.

GET /health

Health check endpoint that returns the status of the API.

POST /predict

Accepts a JSON payload with two keys: transaction_table and identity_table.

Example payload:

{
  "transaction_table": { ... },
  "identity_table": { ... }
}

Response:

Returns the predicted fraud probability.

Dockerization & Deployment

Dockerization:

The project is containerized using a Dockerfile located at the root of the repository. To build and run locally:

docker build -t fraud-api .
docker run -p 8000:8000 fraud-api

Deployment:

The application is deployed on Google Cloud Run. Use the provided deploy.sh script to build, push, and deploy your container:

bash deploy.sh

Testing

Unit tests are written using pytest and are located in the tests/ directory. They cover:

Data processing and merging
Feature engineering functions
Helper utilities for model saving/loading and evaluation
API endpoints using FastAPI’s TestClient

To run the tests, execute:

pytest

Future Improvements

Model Tuning & Ensembling:

Further optimize hyperparameters, and possibly ensemble multiple models (e.g., stacking XGBoost and LightGBM).
Advanced Feature Engineering:

Explore additional feature interactions, frequency encoding, and domain-specific transformations.
Neural Network Models:

Experiment with MLPs or more advanced neural architectures for tabular data.
Enhanced Frontend:

Expand the Streamlit app with more interactive visualizations and a user-friendly interface.
CI/CD & Monitoring: Implement CI/CD (e.g., with GitHub Actions) and integrate monitoring/logging for production readiness.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
app		app
notebooks		notebooks
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IEEE Fraud Detection Project

Table of Contents

Overview

Features

Project Files & Notebooks

Installation

Local Development

API Usage

Dockerization & Deployment

Testing

Future Improvements

About

Uh oh!

Releases

Packages

Languages

elnurisg/ieee-fraud-detection

Folders and files

Latest commit

History

Repository files navigation

IEEE Fraud Detection Project

Table of Contents

Overview

Features

Project Files & Notebooks

Installation

Local Development

API Usage

Dockerization & Deployment

Testing

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages