Skip to content

FARDEEN-785/credit-risk-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Credit Risk Prediction System (MLOps)

A production-grade end-to-end Machine Learning project for predicting credit default risk, built with modern MLOps practices.
The system covers the full ML lifecycle: data processing, model training, experiment tracking, model registry, and API-based deployment.


๐Ÿ“Œ Project Overview

This project aims to predict whether a loan applicant is likely to default based on demographic, financial, and credit history features.
Multiple models were trained and evaluated, with the best-performing model deployed as a REST API.


๐Ÿง  Key Features

  • End-to-end ML pipeline (data โ†’ model โ†’ API)
  • Feature engineering and preprocessing
  • Model comparison and evaluation
  • Experiment tracking and model versioning
  • Production-ready inference API

๐Ÿ› ๏ธ Tech Stack & Skills

  • Programming: Python
  • Data Processing: Pandas, NumPy
  • Machine Learning: Scikit-learn, XGBoost
  • MLOps: MLflow (tracking, registry)
  • API: FastAPI, Uvicorn
  • Evaluation: ROC-AUC, Precision, Recall

๐Ÿ“Š Models Used

  • Logistic Regression (baseline)
  • XGBoost Classifier (final model)

XGBoost achieved higher recall (~71%), making it more effective at identifying high-risk borrowers.


๐Ÿงช Experiment Tracking

MLflow is used to:

  • Track experiments and metrics
  • Compare multiple models
  • Register the best-performing model

Note: MLflow artifacts (mlruns/) are generated locally at runtime and are excluded from version control.


๐Ÿš€ Project Structure

Credit_risk_2.0/ โ”‚ โ”œโ”€โ”€ src/ โ”‚ โ”œโ”€โ”€ data/ # Data loading & preprocessing โ”‚ โ”œโ”€โ”€ features/ # Feature engineering โ”‚ โ”œโ”€โ”€ models/ # Training & evaluation scripts โ”‚ โ””โ”€โ”€ api/ # FastAPI inference service โ”‚ โ”œโ”€โ”€ notebooks/ # Exploration & experiments โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ README.md โ””โ”€โ”€ .gitignore


๐Ÿ”ฎ Running the Project

1. Install dependencies

pip install -r requirements.txt
2. Train models & log experiments
python src/models/train.py
3. Start the API server
uvicorn src.api.app:app --reload
4. Test the API
Open your browser at:

http://127.0.0.1:8000/docs
๐Ÿ”‘ Key Insight
Gradient-boosted models captured non-linear patterns in credit data better than linear models, significantly improving recall for default prediction โ€” a critical metric in real-world credit risk systems.

๐Ÿ“ˆ Future Improvements
Data drift and model monitoring

Automated retraining pipelines

CI/CD for ML workflows

Dockerized deployment

๐Ÿ‘ค Author
Fardeen
Aspiring AI/ML Engineer | Interested in applied ML, MLOps, and data-driven systems

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages