📊 Customer Churn Prediction – MLOps Pipeline
This project implements an end-to-end MLOps workflow for a Customer Churn Prediction Model using modern tools like DVC, MLflow, GitHub Actions, and FastAPI.
It predicts whether a customer will churn (leave the service) based on historical tabular data.
⸻
🚀 Features Implemented
✅ Data validation using Pandera ✅ Model training and evaluation with scikit-learn ✅ Experiment tracking with MLflow ✅ API serving using FastAPI ✅ Version control with Git ✅ Data & model tracking with DVC ✅ Remote storage setup with Google Drive (DVC Remote) ✅ CI/CD automation with GitHub Actions
⸻
🏗 Project Structure
customer_churn_mlops/ │ ├── data/ # Raw & processed datasets (tracked with DVC) │ ├── raw/ │ └── processed/ │ ├── models/ # Stored models (tracked with DVC) │ ├── src/ # Source code │ ├── data_validation.py # Pandera validation schemas │ ├── train_model.py # Training script with MLflow logging │ ├── serve_api.py # FastAPI app for predictions │ ├── .dvc/ # DVC configuration ├── dvc.yaml # DVC pipeline stages ├── dvc.lock # Locked pipeline stages ├── requirements.txt # Python dependencies ├── README.md # Project documentation └── .github/workflows/ # GitHub Actions CI/CD pipelines
⸻
⚙ Installation & Setup
1️⃣ Clone the Repository
git clone https://github.com//customer_churn_mlops.git cd customer_churn_mlops
2️⃣ Create a Virtual Environment
python -m venv .dvc_env ..dvc_env\Scripts\activate # Windows source .dvc_env/bin/activate # Linux/Mac
3️⃣ Install Dependencies
pip install -r requirements.txt
4️⃣ Setup DVC Remote dvc push # Upload data & models to remote
📦 Running the Pipeline
Reproduce the ML pipeline:
dvc repro
Show metrics:
dvc metrics show
🔎 Model Training & Tracking
Train the model manually:
python src/train_model.py
View experiments in MLflow UI:
mlflow ui
🌐 API Serving
Run FastAPI server:
uvicorn src.serve_api:app --reload
Then test via browser or cURL:
⚡ CI/CD with GitHub Actions
This repo includes a GitHub Actions workflow that: 1. Installs dependencies 2. Pulls dataset & models from Google Drive (DVC Remote) 3. Reproduces the pipeline (dvc repro) 4. Shows metrics (dvc metrics show)
Check it under: 👉 GitHub → Actions tab
🛠 Tools & Tech Stack • Python 3.11 • scikit-learn – Model training • Pandera – Data validation • MLflow – Experiment tracking • FastAPI – Model serving • DVC – Data & model versioning • GitHub Actions – CI/CD automation • Google Drive – DVC remote storage
⸻
👨💻 Author
Developed by Moeed Abbasi