A production-grade Machine Learning project that predicts the sentiment of movie reviews (positive or negative) using Logistic Regression and Bag of Words. This project demonstrates a complete MLOps lifecycle — from data ingestion to deployment on AWS EKS with CI/CD, experiment tracking, data versioning, containerization, and monitoring.
| Aspect | Description |
|---|---|
| Problem | Classify IMDb movie reviews as positive or negative |
| Best Model | Logistic Regression with Bag of Words |
| Goal | Deploy a fully automated ML pipeline with continuous integration, versioning, and cloud deployment |
| Deployment Type | End-to-End MLOps workflow on AWS (S3, ECR, EC2, EKS) |
| Monitoring Tools | Prometheus & Grafana |
| Tracking & Versioning | MLflow, DVC, Dagshub |
| Automation | GitHub Actions CI/CD |
| Containerization | Docker |
| Orchestration | AWS EKS (Kubernetes) |
- Python (3.10)
- Scikit-learn
- Bag of Words (BoW)
- Logistic Regression
- Git & GitHub
- GitHub Actions — CI/CD Pipeline
- DVC (Data Version Control) — Data & Model tracking
- MLflow + Dagshub — Experiment Tracking & Model Registry
- Docker — Containerization
- AWS S3 — Artifact & Data Storage
- AWS ECR — Docker Image Registry
- AWS EC2 — Compute Instances for Hosting Prometheus & Grafana
- AWS EKS (Elastic Kubernetes Service) — Model Deployment on Kubernetes
- Prometheus & Grafana — Application Monitoring and Visualization
┌────────────────────────────┐
│ GitHub Repo │
│ (Code + DVC + CI/CD) │
└──────────────┬──────────────┘
│
GitHub Actions (CI/CD)
│
▼
┌─────────────┐ ┌─────────────┐ ┌───────────────┐
│ MLflow + │◄────►│ DVC Repo │──────►│ AWS S3 Bucket │
│ Dagshub │ │ (Data + │ │ (Remote Store)│
│ (Tracking) │ │ Models) │ └───────────────┘
└─────────────┘ └─────────────┘
│
▼
Docker Image Build
│
▼
Push Image → AWS ECR Repository
│
▼
Deploy Image → AWS EKS Cluster
│
▼
Monitor via Prometheus & Grafana
# Clone repo
git clone <repo_url>
cd <repo>
# Create and activate virtual environment
conda create -n atlas python=3.10
conda activate atlas
# Initialize project template
pip install cookiecutter
cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-scienceRename directories:
src.models → src.model
- Create a Dagshub repo → Connect to GitHub
- Copy MLflow experiment tracking URL & add to code
- Install dependencies:
pip install dagshub mlflow- Run experiments → push to GitHub
dvc init
mkdir local_s3
dvc remote add -d mylocal local_s3Create these components:
src/
├── logger.py
├── data_ingestion.py
├── data_preprocessing.py
├── feature_engineering.py
├── model_building.py
├── model_evaluation.py
├── register_model.py
Add:
dvc.yaml
params.yaml
Run the pipeline:
dvc repro
dvc push-
Create IAM User + S3 Bucket
-
Install:
pip install dvc[s3] awscli
-
Configure AWS CLI:
aws configure
-
Add S3 remote:
dvc remote add -d myremote s3://<bucket-name>
mkdir flask_app
pip install flask
python app.pycd flask_app
pip install pipreqs
pipreqs . --force
docker build -t capstone-app:latest .
docker run -p 8888:5000 -e CAPSTONE_TEST=<dagshub_token> capstone-app:latestPush image to DockerHub or AWS ECR.
Add .github/workflows/ci.yaml and setup repository secrets:
| Secret | Description |
|---|---|
| AWS_ACCESS_KEY_ID | AWS Access Key |
| AWS_SECRET_ACCESS_KEY | AWS Secret |
| AWS_REGION | AWS Region (e.g., ap-south-1) |
| ECR_REPOSITORY | Docker repo name |
| AWS_ACCOUNT_ID | Your AWS account number |
| CAPSTONE_TEST | Dagshub authentication token |
The pipeline automatically:
- Builds Docker image
- Pushes it to ECR
- Deploys to EKS
eksctl create cluster --name flask-app-cluster \
--region ap-south-1 --nodegroup-name flask-app-nodes \
--node-type t3.small --nodes 1 --managedVerify and connect:
aws eks --region ap-south-1 update-kubeconfig --name flask-app-cluster
kubectl get nodesDeploy Flask app:
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl get svc flask-app-serviceAccess via:
http://<external-ip>:5000
wget https://github.com/prometheus/prometheus/releases/download/v2.46.0/prometheus-2.46.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.46.0.linux-amd64.tar.gz
sudo mv prometheus /etc/prometheusEdit /etc/prometheus/prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: "flask-app"
static_configs:
- targets: ["<external-ip>:5000"]Run Prometheus:
/usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.ymlVisit: http://<prometheus-ec2-ip>:9090
wget https://dl.grafana.com/oss/release/grafana_10.1.5_amd64.deb
sudo apt install ./grafana_10.1.5_amd64.deb -y
sudo systemctl start grafana-server
sudo systemctl enable grafana-serverVisit Grafana UI:
http://<grafana-ec2-ip>:3000
(Default: admin/admin)
Add Prometheus as data source and visualize metrics.
(You can include an image like architecture.png here in your repo)
✅ End-to-End Automated ML Pipeline ✅ Continuous Integration & Deployment using GitHub Actions ✅ Data and Model Versioning with DVC & MLflow ✅ Containerized & Deployed on AWS EKS ✅ Real-time Monitoring via Prometheus & Grafana ✅ Scalable, Reproducible, and Cloud-Native ML Workflow
Priyangshu Majumder 💼 Machine Learning & MLOps Enthusiast 📧 [priyangshumajumder9@gmail.com] 🌐 [https://www.linkedin.com/in/priyangshu-majumder-052005236/]
This project represents a complete real-world MLOps implementation — from model development to cloud deployment and production monitoring. It’s designed to demonstrate practical industry-level MLOps expertise suitable for any enterprise ML workflow.