Skip to content

This project represents a complete real-world MLOps implementation — from model development to cloud deployment and production monitoring. It’s designed to demonstrate practical industry-level MLOps expertise suitable for any enterprise ML workflow.

License

Notifications You must be signed in to change notification settings

PRIYANGSHU018/Sentiment-Analysis-of-Movie-Reviews--End-to-End-MLOps-Project

Repository files navigation

🎬 Sentiment Analysis of Movie Reviews — End-to-End MLOps Project

A production-grade Machine Learning project that predicts the sentiment of movie reviews (positive or negative) using Logistic Regression and Bag of Words. This project demonstrates a complete MLOps lifecycle — from data ingestion to deployment on AWS EKS with CI/CD, experiment tracking, data versioning, containerization, and monitoring.


🚀 Project Overview

Aspect Description
Problem Classify IMDb movie reviews as positive or negative
Best Model Logistic Regression with Bag of Words
Goal Deploy a fully automated ML pipeline with continuous integration, versioning, and cloud deployment
Deployment Type End-to-End MLOps workflow on AWS (S3, ECR, EC2, EKS)
Monitoring Tools Prometheus & Grafana
Tracking & Versioning MLflow, DVC, Dagshub
Automation GitHub Actions CI/CD
Containerization Docker
Orchestration AWS EKS (Kubernetes)

🧠 Tech Stack & Tools

⚙️ Machine Learning

  • Python (3.10)
  • Scikit-learn
  • Bag of Words (BoW)
  • Logistic Regression

🧰 MLOps Tools

  • Git & GitHub
  • GitHub Actions — CI/CD Pipeline
  • DVC (Data Version Control) — Data & Model tracking
  • MLflow + Dagshub — Experiment Tracking & Model Registry
  • Docker — Containerization
  • AWS S3 — Artifact & Data Storage
  • AWS ECR — Docker Image Registry
  • AWS EC2 — Compute Instances for Hosting Prometheus & Grafana
  • AWS EKS (Elastic Kubernetes Service) — Model Deployment on Kubernetes
  • Prometheus & Grafana — Application Monitoring and Visualization

🏗️ Project Architecture

               ┌────────────────────────────┐
               │        GitHub Repo          │
               │  (Code + DVC + CI/CD)       │
               └──────────────┬──────────────┘
                              │
                   GitHub Actions (CI/CD)
                              │
                              ▼
 ┌─────────────┐      ┌─────────────┐       ┌───────────────┐
 │  MLflow +   │◄────►│   DVC Repo  │──────►│ AWS S3 Bucket │
 │  Dagshub    │      │ (Data +     │       │  (Remote Store)│
 │ (Tracking)  │      │  Models)    │       └───────────────┘
 └─────────────┘      └─────────────┘
                              │
                              ▼
                     Docker Image Build
                              │
                              ▼
                Push Image → AWS ECR Repository
                              │
                              ▼
                Deploy Image → AWS EKS Cluster
                              │
                              ▼
               Monitor via Prometheus & Grafana

📂 Project Setup and Execution

🔧 1. Setting up the Project Structure

# Clone repo
git clone <repo_url>
cd <repo>

# Create and activate virtual environment
conda create -n atlas python=3.10
conda activate atlas

# Initialize project template
pip install cookiecutter
cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science

Rename directories:

src.models → src.model

🧪 2. Experiment Tracking with MLflow & Dagshub

  1. Create a Dagshub repo → Connect to GitHub
  2. Copy MLflow experiment tracking URL & add to code
  3. Install dependencies:
pip install dagshub mlflow
  1. Run experiments → push to GitHub

💾 3. Data Versioning using DVC

dvc init
mkdir local_s3
dvc remote add -d mylocal local_s3

Create these components:

src/
├── logger.py
├── data_ingestion.py
├── data_preprocessing.py
├── feature_engineering.py
├── model_building.py
├── model_evaluation.py
├── register_model.py

Add:

dvc.yaml
params.yaml

Run the pipeline:

dvc repro
dvc push

☁️ 4. Connecting AWS S3 as Remote Storage

  1. Create IAM User + S3 Bucket

  2. Install:

    pip install dvc[s3] awscli
  3. Configure AWS CLI:

    aws configure
  4. Add S3 remote:

    dvc remote add -d myremote s3://<bucket-name>

🌐 5. Flask API Setup

mkdir flask_app
pip install flask
python app.py

🐳 6. Dockerization

cd flask_app
pip install pipreqs
pipreqs . --force

docker build -t capstone-app:latest .
docker run -p 8888:5000 -e CAPSTONE_TEST=<dagshub_token> capstone-app:latest

Push image to DockerHub or AWS ECR.


⚙️ 7. CI/CD with GitHub Actions

Add .github/workflows/ci.yaml and setup repository secrets:

Secret Description
AWS_ACCESS_KEY_ID AWS Access Key
AWS_SECRET_ACCESS_KEY AWS Secret
AWS_REGION AWS Region (e.g., ap-south-1)
ECR_REPOSITORY Docker repo name
AWS_ACCOUNT_ID Your AWS account number
CAPSTONE_TEST Dagshub authentication token

The pipeline automatically:

  1. Builds Docker image
  2. Pushes it to ECR
  3. Deploys to EKS

☸️ 8. AWS EKS Setup & Deployment

eksctl create cluster --name flask-app-cluster \
--region ap-south-1 --nodegroup-name flask-app-nodes \
--node-type t3.small --nodes 1 --managed

Verify and connect:

aws eks --region ap-south-1 update-kubeconfig --name flask-app-cluster
kubectl get nodes

Deploy Flask app:

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl get svc flask-app-service

Access via:

http://<external-ip>:5000

📊 9. Monitoring with Prometheus & Grafana

🧭 Prometheus Setup (on EC2)

wget https://github.com/prometheus/prometheus/releases/download/v2.46.0/prometheus-2.46.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.46.0.linux-amd64.tar.gz
sudo mv prometheus /etc/prometheus

Edit /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: "flask-app"
    static_configs:
      - targets: ["<external-ip>:5000"]

Run Prometheus:

/usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.yml

Visit: http://<prometheus-ec2-ip>:9090

📈 Grafana Setup (on EC2)

wget https://dl.grafana.com/oss/release/grafana_10.1.5_amd64.deb
sudo apt install ./grafana_10.1.5_amd64.deb -y
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Visit Grafana UI: http://<grafana-ec2-ip>:3000 (Default: admin/admin) Add Prometheus as data source and visualize metrics.


📸 Sample Architecture Diagram

(You can include an image like architecture.png here in your repo)


🌟 Key Highlights

✅ End-to-End Automated ML Pipeline ✅ Continuous Integration & Deployment using GitHub Actions ✅ Data and Model Versioning with DVC & MLflow ✅ Containerized & Deployed on AWS EKS ✅ Real-time Monitoring via Prometheus & Grafana ✅ Scalable, Reproducible, and Cloud-Native ML Workflow


👨‍💻 Author

Priyangshu Majumder 💼 Machine Learning & MLOps Enthusiast 📧 [priyangshumajumder9@gmail.com] 🌐 [https://www.linkedin.com/in/priyangshu-majumder-052005236/]


🏁 Final Note

This project represents a complete real-world MLOps implementation — from model development to cloud deployment and production monitoring. It’s designed to demonstrate practical industry-level MLOps expertise suitable for any enterprise ML workflow.

About

This project represents a complete real-world MLOps implementation — from model development to cloud deployment and production monitoring. It’s designed to demonstrate practical industry-level MLOps expertise suitable for any enterprise ML workflow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published