🎬 Sentiment Analysis of Movie Reviews — End-to-End MLOps Project

A production-grade Machine Learning project that predicts the sentiment of movie reviews (positive or negative) using Logistic Regression and Bag of Words. This project demonstrates a complete MLOps lifecycle — from data ingestion to deployment on AWS EKS with CI/CD, experiment tracking, data versioning, containerization, and monitoring.

🚀 Project Overview

Aspect	Description
Problem	Classify IMDb movie reviews as positive or negative
Best Model	Logistic Regression with Bag of Words
Goal	Deploy a fully automated ML pipeline with continuous integration, versioning, and cloud deployment
Deployment Type	End-to-End MLOps workflow on AWS (S3, ECR, EC2, EKS)
Monitoring Tools	Prometheus & Grafana
Tracking & Versioning	MLflow, DVC, Dagshub
Automation	GitHub Actions CI/CD
Containerization	Docker
Orchestration	AWS EKS (Kubernetes)

🧠 Tech Stack & Tools

⚙️ Machine Learning

Python (3.10)
Scikit-learn
Bag of Words (BoW)
Logistic Regression

🧰 MLOps Tools

Git & GitHub
GitHub Actions — CI/CD Pipeline
DVC (Data Version Control) — Data & Model tracking
MLflow + Dagshub — Experiment Tracking & Model Registry
Docker — Containerization
AWS S3 — Artifact & Data Storage
AWS ECR — Docker Image Registry
AWS EC2 — Compute Instances for Hosting Prometheus & Grafana
AWS EKS (Elastic Kubernetes Service) — Model Deployment on Kubernetes
Prometheus & Grafana — Application Monitoring and Visualization

🏗️ Project Architecture

               ┌────────────────────────────┐
               │        GitHub Repo          │
               │  (Code + DVC + CI/CD)       │
               └──────────────┬──────────────┘
                              │
                   GitHub Actions (CI/CD)
                              │
                              ▼
 ┌─────────────┐      ┌─────────────┐       ┌───────────────┐
 │  MLflow +   │◄────►│   DVC Repo  │──────►│ AWS S3 Bucket │
 │  Dagshub    │      │ (Data +     │       │  (Remote Store)│
 │ (Tracking)  │      │  Models)    │       └───────────────┘
 └─────────────┘      └─────────────┘
                              │
                              ▼
                     Docker Image Build
                              │
                              ▼
                Push Image → AWS ECR Repository
                              │
                              ▼
                Deploy Image → AWS EKS Cluster
                              │
                              ▼
               Monitor via Prometheus & Grafana

📂 Project Setup and Execution

🔧 1. Setting up the Project Structure

# Clone repo
git clone <repo_url>
cd <repo>

# Create and activate virtual environment
conda create -n atlas python=3.10
conda activate atlas

# Initialize project template
pip install cookiecutter
cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science

Rename directories:

src.models → src.model

🧪 2. Experiment Tracking with MLflow & Dagshub

Create a Dagshub repo → Connect to GitHub
Copy MLflow experiment tracking URL & add to code
Install dependencies:

pip install dagshub mlflow

Run experiments → push to GitHub

💾 3. Data Versioning using DVC

dvc init
mkdir local_s3
dvc remote add -d mylocal local_s3

Create these components:

src/
├── logger.py
├── data_ingestion.py
├── data_preprocessing.py
├── feature_engineering.py
├── model_building.py
├── model_evaluation.py
├── register_model.py

Add:

dvc.yaml
params.yaml

Run the pipeline:

dvc repro
dvc push

☁️ 4. Connecting AWS S3 as Remote Storage

Create IAM User + S3 Bucket
Install:
```
pip install dvc[s3] awscli
```
Configure AWS CLI:
```
aws configure
```

Add S3 remote:

dvc remote add -d myremote s3://<bucket-name>

🌐 5. Flask API Setup

mkdir flask_app
pip install flask
python app.py

🐳 6. Dockerization

cd flask_app
pip install pipreqs
pipreqs . --force

docker build -t capstone-app:latest .
docker run -p 8888:5000 -e CAPSTONE_TEST=<dagshub_token> capstone-app:latest

Push image to DockerHub or AWS ECR.

⚙️ 7. CI/CD with GitHub Actions

Add .github/workflows/ci.yaml and setup repository secrets:

Secret	Description
AWS_ACCESS_KEY_ID	AWS Access Key
AWS_SECRET_ACCESS_KEY	AWS Secret
AWS_REGION	AWS Region (e.g., ap-south-1)
ECR_REPOSITORY	Docker repo name
AWS_ACCOUNT_ID	Your AWS account number
CAPSTONE_TEST	Dagshub authentication token

The pipeline automatically:

Builds Docker image
Pushes it to ECR
Deploys to EKS

☸️ 8. AWS EKS Setup & Deployment

eksctl create cluster --name flask-app-cluster \
--region ap-south-1 --nodegroup-name flask-app-nodes \
--node-type t3.small --nodes 1 --managed

Verify and connect:

aws eks --region ap-south-1 update-kubeconfig --name flask-app-cluster
kubectl get nodes

Deploy Flask app:

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl get svc flask-app-service

Access via:

http://<external-ip>:5000

📊 9. Monitoring with Prometheus & Grafana

🧭 Prometheus Setup (on EC2)

wget https://github.com/prometheus/prometheus/releases/download/v2.46.0/prometheus-2.46.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.46.0.linux-amd64.tar.gz
sudo mv prometheus /etc/prometheus

Edit /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: "flask-app"
    static_configs:
      - targets: ["<external-ip>:5000"]

Run Prometheus:

/usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.yml

Visit: http://<prometheus-ec2-ip>:9090

📈 Grafana Setup (on EC2)

wget https://dl.grafana.com/oss/release/grafana_10.1.5_amd64.deb
sudo apt install ./grafana_10.1.5_amd64.deb -y
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Visit Grafana UI: http://<grafana-ec2-ip>:3000 (Default: admin/admin) Add Prometheus as data source and visualize metrics.

📸 Sample Architecture Diagram

(You can include an image like architecture.png here in your repo)

🌟 Key Highlights

✅ End-to-End Automated ML Pipeline ✅ Continuous Integration & Deployment using GitHub Actions ✅ Data and Model Versioning with DVC & MLflow ✅ Containerized & Deployed on AWS EKS ✅ Real-time Monitoring via Prometheus & Grafana ✅ Scalable, Reproducible, and Cloud-Native ML Workflow

👨‍💻 Author

Priyangshu Majumder 💼 Machine Learning & MLOps Enthusiast 📧 [priyangshumajumder9@gmail.com] 🌐 [https://www.linkedin.com/in/priyangshu-majumder-052005236/]

🏁 Final Note

This project represents a complete real-world MLOps implementation — from model development to cloud deployment and production monitoring. It’s designed to demonstrate practical industry-level MLOps expertise suitable for any enterprise ML workflow.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.dvc		.dvc
.github/workflows		.github/workflows
docs		docs
flask_app		flask_app
models		models
notebooks		notebooks
references		references
reports		reports
scripts		scripts
src		src
tests		tests
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
deployment.yaml		deployment.yaml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Sentiment Analysis of Movie Reviews — End-to-End MLOps Project

🚀 Project Overview

🧠 Tech Stack & Tools

⚙️ Machine Learning

🧰 MLOps Tools

🏗️ Project Architecture

📂 Project Setup and Execution

🔧 1. Setting up the Project Structure

🧪 2. Experiment Tracking with MLflow & Dagshub

💾 3. Data Versioning using DVC

☁️ 4. Connecting AWS S3 as Remote Storage

🌐 5. Flask API Setup

🐳 6. Dockerization

⚙️ 7. CI/CD with GitHub Actions

☸️ 8. AWS EKS Setup & Deployment

📊 9. Monitoring with Prometheus & Grafana

🧭 Prometheus Setup (on EC2)

📈 Grafana Setup (on EC2)

📸 Sample Architecture Diagram

🌟 Key Highlights

👨‍💻 Author

🏁 Final Note

About

Uh oh!

Releases

Packages

Languages

License

PRIYANGSHU018/Sentiment-Analysis-of-Movie-Reviews--End-to-End-MLOps-Project

Folders and files

Latest commit

History

Repository files navigation

🎬 Sentiment Analysis of Movie Reviews — End-to-End MLOps Project

🚀 Project Overview

🧠 Tech Stack & Tools

⚙️ Machine Learning

🧰 MLOps Tools

🏗️ Project Architecture

📂 Project Setup and Execution

🔧 1. Setting up the Project Structure

🧪 2. Experiment Tracking with MLflow & Dagshub

💾 3. Data Versioning using DVC

☁️ 4. Connecting AWS S3 as Remote Storage

🌐 5. Flask API Setup

🐳 6. Dockerization

⚙️ 7. CI/CD with GitHub Actions

☸️ 8. AWS EKS Setup & Deployment

📊 9. Monitoring with Prometheus & Grafana

🧭 Prometheus Setup (on EC2)

📈 Grafana Setup (on EC2)

📸 Sample Architecture Diagram

🌟 Key Highlights

👨‍💻 Author

🏁 Final Note

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages