Skip to content

ML-powered web app predicting Netflix content performance (28-day viewing hours) using Gradient Boosting with 85%+ accuracy

Notifications You must be signed in to change notification settings

Serhii2009/netflix-content-viewership-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Netflix Content Viewership Predictor

Production-ready ML application for predicting Netflix content viewership (28-day viewing hours in millions) using Gradient Boosting Machine Learning.

Features

  • Interactive Streamlit UI for content viewership prediction
  • Single content and batch prediction support
  • Real-time feature importance visualization
  • Export predictions to CSV
  • Professional insights dashboard
  • High-capacity Gradient Boosting model (R² ≥ 0.85)

Model Information

  • Algorithm: Gradient Boosting Regressor (High Capacity)
  • Viewership: Test R² ≈ 0.85+ (85%+ variance explained)
  • Features: 50+ engineered features
  • Complexity: 300 estimators, max depth 6
  • Test MAE: ~30-35M hours

Setup Instructions

1. Clone Repository

git clone https://github.com/Serhii2009/netflix-content-viewership-predictor.git
cd netflix-content-viewership-predictor

2. Create Virtual Environment

Linux/macOS:

python3 -m venv venv
source venv/bin/activate

Windows:

python -m venv venv
venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Add Model File

Place your trained netflix_model.pkl file in the models/ directory.

5. Run Application

streamlit run app/streamlit_app.py

The app will open automatically in your browser at http://localhost:8501

Usage

Single Prediction

  1. Navigate to "Single Content Prediction" tab
  2. Fill in content metadata fields:
    • Content Type (series/movie)
    • Release timing
    • Popularity scores
    • Genres
    • Competition level
  3. Click "Predict viewership"
  4. View predicted hours and insights

Batch Prediction

  1. Navigate to "Batch Prediction" tab
  2. Upload CSV file with content features
  3. Download predictions as CSV

Required Input Features

  • Content Type: series or movie
  • Release Year: 2013-2026
  • Release Month: 1-12
  • Genres: Select applicable genres (pipe-separated)
  • Seasons Count: For series (0 for movies)
  • Global Popularity Proxy: 0-100
  • Cast Popularity Proxy: 0-100
  • Competition Level: low, medium, high

Model Architecture

Feature Engineering (50+ Features)

  1. Content Features

    • Content type indicators
    • Season availability and bins
    • Multi-season flags
  2. Temporal Features

    • Release quarter, holiday, summer indicators
    • Years since release
    • Recency flags
  3. Popularity Features

    • Composite scores
    • Min/max popularity
    • Blockbuster potential indicators
  4. Genre Features

    • 22 genre binary indicators
    • Genre count and diversity
  5. Interaction Features

    • Series × Seasons
    • Popularity × Competition
    • Content type × Popularity synergies

Model Configuration

GradientBoostingRegressor(
    n_estimators=300,
    learning_rate=0.08,
    max_depth=6,
    min_samples_split=8,
    min_samples_leaf=3,
    subsample=0.85,
    max_features='sqrt'
)

Team Collaboration

Adding Team Members

  1. Go to repository Settings → Collaborators
  2. Click "Add people"
  3. Enter GitHub username or email
  4. Assign role: Read, Write, or Admin

Branch Workflow

Main branch: Production-ready code only Development branch: Active feature development

Create feature branch:

git checkout -b feature/ui-enhancement
git add .
git commit -m "Add new visualization feature"
git push origin feature/ui-enhancement

Create Pull Request on GitHub for code review before merging.

Project Structure

netflix-content-viewership-predictor/
├── models/              # Trained model file (.pkl)
├── app/                 # Streamlit UI
├── utils/               # Helper functions
│   ├── model_loader.py  # Model loading utilities
│   ├── preprocessor.py  # Feature engineering
│   └── visualizations.py # Plotting functions
├── data/                # Sample data
├── requirements.txt     # Dependencies
├── README.md            # Documentation
└── test_load.py         # Testing

Requirements

  • Python 3.8+
  • scikit-learn 1.6.1 (model compatibility)
  • streamlit 1.31.0
  • pandas, numpy, matplotlib, seaborn, plotly

See requirements.txt for complete list.

Troubleshooting

Model Loading Issues

If you encounter version warnings:

pip install scikit-learn==1.6.1 --upgrade

Feature Mismatch

Ensure preprocessing creates all 50+ features exactly as in training notebook.

Viewership Benchmarks

  • Test R²: ≥ 0.85 (target achieved)
  • Predictions within ±20%: ~65-70% of cases
  • Predictions within ±30%: ~80-85% of cases

License

MIT License

Releases

No releases published

Packages

No packages published

Languages