Netflix Content Viewership Predictor

Production-ready ML application for predicting Netflix content viewership (28-day viewing hours in millions) using Gradient Boosting Machine Learning.

Features

Interactive Streamlit UI for content viewership prediction
Single content and batch prediction support
Real-time feature importance visualization
Export predictions to CSV
Professional insights dashboard
High-capacity Gradient Boosting model (R² ≥ 0.85)

Model Information

Algorithm: Gradient Boosting Regressor (High Capacity)
Viewership: Test R² ≈ 0.85+ (85%+ variance explained)
Features: 50+ engineered features
Complexity: 300 estimators, max depth 6
Test MAE: ~30-35M hours

Setup Instructions

1. Clone Repository

git clone https://github.com/Serhii2009/netflix-content-viewership-predictor.git
cd netflix-content-viewership-predictor

2. Create Virtual Environment

Linux/macOS:

python3 -m venv venv
source venv/bin/activate

Windows:

python -m venv venv
venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Add Model File

Place your trained netflix_model.pkl file in the models/ directory.

5. Run Application

streamlit run app/streamlit_app.py

The app will open automatically in your browser at http://localhost:8501

Usage

Single Prediction

Navigate to "Single Content Prediction" tab
Fill in content metadata fields:
- Content Type (series/movie)
- Release timing
- Popularity scores
- Genres
- Competition level
Click "Predict viewership"
View predicted hours and insights

Batch Prediction

Navigate to "Batch Prediction" tab
Upload CSV file with content features
Download predictions as CSV

Required Input Features

Content Type: series or movie
Release Year: 2013-2026
Release Month: 1-12
Genres: Select applicable genres (pipe-separated)
Seasons Count: For series (0 for movies)
Global Popularity Proxy: 0-100
Cast Popularity Proxy: 0-100
Competition Level: low, medium, high

Model Architecture

Feature Engineering (50+ Features)

Content Features
- Content type indicators
- Season availability and bins
- Multi-season flags
Temporal Features
- Release quarter, holiday, summer indicators
- Years since release
- Recency flags
Popularity Features
- Composite scores
- Min/max popularity
- Blockbuster potential indicators
Genre Features
- 22 genre binary indicators
- Genre count and diversity
Interaction Features
- Series × Seasons
- Popularity × Competition
- Content type × Popularity synergies

Model Configuration

GradientBoostingRegressor(
    n_estimators=300,
    learning_rate=0.08,
    max_depth=6,
    min_samples_split=8,
    min_samples_leaf=3,
    subsample=0.85,
    max_features='sqrt'
)

Team Collaboration

Adding Team Members

Go to repository Settings → Collaborators
Click "Add people"
Enter GitHub username or email
Assign role: Read, Write, or Admin

Branch Workflow

Main branch: Production-ready code only Development branch: Active feature development

Create feature branch:

git checkout -b feature/ui-enhancement
git add .
git commit -m "Add new visualization feature"
git push origin feature/ui-enhancement

Create Pull Request on GitHub for code review before merging.

Project Structure

netflix-content-viewership-predictor/
├── models/              # Trained model file (.pkl)
├── app/                 # Streamlit UI
├── utils/               # Helper functions
│   ├── model_loader.py  # Model loading utilities
│   ├── preprocessor.py  # Feature engineering
│   └── visualizations.py # Plotting functions
├── data/                # Sample data
├── requirements.txt     # Dependencies
├── README.md            # Documentation
└── test_load.py         # Testing

Requirements

Python 3.8+
scikit-learn 1.6.1 (model compatibility)
streamlit 1.31.0
pandas, numpy, matplotlib, seaborn, plotly

See requirements.txt for complete list.

Troubleshooting

Model Loading Issues

If you encounter version warnings:

pip install scikit-learn==1.6.1 --upgrade

Feature Mismatch

Ensure preprocessing creates all 50+ features exactly as in training notebook.

Viewership Benchmarks

Test R²: ≥ 0.85 (target achieved)
Predictions within ±20%: ~65-70% of cases
Predictions within ±30%: ~80-85% of cases

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netflix Content Viewership Predictor

Features

Model Information

Setup Instructions

1. Clone Repository

2. Create Virtual Environment

3. Install Dependencies

4. Add Model File

5. Run Application

Usage

Single Prediction

Batch Prediction

Required Input Features

Model Architecture

Feature Engineering (50+ Features)

Model Configuration

Team Collaboration

Adding Team Members

Branch Workflow

Project Structure

Requirements

Troubleshooting

Model Loading Issues

Feature Mismatch

Viewership Benchmarks

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test_load.py		test_load.py

Serhii2009/netflix-content-viewership-predictor

Folders and files

Latest commit

History

Repository files navigation

Netflix Content Viewership Predictor

Features

Model Information

Setup Instructions

1. Clone Repository

2. Create Virtual Environment

3. Install Dependencies

4. Add Model File

5. Run Application

Usage

Single Prediction

Batch Prediction

Required Input Features

Model Architecture

Feature Engineering (50+ Features)

Model Configuration

Team Collaboration

Adding Team Members

Branch Workflow

Project Structure

Requirements

Troubleshooting

Model Loading Issues

Feature Mismatch

Viewership Benchmarks

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages