Production-ready ML application for predicting Netflix content viewership (28-day viewing hours in millions) using Gradient Boosting Machine Learning.
- Interactive Streamlit UI for content viewership prediction
- Single content and batch prediction support
- Real-time feature importance visualization
- Export predictions to CSV
- Professional insights dashboard
- High-capacity Gradient Boosting model (R² ≥ 0.85)
- Algorithm: Gradient Boosting Regressor (High Capacity)
- Viewership: Test R² ≈ 0.85+ (85%+ variance explained)
- Features: 50+ engineered features
- Complexity: 300 estimators, max depth 6
- Test MAE: ~30-35M hours
git clone https://github.com/Serhii2009/netflix-content-viewership-predictor.git
cd netflix-content-viewership-predictorLinux/macOS:
python3 -m venv venv
source venv/bin/activateWindows:
python -m venv venv
venv\Scripts\activatepip install -r requirements.txtPlace your trained netflix_model.pkl file in the models/ directory.
streamlit run app/streamlit_app.pyThe app will open automatically in your browser at http://localhost:8501
- Navigate to "Single Content Prediction" tab
- Fill in content metadata fields:
- Content Type (series/movie)
- Release timing
- Popularity scores
- Genres
- Competition level
- Click "Predict viewership"
- View predicted hours and insights
- Navigate to "Batch Prediction" tab
- Upload CSV file with content features
- Download predictions as CSV
- Content Type: series or movie
- Release Year: 2013-2026
- Release Month: 1-12
- Genres: Select applicable genres (pipe-separated)
- Seasons Count: For series (0 for movies)
- Global Popularity Proxy: 0-100
- Cast Popularity Proxy: 0-100
- Competition Level: low, medium, high
-
Content Features
- Content type indicators
- Season availability and bins
- Multi-season flags
-
Temporal Features
- Release quarter, holiday, summer indicators
- Years since release
- Recency flags
-
Popularity Features
- Composite scores
- Min/max popularity
- Blockbuster potential indicators
-
Genre Features
- 22 genre binary indicators
- Genre count and diversity
-
Interaction Features
- Series × Seasons
- Popularity × Competition
- Content type × Popularity synergies
GradientBoostingRegressor(
n_estimators=300,
learning_rate=0.08,
max_depth=6,
min_samples_split=8,
min_samples_leaf=3,
subsample=0.85,
max_features='sqrt'
)- Go to repository Settings → Collaborators
- Click "Add people"
- Enter GitHub username or email
- Assign role: Read, Write, or Admin
Main branch: Production-ready code only Development branch: Active feature development
Create feature branch:
git checkout -b feature/ui-enhancement
git add .
git commit -m "Add new visualization feature"
git push origin feature/ui-enhancementCreate Pull Request on GitHub for code review before merging.
netflix-content-viewership-predictor/
├── models/ # Trained model file (.pkl)
├── app/ # Streamlit UI
├── utils/ # Helper functions
│ ├── model_loader.py # Model loading utilities
│ ├── preprocessor.py # Feature engineering
│ └── visualizations.py # Plotting functions
├── data/ # Sample data
├── requirements.txt # Dependencies
├── README.md # Documentation
└── test_load.py # Testing
- Python 3.8+
- scikit-learn 1.6.1 (model compatibility)
- streamlit 1.31.0
- pandas, numpy, matplotlib, seaborn, plotly
See requirements.txt for complete list.
If you encounter version warnings:
pip install scikit-learn==1.6.1 --upgradeEnsure preprocessing creates all 50+ features exactly as in training notebook.
- Test R²: ≥ 0.85 (target achieved)
- Predictions within ±20%: ~65-70% of cases
- Predictions within ±30%: ~80-85% of cases
MIT License