A robust prototype of a real-time recommendation engine for e-commerce, capable of delivering personalized product suggestions using a Hybrid approach (Collaborative + Content-Based Filtering).
- Hybrid Recommendation Engine: Combines User Behavior (NMF) and Product Similarity (TF-IDF) for accurate suggestions.
- Cold Start Handling: Uses Content-Based filtering for new users/products.
- Real-Time API: Flask-based REST API with <100ms response time.
- Interactive Dashboard: Streamlit UI for visualizing user profiles and recommendations.
- Evaluation Metrics: Built-in script calculates Precision@K and Recall@K.
- Scalable Design: Modular architecture ready for Spark/Kafka integration.
- Language: Python 3.x
- Machine Learning: Scikit-Learn (NMF, TF-IDF), NumPy, Pandas
- API: Flask
- Visualization: Streamlit
- Data Processing: Pandas
βββ data/
β βββ processed/ # Generated artifacts (interactions, models)
β βββ raw/ # Original CSV datasets
βββ src/
β βββ models/ # ML Model implementations (Hybrid, NMF, Content)
β βββ app.py # Flask API
β βββ dashboard.py # Streamlit Dashboard
β βββ data_processing.py # ETL Pipeline
β βββ feature_engineering.py # Feature extraction
β βββ evaluate.py # Metrics calculation
βββ notebooks/ # Experimental notebookspip install pandas numpy scikit-learn flask streamlit joblib requestsRun the pipeline to process data and train the initial models:
# 1. Process Raw Data
python src/data_processing.py
# 2. Extract Features & Create Embeddings
python src/feature_engineering.py
# 3. Train Collaborative Model (NMF)
python src/models/collaborative_sklearn.pyStart the backend server:
python src/app.pyServer runs at http://localhost:5000
Open the interactive UI:
streamlit run src/dashboard.pyTo check the model's performance metrics (Precision & Recall):
python src/evaluate.pyEndpoint: GET /recommend/<user_id>
Example:
curl http://localhost:5000/recommend/C1003Response:
{
"user_id": "C1003",
"strategy": "Hybrid",
"recommendations": ["P2003", "P4012", "P1005", "P3001", "P2022"]
}