A modular and extensible pipeline for aggregating online course data (Coursera, edX, NPTEL), unifying and enriching it, generating embeddings, and serving personalized course recommendations. The system supports both query-based search and adaptive recommendations driven by user quiz performance.
course-recommendation-system/
├── course-scraper/ # Individual scrapers for Coursera, edX, NPTEL → SQLite
├── unified_catalog/ # ETL pipeline: merge three source DBs → unified SQLite
├── recommender/ # Embeddings (FAISS), recommendation logic, FastAPI server
└── demo-quiz-app/ # Demo app (Flask backend + React/Vite frontend)
-
Scraping and Collection
- Extract course metadata from:
- Coursera →
course-scraper/coursera_scraper/coursera.db - edX →
course-scraper/edX_scraper/courses.db - NPTEL →
course-scraper/NPTEL_scraper/courses.db
- Coursera →
- Extract course metadata from:
-
Data Integration
- Normalize and merge into a unified schema using:
python unified_catalog/etl.py --sources coursera edx nptel
- Output:
unified_catalog/unified_courses.db
- Normalize and merge into a unified schema using:
-
Feature Construction
- Generate processed course text for embedding:
python unified_catalog/feature_builder.py
- Produces a
course_featurestable withtitle,tags,level, andlanguage.
- Generate processed course text for embedding:
-
Embedding & Indexing
- Encode text features using a Sentence-Transformer model (default:
all-mpnet-base-v2) and build a FAISS index:python -c "from recommender.vector_store import build_optimized_index; build_optimized_index()"
- Encode text features using a Sentence-Transformer model (default:
-
Recommendation API
- Serve recommendations using a FastAPI service:
uvicorn recommender.server:app --host 0.0.0.0 --port 8000
- Optimized for sub-second responses with cached models, FAISS index, and course data.
- Serve recommendations using a FastAPI service:
-
Quiz-Based Personalization
- A Flask backend in
demo-quiz-app/backendaggregates user quiz history fromquiz_system.dband integrates with the recommender API. - Personalized recommendations are generated from quiz performance:
- Courses to strengthen weak areas
- Courses for advanced exploration based on strong performance
- Courses aligned with interests inferred from attempted quizzes
- A Flask backend in
-
Frontend Integration
demo-quiz-app/frontend-vite/displays results in three sections:- “You may want to improve in...”
- “You performed well in... try these advanced courses”
- “You’ve shown interest in... these may interest you”
- The user can still perform manual course searches from the same interface.
- Schema
unified_courses(course_id, source, title, description, url, provider, subject, level, language, tags_json, skills_json, ...)course_features(course_id, title, tags, level, language)
- Processing
- Text is normalized using NLTK tokenization, stopword removal, and lemmatization.
- URLs for Coursera are automatically prefixed with
https://www.coursera.org/when needed.
- Model:
sentence-transformers/all-mpnet-base-v2(768D)- Can be switched to a lighter alternative like
all-MiniLM-L6-v2.
- Can be switched to a lighter alternative like
- Index: FAISS inner-product search on normalized vectors (cosine similarity).
- API Endpoints
/recommend?query=<text>&top_k=10- Integrated quiz-driven route:
/api/user_recommendations?user_id=<id>
- Acts as a proxy to the recommendation engine.
- New endpoints:
/api/recommendations: direct search-based recommendations/api/user_recommendations: personalized recommendations based on quiz scores and recency
- Displays search-based and personalized recommendations.
- Fetches user data (including
user.idfrom local storage) and quiz-based results dynamically.
- Run Quiz Application:
cd demo-quiz-app ./dev.sh - Run Recommender API:
cd recommender uvicorn server:app --reload --host 0.0.0.0 --port 8000
- Run scrapers to collect data.
- Merge sources into a unified catalog.
- Build course features.
- Build the FAISS index.
- Start the FastAPI recommender.
- Run the Flask backend and React frontend for the demo app.
- Python 3.10+
- Core libraries (installed per component):
sentence-transformers,faiss-cpu,numpy,fastapi,uvicorn,flask,requestsnltk,sqlite3,pandas,beautifulsoup4(for scrapers)
- Node.js 18+ for the Vite frontend.
- All embeddings and FAISS indexes must be rebuilt if the model or course features change.
- Optimized vector loading and caching minimize memory overhead during repeated queries.
- The personalized recommendation service considers quiz accuracy, quiz recency, and topic relevance to rank courses.