A powerful exam preparation assistant that allows students to upload their notes in the form of PDF, DOCX, or PPTX documents, submit multiple previous year questions in a single batch, and receive context-aware, accurate answers and a list of important topics using Retrieval-Augmented Generation (RAG) with LLMs.
- Multi-Document Upload: Supports PDF, DOCX, and PPTX formats.
- Multi-Question Input: Add multiple previous year questions at once β answers are batch-processed for speed and clarity.
- RAG Pipeline: Embeds documents using open-source models (e.g.
bge-base-en), stores them in a vector store using langgraph's InMemoryStore, and retrieves context based on the questions. - LLM Integration: Uses
Groq+llama-3.1-8b-instantfor grounded answer generation while using the context provided by RAG. - Important Topics: Extracts most frequently occuring topics from the answers and displays a concise summary.
- Automatic Cleanup: All uploaded files are removed after processing to keep your workspace clean.
- Frontend - HTML, CSS, JS
- Backend - Flask(Python), RAG Pipeline
- LLM Stack - LangChain, Groq, HuggingFace Embeddings
rag-project/
βββ app.py # Flask app entry point
βββ .env # API key (not tracked by Git)
βββ uploads/ # Temporary upload folder (auto-cleaned)
βββ templates/
β βββ index.html # Frontend HTML
βββ static/
β βββ css/style.css # Custom dark theme styles
β βββ js/script.js # AJAX + DOM updates
βββ llm_utils/
β βββ __init__.py
β βββ pipeline.py # RAG + embedding logic
βββ requirements.txt # Python dependencies- Clone the repository:
git clone https://github.com/your-username/rag-doc-qa.git
cd rag-doc-qa- Create a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate # For Windows: venv\Scripts\activate
pip install -r requirements.txt- Set up your .env file:
GROQ_API_KEY=your_groq_key_here- Run the app:
flask run- Generate important questions based on previous year papers
- Add handwritten notes support with OCR
- Add multi-user session support
- Enable document history and downloads
- Summarization refinement via separate LLM
- Deploy to cloud
An example use case (I have used notes for my Mechanical Engineering Course):


