A Streamlit-based AI chatbot that allows students to interact with their textbooks using DeepSeek 1.5B LLM and vector-based retrieval for precise answers from uploaded PDFs.
PDF-Based Question Answering – Upload documents and chat with an AI that retrieves relevant content.
DeepSeek-1.5B AI Model – Uses DeepSeek-1.5B for efficient and context-aware responses.
Retrieval-Augmented Generation (RAG) – Uses ChromaDB for vector-based retrieval.
Streamlit Web Interface – Simple, user-friendly chatbot interface.
Supports Multiple PDFs – Process and query multiple textbooks at once.
git clone https://github.com/YOUR_GITHUB_USERNAME/ScholarChatAI.git
cd ScholarChatAIpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtEnsure you have Ollama installed and DeepSeek-1.5B model downloaded:
ollama pull deepseek-r1:1.5bstreamlit run main.py- Click Upload PDF Documents in the sidebar.
- Click Create Knowledge Base to process the documents.
- Type your query in the chat input.
- The chatbot retrieves relevant content and generates AI responses.
- The AI only answers from the uploaded PDFs.
- If content is missing, it responds:
"I cannot find relevant information in the provided documents."
- Converts PDF text into vector embeddings using
nomic-embed-text. - Stores them in ChromaDB for fast lookups.
- Uses Maximum Marginal Relevance (MMR) search for precise retrieval.
- Uses DeepSeek-1.5B via
ChatOllama. - Generates exam-friendly responses with examples.
- Follows structured, educational guidelines.
| Technology | Purpose |
|---|---|
| Python | Main programming language |
| Streamlit | Web interface for chatbot |
| LangChain | AI model interaction |
| ChromaDB | Vector-based document retrieval |
| Ollama | Local AI model inference |
| DeepSeek-1.5B | AI model for generating responses |
- ✅ Add support for custom LLM models.
- ✅ Improve retrieval accuracy for multi-document queries.
- 🚀 Implement fine-tuning options for specific subjects.
- 🚀 Add support for non-English documents.
- LangChain – For integrating RAG pipelines.
- DeepSeek – For providing high-quality AI models.
- Streamlit – For the intuitive web interface.
This project is licensed under the MIT License.
Pull requests are welcome!
For major changes, open an issue first to discuss your proposal.
- Fork the repo
- Create a feature branch (
git checkout -b feature-branch) - Commit changes (
git commit -m "Added new feature") - Push to branch (
git push origin feature-branch) - Open a PR on GitHub
For issues or feedback, open an issue or contact:
📧 Email: [kosamkar.r@northeastern.com]
🌐 GitHub: [github.com/rohit180497]

