Multi-PDF AI Question Answering System (Local + Cloud)
DocQuery-AI is an advanced AI-powered document intelligence system that allows users to upload, select, and query multiple PDF documents using semantic search and Retrieval-Augmented Generation (RAG).
It supports both offline (local LLM) and online (cloud LLM) modes, ensuring flexibility for development, privacy, and deployment.
Answers are strictly grounded in the uploaded PDFs, with page-level and section-level citations to prevent hallucinations.
🚀 DocQuery-AI is live here:
👉 https://docquery-ai-o9hwz6rjgeah75ytyv6l9y.streamlit.app/
⚡ This live deployment uses Groq LLM (
llama-3.1-8b-instant) for fast, free cloud inference.
- LLM: Groq –
llama-3.1-8b-instant - Inference: Cloud-based
- Cost: Free tier (no billing)
- Best for: Deployment, demos, sharing
- LLM: Ollama (e.g., Phi, Mistral)
- Inference: Fully local
- Internet: Not required
- Best for: Privacy-focused & offline use
- 📂 Multi-PDF Upload & Selection
- 🔍 Semantic Search using FAISS
- 🧠 LLM-powered Answers (Groq or Ollama)
- 📄 Page-wise & Section-wise Explanations
- 🧾 Source Citations (PDF | Section | Page)
- 📊 Confidence Scoring
- 🛡️ Hallucination Control (PDF-grounded answers only)
- ⚡ Fast & Lightweight UI (Streamlit)
- 💸 No mandatory billing required
PDFs └── Text Extraction (PyMuPDF) └── Section Detection (Font-based) └── Embeddings (SentenceTransformers) └── Vector Store (FAISS) └── Query Retrieval └── Local LLM (Ollama) └── Answer + Sources
| Layer | Technology |
|---|---|
| UI | Streamlit |
| PDF Parsing | PyMuPDF |
| Embeddings | SentenceTransformers |
| Vector Database | FAISS |
| LLM | Ollama (phi / mistral) |
| Language | Python |
git clone https://github.com/Subhajit-Das-1/DocQuery-AI.git
cd DocQuery-AI
2️⃣ Install Dependencies
pip install -r requirements.txt
3️⃣ Install Ollama & Pull Model
ollama pull phi
4️⃣ Run the Application
streamlit run app.py- Upload one or more PDF files
- Select PDFs from the sidebar
- Ask questions such as:
- What is inside this PDF?
- Explain page 10
- Explain continuous-time signals
- View:
- ✅ AI-generated answer
- 📊 Confidence score
- 📌 Source pages & sections
- 🔍 Highlighted reference text
- 🧠 OCR support for scanned PDFs
- 📊 PDF comparison mode
- 📝 Auto-generated notes & summaries
- ❓ MCQ / exam question generator
- 📤 Export answers to PDF / DOCX
This project is licensed under the MIT License.
Subhajit Das