Retrieval-Augmented Generation (RAG) stack that runs locally with Ollama for language models, Qdrant for vector search, and a Streamlit UI for document ingestion and chat. Use it to curate domain-specific knowledge bases.
- Python 3.10+
- Git
- Docker Desktop (or Docker Engine) with Compose support
- Ollama running locally
- At least 8 GB RAM recommended for the default LLM (llama3.2:1b); adjust models if needed
ℹ️ The repository uses local resources only—no external APIs are required once the models are pulled.
git clone https://github.com/<your-username>/local-rag-setup.git
cd local-rag-setup
# create virtual environment (Unix/macOS)
python -m venv .venv
source .venv/bin/activate
# on Windows PowerShell
python -m venv .venv
.venv\Scripts\Activate.ps1Install Python dependencies:
pip install --upgrade pip
pip install -r requirements.txtPrefer uv? Same steps look like:
uv venv --python 3.10 source .venv/bin/activate # or .venv\Scripts\Activate.ps1 on Windows uv pip install -r requirements.txtYou can then launch any command (e.g.,
uv run streamlit run app.py) without manually activating the environment.
The application reads configuration from environment variables (defaults are shown below). Create a .env file in the project root if you need to override them:
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=
QDRANT_COLLECTION=my_documents
EMBED_MODEL=mxbai-embed-large
GEN_MODEL=llama3.2:1b
RETRIEVE_K=5
INGEST_BATCH_SIZE=64
CHUNK_SIZE_DEFAULT=1000
CHUNK_OVERLAP_DEFAULT=100
DATA_PATH=data
LOG_DIR=conversation_logsQDRANT_COLLECTIONis the default collection used when the UI first loads; you can create or switch collections at runtime.DATA_PATHis where uploaded PDFs are stored (per collection).LOG_DIRholds chat transcripts generated by the CLI tool.RETRIEVE_Knumber how many chunks should be retrieved. Chunks are rated on relevancy
-
Launch Qdrant
docker compose up -d qdrant
Qdrant will listen on
http://localhost:6333by default. You can check container status withdocker compose ps.Qdrant UI is accesible on
http://localhost:6333/dashboardwhen using default url. -
Pull Ollama models
ollama pull mxbai-embed-large # embeddings ollama pull llama3.2:1b # generation
Ensure the Ollama service is running in the background (
ollama serve). You may substitute alternative models; updateEMBED_MODEL/GEN_MODELaccordingly.
streamlit run app.py- Open the Ingest Documents page.
- Pick an existing Qdrant collection or create a new one.
- Upload one or more PDF files (they will be saved under
<DATA_PATH>/<collection_name>/). - Adjust chunking parameters if necessary and click Process.
- A progress bar tracks loading, chunking, and upsert status. Once complete, the collection is ready for querying.
- Switch to the Chat with LLM page.
- Select the collection you want to interrogate. The first query will trigger retrieval using the configured
RETRIEVE_Kvalue. - Ask a question in the chat box. Responses include an expandable view of the chunks retrieved from Qdrant with source metadata.
- Each collection maintains its own conversation state; switching collections resets the visible history.
This part is purely for testing purposes and behaves as a Streamlit chat but in the terminal. the only difference is saving the conversation logs.
python query_LLM.py— interactive console assistant using the same retrieval chain as the UI. It writes transcripts toLOG_DIRand is handy for quick smoke tests.python rag_query_qdrant.py— single-shot helper that prints a model answer for a hard-coded prompt. Useful for integration checks or scripting.
Both commands respect the same configuration as the Streamlit app; ensure Qdrant and Ollama are running before invoking them.
.
├── app.py # Streamlit main UI for KB creation and chat
├── modules/ # Shared package
│ ├── __init__.py
│ ├── config.py # RAGSettings dataclass + helpers
│ ├── ingestion.py # DataIngestor with progress callbacks
│ ├── prompts.py # Prompt templates for all chains
│ └── qdrant_utils.py # Qdrant client, vector store, retriever
├── query_LLM.py # KB chat with saving conversation log
├── rag_query_qdrant.py # Minimal CLI smoke test
├── docker-compose.yml # Qdrant service definition
├── requirements.txt
└── conversation_logs/ # Created at runtime
- Ollama model not found: Verify
ollama listshows the configured model names. Pull them again if needed. - Qdrant connection errors: Ensure the container is running and the
QDRANT_URLin.envmatches the exposed port (docker compose logs qdrantcan help diagnose issues). - Large files ingest slowly: Increase
INGEST_BATCH_SIZEor switch to a more compact embedding model. You can monitor ingestion progress directly in the UI logs. - Permission errors creating directories: Confirm the process has write access to
DATA_PATHandLOG_DIR. Both are created automatically if the locations exist and are writable.