Skip to content

Chat with PDF - simple setup for creating and testing RAG and LLM capabilities

Notifications You must be signed in to change notification settings

Idemdnu/local-rag-setup

Repository files navigation

LOCAL RAG AND LLM SETUP - Chat with PDFs

Retrieval-Augmented Generation (RAG) stack that runs locally with Ollama for language models, Qdrant for vector search, and a Streamlit UI for document ingestion and chat. Use it to curate domain-specific knowledge bases.

Prerequisites

  • Python 3.10+
  • Git
  • Docker Desktop (or Docker Engine) with Compose support
  • Ollama running locally
  • At least 8 GB RAM recommended for the default LLM (llama3.2:1b); adjust models if needed

ℹ️ The repository uses local resources only—no external APIs are required once the models are pulled.

1. Clone & Virtual Environment

git clone https://github.com/<your-username>/local-rag-setup.git
cd local-rag-setup

# create virtual environment (Unix/macOS)
python -m venv .venv
source .venv/bin/activate

# on Windows PowerShell
python -m venv .venv
.venv\Scripts\Activate.ps1

Install Python dependencies:

pip install --upgrade pip
pip install -r requirements.txt

Prefer uv? Same steps look like:

uv venv --python 3.10
source .venv/bin/activate          # or .venv\Scripts\Activate.ps1 on Windows
uv pip install -r requirements.txt

You can then launch any command (e.g., uv run streamlit run app.py) without manually activating the environment.

2. Environment Configuration

The application reads configuration from environment variables (defaults are shown below). Create a .env file in the project root if you need to override them:

QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=
QDRANT_COLLECTION=my_documents
EMBED_MODEL=mxbai-embed-large
GEN_MODEL=llama3.2:1b
RETRIEVE_K=5
INGEST_BATCH_SIZE=64
CHUNK_SIZE_DEFAULT=1000
CHUNK_OVERLAP_DEFAULT=100
DATA_PATH=data
LOG_DIR=conversation_logs
  • QDRANT_COLLECTION is the default collection used when the UI first loads; you can create or switch collections at runtime.
  • DATA_PATH is where uploaded PDFs are stored (per collection).
  • LOG_DIR holds chat transcripts generated by the CLI tool.
  • RETRIEVE_K number how many chunks should be retrieved. Chunks are rated on relevancy

3. Start Services

  1. Launch Qdrant

    docker compose up -d qdrant

    Qdrant will listen on http://localhost:6333 by default. You can check container status with docker compose ps.

    Qdrant UI is accesible on http://localhost:6333/dashboard when using default url.

  2. Pull Ollama models

    ollama pull mxbai-embed-large   # embeddings
    ollama pull llama3.2:1b         # generation

    Ensure the Ollama service is running in the background (ollama serve). You may substitute alternative models; update EMBED_MODEL / GEN_MODEL accordingly.

4. Run the Streamlit App

streamlit run app.py

Ingest Workflow

  1. Open the Ingest Documents page.
  2. Pick an existing Qdrant collection or create a new one.
  3. Upload one or more PDF files (they will be saved under <DATA_PATH>/<collection_name>/).
  4. Adjust chunking parameters if necessary and click Process.
  5. A progress bar tracks loading, chunking, and upsert status. Once complete, the collection is ready for querying.

Chat Workflow

  1. Switch to the Chat with LLM page.
  2. Select the collection you want to interrogate. The first query will trigger retrieval using the configured RETRIEVE_K value.
  3. Ask a question in the chat box. Responses include an expandable view of the chunks retrieved from Qdrant with source metadata.
  4. Each collection maintains its own conversation state; switching collections resets the visible history.

5. CLI Utilities (Optional)

This part is purely for testing purposes and behaves as a Streamlit chat but in the terminal. the only difference is saving the conversation logs.

  • python query_LLM.py — interactive console assistant using the same retrieval chain as the UI. It writes transcripts to LOG_DIR and is handy for quick smoke tests.
  • python rag_query_qdrant.py — single-shot helper that prints a model answer for a hard-coded prompt. Useful for integration checks or scripting.

Both commands respect the same configuration as the Streamlit app; ensure Qdrant and Ollama are running before invoking them.

6. Project Layout

.
├── app.py                 # Streamlit main UI for KB creation and chat
├── modules/               # Shared package
│   ├── __init__.py
│   ├── config.py          # RAGSettings dataclass + helpers
│   ├── ingestion.py       # DataIngestor with progress callbacks
│   ├── prompts.py         # Prompt templates for all chains
│   └── qdrant_utils.py    # Qdrant client, vector store, retriever
├── query_LLM.py           # KB chat with saving conversation log
├── rag_query_qdrant.py    # Minimal CLI smoke test
├── docker-compose.yml     # Qdrant service definition
├── requirements.txt
└── conversation_logs/     # Created at runtime

7. Troubleshooting

  • Ollama model not found: Verify ollama list shows the configured model names. Pull them again if needed.
  • Qdrant connection errors: Ensure the container is running and the QDRANT_URL in .env matches the exposed port (docker compose logs qdrant can help diagnose issues).
  • Large files ingest slowly: Increase INGEST_BATCH_SIZE or switch to a more compact embedding model. You can monitor ingestion progress directly in the UI logs.
  • Permission errors creating directories: Confirm the process has write access to DATA_PATH and LOG_DIR. Both are created automatically if the locations exist and are writable.

About

Chat with PDF - simple setup for creating and testing RAG and LLM capabilities

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages