A production-grade, privacy-first Retrieval-Augmented Generation (RAG) system that runs entirely on your hardware.
LOCO (Local-Only Contextual Orchestration) is designed for organizations and individuals who need the power of RAG without the privacy risks or costs of cloud-based LLMs.
By combining FastAPI, Next.js 15, and LanceDB, LOCO provides a "chat-with-your-docs" experience that is 100% air-gapped ready.
- 🔒 Zero Data Leaks: Your documents and queries never leave your local machine.
- 🧠 Semantic Chunking: Intelligent document splitting that understands topic shifts (not just character counts).
- 🔍 Hybrid Retrieval: Combines vector similarity with keyword matching via LanceDB.
- 📚 Verified Citations: Every response includes clickable references to the exact source text.
- ⚡ Single-Command Startup: A custom
run.pyscript manages the backend, frontend, and environment for you.
LOCO RAG Engine operates on a decoupled client-server model designed for 100% local execution.
| Layer | Technology | Role |
|---|---|---|
| LLM | Ollama (llama3.2) |
Local reasoning engine |
| Embeddings | Ollama (nomic-embed-text) |
Text-to-vector transformation |
| Vector Store | LanceDB | High-performance, embedded vector database |
| Backend | FastAPI | High-concurrency REST API |
| Frontend | Next.js 15 (App Router) | Modern, responsive Chat UI with Shadcn/UI |
Ensure you have the following installed:
- Python 3.10+
- Node.js 18+
- Ollama — Download here
Pull the required models via terminal:
ollama pull llama3.2
ollama pull nomic-embed-textClone the repo and run the orchestrator:
git clone https://github.com/yourusername/loco-rag-engine.git
cd loco-rag-engine
python run.pyThe script will automatically create a virtual environment, install dependencies, and launch both the API (Port 8000) and the UI (Port 3000).
Once running, the following services are available:
- Frontend UI: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation (Swagger UI): http://localhost:8000/docs
loco-rag-engine/
├── 📂 backend/ # FastAPI application & RAG logic
│ ├── 📂 core/ # Engine, Semantic Processor & Security
│ └── 📂 data/ # Local LanceDB vector storage
├── 📂 frontend/ # Next.js 15 + Tailwind + Shadcn UI
│ ├── 📂 src/app/ # Chat and Admin routes
│ └── 📂 src/components/# UI components (Alerts, Buttons, etc.)
├── 📂 docs/ # Detailed technical documentation
├── 📄 run.py # Main entry point / Process orchestrator
└── 📄 README.md # You are here
| Topic | Description |
|---|---|
| Architecture | How the Semantic Chunking and LanceDB integration works. |
| API Reference | Documentation for /query, /ingest, and /admin endpoints. |
| Deployment | Deployment Guide. |
| Development | How to contribute and extend the LOCO engine. |
| Frontend UI | Frontend UI Documentation. |
| Troubleshooting | Common issues with Ollama connectivity or PDF parsing. |
We welcome contributions! Please see our Development Guide for local setup instructions.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
Built with ❤️ for the Local-First AI Community