Local Second Mind is a local-first RAG system for personal knowledge management. It ingests local documents, retrieves relevant context, and generates cited answers with configurable LLM providers.
This project is primarily for personal use. Pull requests or issues may not be reviewed unless they overlap with the maintainer's active use cases.
- Build and maintain a living local knowledge base.
- Support incremental ingest as files change.
- Provide semantic retrieval with optional reranking.
- Generate grounded answers with source citations.
- Keep corpus data local while using external LLM APIs only when configured.
Local Files
-> Parse -> Chunk -> Embed
-> Vector Store (ChromaDB or PostgreSQL/pgvector)
-> Retrieval + Optional Rerank + Synthesis
-> Answer + Citations
- Recursive ingest from configured roots
- Parsers for PDF, DOCX, Markdown, HTML, and text
- Deterministic chunking and local embeddings
- Incremental ingest via manifest and file metadata
- Optional AI tagging for chunks
- Progress callback support in ingest APIs and UI
- Semantic retrieval from local vector store
- Configurable relevance thresholds and retrieval depth
- Optional lexical/LLM/hybrid rerank strategies
- Source-policy modes (
grounded,insight,hybrid, custom modes) - Optional remote source blending via provider framework
- Cited synthesis with fallback behavior when provider calls fail
- Textual TUI for interactive query/ingest/settings workflows
- Single-shot CLI commands for ingest automation
- Structured config loading and validation (
config.jsonorconfig.yaml)
- Python 3.10+
pip install -e .Store secrets in .env (not in config files):
cp .env.example .envCommon variables:
OPENAI_API_KEYANTHROPIC_API_KEYGOOGLE_API_KEYAZURE_OPENAI_API_KEYAZURE_OPENAI_ENDPOINTBRAVE_API_KEY
See .env.example for the full list.
Default entrypoint:
lsmEquivalent module form:
python -m lsmRun lsm with no subcommand. Querying is done in the TUI Query tab.
lsm ingest build [--dry-run] [--force] [--skip-errors]
lsm ingest tag [--max N]
lsm ingest wipe --confirmGlobal options:
lsm --config path/to/config.json
lsm --verbose
lsm --log-level DEBUG
lsm --log-file logs/lsm.logConfiguration is loaded from config.json or config.yaml.
Minimal working example:
{
"roots": ["C:/Users/You/Documents"],
"vectordb": {
"provider": "chromadb",
"persist_dir": ".chroma",
"collection": "local_kb"
},
"llms": [
{
"provider_name": "openai",
"query": { "model": "gpt-5.2" }
}
]
}Important schema notes:
- Vector DB settings live under
vectordb(persist_dir,collection,provider). - LLM config is an ordered
llmslist with feature-level settings (query,tagging,ranking). - Query behavior is configured under
query. - Mode/source-policy behavior is configured under
modes. - Remote integrations are configured under
remote_providers. - Notes are configured globally under top-level
notes. - Optional top-level
global_foldercontrols default app data location.
For full configuration reference, see docs/user-guide/CONFIGURATION.md.
- Copy
example_config.jsontoconfig.jsonand adjust paths/models. - Add API keys to
.envas needed. - Build your collection:
lsm ingest build. - Launch TUI:
lsm. - Query from the Query tab and save notes if desired.
If enabling OCR for image-based PDFs, install the Tesseract executable and add
it to PATH. pytesseract is only a Python wrapper.
docs/user-guide/GETTING_STARTED.mddocs/user-guide/CLI_USAGE.mddocs/user-guide/CONFIGURATION.mddocs/architecture/OVERVIEW.mddocs/development/CHANGELOG.md
See LICENSE.