llm-semantic Improvements
New Features
Reranking Support
- Two-stage retrieval with cross-encoder reranking for improved search precision
- Uses Cohere-compatible
/v1/rerankAPI endpoint - Auto-enabled when
LLM_SEMANTIC_RERANKER_API_URLenvironment variable is set - New CLI flags:
--rerank,--rerank-candidates,--rerank-threshold,--no-rerank - Recommended model: Qwen/Qwen3-Reranker-0.6B (~1GB VRAM)
Upload Progress with ETA
- Real-time progress feedback during embedding and upload phases
- TTY-aware: single-line updates on terminals, periodic logging in non-TTY
- Shows batch counts, chunk counts, percentage, and estimated time remaining
Bug Fixes
Qdrant Large Batch Upload Fix
- Fixed silent upload failures when indexing large codebases (100K+ chunks)
- Added automatic sub-batching in
QdrantStorage.CreateBatchfor batches > 100 points - Previously, uploading all chunks in a single request could timeout silently
Incremental Commit Indexing
- Redesigned
--embed-batch-sizeflow for crash recovery - Old: chunk ALL → embed ALL → store ALL → commit (no recovery if interrupted)
- New: for each batch: chunk → embed → store → commit (resumable from any point)
- Memory usage now bounded by batch size instead of entire codebase
--paralleland--batch-sizework within each batch for faster uploads
Usage
# Full performance setup for large Qdrant indexes
llm-semantic index . --storage qdrant \
--embed-batch-size 64 \
--batch-size 100 \
--parallel 4
# Resume interrupted indexing (just run same command again)
llm-semantic index . --storage qdrant --embed-batch-size 64
# Enable reranking
export LLM_SEMANTIC_RERANKER_API_URL=http://ai.lan:5000
llm-semantic search "authentication middleware" --top 10