Skip to content

v1.7.0

Latest

Choose a tag to compare

@samestrin samestrin released this 21 Jan 01:54
· 14 commits to main since this release

llm-semantic Improvements

New Features

Reranking Support

  • Two-stage retrieval with cross-encoder reranking for improved search precision
  • Uses Cohere-compatible /v1/rerank API endpoint
  • Auto-enabled when LLM_SEMANTIC_RERANKER_API_URL environment variable is set
  • New CLI flags: --rerank, --rerank-candidates, --rerank-threshold, --no-rerank
  • Recommended model: Qwen/Qwen3-Reranker-0.6B (~1GB VRAM)

Upload Progress with ETA

  • Real-time progress feedback during embedding and upload phases
  • TTY-aware: single-line updates on terminals, periodic logging in non-TTY
  • Shows batch counts, chunk counts, percentage, and estimated time remaining

Bug Fixes

Qdrant Large Batch Upload Fix

  • Fixed silent upload failures when indexing large codebases (100K+ chunks)
  • Added automatic sub-batching in QdrantStorage.CreateBatch for batches > 100 points
  • Previously, uploading all chunks in a single request could timeout silently

Incremental Commit Indexing

  • Redesigned --embed-batch-size flow for crash recovery
  • Old: chunk ALL → embed ALL → store ALL → commit (no recovery if interrupted)
  • New: for each batch: chunk → embed → store → commit (resumable from any point)
  • Memory usage now bounded by batch size instead of entire codebase
  • --parallel and --batch-size work within each batch for faster uploads

Usage

# Full performance setup for large Qdrant indexes
llm-semantic index . --storage qdrant \
  --embed-batch-size 64 \
  --batch-size 100 \
  --parallel 4

# Resume interrupted indexing (just run same command again)
llm-semantic index . --storage qdrant --embed-batch-size 64

# Enable reranking
export LLM_SEMANTIC_RERANKER_API_URL=http://ai.lan:5000
llm-semantic search "authentication middleware" --top 10