Release v1.7.0 · samestrin/llm-tools

llm-semantic Improvements

New Features

Reranking Support

Two-stage retrieval with cross-encoder reranking for improved search precision
Uses Cohere-compatible /v1/rerank API endpoint
Auto-enabled when LLM_SEMANTIC_RERANKER_API_URL environment variable is set
New CLI flags: --rerank, --rerank-candidates, --rerank-threshold, --no-rerank
Recommended model: Qwen/Qwen3-Reranker-0.6B (~1GB VRAM)

Upload Progress with ETA

Real-time progress feedback during embedding and upload phases
TTY-aware: single-line updates on terminals, periodic logging in non-TTY
Shows batch counts, chunk counts, percentage, and estimated time remaining

Bug Fixes

Qdrant Large Batch Upload Fix

Fixed silent upload failures when indexing large codebases (100K+ chunks)
Added automatic sub-batching in QdrantStorage.CreateBatch for batches > 100 points
Previously, uploading all chunks in a single request could timeout silently

Incremental Commit Indexing

Redesigned --embed-batch-size flow for crash recovery
Old: chunk ALL → embed ALL → store ALL → commit (no recovery if interrupted)
New: for each batch: chunk → embed → store → commit (resumable from any point)
Memory usage now bounded by batch size instead of entire codebase
--parallel and --batch-size work within each batch for faster uploads

Usage

# Full performance setup for large Qdrant indexes
llm-semantic index . --storage qdrant \
  --embed-batch-size 64 \
  --batch-size 100 \
  --parallel 4

# Resume interrupted indexing (just run same command again)
llm-semantic index . --storage qdrant --embed-batch-size 64

# Enable reranking
export LLM_SEMANTIC_RERANKER_API_URL=http://ai.lan:5000
llm-semantic search "authentication middleware" --top 10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.7.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

llm-semantic Improvements

New Features

Bug Fixes

Usage

Uh oh!