-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
vecflow is a proposed all-in-one Go CLI tool for local vector embedding generation, indexing, and semantic recall/search. Single binary, no external dependencies, no server, no cloud API required.
Motivation
Existing vector/embedding solutions require:
- Running servers (Chroma, Weaviate, Qdrant)
- Cloud APIs (OpenAI embeddings, Pinecone)
- Python environments (sentence-transformers, LangChain)
- Database backends (PostgreSQL + pgvector)
This creates friction for lightweight, CLI-native agent runtimes that need semantic search without infrastructure overhead.
Design Principles
- Single binary: One executable, cross-compiles to Linux/macOS/Windows/ARM
- Zero external dependencies: No Docker, no database, no cloud account
- Filesystem-native state: Index is a directory of files (Git-friendly, Dropbox-friendly)
- Local inference: Embeddings generated on-device via ONNX runtime
- Hybrid retrieval: BM25 lexical + vector semantic search with RRF fusion
- Fail-open: Degrades gracefully when embeddings unavailable
Proposed CLI Interface
# Initialize
vecflow init # Creates ./vecflow.db or memory dir
vecflow config model all-MiniLM-L6-v2 # Set embedding model (auto-downloads ONNX)
# Index content
vecflow add "Quick note or text" # Embed and index inline text
vecflow add ./docs/*.md --recursive # Index files with auto-chunking
vecflow add --chunk-size 512 --overlap 128 --batch 16
# Search
vecflow search "semantic query" --top-k 8 --min-score 0.7
# Output: ranked results with score, text snippet, source, metadata
# Manage
vecflow ls # List indexed items
vecflow stats # Count, dimensions, model info
vecflow rm --older-than 30d # Prune old entries
vecflow export jsonl > backup.jsonl # Export for portability
# Optional: serve OpenAI-compatible endpoint
vecflow serve --port 8080 # /v1/embeddings APITechnical Architecture
Vector Storage: chromem-go
chromem-go — pure Go, zero dependencies, Chroma-like API
import "github.com/philippgille/chromem-go"
coll, _ := chromem.NewCollection("docs", nil)
coll.Add("doc1", "Your text here", embedding, map[string]any{"source": "file.md"})
results := coll.Query(queryEmbedding, 5, nil) // top-5 similarFeatures:
- In-memory with optional file persistence
- Built-in cosine/Euclidean similarity search
- Stores documents, embeddings ([]float32), metadata
- Fast for <500K items; add HNSW (USearch bindings) if scale grows
Embedding Models
Local ONNX inference, no API calls:
| Model | Params | Dimensions | Speed | Use Case |
|---|---|---|---|---|
| all-MiniLM-L6-v2 | 22M | 384 | <5ms/chunk | Default, CPU-friendly |
| nomic-embed-text | 137M | 768 | ~20ms/chunk | Higher quality via Ollama |
| pplx-embed-v1-0.6b | 600M | 1024 | ~50ms/chunk | Best quality, needs more RAM |
Models auto-download on first use. Quantized (int8) versions for smaller footprint.
Hybrid Retrieval Pipeline
Query → ┬─→ BM25 (lexical) ──→ Lexical Ranks ─┐
│ ├─→ RRF Fusion → Top-K Results
└─→ Embed → kNN (vector) → Vector Ranks ─┘
Reciprocal Rank Fusion:
RRF(d) = Σ 1/(k + rank(d)) where k=60
- No score normalization needed
- Robust to different score distributions
- Proven in IR literature (Cormack et al. 2009)
Directory Structure
./memory/
├── config.toml # Model, chunk size, fusion weights
├── chunks/ # Chunked text segments
│ ├── chunk-001.txt
│ └── chunk-001.vec # Sidecar embedding file
├── index/
│ ├── lexical.idx # BM25 inverted index
│ └── vectors.idx # Vector index (chromem-go format)
└── sources/ # Optional: original source files
Implementation Sketch
package main
import (
"github.com/philippgille/chromem-go"
"github.com/urfave/cli/v2"
// ONNX runtime for embeddings
)
func main() {
app := &cli.App{
Name: "vecflow",
Commands: []*cli.Command{
{Name: "init", Action: initCmd},
{Name: "add", Action: addCmd},
{Name: "search", Action: searchCmd},
{Name: "config", Action: configCmd},
// ...
},
}
app.Run(os.Args)
}
func embedText(text string, model string) []float32 {
// Load ONNX model, run inference
// Return embedding vector
}Why Go?
- Cross-compilation:
GOOS=linux GOARCH=arm64 go build→ ARM binary - Single binary: No runtime dependencies, no DLLs
- Concurrency: Parallel batch embedding with goroutines
- ONNX ecosystem: onnxruntime-go, tract for inference
- Matches sciclaw: Same language, easy integration
Latency Targets
| Operation | Target |
|---|---|
| Embed single chunk | <50ms |
| Vector kNN (10K chunks) | <20ms |
| BM25 search | <20ms |
| Full hybrid recall | <200ms |
Use Cases
- Agent memory: Archive + recall for LLM agents (sciClaw, etc.)
- Local RAG: Query your own documents without cloud APIs
- Note search: Semantic search over Markdown/Obsidian vaults
- Code search: Find similar code snippets by meaning
- Research: Search papers/abstracts by concept
Relationship to sciclaw
vecflow is designed as a general-purpose tool that sciclaw can integrate:
- sciclaw routes Discord channels to workspaces
- Each workspace gets its own vecflow memory directory
sciclaw memorycommands wrap vecflow functionality- Per-channel isolation automatic via workspace separation
But vecflow itself is standalone — usable by any CLI agent or directly by users.
Prior Art
| Tool | Approach | Limitation |
|---|---|---|
| Chroma | Python + server | Requires running service |
| LanceDB | Rust + Python bindings | Python-centric |
| Qdrant | Rust + server | Requires running service |
| sqlite-vss | SQLite extension | Requires SQLite, C deps |
| chromem-go | Pure Go library | Library, not CLI (we build on this) |
vecflow fills the gap: CLI-first, single-binary, no server, pure Go.
Open Questions
- Should vecflow be a separate repo or part of sciclaw?
- Which ONNX runtime? (onnxruntime-go vs tract vs custom)
- Include BM25 in binary or shell out to external tool?
- Support for incremental index updates vs full rebuild?
References
- chromem-go — pure Go vector DB
- onnxruntime-go — ONNX inference in Go
- all-MiniLM-L6-v2 — embedding model
- RRF paper — Cormack et al. 2009
- Context Rot — Hong et al. 2025
Acceptance Criteria
- Single binary builds for Linux/macOS/Windows
-
vecflow addindexes text with local embeddings -
vecflow searchreturns ranked hybrid results - <200ms end-to-end latency on 10K chunks
- No external runtime dependencies
- Works offline (no network required after model download)