Skip to content

feat(tools): vecflow — single-binary Go CLI for local vector embedding and semantic recall #85

@drpedapati

Description

@drpedapati

Summary

vecflow is a proposed all-in-one Go CLI tool for local vector embedding generation, indexing, and semantic recall/search. Single binary, no external dependencies, no server, no cloud API required.

Motivation

Existing vector/embedding solutions require:

  • Running servers (Chroma, Weaviate, Qdrant)
  • Cloud APIs (OpenAI embeddings, Pinecone)
  • Python environments (sentence-transformers, LangChain)
  • Database backends (PostgreSQL + pgvector)

This creates friction for lightweight, CLI-native agent runtimes that need semantic search without infrastructure overhead.

Design Principles

  1. Single binary: One executable, cross-compiles to Linux/macOS/Windows/ARM
  2. Zero external dependencies: No Docker, no database, no cloud account
  3. Filesystem-native state: Index is a directory of files (Git-friendly, Dropbox-friendly)
  4. Local inference: Embeddings generated on-device via ONNX runtime
  5. Hybrid retrieval: BM25 lexical + vector semantic search with RRF fusion
  6. Fail-open: Degrades gracefully when embeddings unavailable

Proposed CLI Interface

# Initialize
vecflow init                              # Creates ./vecflow.db or memory dir
vecflow config model all-MiniLM-L6-v2     # Set embedding model (auto-downloads ONNX)

# Index content
vecflow add "Quick note or text"          # Embed and index inline text
vecflow add ./docs/*.md --recursive       # Index files with auto-chunking
vecflow add --chunk-size 512 --overlap 128 --batch 16

# Search
vecflow search "semantic query" --top-k 8 --min-score 0.7
# Output: ranked results with score, text snippet, source, metadata

# Manage
vecflow ls                                # List indexed items
vecflow stats                             # Count, dimensions, model info
vecflow rm --older-than 30d               # Prune old entries
vecflow export jsonl > backup.jsonl       # Export for portability

# Optional: serve OpenAI-compatible endpoint
vecflow serve --port 8080                 # /v1/embeddings API

Technical Architecture

Vector Storage: chromem-go

chromem-go — pure Go, zero dependencies, Chroma-like API

import "github.com/philippgille/chromem-go"

coll, _ := chromem.NewCollection("docs", nil)
coll.Add("doc1", "Your text here", embedding, map[string]any{"source": "file.md"})
results := coll.Query(queryEmbedding, 5, nil)  // top-5 similar

Features:

  • In-memory with optional file persistence
  • Built-in cosine/Euclidean similarity search
  • Stores documents, embeddings ([]float32), metadata
  • Fast for <500K items; add HNSW (USearch bindings) if scale grows

Embedding Models

Local ONNX inference, no API calls:

Model Params Dimensions Speed Use Case
all-MiniLM-L6-v2 22M 384 <5ms/chunk Default, CPU-friendly
nomic-embed-text 137M 768 ~20ms/chunk Higher quality via Ollama
pplx-embed-v1-0.6b 600M 1024 ~50ms/chunk Best quality, needs more RAM

Models auto-download on first use. Quantized (int8) versions for smaller footprint.

Hybrid Retrieval Pipeline

Query → ┬─→ BM25 (lexical) ──→ Lexical Ranks ─┐
        │                                      ├─→ RRF Fusion → Top-K Results
        └─→ Embed → kNN (vector) → Vector Ranks ─┘

Reciprocal Rank Fusion:

RRF(d) = Σ 1/(k + rank(d))   where k=60
  • No score normalization needed
  • Robust to different score distributions
  • Proven in IR literature (Cormack et al. 2009)

Directory Structure

./memory/
├── config.toml           # Model, chunk size, fusion weights
├── chunks/               # Chunked text segments
│   ├── chunk-001.txt
│   └── chunk-001.vec     # Sidecar embedding file
├── index/
│   ├── lexical.idx       # BM25 inverted index
│   └── vectors.idx       # Vector index (chromem-go format)
└── sources/              # Optional: original source files

Implementation Sketch

package main

import (
    "github.com/philippgille/chromem-go"
    "github.com/urfave/cli/v2"
    // ONNX runtime for embeddings
)

func main() {
    app := &cli.App{
        Name: "vecflow",
        Commands: []*cli.Command{
            {Name: "init", Action: initCmd},
            {Name: "add", Action: addCmd},
            {Name: "search", Action: searchCmd},
            {Name: "config", Action: configCmd},
            // ...
        },
    }
    app.Run(os.Args)
}

func embedText(text string, model string) []float32 {
    // Load ONNX model, run inference
    // Return embedding vector
}

Why Go?

  • Cross-compilation: GOOS=linux GOARCH=arm64 go build → ARM binary
  • Single binary: No runtime dependencies, no DLLs
  • Concurrency: Parallel batch embedding with goroutines
  • ONNX ecosystem: onnxruntime-go, tract for inference
  • Matches sciclaw: Same language, easy integration

Latency Targets

Operation Target
Embed single chunk <50ms
Vector kNN (10K chunks) <20ms
BM25 search <20ms
Full hybrid recall <200ms

Use Cases

  1. Agent memory: Archive + recall for LLM agents (sciClaw, etc.)
  2. Local RAG: Query your own documents without cloud APIs
  3. Note search: Semantic search over Markdown/Obsidian vaults
  4. Code search: Find similar code snippets by meaning
  5. Research: Search papers/abstracts by concept

Relationship to sciclaw

vecflow is designed as a general-purpose tool that sciclaw can integrate:

  • sciclaw routes Discord channels to workspaces
  • Each workspace gets its own vecflow memory directory
  • sciclaw memory commands wrap vecflow functionality
  • Per-channel isolation automatic via workspace separation

But vecflow itself is standalone — usable by any CLI agent or directly by users.

Prior Art

Tool Approach Limitation
Chroma Python + server Requires running service
LanceDB Rust + Python bindings Python-centric
Qdrant Rust + server Requires running service
sqlite-vss SQLite extension Requires SQLite, C deps
chromem-go Pure Go library Library, not CLI (we build on this)

vecflow fills the gap: CLI-first, single-binary, no server, pure Go.

Open Questions

  1. Should vecflow be a separate repo or part of sciclaw?
  2. Which ONNX runtime? (onnxruntime-go vs tract vs custom)
  3. Include BM25 in binary or shell out to external tool?
  4. Support for incremental index updates vs full rebuild?

References

Acceptance Criteria

  • Single binary builds for Linux/macOS/Windows
  • vecflow add indexes text with local embeddings
  • vecflow search returns ranked hybrid results
  • <200ms end-to-end latency on 10K chunks
  • No external runtime dependencies
  • Works offline (no network required after model download)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions