feat(tools): vecflow — single-binary Go CLI for local vector embedding and semantic recall

## Summary

**vecflow** is a proposed all-in-one Go CLI tool for local vector embedding generation, indexing, and semantic recall/search. Single binary, no external dependencies, no server, no cloud API required.

## Motivation

Existing vector/embedding solutions require:
- Running servers (Chroma, Weaviate, Qdrant)
- Cloud APIs (OpenAI embeddings, Pinecone)
- Python environments (sentence-transformers, LangChain)
- Database backends (PostgreSQL + pgvector)

This creates friction for lightweight, CLI-native agent runtimes that need semantic search without infrastructure overhead.

## Design Principles

1. **Single binary**: One executable, cross-compiles to Linux/macOS/Windows/ARM
2. **Zero external dependencies**: No Docker, no database, no cloud account
3. **Filesystem-native state**: Index is a directory of files (Git-friendly, Dropbox-friendly)
4. **Local inference**: Embeddings generated on-device via ONNX runtime
5. **Hybrid retrieval**: BM25 lexical + vector semantic search with RRF fusion
6. **Fail-open**: Degrades gracefully when embeddings unavailable

## Proposed CLI Interface

```bash
# Initialize
vecflow init                              # Creates ./vecflow.db or memory dir
vecflow config model all-MiniLM-L6-v2     # Set embedding model (auto-downloads ONNX)

# Index content
vecflow add "Quick note or text"          # Embed and index inline text
vecflow add ./docs/*.md --recursive       # Index files with auto-chunking
vecflow add --chunk-size 512 --overlap 128 --batch 16

# Search
vecflow search "semantic query" --top-k 8 --min-score 0.7
# Output: ranked results with score, text snippet, source, metadata

# Manage
vecflow ls                                # List indexed items
vecflow stats                             # Count, dimensions, model info
vecflow rm --older-than 30d               # Prune old entries
vecflow export jsonl > backup.jsonl       # Export for portability

# Optional: serve OpenAI-compatible endpoint
vecflow serve --port 8080                 # /v1/embeddings API
```

## Technical Architecture

### Vector Storage: chromem-go

[chromem-go](https://github.com/philippgille/chromem-go) — pure Go, zero dependencies, Chroma-like API

```go
import "github.com/philippgille/chromem-go"

coll, _ := chromem.NewCollection("docs", nil)
coll.Add("doc1", "Your text here", embedding, map[string]any{"source": "file.md"})
results := coll.Query(queryEmbedding, 5, nil)  // top-5 similar
```

Features:
- In-memory with optional file persistence
- Built-in cosine/Euclidean similarity search
- Stores documents, embeddings ([]float32), metadata
- Fast for <500K items; add HNSW (USearch bindings) if scale grows

### Embedding Models

Local ONNX inference, no API calls:

| Model | Params | Dimensions | Speed | Use Case |
|-------|--------|------------|-------|----------|
| all-MiniLM-L6-v2 | 22M | 384 | <5ms/chunk | Default, CPU-friendly |
| nomic-embed-text | 137M | 768 | ~20ms/chunk | Higher quality via Ollama |
| pplx-embed-v1-0.6b | 600M | 1024 | ~50ms/chunk | Best quality, needs more RAM |

Models auto-download on first use. Quantized (int8) versions for smaller footprint.

### Hybrid Retrieval Pipeline

```
Query → ┬─→ BM25 (lexical) ──→ Lexical Ranks ─┐
        │                                      ├─→ RRF Fusion → Top-K Results
        └─→ Embed → kNN (vector) → Vector Ranks ─┘
```

**Reciprocal Rank Fusion:**
```
RRF(d) = Σ 1/(k + rank(d))   where k=60
```

- No score normalization needed
- Robust to different score distributions
- Proven in IR literature (Cormack et al. 2009)

### Directory Structure

```
./memory/
├── config.toml           # Model, chunk size, fusion weights
├── chunks/               # Chunked text segments
│   ├── chunk-001.txt
│   └── chunk-001.vec     # Sidecar embedding file
├── index/
│   ├── lexical.idx       # BM25 inverted index
│   └── vectors.idx       # Vector index (chromem-go format)
└── sources/              # Optional: original source files
```

## Implementation Sketch

```go
package main

import (
    "github.com/philippgille/chromem-go"
    "github.com/urfave/cli/v2"
    // ONNX runtime for embeddings
)

func main() {
    app := &cli.App{
        Name: "vecflow",
        Commands: []*cli.Command{
            {Name: "init", Action: initCmd},
            {Name: "add", Action: addCmd},
            {Name: "search", Action: searchCmd},
            {Name: "config", Action: configCmd},
            // ...
        },
    }
    app.Run(os.Args)
}

func embedText(text string, model string) []float32 {
    // Load ONNX model, run inference
    // Return embedding vector
}
```

## Why Go?

- **Cross-compilation**: `GOOS=linux GOARCH=arm64 go build` → ARM binary
- **Single binary**: No runtime dependencies, no DLLs
- **Concurrency**: Parallel batch embedding with goroutines
- **ONNX ecosystem**: onnxruntime-go, tract for inference
- **Matches sciclaw**: Same language, easy integration

## Latency Targets

| Operation | Target |
|-----------|--------|
| Embed single chunk | <50ms |
| Vector kNN (10K chunks) | <20ms |
| BM25 search | <20ms |
| Full hybrid recall | <200ms |

## Use Cases

1. **Agent memory**: Archive + recall for LLM agents (sciClaw, etc.)
2. **Local RAG**: Query your own documents without cloud APIs
3. **Note search**: Semantic search over Markdown/Obsidian vaults
4. **Code search**: Find similar code snippets by meaning
5. **Research**: Search papers/abstracts by concept

## Relationship to sciclaw

vecflow is designed as a **general-purpose tool** that sciclaw can integrate:

- sciclaw routes Discord channels to workspaces
- Each workspace gets its own vecflow memory directory
- `sciclaw memory` commands wrap vecflow functionality
- Per-channel isolation automatic via workspace separation

But vecflow itself is standalone — usable by any CLI agent or directly by users.

## Prior Art

| Tool | Approach | Limitation |
|------|----------|------------|
| Chroma | Python + server | Requires running service |
| LanceDB | Rust + Python bindings | Python-centric |
| Qdrant | Rust + server | Requires running service |
| sqlite-vss | SQLite extension | Requires SQLite, C deps |
| chromem-go | Pure Go library | Library, not CLI (we build on this) |

vecflow fills the gap: **CLI-first, single-binary, no server, pure Go**.

## Open Questions

1. Should vecflow be a separate repo or part of sciclaw?
2. Which ONNX runtime? (onnxruntime-go vs tract vs custom)
3. Include BM25 in binary or shell out to external tool?
4. Support for incremental index updates vs full rebuild?

## References

- [chromem-go](https://github.com/philippgille/chromem-go) — pure Go vector DB
- [onnxruntime-go](https://github.com/yalue/onnxruntime_go) — ONNX inference in Go
- [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) — embedding model
- [RRF paper](https://dl.acm.org/doi/10.1145/1571941.1572114) — Cormack et al. 2009
- [Context Rot](https://research.chromadb.dev/context-rot) — Hong et al. 2025

## Acceptance Criteria

- [ ] Single binary builds for Linux/macOS/Windows
- [ ] `vecflow add` indexes text with local embeddings
- [ ] `vecflow search` returns ranked hybrid results
- [ ] <200ms end-to-end latency on 10K chunks
- [ ] No external runtime dependencies
- [ ] Works offline (no network required after model download)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): vecflow — single-binary Go CLI for local vector embedding and semantic recall #85

Summary

Motivation

Design Principles

Proposed CLI Interface

Technical Architecture

Vector Storage: chromem-go

Embedding Models

Hybrid Retrieval Pipeline

Directory Structure

Implementation Sketch

Why Go?

Latency Targets

Use Cases

Relationship to sciclaw

Prior Art

Open Questions

References

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model	Params	Dimensions	Speed	Use Case
all-MiniLM-L6-v2	22M	384	<5ms/chunk	Default, CPU-friendly
nomic-embed-text	137M	768	~20ms/chunk	Higher quality via Ollama
pplx-embed-v1-0.6b	600M	1024	~50ms/chunk	Best quality, needs more RAM

Operation	Target
Embed single chunk	<50ms
Vector kNN (10K chunks)	<20ms
BM25 search	<20ms
Full hybrid recall	<200ms

Tool	Approach	Limitation
Chroma	Python + server	Requires running service
LanceDB	Rust + Python bindings	Python-centric
Qdrant	Rust + server	Requires running service
sqlite-vss	SQLite extension	Requires SQLite, C deps
chromem-go	Pure Go library	Library, not CLI (we build on this)

feat(tools): vecflow — single-binary Go CLI for local vector embedding and semantic recall #85

Description

Summary

Motivation

Design Principles

Proposed CLI Interface

Technical Architecture

Vector Storage: chromem-go

Embedding Models

Hybrid Retrieval Pipeline

Directory Structure

Implementation Sketch

Why Go?

Latency Targets

Use Cases

Relationship to sciclaw

Prior Art

Open Questions

References

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions