-
Notifications
You must be signed in to change notification settings - Fork 255
Description
Feature Parity: Memory & Knowledge Enhancements
Priority: P2-P3
Source: FEATURE_PARITY.md — Memory & Knowledge System
IronClaw has vector memory, hybrid search (BM25 + vector via RRF), OpenAI embeddings, flexible path structure, identity files, daily logs, and heartbeat. Several features are missing.
Missing
- Local embeddings (P2) — Run embedding models locally (e.g.,
fastembedorcandlecrate) - Gemini embeddings (P3) — Google embedding API support
- Embeddings batching (P2) — Batch embedding requests for efficiency
- Citation support (P2) — Track source attribution in search results
- Memory CLI commands (P2) —
memory search/index/statussubcommands - SQLite-vec backend (P3) — Alternative to pgvector
- LanceDB backend (P3) — Alternative vector store
- QMD backend (P3) — Alternative vector store
Related PRs
- feat(memory): add citation support, line tracking, and cognitive routines #63 — Citation support, line tracking, and cognitive routines (in progress)
Related Issues
- Import OpenClaw memory, history and settings #58 — Import OpenClaw memory, history, and settings
- feat: Audio pipeline (speech-to-text, text-to-speech, voice note handling) #90 — Audio pipeline (transcripts stored in workspace memory)
- feat: Meeting intelligence pipeline (recording, transcription, action items, proactive follow-up) #91 — Meeting intelligence pipeline (transcripts and action items in memory)
Notes
- Local embeddings are important for offline/privacy use cases
- The
EmbeddingProvidertrait insrc/workspace/embeddings.rsmakes adding backends straightforward - libSQL backend already has partial vector support via
libsql_vector_idx(not yet wired)
Design Considerations
Current EmbeddingProvider Trait
src/workspace/embeddings.rs defines the trait with 5 methods:
pub trait EmbeddingProvider: Send + Sync {
fn dimension(&self) -> usize;
fn model_name(&self) -> &str;
fn max_input_length(&self) -> usize;
async fn embed(&self, text: &str) -> Result<Vec<f32>, EmbeddingError>;
async fn embed_batch(&self, texts: &[String]) -> Result<Vec<Vec<f32>>, EmbeddingError>;
}Three implementations exist: OpenAiEmbeddings (1536/3072 dims), NearAiEmbeddings (1536 dims), MockEmbeddings (configurable).
Local Embeddings
Recommended crate: fastembed-rs — Rust bindings for ONNX-based embedding models. Supports all-MiniLM-L6-v2 (384 dims), BAAI/bge-small-en-v1.5 (384 dims), and others.
Alternative: candle — Hugging Face's Rust ML framework. More flexible but requires more setup.
Implementation:
pub struct LocalEmbeddings {
model: fastembed::TextEmbedding,
dimension: usize,
model_name: String,
}
impl LocalEmbeddings {
pub fn new(model_name: &str) -> Result<Self, EmbeddingError> {
let model = fastembed::TextEmbedding::try_new(
fastembed::InitOptions::new(model_name.parse()?)
)?;
Ok(Self { model, dimension: 384, model_name: model_name.to_string() })
}
}Trade-offs:
- Pro: No API key, no network, no cost, fast for small batches
- Con: Lower quality than OpenAI ada-002/3-small, CPU-bound (blocks tokio unless spawned on blocking pool), model download on first use (~90MB for MiniLM)
- Dimension mismatch: Local models produce 384-dim vectors vs OpenAI's 1536. Cannot mix in same vector index — would need re-embedding all documents on provider switch.
Config:
EMBEDDING_PROVIDER=local # or "openai", "nearai"
EMBEDDING_LOCAL_MODEL=all-MiniLM-L6-v2 # Model nameEmbeddings Batching
Current behavior: backfill_embeddings() in src/workspace/mod.rs fetches 100 chunks without embeddings and embeds them sequentially via the default embed_batch() implementation (which calls embed() in a loop).
OpenAI API supports up to 2,048 inputs per batch request. The OpenAiEmbeddings::embed_batch() already sends all texts in a single HTTP request, but backfill_embeddings() could be optimized:
// Current (sequential batches of 100):
let chunks = storage.chunks_without_embeddings(100).await?;
for chunk in &chunks {
let embedding = provider.embed(&chunk.content).await?;
storage.update_chunk_embedding(chunk.id, &embedding).await?;
}
// Optimized (batch embed + batch update):
let chunks = storage.chunks_without_embeddings(500).await?;
let texts: Vec<String> = chunks.iter().map(|c| c.content.clone()).collect();
let embeddings = provider.embed_batch(&texts).await?; // Single API call
for (chunk, embedding) in chunks.iter().zip(embeddings) {
storage.update_chunk_embedding(chunk.id, &embedding).await?;
}Estimated improvement: 100 sequential API calls → 1 batch call. ~50x faster for backfill operations.
Citation Support
Current SearchResult already tracks document_id, chunk_id, content, score, fts_rank, vector_rank. Citation needs:
- Document path — Already available via
document_id→memory_documents.pathjoin - Chunk position — Add
chunk_index: usizetoMemoryChunk(ordinal position within document) - Line range — Compute from chunk offset within original document
- Retrieval method attribution —
is_hybrid()already returns true for cross-method hits; could expose per-method scores
Memory tool output with citations:
{
"results": [
{
"content": "User prefers dark mode...",
"score": 0.95,
"citation": {
"path": "context/preferences.md",
"chunk": 2,
"lines": "15-28",
"methods": ["fts", "vector"]
}
}
]
}PR #63 is already working on this — coordinate to avoid duplication.
Memory CLI Commands
Already partially implemented: src/cli/mod.rs has Memory(MemoryCommand) with subcommands search, read, write, tree, index. These mirror the LLM-callable memory tools.
Additional CLI commands needed:
memory status— Show embedding stats (total chunks, chunks with/without embeddings, last backfill time)memory reindex— Force re-chunk and re-embed all documentsmemory export— Dump workspace to filesystem for backupmemory import— Import from filesystem or OpenClaw format (Import OpenClaw memory, history and settings #58)
Vector Backend Completion (libSQL)
Current state: libSQL schema has libsql_vector_idx index on memory_chunks.embedding column, but the hybrid_search() implementation in src/db/libsql_backend.rs only uses FTS5 (vector search not wired).
To complete:
-- Vector similarity search in libSQL:
SELECT id, content, vector_distance_cos(embedding, vector('[0.1, 0.2, ...]'))
FROM memory_chunks
WHERE rowid IN (
SELECT rowid FROM vector_top_k('memory_chunks_embedding_idx', vector('[0.1, 0.2, ...]'), 50)
)This would bring libSQL backend to full feature parity with PostgreSQL's pgvector cosine distance.
Success Criteria
- Local embeddings work offline:
EMBEDDING_PROVIDER=localgenerates embeddings without any API calls or network access - Batch embedding is measurably faster: Backfill of 1000 chunks completes in < 30s with batch API calls (vs ~5min with sequential)
- Citations include document path and line range: Search results contain enough metadata to link back to source document location
- CLI
memory statusshows embedding health: Displays total/embedded/pending chunk counts and last backfill timestamp - libSQL vector search produces relevant results: Same query returns comparable top-5 results on both PostgreSQL and libSQL backends
- Provider switch re-embeds: Changing embedding provider triggers automatic re-embedding of all chunks (dimension mismatch detected and handled)
- Model download is user-confirmed: First use of local embeddings prompts user to download model files before proceeding