Skip to content

feat: Memory & knowledge system enhancements (local embeddings, batching, citation, memory CLI) #87

@ilblackdragon

Description

@ilblackdragon

Feature Parity: Memory & Knowledge Enhancements

Priority: P2-P3
Source: FEATURE_PARITY.md — Memory & Knowledge System

IronClaw has vector memory, hybrid search (BM25 + vector via RRF), OpenAI embeddings, flexible path structure, identity files, daily logs, and heartbeat. Several features are missing.

Missing

  • Local embeddings (P2) — Run embedding models locally (e.g., fastembed or candle crate)
  • Gemini embeddings (P3) — Google embedding API support
  • Embeddings batching (P2) — Batch embedding requests for efficiency
  • Citation support (P2) — Track source attribution in search results
  • Memory CLI commands (P2) — memory search/index/status subcommands
  • SQLite-vec backend (P3) — Alternative to pgvector
  • LanceDB backend (P3) — Alternative vector store
  • QMD backend (P3) — Alternative vector store

Related PRs

Related Issues

Notes

  • Local embeddings are important for offline/privacy use cases
  • The EmbeddingProvider trait in src/workspace/embeddings.rs makes adding backends straightforward
  • libSQL backend already has partial vector support via libsql_vector_idx (not yet wired)

Design Considerations

Current EmbeddingProvider Trait

src/workspace/embeddings.rs defines the trait with 5 methods:

pub trait EmbeddingProvider: Send + Sync {
    fn dimension(&self) -> usize;
    fn model_name(&self) -> &str;
    fn max_input_length(&self) -> usize;
    async fn embed(&self, text: &str) -> Result<Vec<f32>, EmbeddingError>;
    async fn embed_batch(&self, texts: &[String]) -> Result<Vec<Vec<f32>>, EmbeddingError>;
}

Three implementations exist: OpenAiEmbeddings (1536/3072 dims), NearAiEmbeddings (1536 dims), MockEmbeddings (configurable).

Local Embeddings

Recommended crate: fastembed-rs — Rust bindings for ONNX-based embedding models. Supports all-MiniLM-L6-v2 (384 dims), BAAI/bge-small-en-v1.5 (384 dims), and others.

Alternative: candle — Hugging Face's Rust ML framework. More flexible but requires more setup.

Implementation:

pub struct LocalEmbeddings {
    model: fastembed::TextEmbedding,
    dimension: usize,
    model_name: String,
}

impl LocalEmbeddings {
    pub fn new(model_name: &str) -> Result<Self, EmbeddingError> {
        let model = fastembed::TextEmbedding::try_new(
            fastembed::InitOptions::new(model_name.parse()?)
        )?;
        Ok(Self { model, dimension: 384, model_name: model_name.to_string() })
    }
}

Trade-offs:

  • Pro: No API key, no network, no cost, fast for small batches
  • Con: Lower quality than OpenAI ada-002/3-small, CPU-bound (blocks tokio unless spawned on blocking pool), model download on first use (~90MB for MiniLM)
  • Dimension mismatch: Local models produce 384-dim vectors vs OpenAI's 1536. Cannot mix in same vector index — would need re-embedding all documents on provider switch.

Config:

EMBEDDING_PROVIDER=local                 # or "openai", "nearai"
EMBEDDING_LOCAL_MODEL=all-MiniLM-L6-v2  # Model name

Embeddings Batching

Current behavior: backfill_embeddings() in src/workspace/mod.rs fetches 100 chunks without embeddings and embeds them sequentially via the default embed_batch() implementation (which calls embed() in a loop).

OpenAI API supports up to 2,048 inputs per batch request. The OpenAiEmbeddings::embed_batch() already sends all texts in a single HTTP request, but backfill_embeddings() could be optimized:

// Current (sequential batches of 100):
let chunks = storage.chunks_without_embeddings(100).await?;
for chunk in &chunks {
    let embedding = provider.embed(&chunk.content).await?;
    storage.update_chunk_embedding(chunk.id, &embedding).await?;
}

// Optimized (batch embed + batch update):
let chunks = storage.chunks_without_embeddings(500).await?;
let texts: Vec<String> = chunks.iter().map(|c| c.content.clone()).collect();
let embeddings = provider.embed_batch(&texts).await?;  // Single API call
for (chunk, embedding) in chunks.iter().zip(embeddings) {
    storage.update_chunk_embedding(chunk.id, &embedding).await?;
}

Estimated improvement: 100 sequential API calls → 1 batch call. ~50x faster for backfill operations.

Citation Support

Current SearchResult already tracks document_id, chunk_id, content, score, fts_rank, vector_rank. Citation needs:

  1. Document path — Already available via document_idmemory_documents.path join
  2. Chunk position — Add chunk_index: usize to MemoryChunk (ordinal position within document)
  3. Line range — Compute from chunk offset within original document
  4. Retrieval method attributionis_hybrid() already returns true for cross-method hits; could expose per-method scores

Memory tool output with citations:

{
  "results": [
    {
      "content": "User prefers dark mode...",
      "score": 0.95,
      "citation": {
        "path": "context/preferences.md",
        "chunk": 2,
        "lines": "15-28",
        "methods": ["fts", "vector"]
      }
    }
  ]
}

PR #63 is already working on this — coordinate to avoid duplication.

Memory CLI Commands

Already partially implemented: src/cli/mod.rs has Memory(MemoryCommand) with subcommands search, read, write, tree, index. These mirror the LLM-callable memory tools.

Additional CLI commands needed:

  • memory status — Show embedding stats (total chunks, chunks with/without embeddings, last backfill time)
  • memory reindex — Force re-chunk and re-embed all documents
  • memory export — Dump workspace to filesystem for backup
  • memory import — Import from filesystem or OpenClaw format (Import OpenClaw memory, history and settings #58)

Vector Backend Completion (libSQL)

Current state: libSQL schema has libsql_vector_idx index on memory_chunks.embedding column, but the hybrid_search() implementation in src/db/libsql_backend.rs only uses FTS5 (vector search not wired).

To complete:

-- Vector similarity search in libSQL:
SELECT id, content, vector_distance_cos(embedding, vector('[0.1, 0.2, ...]'))
FROM memory_chunks
WHERE rowid IN (
    SELECT rowid FROM vector_top_k('memory_chunks_embedding_idx', vector('[0.1, 0.2, ...]'), 50)
)

This would bring libSQL backend to full feature parity with PostgreSQL's pgvector cosine distance.


Success Criteria

  1. Local embeddings work offline: EMBEDDING_PROVIDER=local generates embeddings without any API calls or network access
  2. Batch embedding is measurably faster: Backfill of 1000 chunks completes in < 30s with batch API calls (vs ~5min with sequential)
  3. Citations include document path and line range: Search results contain enough metadata to link back to source document location
  4. CLI memory status shows embedding health: Displays total/embedded/pending chunk counts and last backfill timestamp
  5. libSQL vector search produces relevant results: Same query returns comparable top-5 results on both PostgreSQL and libSQL backends
  6. Provider switch re-embeds: Changing embedding provider triggers automatic re-embedding of all chunks (dimension mismatch detected and handled)
  7. Model download is user-confirmed: First use of local embeddings prompts user to download model files before proceeding

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions