feat: Memory & knowledge system enhancements (local embeddings, batching, citation, memory CLI)

## Feature Parity: Memory & Knowledge Enhancements

**Priority:** P2-P3
**Source:** `FEATURE_PARITY.md` — Memory & Knowledge System

IronClaw has vector memory, hybrid search (BM25 + vector via RRF), OpenAI embeddings, flexible path structure, identity files, daily logs, and heartbeat. Several features are missing.

### Missing

- [ ] **Local embeddings** (P2) — Run embedding models locally (e.g., `fastembed` or `candle` crate)
- [ ] **Gemini embeddings** (P3) — Google embedding API support
- [ ] **Embeddings batching** (P2) — Batch embedding requests for efficiency
- [ ] **Citation support** (P2) — Track source attribution in search results
- [ ] **Memory CLI commands** (P2) — `memory search/index/status` subcommands
- [ ] **SQLite-vec backend** (P3) — Alternative to pgvector
- [ ] **LanceDB backend** (P3) — Alternative vector store
- [ ] **QMD backend** (P3) — Alternative vector store

### Related PRs

- #63 — Citation support, line tracking, and cognitive routines (in progress)

### Related Issues

- #58 — Import OpenClaw memory, history, and settings
- #90 — Audio pipeline (transcripts stored in workspace memory)
- #91 — Meeting intelligence pipeline (transcripts and action items in memory)

### Notes

- Local embeddings are important for offline/privacy use cases
- The `EmbeddingProvider` trait in `src/workspace/embeddings.rs` makes adding backends straightforward
- libSQL backend already has partial vector support via `libsql_vector_idx` (not yet wired)

---

## Design Considerations

### Current EmbeddingProvider Trait

`src/workspace/embeddings.rs` defines the trait with 5 methods:

```rust
pub trait EmbeddingProvider: Send + Sync {
    fn dimension(&self) -> usize;
    fn model_name(&self) -> &str;
    fn max_input_length(&self) -> usize;
    async fn embed(&self, text: &str) -> Result<Vec<f32>, EmbeddingError>;
    async fn embed_batch(&self, texts: &[String]) -> Result<Vec<Vec<f32>>, EmbeddingError>;
}
```

Three implementations exist: `OpenAiEmbeddings` (1536/3072 dims), `NearAiEmbeddings` (1536 dims), `MockEmbeddings` (configurable).

### Local Embeddings

**Recommended crate: `fastembed-rs`** — Rust bindings for ONNX-based embedding models. Supports `all-MiniLM-L6-v2` (384 dims), `BAAI/bge-small-en-v1.5` (384 dims), and others.

**Alternative: `candle`** — Hugging Face's Rust ML framework. More flexible but requires more setup.

**Implementation:**
```rust
pub struct LocalEmbeddings {
    model: fastembed::TextEmbedding,
    dimension: usize,
    model_name: String,
}

impl LocalEmbeddings {
    pub fn new(model_name: &str) -> Result<Self, EmbeddingError> {
        let model = fastembed::TextEmbedding::try_new(
            fastembed::InitOptions::new(model_name.parse()?)
        )?;
        Ok(Self { model, dimension: 384, model_name: model_name.to_string() })
    }
}
```

**Trade-offs:**
- **Pro:** No API key, no network, no cost, fast for small batches
- **Con:** Lower quality than OpenAI ada-002/3-small, CPU-bound (blocks tokio unless spawned on blocking pool), model download on first use (~90MB for MiniLM)
- **Dimension mismatch:** Local models produce 384-dim vectors vs OpenAI's 1536. Cannot mix in same vector index — would need re-embedding all documents on provider switch.

**Config:**
```bash
EMBEDDING_PROVIDER=local                 # or "openai", "nearai"
EMBEDDING_LOCAL_MODEL=all-MiniLM-L6-v2  # Model name
```

### Embeddings Batching

**Current behavior:** `backfill_embeddings()` in `src/workspace/mod.rs` fetches 100 chunks without embeddings and embeds them **sequentially** via the default `embed_batch()` implementation (which calls `embed()` in a loop).

**OpenAI API supports up to 2,048 inputs per batch request.** The `OpenAiEmbeddings::embed_batch()` already sends all texts in a single HTTP request, but `backfill_embeddings()` could be optimized:

```rust
// Current (sequential batches of 100):
let chunks = storage.chunks_without_embeddings(100).await?;
for chunk in &chunks {
    let embedding = provider.embed(&chunk.content).await?;
    storage.update_chunk_embedding(chunk.id, &embedding).await?;
}

// Optimized (batch embed + batch update):
let chunks = storage.chunks_without_embeddings(500).await?;
let texts: Vec<String> = chunks.iter().map(|c| c.content.clone()).collect();
let embeddings = provider.embed_batch(&texts).await?;  // Single API call
for (chunk, embedding) in chunks.iter().zip(embeddings) {
    storage.update_chunk_embedding(chunk.id, &embedding).await?;
}
```

**Estimated improvement:** 100 sequential API calls → 1 batch call. ~50x faster for backfill operations.

### Citation Support

**Current `SearchResult`** already tracks `document_id`, `chunk_id`, `content`, `score`, `fts_rank`, `vector_rank`. Citation needs:

1. **Document path** — Already available via `document_id` → `memory_documents.path` join
2. **Chunk position** — Add `chunk_index: usize` to `MemoryChunk` (ordinal position within document)
3. **Line range** — Compute from chunk offset within original document
4. **Retrieval method attribution** — `is_hybrid()` already returns true for cross-method hits; could expose per-method scores

**Memory tool output with citations:**
```json
{
  "results": [
    {
      "content": "User prefers dark mode...",
      "score": 0.95,
      "citation": {
        "path": "context/preferences.md",
        "chunk": 2,
        "lines": "15-28",
        "methods": ["fts", "vector"]
      }
    }
  ]
}
```

PR #63 is already working on this — coordinate to avoid duplication.

### Memory CLI Commands

**Already partially implemented:** `src/cli/mod.rs` has `Memory(MemoryCommand)` with subcommands `search`, `read`, `write`, `tree`, `index`. These mirror the LLM-callable memory tools.

**Additional CLI commands needed:**
- `memory status` — Show embedding stats (total chunks, chunks with/without embeddings, last backfill time)
- `memory reindex` — Force re-chunk and re-embed all documents
- `memory export` — Dump workspace to filesystem for backup
- `memory import` — Import from filesystem or OpenClaw format (#58)

### Vector Backend Completion (libSQL)

**Current state:** libSQL schema has `libsql_vector_idx` index on `memory_chunks.embedding` column, but the `hybrid_search()` implementation in `src/db/libsql_backend.rs` only uses FTS5 (vector search not wired).

**To complete:**
```sql
-- Vector similarity search in libSQL:
SELECT id, content, vector_distance_cos(embedding, vector('[0.1, 0.2, ...]'))
FROM memory_chunks
WHERE rowid IN (
    SELECT rowid FROM vector_top_k('memory_chunks_embedding_idx', vector('[0.1, 0.2, ...]'), 50)
)
```

This would bring libSQL backend to full feature parity with PostgreSQL's pgvector cosine distance.

---

## Success Criteria

1. **Local embeddings work offline:** `EMBEDDING_PROVIDER=local` generates embeddings without any API calls or network access
2. **Batch embedding is measurably faster:** Backfill of 1000 chunks completes in < 30s with batch API calls (vs ~5min with sequential)
3. **Citations include document path and line range:** Search results contain enough metadata to link back to source document location
4. **CLI `memory status` shows embedding health:** Displays total/embedded/pending chunk counts and last backfill timestamp
5. **libSQL vector search produces relevant results:** Same query returns comparable top-5 results on both PostgreSQL and libSQL backends
6. **Provider switch re-embeds:** Changing embedding provider triggers automatic re-embedding of all chunks (dimension mismatch detected and handled)
7. **Model download is user-confirmed:** First use of local embeddings prompts user to download model files before proceeding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Memory & knowledge system enhancements (local embeddings, batching, citation, memory CLI) #87

Feature Parity: Memory & Knowledge Enhancements

Missing

Related PRs

Related Issues

Notes

Design Considerations

Current EmbeddingProvider Trait

Local Embeddings

Embeddings Batching

Citation Support

Memory CLI Commands

Vector Backend Completion (libSQL)

Success Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Memory & knowledge system enhancements (local embeddings, batching, citation, memory CLI) #87

Description

Feature Parity: Memory & Knowledge Enhancements

Missing

Related PRs

Related Issues

Notes

Design Considerations

Current EmbeddingProvider Trait

Local Embeddings

Embeddings Batching

Citation Support

Memory CLI Commands

Vector Backend Completion (libSQL)

Success Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions