Skip to content

feat(memory): add citation support, line tracking, and cognitive routines#63

Open
joe-rlo wants to merge 25 commits intonearai:mainfrom
joe-rlo:microwave/memory-improvements
Open

feat(memory): add citation support, line tracking, and cognitive routines#63
joe-rlo wants to merge 25 commits intonearai:mainfrom
joe-rlo:microwave/memory-improvements

Conversation

@joe-rlo
Copy link

@joe-rlo joe-rlo commented Feb 13, 2026

Summary

This PR ports key memory system improvements from OpenClaw to IronClaw, focusing on better search citations, checkpointing, and agent cognitive routines.

Changes

1. Memory Citations with Line Numbers

  • Added ChunkWithPosition struct to track line numbers during chunking
  • New chunk_document_with_positions() function preserves source locations
  • MemoryChunk now includes line_start/line_end fields
  • Search results can now cite exact locations (e.g., path#L15-L23)

2. Cognitive Routines Module (src/agent/cognitive.rs)

  • CognitiveConfig - configurable checkpointing intervals and feature flags
  • CheckpointTracker - tracks exchanges, topics, and decisions for periodic checkpoints
  • pre_game_instructions() - 5-step checklist before task execution
  • after_action_template() - structured post-task review format
  • post_compaction_recovery() - recovery instructions after context loss
  • System prompt injection via cognitive_instructions() in Workspace

3. Security Hardening

  • Path traversal prevention: normalize_path() now rejects .. sequences and null bytes
  • Added validate_path() wrapper with proper InvalidPath error handling
  • All workspace file operations now validate paths before execution

Testing

  • 31 new tests across 3 test files:
    • tests/cognitive_integration.rs (9 tests)
    • tests/chunker_integration.rs (14 tests)
    • tests/security_integration.rs (8 tests)
  • Plus 4 inline unit tests for path validation
  • All new tests pass; 2 pre-existing test failures unrelated to this PR

Motivation

These changes address feature gaps identified when comparing IronClaw to OpenClaw:

  • OpenClaw agents can cite exact memory locations; IronClaw could not
  • OpenClaw has robust checkpointing to survive context compaction; IronClaw lacked this
  • Path traversal was a potential security concern in workspace operations

Breaking Changes

None. All changes are additive and backward compatible.


Happy to iterate on any feedback!

- Add 'path' field to RankedResult and SearchResult structs
- Include document path in FTS and vector search queries
- Add 'citation' field to memory_search tool output
- Prepare for future line number tracking (line_start/line_end fields)

Phase 1 of citation support - path-level citations work now,
line-level citations need chunker changes (Phase 2).
- Add ChunkWithPosition struct with line_start/line_end
- Add chunk_document_with_positions() function
- Add line_start/line_end fields to MemoryChunk
- Add citation() method for formatting 'line X' or 'lines X-Y'
- Add comprehensive tests for position tracking

Enables citations like 'Source: daily/2026-02-12.md#lines 15-23'
- Add CheckpointTracker for tracking exchanges and triggering checkpoints
- Add pre_game_instructions() for task preparation checklist
- Add after_action_template() for post-task reviews
- Add post_compaction_recovery() for context loss recovery
- Add CognitiveConfig for enabling/disabling features
- Add write_checkpoint() helper for workspace integration

Implements the cognitive architecture patterns from OpenClaw:
- Pre-game routine before task execution
- Mandatory checkpointing every N exchanges (default 15)
- After-action review templates
- Post-compaction recovery instructions
- Add cognitive_instructions() helper function
- Include mandatory memory recall requirements in system prompt
- Add pre-game routine checklist
- Add checkpointing reminder

Ensures the agent follows recall-first patterns and pre-game
routines automatically via system prompt injection.
- Add GeminiEmbeddings struct with embedContent and batchEmbedContents support
- Support gemini-embedding-001 (768 dims) and text-embedding-004
- Proper error handling for auth, rate limits, and API errors
- Export from workspace module
- Add V9 migration for line_start/line_end columns
- Add insert_chunk_with_lines() for storing line positions
- Update FTS and vector search queries to return line numbers
- Update RankedResult and SearchResult with line fields
- Format citations as 'path#line N' or 'path#lines N-M'

Citations now work end-to-end:
1. Chunker tracks line positions
2. DB stores line_start/line_end
3. Search returns line info
4. memory_search formats proper citations
- Add 'index' command for reindexing embeddings
- Add --full flag to search for complete content
- Add --from/--lines flags to read for line ranges
- Show citations with line numbers in search results
- Add has_embeddings() helper to Workspace

CLI now supports:
  ironclaw memory search 'query' --limit 10 --full
  ironclaw memory read MEMORY.md --from 15 --lines 20
  ironclaw memory index [--force]
- normalize_path() now rejects '..' sequences and null bytes
- Added validate_path() wrapper with proper error handling
- Added InvalidPath variant to WorkspaceError
- All file ops (read/write/append/delete/list) now validate paths

Security audit found:
- [FIXED] Path traversal in workspace operations
- [DOCUMENTED] Log injection in cognitive routines (caller responsibility)
- [DOCUMENTED] SQL stored proc in repository (external validation)

Tests:
- 4 new unit tests for path validation
- Created tests/security_integration.rs (8 tests)
- Created tests/cognitive_integration.rs (9 tests)
- Created tests/chunker_integration.rs (14 tests)
- All 24 new tests pass
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @joe-rlo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the agent's memory and operational capabilities. It introduces precise source citation for memory search results, enhances agent reliability through structured cognitive routines like checkpointing and pre-task planning, and strengthens the system's security by preventing malicious path manipulations. These changes aim to make the agent more robust, transparent, and capable of handling complex, long-running tasks.

Highlights

  • Memory Citation Support: Implemented line number tracking for memory chunks, allowing search results to cite exact source locations (e.g., path#L15-L23). This includes database schema updates, new chunking logic, and CLI/tool integration.
  • Agent Cognitive Routines: Added a new cognitive module to introduce structured agent behaviors such as pre-game checklists, periodic checkpointing to daily notes, after-action review templates, and post-compaction recovery instructions.
  • Security Hardening for Workspace Paths: Introduced robust path validation to prevent directory traversal attacks (e.g., ../) and null byte injection across all workspace file operations (read, write, append, delete, list).
  • Enhanced CLI Memory Commands: The memory search command now supports a --full flag to display complete content, and memory read allows specifying line ranges (--from, --lines). A new memory index command was added to backfill missing embeddings.
  • Gemini Embedding Provider: Integrated support for Google Gemini as a new embedding provider, offering an alternative to existing OpenAI and Near.ai options.
Changelog
  • migrations/V9__chunk_line_numbers.sql
    • Added line_start and line_end integer columns to the memory_chunks table.
    • Created an index idx_memory_chunks_lines for efficient line-based queries.
  • src/agent/cognitive.rs
    • Introduced a new module for agent cognitive routines.
    • Added CognitiveConfig for routine configuration, CheckpointTracker for session state management, and functions for pre-game instructions, after-action reviews, and post-compaction recovery.
  • src/agent/mod.rs
    • Exported the new cognitive module and its public structs and functions.
  • src/cli/memory.rs
    • Added a --full option to the memory search command to display full content.
    • Introduced --from and --lines options to the memory read command for reading specific line ranges.
    • Added a new memory index command with a --force option to reindex embeddings.
    • Updated search and read command handlers to utilize the new options.
  • src/error.rs
    • Added a new InvalidPath error variant to handle path validation failures.
  • src/tools/builtin/memory.rs
    • Modified the memory_search tool to include path, citation, line_start, and line_end in its JSON output for search results.
  • src/workspace/chunker.rs
    • Defined ChunkWithPosition struct to store chunk content along with its line and character positions.
    • Implemented chunk_document_with_positions function to generate chunks while preserving line number information.
    • Added a citation method to ChunkWithPosition for formatted line citations.
  • src/workspace/document.rs
    • Added line_start and line_end fields to the MemoryChunk struct.
    • Introduced a with_lines constructor for MemoryChunk to create chunks with line position data.
    • Added a citation method to MemoryChunk for formatted line citations.
  • src/workspace/embeddings.rs
    • Added GeminiEmbeddings struct and its implementation for the EmbeddingProvider trait, supporting Google Gemini embedding models.
  • src/workspace/mod.rs
    • Exported ChunkWithPosition and chunk_document_with_positions from the chunker module.
    • Exported GeminiEmbeddings from the embeddings module.
    • Added has_embeddings method to Workspace to check for configured embedding providers.
    • Integrated cognitive routine instructions into the system_prompt generation.
    • Refactored path normalization into normalize_path (returning Option<String>) and introduced validate_path (returning Result<String, WorkspaceError>) to enforce security policies across all workspace file operations.
    • Removed normalize_directory as its functionality is now covered by validate_path.
  • src/workspace/repository.rs
    • Modified insert_chunk to call a new insert_chunk_with_lines method, which now accepts optional line_start and line_end parameters.
    • Updated SQL queries for insert_chunk_with_lines to store line number information.
    • Modified get_chunks_for_document, fts_search, and vector_search queries to retrieve path, line_start, and line_end from the database.
  • src/workspace/search.rs
    • Added path, line_start, and line_end fields to SearchResult and RankedResult structs.
    • Updated the reciprocal_rank_fusion function to propagate and store the new line number and path information in search results.
  • tests/chunker_integration.rs
    • Added a new integration test file for chunk_document_with_positions and ChunkWithPosition citation formatting.
  • tests/cognitive_integration.rs
    • Added a new integration test file for the cognitive module, covering CheckpointTracker workflow, instruction content, and configuration.
  • tests/security_integration.rs
    • Added a new integration test file to verify path traversal and null byte injection prevention in workspace operations (read, write, delete, list).
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces key features from OpenClaw to IronClaw, including memory citations with line number tracking, cognitive routines, and significant security hardening against path traversal. However, critical security issues were identified, specifically a potential prompt injection vector in the cognitive checkpointing logic and the insecure transmission of API keys in the Gemini embedding provider. Furthermore, the new citation feature is currently ineffective for new or modified documents because the reindex_document function incorrectly uses chunk_document instead of chunk_document_with_positions. Minor comments were also left regarding code efficiency, CLI usability (including a panic condition in line-range reading logic), and potential refactoring.

- fix(security): Move Gemini API key from URL query param to x-goog-api-key header
- fix(security): Add sanitize_checkpoint_text() to prevent markdown/prompt injection
- refactor: Add SearchResult::citation() method to centralize citation formatting
- refactor: Improve add_topic() efficiency (avoid unnecessary String allocation)
- cleanup: Remove unimplemented --force flag from memory index command

Addresses all 5 issues from Gemini Code Assist review:
nearai#63
@joe-rlo
Copy link
Author

joe-rlo commented Feb 13, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a great PR that adds significant new capabilities to the memory system, including citation support with line numbers, cognitive routines for the agent, and important security hardening against path traversal. The code is well-structured and includes a comprehensive set of new tests.

I have a couple of suggestions for improvement in the comments.

One critical point that is not in the changed files but is highly relevant to this PR's features: the reindex_document function in src/workspace/mod.rs does not seem to be updated to use the new chunk_document_with_positions function. This means that documents created or updated via write and append will not have line number information stored for their chunks, which undermines the new citation feature. This seems like a critical oversight that should be addressed for the new features to work as intended.

Critical fix: reindex_document was using chunk_document() instead of
chunk_document_with_positions(), meaning line numbers were never
stored. Now properly tracks line_start/line_end for all chunks.

Thanks to Gemini Code Assist for catching this oversight.
Replace verbose match block with map_or + iterator methods.
More idiomatic Rust and easier to read.
Renamed test_checkpoint_content_escaping to test_checkpoint_content_sanitization.
Now verifies specific sanitization transformations:
- # escaped to \#
- Newlines flattened to spaces
- --- replaced with en-dashes
- ``` replaced with '''
@tribendu
Copy link

Code Review: PR #63 - "feat(memory): add citation support, line tracking, and cognitive routines"

Summary

This PR introduces significant features to the IronClaw memory system:

  1. Line tracking for citations - Adds line_start and line_end columns to memory chunks
  2. Cognitive routines module - Implements checkpointing, pre-game routines, and after-action reviews
  3. Position-aware chunking - New chunking algorithm that tracks character/line positions
  4. Path traversal security - Prevents directory traversal attacks in file operations
  5. Gemini embeddings provider - Adds Google Gemini API support for embeddings
  6. CLI enhancements - Improves search/read commands and adds indexing

The PR adds ~1600 new lines across 12 files with comprehensive test coverage. The changes are well-structured and include proper migration scripts.


Pros

Excellent Test Coverage

  • Three new test files with 171+ test cases
  • Tests cover chunking positions, cognitive routines, and security scenarios
  • Integration tests demonstrate edge cases (unicode, empty input, CRLF line endings)
  • Security tests specifically target path traversal prevention

Good Security Practices

  • Path traversal prevention: normalize_path() rejects ../ patterns correctly
  • Null byte injection protection: Rejects \0 characters in paths
  • Input sanitization: sanitize_checkpoint_text() escapes markdown characters
  • Validates all path inputs before database operations

Clean Code Organization

  • New cognitive.rs module is well-documented with clear module docstrings
  • Separation of concerns: chunking, cognitive routines, and repository logic are distinct
  • Descriptive function names: chunk_document_with_positions, generate_checkpoint_content

Practical Features

  • Citation format is intuitive: path#line 42 or path#lines 10-25
  • Checkpointing is configurable with sensible defaults (15 exchanges interval)
  • Pre-game instructions are clear and actionable
  • CLI enhancements (--full, --from, --lines) improve usability

Database Design

  • Migration script V9 adds line_start, line_end columns properly
  • Creates composite index for line-based queries
  • Includes column comments for documentation
  • Optional fields handle backward compatibility gracefully

Rust Best Practices

  • Uses Option<u32> for nullable line numbers (appropriate for existing data)
  • Implements Default trait where appropriate
  • Serde serialization for config structs
  • Builder pattern for ChunkConfig

Concerns

1. Potential Bug in char_to_line Function (chunker.rs:157-161)

The binary search for line numbers may produce incorrect results:

let char_to_line = |pos: usize| -> u32 {
    match line_starts.binary_search(&pos) {
        Ok(i) => (i + 1) as u32,
        Err(i) => i as u32,
    }
};

Issue: This assumes character positions map cleanly to line boundaries. For character position 0 (start of file), it returns line 1 correctly. However, for any position that's not exactly at a newline character boundary, it returns the line after the position, not the line containing it. This could lead to citations that are off by 1 line.

Example: If line 1 is "hello\n" (6 chars), position 5 is within line 1 but char_to_line(5) returns 2.

Severity: Medium - affects citation accuracy

2. Word-Based Chunking May Break Context (chunker.rs:165-183)

The chunking algorithm splits documents by words for position tracking:

for (i, c) in content.char_indices() {
    if c.is_whitespace() {
        if in_word {
            words.push(WordPos { word: &content[word_start..i], ... });
            in_word = false;
        }
    } else if !in_word {
        word_start = i;
        in_word = true;
    }
}

Issue: This discards all whitespace when reconstructing chunks.

  • Newlines are lost
  • Indentation is removed
  • Markdown formatting may be affected (e.g., code blocks, lists)

Impact: Code blocks could lose proper indentation, lists might break, paragraph separation could be lost.

Severity: Medium to High - affects semantic content integrity

3. Security: Path Traversal May Be Incomplete (workspace/mod.rs)

While normalize_path() rejects ../, the delete operation still uses the old normalize_path():

// Line 421 - Delete doesn't use validate_path()
pub async fn delete(&self, path: &str) -> Result<(), WorkspaceError> {
    let path = normalize_path(path);  // Old function, not validated!
    self.storage.delete_document(...).await
}

All other methods use validate_path() which rejects dangerous paths, but delete() uses normalize_path() which returns None for bad paths.

Issue: normalize_path() now returns Option<String>, but delete doesn't handle the None case properly - it could panic if unwrap is added incorrectly elsewhere.

Severity: Medium - inconsistent handling could lead to issues

4. Empty Content Handling Inconsistent (chunker.rs)

The function returns empty vec for empty content early:

pub fn chunk_document_with_positions(content: &str, config: ChunkConfig) -> Vec<ChunkWithPosition> {
    if content.is_empty() {
        return Vec::new();
    }

But what happens when indexing documents? The caller in reindex_document() doesn't check for empty chunks - it just iterates and inserts. This could lead to no chunks being indexed for whitespace-only documents.

Severity: Low - edge case, but should be handled

5. Missing Backfill Implementation

The CLI adds an index command that calls workspace.backfill_embeddings(), but this function is not implemented in the workspace module:

// src/cli/memory.rs:295
let count = workspace.backfill_embeddings().await?;

Issue: This will fail to compile or panic at runtime.

Severity: Critical - code won't work

6. Citation Format Ambiguity

The citation format uses # separator:

  • path#line 42 - single line
  • path#lines 10-25 - range

Issue: If a document path contains # (which is valid), citation parsing becomes ambiguous. While current paths are simple .md files, this could be a problem in the future.

Severity: Low - edge case

7. No Concurrent Write Protection

When reindexing documents:

// Delete old chunks first
self.storage.delete_chunks(document_id).await?;

// Then insert new chunks
for (index, chunk) in chunks.into_iter().enumerate() {
    self.storage.insert_chunk_with_lines(...).await?;
}

Issue: If another process reads/searches the document between these two operations, they get either no results or stale results. There's no transaction or locking mechanism shown.

Severity: Medium - could cause inconsistent search results

8. Gemini Embeddings Error Handling

Rate limit handling extracts retry-after header:

if status == reqwest::StatusCode::TOO_MANY_REQUESTS {
    let retry_after = response.headers()
        .get("retry-after")
        .and_then(|v| v.to_str().ok())
        .and_then(|s| s.parse::<u64>().ok())
        .map(Duration::from_secs);
    return Err(EmbeddingError::RateLimited { retry_after });
}

Issue: Gemini's rate limit response format may differ from the expected header name or format. Should verify actual API behavior.

Severity: Low - operational concern

9. Memory Leak in CheckpointTracker

The tracker accumulates topics and decisions:

pub struct CheckpointTracker {
    pub topics: Vec<String>,
    pub decisions: Vec<String>,
    // ...
}

Issue: If checkpointing_enabled is false but the tracker is instantiated, topics/decisions would grow unbounded during long conversations. The reset() clears them, but only when checkpointing occurs.

Severity: Low - memory could grow if checkpointing disabled

10. Test Coverage Gap

No tests for:

  • backfill_embeddings() function (not implemented)
  • Citation parsing/verification
  • Concurrent reindexing scenarios
  • Database migration rollback path
  • Large document chunking performance (1000+ lines)

Severity: Medium - missing edge case validation


Suggestions

High Priority

  1. Fix word-based chunking to preserve structure

    suggestion: Instead of word-based, consider line-based chunking
    that preserves paragraph and code block structure.
  2. Implement backfill_embeddings() in workspace.rs

    pub async fn backfill_embeddings(&self) -> Result<usize, WorkspaceError> {
        let provider = self.embeddings.as_ref().ok_or_else(|| {
            WorkspaceError::EmbeddingError("No embedding provider configured".into())
        })?;
    
        // Query for chunks without embeddings
        let rows = self.storage.conn().await?.query(
            "SELECT id, content FROM memory_chunks WHERE embedding IS NULL",
            &[]
        ).await?;
    
        let mut count = 0;
        for row in rows {
            let id: Uuid = row.get("id");
            let content: String = row.get("content");
    
            if let Ok(embedding) = provider.embed(&content).await {
                self.storage.update_chunk_embedding(id, &embedding).await?;
                count += 1;
            }
        }
    
        Ok(count)
    }
  3. Fix char_to_line boundary logic

    suggestion: Use position that IS on a line, not just after a newline.
    Consider using binary search to find the greatest line_start <= position.
  4. Unify path validation - update delete()

    pub async fn delete(&self, path: &str) -> Result<(), WorkspaceError> {
        let path = validate_path(path)?;  // Use validated path
        // ...
    }

Medium Priority

  1. Add transactions for reindexing

    suggestion: Wrap delete + insert in a transaction to avoid
    inconsistent state during concurrent reads.
  2. Consider using a different citation separator

    suggestion: Use `path@lines 10-25` or `path:L10-15` instead of `#`
    to avoid ambiguity with paths containing hash characters.
  3. Add bounds checking to CheckpointTracker

    suggestion: Add max limits for topics/decisions to prevent
    unbounded growth if checkpointing is disabled.
  4. Preserve leading whitespace in chunks

    suggestion: Store indentation for each line to preserve
    code block formatting and markdown list structures.

Low Priority

  1. Add integration test for concurrent operations

    suggestion: Test that search works correctly while a document
    is being reindexed.
  2. Document API rate limit behavior

    suggestion: Add doc comments about what happens when Gemini
    rate limits are hit and how callers should retry.
  3. Add config option for citation format

    suggestion: Make citation format configurable for different
    output styles (GitHub links, file paths, etc.).
  4. Consider adding migration rollback script

    suggestion: Add V9__rollback.sql script for testing and
    emergency rollbacks.

Overall Assessment

Strengths: Well-structured, comprehensive test coverage, good security focus, practical features.

Critical Issues: Missing backfill_embeddings() implementation (won't compile), potential semantic loss in word-based chunking.

Recommendation: Request Changes - Address the critical issues before merging. The implementation is solid overall but has a blocking bug (missing function) and a concerning semantic issue (whitespace removal).

After fixes: This will be a valuable addition to IronClaw's memory system with proper citation support and cognitive routines.


Additional Notes

The cognitive routines feature is well-designed and follows good practices:

  • Clear separation between Preparation and Execution modes
  • Checkpoint sanitization prevents markdown injection in system prompts
  • Configurable defaults allow user customization

The security improvements are thoughtful:

  • Path traversal prevention is comprehensive
  • Null byte protection is proper
  • Error messages don't leak information

The Gemini embeddings implementation is complete with proper error handling and both single/batch operations supported.

joe-rlo and others added 4 commits February 15, 2026 03:07
Memory Guardian adds automatic memory discipline that doesn't rely on the
agent choosing to follow instructions. It operates at the system level,
writing breadcrumbs and injecting checkpoint pressure regardless of agent
attention state.

## New Features (cognitive.rs)

- Pre-compaction breadcrumbs: Raw state snapshot written synchronously
  before context compression. Acts as a 'black box recorder' — survives
  even when the LLM-generated compaction summary is lossy or incomplete.

- Auto-breadcrumbs: Every 10 tool calls, automatically append a breadcrumb
  to daily notes with recent tool names and call counts. Creates a paper
  trail even if the agent never explicitly writes a checkpoint.

- Checkpoint gate: Escalating pressure injected into the system prompt
  when too many tool calls happen without a checkpoint write:
  - Gentle (12 calls): 'Consider checkpointing'
  - Firm (20 calls): 'Checkpoint needed'
  - Urgent (30 calls): 'STOP and checkpoint NOW'
  Uses escalation (not hard blocking) because hard blocks would break
  legitimate multi-step workflows.

- Memory search tracking: Counts turns since last memory_search call,
  reminds agent to search before answering memory questions (default: 8).

- Tool calls as metric (not just exchanges): Tool calls are a better
  proxy for 'work that could be lost' — an agent might have 3 exchanges
  but 30 tool calls of file reads, shell commands, and code writes.

## Design Decisions

- Escalating vs hard blocking: Escalating pressure works with the agent's
  attention mechanism. Urgent-level messages are emphatic enough to
  interrupt most workflows without breaking them.

- Breadcrumbs don't reset checkpoint counter: If they did, the agent would
  never feel pressure to write its own checkpoints — the guardian would
  always relieve the pressure first.

- CheckpointTracker on Thread (not Session): Each conversation thread has
  independent memory state. Not serialized — resets on session reload,
  which is fine since it's a running tally for the current conversation.

## Integration Points (agent_loop.rs)

- Pre-compaction breadcrumb before auto-compaction and manual /compact
- Pre-reset breadcrumb before /new thread creation
- Tool call tracking after every tool execution
- Guardian context (gate + search reminder) injected into system prompt
- Exchange tracking on turn start

## Also Fixed

- workspace/mod.rs: Fixed normalize_path Option<String> handling,
  added insert_chunk_with_lines to WorkspaceStorage enum,
  replaced missing normalize_directory with normalize_path
- cli/memory.rs: Fixed missing pattern fields and arguments for
  Search (full) and Read (from, lines) commands, added Index arm

## Tests

14 new unit tests covering:
- Tool call tracking and breadcrumb triggering
- Checkpoint detection (writes to daily/*.md)
- Checkpoint gate escalation at 12/20/30 thresholds
- Memory search reminder timing
- Rolling tool window (last 10)
- Breadcrumb content generation
- Counter reset behavior (breadcrumb vs checkpoint)
- Config defaults
Copy link
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(memory): add citation support, line tracking, and cognitive routines

This is a well-designed PR that brings valuable features (citations, cognitive routines, path traversal prevention). The architecture of the memory guardian with its dual-layer approach (prompt-level guidance + system-level enforcement) is thoughtful and well-documented. However, there are several issues that need to be addressed before merging.

Blocking: Compilation Failure (libSQL Backend)

The PR does not compile. cargo check fails with 3 errors in src/db/libsql_backend.rs:

error[E0063]: missing fields `line_end` and `line_start` in initializer of `MemoryChunk`
    --> src/db/libsql_backend.rs:2403:25

error[E0063]: missing fields `line_end`, `line_start` and `path` in initializer of `RankedResult`
    --> src/db/libsql_backend.rs:2460:30

error[E0063]: missing fields `line_end`, `line_start` and `path` in initializer of `RankedResult`
    --> src/db/libsql_backend.rs:2509:30

The MemoryChunk and RankedResult structs gained new fields (line_start, line_end, path), but the libSQL backend was not updated. Per CLAUDE.md: "All new features that touch persistence MUST support both backends." Additionally, src/db/libsql_migrations.rs needs the line_start and line_end columns added to the memory_chunks table schema.

Fixes needed:

  1. Add line_start: None, line_end: None to the MemoryChunk construction at line 2403
  2. Add path: String::new(), line_start: None, line_end: None to the RankedResult constructions at lines 2460 and 2509 (ideally, join memory_documents to fetch the actual path, same as the postgres repository does)
  3. Update the FTS and vector search SQL queries in libsql_backend.rs to SELECT d.path, c.line_start, c.line_end
  4. Add line_start and line_end columns to the memory_chunks CREATE TABLE in libsql_migrations.rs

Blocking: Integration Test Won't Compile

tests/cognitive_integration.rs line 115 constructs CognitiveConfig with only 4 of 8 fields:

let config = CognitiveConfig {
    pre_game_enabled: false,
    checkpointing_enabled: true,
    checkpoint_interval: 20,
    after_action_enabled: true,
};

Missing: guardian_enabled, breadcrumb_interval, gate_thresholds, memory_search_reminder_interval. This won't compile. Either add the missing fields or use ..Default::default().

.unwrap() in Production Code

Per CLAUDE.md: "Never use .unwrap() or .expect() in production code."

In src/workspace/chunker.rs, inside chunk_document_with_positions:

let last: ChunkWithPosition = chunks.pop().unwrap();

While this is logically safe (guarded by !chunks.is_empty()), it violates the project convention. Suggestion:

let Some(last) = chunks.pop() else { break; };

Feature Gate Issue

The PR removes #[cfg(feature = "postgres")] from pub use repository::Repository (in src/workspace/mod.rs) but the mod repository declaration still has the feature gate. With default features ["postgres", "libsql"] this works, but building with --no-default-features --features libsql would fail because mod repository is not compiled while pub use repository::Repository is unconditional. Either:

  • Keep the #[cfg(feature = "postgres")] on the pub use, or
  • Also remove it from mod repository (if Repository should be available for all backends)

Minor Issues

  1. Trailing whitespace: Multiple files have trailing whitespace on blank lines (e.g., src/cli/memory.rs lines with trailing spaces). Run cargo fmt to clean up.

  2. super:: in tests: src/agent/cognitive.rs uses use super::*; in the test module (line 855 of the diff). Per project conventions, crate:: imports are preferred over super::. However, use super::* in mod tests within the same file is a common Rust pattern and could be considered an exception here.

  3. Security test dependency on PostgreSQL: All tests in tests/security_integration.rs require a PostgreSQL connection (deadpool_postgres::Pool). These tests will fail in CI environments without a running Postgres instance. Consider either:

    • Gating them behind the integration feature
    • Adding a connection check that skips the test if Postgres is unavailable
  4. GeminiEmbeddings added but not documented: The PR adds a full GeminiEmbeddings provider (~240 lines) which is not mentioned in the PR description. This is a separate feature that could be its own PR. It also doesn't appear in the configuration docs or CLAUDE.md.

Positive Notes

  • Excellent documentation throughout cognitive.rs -- the "Why" sections for design decisions are exactly the right level of documentation
  • Path traversal prevention is well-implemented with both .. and null byte checks
  • The escalating checkpoint gate (gentle/firm/urgent) is a smart design choice
  • Comprehensive test coverage: 31 new tests covering the core logic
  • The sanitization of checkpoint text to prevent markdown/prompt injection is a nice security touch
  • Clean separation between prompt-level guidance and system-level enforcement

Please fix the compilation failures and I'm happy to approve.

@tribendu
Copy link

GitHub PR Review Batch 3 - nearai/ironclaw

PR #71: Fix undo/redo deadlock and checkpoint behavior

Summary

This PR addresses critical concurrency issues in the undo/redo system. The main changes are:

  1. Reordering lock acquisitions in undo_turn() and redo_turn() to lock session before undo manager, matching process_user_input() pattern to prevent deadlocks
  2. Changing undo() to return owned Checkpoint instead of &Checkpoint, fixing a bug where repeated undos returned the same checkpoint reference instead of walking backwards through history
  3. Adding current_turn and current_messages parameters to redo() so it can save the current state before popping from redo stack
  4. Adding comprehensive tests to verify undo/redo stack invariants and state preservation

Pros

  • Critical bug fix: The deadlock prevention is well-reasoned and follows existing patterns in the codebase
  • Ownership fix: Returning owned Checkpoint from undo() eliminates the borrow/snapshot bug that prevented repeated undos
  • Better API design: The new redo() signature is more explicit about state management
  • Excellent test coverage: Added 3 new tests covering repeated undo, undo/redo cycles, and stack size consistency
  • Documentation: Added helpful docstrings explaining the stack behavior and invariants

Concerns

  • Missing error handling: No check if thread.restore_from_messages() succeeds/fails before returning success
  • Clone overhead: The undo() function now clones messages when saving to redo stack, but this is unavoidable with the API change
  • Test gap: No test for the deadlock scenario itself (would require concurrent test setup)
  • Migration note: The change from &Checkpoint to Checkpoint is a breaking API change for any external users of UndoManager

Suggestions

  • Consider returning Result<Checkpoint, ...> from undo() and propagate restoration errors
  • Add integration test with concurrent calls to verify deadlock is truly fixed
  • Document the breaking API change in upgrade notes if this is a public library

PR #66: Update README architecture diagram to use Unicode box drawing

Summary

This PR updates the README.md architecture diagram from ASCII art to Unicode box drawing characters. The change improves readability and visual appeal of the system architecture diagram, using characters like ╭───╮, , instead of +, -, |. The diagram structure and content remain unchanged.

Pros

  • Better aesthetics: Unicode box drawing creates cleaner, more professional diagrams
  • Improved readability: The visual distinction between components is clearer
  • Modern standard: Unicode box drawing is the standard for terminal-based diagrams in modern tools
  • Preserves structure: The logical diagram structure is unchanged

Concerns

  • Terminal compatibility: Some older terminals or Windows default fonts may not render Unicode box drawing correctly, showing replacement characters
  • Copy-paste issues: Unicode characters may cause issues in some markdown viewers or plain text contexts
  • Diff noise: This change makes future diff review harder since the diagram structure is visually the same but characters changed

Suggestions

  • Consider adding a note about terminal compatibility if users report issues
  • For CI environments, ensure the font/encoding is UTF-8 aware

PR #63: Memory Guardian and Cognitive Routines

Summary

This is a large PR (894 lines of new code) adding a comprehensive memory management system with two layers:

Cognitive Routines (Prompt-Level):

  • Pre-game routine (5-step checklist)
  • Checkpoint reminder templates
  • After-action review templates

Memory Guardian (System-Level):

  • Pre-compaction breadcrumbs (writes state snapshot before context loss)
  • Auto-breadcrumbs (every N tool calls, default 10)
  • Checkpoint gate (escalating pressure: gentle → firm → urgent at 12/20/30 calls)
  • Memory search tracking (reminds agent to search after 8 turns)

The PR also adds line number tracking to memory chunks for citation support (V9 migration), checkpoint tracker to Thread struct, and extensive testing (15 new tests).

Pros

  • Comprehensive documentation: Excellent module-level docs explaining the two-layer design rationale
  • Design rationale: Clear explanation of why both layers exist (instructions vs. enforcement)
  • Good defaults: Sensible thresholds (breadcrumb_interval=10, gate_thresholds=[12,20,30])
  • Defense in depth: Escalating pressure vs. hard blocking is a thoughtful UX choice
  • Sanitization functions: sanitize_checkpoint_text() prevents markdown/prompt injection
  • Extensive testing: 15 tests covering the full functionality

Concerns

  • Large PR: 894 lines makes review difficult; could be split into smaller PRs
  • Config not wired up: CognitiveConfig is defined but not loaded from configuration files or used to enable/disable features
  • Session lock pattern: Multiple session lock acquisitions in the agent loop could become a performance bottleneck with high tool call volume
  • Breadcrumb performance: Async I/O in the tool execution path for every Nth tool call adds latency
  • Missing integration tests: No tests verifying the guardian works end-to-end with compaction/reset
  • Tracker not persisted: CheckpointTracker is in-memory only, resets on session restart

Suggestions

  • Split into smaller PRs: (1) cognitive module structure, (2) guardian integration, (3) database migration
  • Wire up CognitiveConfig from config files/CLI flags so features can be toggled
  • Consider making breadcrumb writing non-blocking (spawn without await)
  • Document the performance impact of breadcrumb I/O
  • Add integration test verifying pre-compaction breadcrumb actually writes
  • Add metrics/logging for guardian actions (breadcrumb writes, gate triggers)

PR #62: Add Tinfoil private inference provider

Summary

This PR adds support for Tinfoil as a new LLM backend. The changes include:

  • Add Tinfoil variant to LlmBackend enum with string parsing
  • Add TinfoilConfig struct with api_key and model fields
  • Add create_tinfoil_provider() function that uses Rig's OpenAI client with https://inference.tinfoil.sh/v1 as base URL
  • Explicitly use completions_api() for Tinfoil (they don't support the newer Responses API)
  • Wire up in create_llm_provider() dispatch
  • Update setup wizard to include tinfoil field in config

Pros

  • Clean implementation: Follows existing provider patterns (Ollama, OpenAI-compatible)
  • Proper error handling: Returns LlmError::AuthFailed when tinfoil config is missing
  • Environment variable support: Reads TINFOIL_API_KEY and TINFOIL_MODEL from env with defaults
  • Good documentation: Note about why completions_api() is explicitly used

Concerns

  • Hardcoded base URL: TINFOIL_BASE_URL constant is not configurable via environment variable
  • No provider documentation: No usage examples or caveats in the main docs
  • Secret exposure: Uses expose_secret() on api_key when passing to Rig - this is the correct approach but worth verifying Rig doesn't log it
  • No health check: No verification that the Tinfoil endpoint works on startup

Suggestions

  • Make TINFOIL_BASE_URL configurable via environment variable
  • Add a brief section in the README about supported LLM providers including Tinfoil
  • Consider adding a quick health check or connection test on startup when tinfoil is configured

PR #61: Add PostgreSQL test workflow for CI

Summary

This PR adds a PostgreSQL with pgvector service to the GitHub Actions test workflow. The changes enable running tests against a real PostgreSQL database instead of libSQL-only testing. The service is configured with pgvector/pgvector:pg16 image and health checks.

Pros

  • Better test coverage: Enables testing against production database backend (PostgreSQL + pgvector)
  • Proper service configuration: Health checks, retries, and proper port mapping
  • Simple integration: Just adds the service and env var for DATABASE_URL

Concerns

  • No database fixtures: No test data setup or migrations for the test database
  • No test splitting: All tests still run against both backends; could be slow
  • Missing test matrix: Could benefit from matrix strategy testing libSQL and PostgreSQL in parallel
  • No cleanup between tests: Tests may interfere with each other's state
  • Flaky test risk: No fixture isolation or transaction rollback

Suggestions

  • Add a job matrix to test libSQL and PostgreSQL in parallel
  • Add migration step before running tests: cargo run -- database migrate
  • Consider wrapping tests in transactions with rollback for isolation
  • Add a separate integration test job vs. unit tests

PR #57: Sandbox job monitoring and credential grants persistence

Summary

This is a large PR (~6333 lines touched) introducing several features:

  1. Job Monitor System: New job_monitor.rs module spawns background tasks that watch for events from sandbox jobs (especially Claude Code) and inject assistant messages back into the agent loop via an injection channel in ChannelManager

  2. Credential Grants Persistence: Added credential_grants_json field to sandbox jobs for restoring credentials on job restart

  3. Claude Code OAuth Token Extraction: New logic to read OAuth tokens from macOS Keychain or Linux ~/.claude/.credentials.json for passing to containers via ANTHROPIC_API_KEY env var

  4. Database improvements: Changed LibSqlBackend::connect() to async and added PRAGMA busy_timeout = 5000 and PRAGMA journal_mode=WAL for better concurrency

  5. NEAR AI URL migration: Changed default from https://cloud-api.near.ai to https://private.near.ai

  6. Worker Docker image: Added GitHub CLI installation and improved layer caching

  7. Model command enhancement: /model now lists available models with the active one marked

Pros

  • Job monitoring is useful: Forwarding Claude Code output to the main agent enables better visibility into sub-agent work
  • Good test coverage: Job monitor has 4 unit tests for message forwarding, filtering, and lifecycle
  • Credential persistence: Restoring grants on job restart is a good UX improvement
  • Database concurrency: WAL mode and busy_timeout reduce lock contention in libSQL
  • OAuth extraction: Clever use of existing Claude auth rather than re-implementing auth

Concerns

  • Monolithic PR: Too many unrelated changes in one PR (job monitor, credentials, database, NEAR AI URLs, Docker, model command)
  • Cross-platform fragility: OAuth token extraction depends on macOS Keychain commands and Linux file paths - could break on different systems
  • Secret handling: OAuth token is passed via ANTHROPIC_API_KEY env var - this could be leaked in logs if the container or tool doesn't sanitize
  • Channel coupling: Job monitor couples Claude Code specifically to the agent loop - not generic for any job type
  • Database breaking change: Making connect() async is a breaking change for any code calling it synchronously
  • No rate limiting: Job monitor could spam the agent loop if the sub-agent produces many messages quickly
  • Missing test forOAuth: No tests for the token extraction logic

Suggestions

  • Split into smaller focused PRs: (1) job monitor, (2) credential grants, (3) database changes (4) Docker/URL migrations
  • Add rate limiting to job monitor (e.g., max 1 message/second)
  • Make job monitor generic (protocol-agnostic) instead of Claude-specific
  • Add tests for OAuth token extraction on both platforms (mocking the Keychain/file read)
  • Document the connect() async migration as a breaking change
  • Add warning logs when job monitor lags or drops messages
  • Verify ANTHROPIC_API_KEY doesn't leak in container logs

Summary

Highest Quality PRs

  1. PR fix: undo() peeks without popping, breaking repeated undo and leaking redo stack #71 - Well-scoped bug fix with excellent documentation and testing
  2. PR feat: add Tinfoil private inference provider #62 - Clean implementation following existing patterns

Needs Splitting

Minor Concerns

Recommendations

onlyamicrowave and others added 4 commits February 17, 2026 13:46
- Add line_start, line_end to MemoryChunk in libsql_backend
- Add path, line_start, line_end to RankedResult in FTS/vector queries
- Add line_start, line_end columns to libsql migrations schema
- Add get_opt_i64 helper for nullable integer columns
- Fix CognitiveConfig in tests with ..Default::default()
- Replace .unwrap() with safe patterns in chunker.rs
- Add #[cfg(feature = "postgres")] to pub use repository::Repository
- Add connection check to security tests (skip if Postgres unavailable)
- Document GeminiEmbeddings in CLAUDE.md config section
- Update feature parity to mention Gemini embeddings provider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments