feat(memory): add citation support, line tracking, and cognitive routines by joe-rlo · Pull Request #63 · nearai/ironclaw

joe-rlo · 2026-02-13T15:10:08Z

Summary

This PR ports key memory system improvements from OpenClaw to IronClaw, focusing on better search citations, checkpointing, and agent cognitive routines.

Changes

1. Memory Citations with Line Numbers

Added ChunkWithPosition struct to track line numbers during chunking
New chunk_document_with_positions() function preserves source locations
MemoryChunk now includes line_start/line_end fields
Search results can now cite exact locations (e.g., path#L15-L23)

2. Cognitive Routines Module (`src/agent/cognitive.rs`)

CognitiveConfig - configurable checkpointing intervals and feature flags
CheckpointTracker - tracks exchanges, topics, and decisions for periodic checkpoints
pre_game_instructions() - 5-step checklist before task execution
after_action_template() - structured post-task review format
post_compaction_recovery() - recovery instructions after context loss
System prompt injection via cognitive_instructions() in Workspace

3. Security Hardening

Path traversal prevention: normalize_path() now rejects .. sequences and null bytes
Added validate_path() wrapper with proper InvalidPath error handling
All workspace file operations now validate paths before execution

Testing

31 new tests across 3 test files:
- tests/cognitive_integration.rs (9 tests)
- tests/chunker_integration.rs (14 tests)
- tests/security_integration.rs (8 tests)
Plus 4 inline unit tests for path validation
All new tests pass; 2 pre-existing test failures unrelated to this PR

Motivation

These changes address feature gaps identified when comparing IronClaw to OpenClaw:

OpenClaw agents can cite exact memory locations; IronClaw could not
OpenClaw has robust checkpointing to survive context compaction; IronClaw lacked this
Path traversal was a potential security concern in workspace operations

Breaking Changes

None. All changes are additive and backward compatible.

Happy to iterate on any feedback!

- Add 'path' field to RankedResult and SearchResult structs - Include document path in FTS and vector search queries - Add 'citation' field to memory_search tool output - Prepare for future line number tracking (line_start/line_end fields) Phase 1 of citation support - path-level citations work now, line-level citations need chunker changes (Phase 2).

- Add ChunkWithPosition struct with line_start/line_end - Add chunk_document_with_positions() function - Add line_start/line_end fields to MemoryChunk - Add citation() method for formatting 'line X' or 'lines X-Y' - Add comprehensive tests for position tracking Enables citations like 'Source: daily/2026-02-12.md#lines 15-23'

- Add CheckpointTracker for tracking exchanges and triggering checkpoints - Add pre_game_instructions() for task preparation checklist - Add after_action_template() for post-task reviews - Add post_compaction_recovery() for context loss recovery - Add CognitiveConfig for enabling/disabling features - Add write_checkpoint() helper for workspace integration Implements the cognitive architecture patterns from OpenClaw: - Pre-game routine before task execution - Mandatory checkpointing every N exchanges (default 15) - After-action review templates - Post-compaction recovery instructions

- Add cognitive_instructions() helper function - Include mandatory memory recall requirements in system prompt - Add pre-game routine checklist - Add checkpointing reminder Ensures the agent follows recall-first patterns and pre-game routines automatically via system prompt injection.

- Add GeminiEmbeddings struct with embedContent and batchEmbedContents support - Support gemini-embedding-001 (768 dims) and text-embedding-004 - Proper error handling for auth, rate limits, and API errors - Export from workspace module

- Add V9 migration for line_start/line_end columns - Add insert_chunk_with_lines() for storing line positions - Update FTS and vector search queries to return line numbers - Update RankedResult and SearchResult with line fields - Format citations as 'path#line N' or 'path#lines N-M' Citations now work end-to-end: 1. Chunker tracks line positions 2. DB stores line_start/line_end 3. Search returns line info 4. memory_search formats proper citations

- Add 'index' command for reindexing embeddings - Add --full flag to search for complete content - Add --from/--lines flags to read for line ranges - Show citations with line numbers in search results - Add has_embeddings() helper to Workspace CLI now supports: ironclaw memory search 'query' --limit 10 --full ironclaw memory read MEMORY.md --from 15 --lines 20 ironclaw memory index [--force]

- normalize_path() now rejects '..' sequences and null bytes - Added validate_path() wrapper with proper error handling - Added InvalidPath variant to WorkspaceError - All file ops (read/write/append/delete/list) now validate paths Security audit found: - [FIXED] Path traversal in workspace operations - [DOCUMENTED] Log injection in cognitive routines (caller responsibility) - [DOCUMENTED] SQL stored proc in repository (external validation) Tests: - 4 new unit tests for path validation - Created tests/security_integration.rs (8 tests) - Created tests/cognitive_integration.rs (9 tests) - Created tests/chunker_integration.rs (14 tests) - All 24 new tests pass

gemini-code-assist · 2026-02-13T15:10:35Z

Summary of Changes

Hello @joe-rlo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the agent's memory and operational capabilities. It introduces precise source citation for memory search results, enhances agent reliability through structured cognitive routines like checkpointing and pre-task planning, and strengthens the system's security by preventing malicious path manipulations. These changes aim to make the agent more robust, transparent, and capable of handling complex, long-running tasks.

Highlights

Memory Citation Support: Implemented line number tracking for memory chunks, allowing search results to cite exact source locations (e.g., path#L15-L23). This includes database schema updates, new chunking logic, and CLI/tool integration.
Agent Cognitive Routines: Added a new cognitive module to introduce structured agent behaviors such as pre-game checklists, periodic checkpointing to daily notes, after-action review templates, and post-compaction recovery instructions.
Security Hardening for Workspace Paths: Introduced robust path validation to prevent directory traversal attacks (e.g., ../) and null byte injection across all workspace file operations (read, write, append, delete, list).
Enhanced CLI Memory Commands: The memory search command now supports a --full flag to display complete content, and memory read allows specifying line ranges (--from, --lines). A new memory index command was added to backfill missing embeddings.
Gemini Embedding Provider: Integrated support for Google Gemini as a new embedding provider, offering an alternative to existing OpenAI and Near.ai options.

Changelog

migrations/V9__chunk_line_numbers.sql
- Added line_start and line_end integer columns to the memory_chunks table.
- Created an index idx_memory_chunks_lines for efficient line-based queries.
src/agent/cognitive.rs
- Introduced a new module for agent cognitive routines.
- Added CognitiveConfig for routine configuration, CheckpointTracker for session state management, and functions for pre-game instructions, after-action reviews, and post-compaction recovery.
src/agent/mod.rs
- Exported the new cognitive module and its public structs and functions.
src/cli/memory.rs
- Added a --full option to the memory search command to display full content.
- Introduced --from and --lines options to the memory read command for reading specific line ranges.
- Added a new memory index command with a --force option to reindex embeddings.
- Updated search and read command handlers to utilize the new options.
src/error.rs
- Added a new InvalidPath error variant to handle path validation failures.
src/tools/builtin/memory.rs
- Modified the memory_search tool to include path, citation, line_start, and line_end in its JSON output for search results.
src/workspace/chunker.rs
- Defined ChunkWithPosition struct to store chunk content along with its line and character positions.
- Implemented chunk_document_with_positions function to generate chunks while preserving line number information.
- Added a citation method to ChunkWithPosition for formatted line citations.
src/workspace/document.rs
- Added line_start and line_end fields to the MemoryChunk struct.
- Introduced a with_lines constructor for MemoryChunk to create chunks with line position data.
- Added a citation method to MemoryChunk for formatted line citations.
src/workspace/embeddings.rs
- Added GeminiEmbeddings struct and its implementation for the EmbeddingProvider trait, supporting Google Gemini embedding models.
src/workspace/mod.rs
- Exported ChunkWithPosition and chunk_document_with_positions from the chunker module.
- Exported GeminiEmbeddings from the embeddings module.
- Added has_embeddings method to Workspace to check for configured embedding providers.
- Integrated cognitive routine instructions into the system_prompt generation.
- Refactored path normalization into normalize_path (returning Option<String>) and introduced validate_path (returning Result<String, WorkspaceError>) to enforce security policies across all workspace file operations.
- Removed normalize_directory as its functionality is now covered by validate_path.
src/workspace/repository.rs
- Modified insert_chunk to call a new insert_chunk_with_lines method, which now accepts optional line_start and line_end parameters.
- Updated SQL queries for insert_chunk_with_lines to store line number information.
- Modified get_chunks_for_document, fts_search, and vector_search queries to retrieve path, line_start, and line_end from the database.
src/workspace/search.rs
- Added path, line_start, and line_end fields to SearchResult and RankedResult structs.
- Updated the reciprocal_rank_fusion function to propagate and store the new line number and path information in search results.
tests/chunker_integration.rs
- Added a new integration test file for chunk_document_with_positions and ChunkWithPosition citation formatting.
tests/cognitive_integration.rs
- Added a new integration test file for the cognitive module, covering CheckpointTracker workflow, instruction content, and configuration.
tests/security_integration.rs
- Added a new integration test file to verify path traversal and null byte injection prevention in workspace operations (read, write, delete, list).

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces key features from OpenClaw to IronClaw, including memory citations with line number tracking, cognitive routines, and significant security hardening against path traversal. However, critical security issues were identified, specifically a potential prompt injection vector in the cognitive checkpointing logic and the insecure transmission of API keys in the Gemini embedding provider. Furthermore, the new citation feature is currently ineffective for new or modified documents because the reindex_document function incorrectly uses chunk_document instead of chunk_document_with_positions. Minor comments were also left regarding code efficiency, CLI usability (including a panic condition in line-range reading logic), and potential refactoring.

src/agent/cognitive.rs

src/workspace/embeddings.rs

src/agent/cognitive.rs

src/cli/memory.rs

src/tools/builtin/memory.rs

- fix(security): Move Gemini API key from URL query param to x-goog-api-key header - fix(security): Add sanitize_checkpoint_text() to prevent markdown/prompt injection - refactor: Add SearchResult::citation() method to centralize citation formatting - refactor: Improve add_topic() efficiency (avoid unnecessary String allocation) - cleanup: Remove unimplemented --force flag from memory index command Addresses all 5 issues from Gemini Code Assist review: nearai#63

joe-rlo · 2026-02-13T15:45:36Z

/gemini review

gemini-code-assist

Code Review

This is a great PR that adds significant new capabilities to the memory system, including citation support with line numbers, cognitive routines for the agent, and important security hardening against path traversal. The code is well-structured and includes a comprehensive set of new tests.

I have a couple of suggestions for improvement in the comments.

One critical point that is not in the changed files but is highly relevant to this PR's features: the reindex_document function in src/workspace/mod.rs does not seem to be updated to use the new chunk_document_with_positions function. This means that documents created or updated via write and append will not have line number information stored for their chunks, which undermines the new citation feature. This seems like a critical oversight that should be addressed for the new features to work as intended.

src/cli/memory.rs

tests/cognitive_integration.rs

Critical fix: reindex_document was using chunk_document() instead of chunk_document_with_positions(), meaning line numbers were never stored. Now properly tracks line_start/line_end for all chunks. Thanks to Gemini Code Assist for catching this oversight.

Replace verbose match block with map_or + iterator methods. More idiomatic Rust and easier to read.

Renamed test_checkpoint_content_escaping to test_checkpoint_content_sanitization. Now verifies specific sanitization transformations: - # escaped to \# - Newlines flattened to spaces - --- replaced with en-dashes - ``` replaced with '''

tribendu · 2026-02-15T00:57:39Z

Code Review: PR #63 - "feat(memory): add citation support, line tracking, and cognitive routines"

Summary

This PR introduces significant features to the IronClaw memory system:

Line tracking for citations - Adds line_start and line_end columns to memory chunks
Cognitive routines module - Implements checkpointing, pre-game routines, and after-action reviews
Position-aware chunking - New chunking algorithm that tracks character/line positions
Path traversal security - Prevents directory traversal attacks in file operations
Gemini embeddings provider - Adds Google Gemini API support for embeddings
CLI enhancements - Improves search/read commands and adds indexing

The PR adds ~1600 new lines across 12 files with comprehensive test coverage. The changes are well-structured and include proper migration scripts.

Pros

Excellent Test Coverage

Three new test files with 171+ test cases
Tests cover chunking positions, cognitive routines, and security scenarios
Integration tests demonstrate edge cases (unicode, empty input, CRLF line endings)
Security tests specifically target path traversal prevention

Good Security Practices

Path traversal prevention: normalize_path() rejects ../ patterns correctly
Null byte injection protection: Rejects \0 characters in paths
Input sanitization: sanitize_checkpoint_text() escapes markdown characters
Validates all path inputs before database operations

Clean Code Organization

New cognitive.rs module is well-documented with clear module docstrings
Separation of concerns: chunking, cognitive routines, and repository logic are distinct
Descriptive function names: chunk_document_with_positions, generate_checkpoint_content

Practical Features

Citation format is intuitive: path#line 42 or path#lines 10-25
Checkpointing is configurable with sensible defaults (15 exchanges interval)
Pre-game instructions are clear and actionable
CLI enhancements (--full, --from, --lines) improve usability

Database Design

Migration script V9 adds line_start, line_end columns properly
Creates composite index for line-based queries
Includes column comments for documentation
Optional fields handle backward compatibility gracefully

Rust Best Practices

Uses Option<u32> for nullable line numbers (appropriate for existing data)
Implements Default trait where appropriate
Serde serialization for config structs
Builder pattern for ChunkConfig

Concerns

1. Potential Bug in `char_to_line` Function (chunker.rs:157-161)

The binary search for line numbers may produce incorrect results:

let char_to_line = |pos: usize| -> u32 {
    match line_starts.binary_search(&pos) {
        Ok(i) => (i + 1) as u32,
        Err(i) => i as u32,
    }
};

Issue: This assumes character positions map cleanly to line boundaries. For character position 0 (start of file), it returns line 1 correctly. However, for any position that's not exactly at a newline character boundary, it returns the line after the position, not the line containing it. This could lead to citations that are off by 1 line.

Example: If line 1 is "hello\n" (6 chars), position 5 is within line 1 but char_to_line(5) returns 2.

Severity: Medium - affects citation accuracy

2. Word-Based Chunking May Break Context (chunker.rs:165-183)

The chunking algorithm splits documents by words for position tracking:

for (i, c) in content.char_indices() {
    if c.is_whitespace() {
        if in_word {
            words.push(WordPos { word: &content[word_start..i], ... });
            in_word = false;
        }
    } else if !in_word {
        word_start = i;
        in_word = true;
    }
}

Issue: This discards all whitespace when reconstructing chunks.

Newlines are lost
Indentation is removed
Markdown formatting may be affected (e.g., code blocks, lists)

Impact: Code blocks could lose proper indentation, lists might break, paragraph separation could be lost.

Severity: Medium to High - affects semantic content integrity

3. Security: Path Traversal May Be Incomplete (workspace/mod.rs)

While normalize_path() rejects ../, the delete operation still uses the old normalize_path():

// Line 421 - Delete doesn't use validate_path()
pub async fn delete(&self, path: &str) -> Result<(), WorkspaceError> {
    let path = normalize_path(path);  // Old function, not validated!
    self.storage.delete_document(...).await
}

All other methods use validate_path() which rejects dangerous paths, but delete() uses normalize_path() which returns None for bad paths.

Issue: normalize_path() now returns Option<String>, but delete doesn't handle the None case properly - it could panic if unwrap is added incorrectly elsewhere.

Severity: Medium - inconsistent handling could lead to issues

4. Empty Content Handling Inconsistent (chunker.rs)

The function returns empty vec for empty content early:

pub fn chunk_document_with_positions(content: &str, config: ChunkConfig) -> Vec<ChunkWithPosition> {
    if content.is_empty() {
        return Vec::new();
    }

But what happens when indexing documents? The caller in reindex_document() doesn't check for empty chunks - it just iterates and inserts. This could lead to no chunks being indexed for whitespace-only documents.

Severity: Low - edge case, but should be handled

5. Missing Backfill Implementation

The CLI adds an index command that calls workspace.backfill_embeddings(), but this function is not implemented in the workspace module:

// src/cli/memory.rs:295
let count = workspace.backfill_embeddings().await?;

Issue: This will fail to compile or panic at runtime.

Severity: Critical - code won't work

6. Citation Format Ambiguity

The citation format uses # separator:

path#line 42 - single line
path#lines 10-25 - range

Issue: If a document path contains # (which is valid), citation parsing becomes ambiguous. While current paths are simple .md files, this could be a problem in the future.

Severity: Low - edge case

7. No Concurrent Write Protection

When reindexing documents:

// Delete old chunks first
self.storage.delete_chunks(document_id).await?;

// Then insert new chunks
for (index, chunk) in chunks.into_iter().enumerate() {
    self.storage.insert_chunk_with_lines(...).await?;
}

Issue: If another process reads/searches the document between these two operations, they get either no results or stale results. There's no transaction or locking mechanism shown.

Severity: Medium - could cause inconsistent search results

8. Gemini Embeddings Error Handling

Rate limit handling extracts retry-after header:

if status == reqwest::StatusCode::TOO_MANY_REQUESTS {
    let retry_after = response.headers()
        .get("retry-after")
        .and_then(|v| v.to_str().ok())
        .and_then(|s| s.parse::<u64>().ok())
        .map(Duration::from_secs);
    return Err(EmbeddingError::RateLimited { retry_after });
}

Issue: Gemini's rate limit response format may differ from the expected header name or format. Should verify actual API behavior.

Severity: Low - operational concern

9. Memory Leak in CheckpointTracker

The tracker accumulates topics and decisions:

pub struct CheckpointTracker {
    pub topics: Vec<String>,
    pub decisions: Vec<String>,
    // ...
}

Issue: If checkpointing_enabled is false but the tracker is instantiated, topics/decisions would grow unbounded during long conversations. The reset() clears them, but only when checkpointing occurs.

Severity: Low - memory could grow if checkpointing disabled

10. Test Coverage Gap

No tests for:

backfill_embeddings() function (not implemented)
Citation parsing/verification
Concurrent reindexing scenarios
Database migration rollback path
Large document chunking performance (1000+ lines)

Severity: Medium - missing edge case validation

Suggestions

High Priority

Fix word-based chunking to preserve structure

suggestion: Instead of word-based, consider line-based chunking
that preserves paragraph and code block structure.

Implement backfill_embeddings() in workspace.rs

pub async fn backfill_embeddings(&self) -> Result<usize, WorkspaceError> {
    let provider = self.embeddings.as_ref().ok_or_else(|| {
        WorkspaceError::EmbeddingError("No embedding provider configured".into())
    })?;

    // Query for chunks without embeddings
    let rows = self.storage.conn().await?.query(
        "SELECT id, content FROM memory_chunks WHERE embedding IS NULL",
        &[]
    ).await?;

    let mut count = 0;
    for row in rows {
        let id: Uuid = row.get("id");
        let content: String = row.get("content");

        if let Ok(embedding) = provider.embed(&content).await {
            self.storage.update_chunk_embedding(id, &embedding).await?;
            count += 1;
        }
    }

    Ok(count)
}

Fix char_to_line boundary logic

suggestion: Use position that IS on a line, not just after a newline.
Consider using binary search to find the greatest line_start <= position.

Unify path validation - update delete()

pub async fn delete(&self, path: &str) -> Result<(), WorkspaceError> {
    let path = validate_path(path)?;  // Use validated path
    // ...
}

Medium Priority

Add transactions for reindexing

suggestion: Wrap delete + insert in a transaction to avoid
inconsistent state during concurrent reads.

Consider using a different citation separator

suggestion: Use `path@lines 10-25` or `path:L10-15` instead of `#`
to avoid ambiguity with paths containing hash characters.

Add bounds checking to CheckpointTracker

suggestion: Add max limits for topics/decisions to prevent
unbounded growth if checkpointing is disabled.

Preserve leading whitespace in chunks

suggestion: Store indentation for each line to preserve
code block formatting and markdown list structures.

Low Priority

Add integration test for concurrent operations

suggestion: Test that search works correctly while a document
is being reindexed.

Document API rate limit behavior

suggestion: Add doc comments about what happens when Gemini
rate limits are hit and how callers should retry.

Add config option for citation format

suggestion: Make citation format configurable for different
output styles (GitHub links, file paths, etc.).

Consider adding migration rollback script

suggestion: Add V9__rollback.sql script for testing and
emergency rollbacks.

Overall Assessment

Strengths: Well-structured, comprehensive test coverage, good security focus, practical features.

Critical Issues: Missing backfill_embeddings() implementation (won't compile), potential semantic loss in word-based chunking.

Recommendation: Request Changes - Address the critical issues before merging. The implementation is solid overall but has a blocking bug (missing function) and a concerning semantic issue (whitespace removal).

After fixes: This will be a valuable addition to IronClaw's memory system with proper citation support and cognitive routines.

Additional Notes

The cognitive routines feature is well-designed and follows good practices:

Clear separation between Preparation and Execution modes
Checkpoint sanitization prevents markdown injection in system prompts
Configurable defaults allow user customization

The security improvements are thoughtful:

Path traversal prevention is comprehensive
Null byte protection is proper
Error messages don't leak information

The Gemini embeddings implementation is complete with proper error handling and both single/batch operations supported.

Memory Guardian adds automatic memory discipline that doesn't rely on the agent choosing to follow instructions. It operates at the system level, writing breadcrumbs and injecting checkpoint pressure regardless of agent attention state. ## New Features (cognitive.rs) - Pre-compaction breadcrumbs: Raw state snapshot written synchronously before context compression. Acts as a 'black box recorder' — survives even when the LLM-generated compaction summary is lossy or incomplete. - Auto-breadcrumbs: Every 10 tool calls, automatically append a breadcrumb to daily notes with recent tool names and call counts. Creates a paper trail even if the agent never explicitly writes a checkpoint. - Checkpoint gate: Escalating pressure injected into the system prompt when too many tool calls happen without a checkpoint write: - Gentle (12 calls): 'Consider checkpointing' - Firm (20 calls): 'Checkpoint needed' - Urgent (30 calls): 'STOP and checkpoint NOW' Uses escalation (not hard blocking) because hard blocks would break legitimate multi-step workflows. - Memory search tracking: Counts turns since last memory_search call, reminds agent to search before answering memory questions (default: 8). - Tool calls as metric (not just exchanges): Tool calls are a better proxy for 'work that could be lost' — an agent might have 3 exchanges but 30 tool calls of file reads, shell commands, and code writes. ## Design Decisions - Escalating vs hard blocking: Escalating pressure works with the agent's attention mechanism. Urgent-level messages are emphatic enough to interrupt most workflows without breaking them. - Breadcrumbs don't reset checkpoint counter: If they did, the agent would never feel pressure to write its own checkpoints — the guardian would always relieve the pressure first. - CheckpointTracker on Thread (not Session): Each conversation thread has independent memory state. Not serialized — resets on session reload, which is fine since it's a running tally for the current conversation. ## Integration Points (agent_loop.rs) - Pre-compaction breadcrumb before auto-compaction and manual /compact - Pre-reset breadcrumb before /new thread creation - Tool call tracking after every tool execution - Guardian context (gate + search reminder) injected into system prompt - Exchange tracking on turn start ## Also Fixed - workspace/mod.rs: Fixed normalize_path Option<String> handling, added insert_chunk_with_lines to WorkspaceStorage enum, replaced missing normalize_directory with normalize_path - cli/memory.rs: Fixed missing pattern fields and arguments for Search (full) and Read (from, lines) commands, added Index arm ## Tests 14 new unit tests covering: - Tool call tracking and breadcrumb triggering - Checkpoint detection (writes to daily/*.md) - Checkpoint gate escalation at 12/20/30 thresholds - Memory search reminder timing - Rolling tool window (last 10) - Breadcrumb content generation - Counter reset behavior (breadcrumb vs checkpoint) - Config defaults

ilblackdragon

Review: feat(memory): add citation support, line tracking, and cognitive routines

This is a well-designed PR that brings valuable features (citations, cognitive routines, path traversal prevention). The architecture of the memory guardian with its dual-layer approach (prompt-level guidance + system-level enforcement) is thoughtful and well-documented. However, there are several issues that need to be addressed before merging.

Blocking: Compilation Failure (libSQL Backend)

The PR does not compile. cargo check fails with 3 errors in src/db/libsql_backend.rs:

error[E0063]: missing fields `line_end` and `line_start` in initializer of `MemoryChunk`
    --> src/db/libsql_backend.rs:2403:25

error[E0063]: missing fields `line_end`, `line_start` and `path` in initializer of `RankedResult`
    --> src/db/libsql_backend.rs:2460:30

error[E0063]: missing fields `line_end`, `line_start` and `path` in initializer of `RankedResult`
    --> src/db/libsql_backend.rs:2509:30

The MemoryChunk and RankedResult structs gained new fields (line_start, line_end, path), but the libSQL backend was not updated. Per CLAUDE.md: "All new features that touch persistence MUST support both backends." Additionally, src/db/libsql_migrations.rs needs the line_start and line_end columns added to the memory_chunks table schema.

Fixes needed:

Add line_start: None, line_end: None to the MemoryChunk construction at line 2403
Add path: String::new(), line_start: None, line_end: None to the RankedResult constructions at lines 2460 and 2509 (ideally, join memory_documents to fetch the actual path, same as the postgres repository does)
Update the FTS and vector search SQL queries in libsql_backend.rs to SELECT d.path, c.line_start, c.line_end
Add line_start and line_end columns to the memory_chunks CREATE TABLE in libsql_migrations.rs

Blocking: Integration Test Won't Compile

tests/cognitive_integration.rs line 115 constructs CognitiveConfig with only 4 of 8 fields:

let config = CognitiveConfig {
    pre_game_enabled: false,
    checkpointing_enabled: true,
    checkpoint_interval: 20,
    after_action_enabled: true,
};

Missing: guardian_enabled, breadcrumb_interval, gate_thresholds, memory_search_reminder_interval. This won't compile. Either add the missing fields or use ..Default::default().

`.unwrap()` in Production Code

Per CLAUDE.md: "Never use .unwrap() or .expect() in production code."

In src/workspace/chunker.rs, inside chunk_document_with_positions:

let last: ChunkWithPosition = chunks.pop().unwrap();

While this is logically safe (guarded by !chunks.is_empty()), it violates the project convention. Suggestion:

let Some(last) = chunks.pop() else { break; };

Feature Gate Issue

The PR removes #[cfg(feature = "postgres")] from pub use repository::Repository (in src/workspace/mod.rs) but the mod repository declaration still has the feature gate. With default features ["postgres", "libsql"] this works, but building with --no-default-features --features libsql would fail because mod repository is not compiled while pub use repository::Repository is unconditional. Either:

Keep the #[cfg(feature = "postgres")] on the pub use, or
Also remove it from mod repository (if Repository should be available for all backends)

Minor Issues

Trailing whitespace: Multiple files have trailing whitespace on blank lines (e.g., src/cli/memory.rs lines with trailing spaces). Run cargo fmt to clean up.
super:: in tests: src/agent/cognitive.rs uses use super::*; in the test module (line 855 of the diff). Per project conventions, crate:: imports are preferred over super::. However, use super::* in mod tests within the same file is a common Rust pattern and could be considered an exception here.
Security test dependency on PostgreSQL: All tests in tests/security_integration.rs require a PostgreSQL connection (deadpool_postgres::Pool). These tests will fail in CI environments without a running Postgres instance. Consider either:
- Gating them behind the integration feature
- Adding a connection check that skips the test if Postgres is unavailable
GeminiEmbeddings added but not documented: The PR adds a full GeminiEmbeddings provider (~240 lines) which is not mentioned in the PR description. This is a separate feature that could be its own PR. It also doesn't appear in the configuration docs or CLAUDE.md.

Positive Notes

Excellent documentation throughout cognitive.rs -- the "Why" sections for design decisions are exactly the right level of documentation
Path traversal prevention is well-implemented with both .. and null byte checks
The escalating checkpoint gate (gentle/firm/urgent) is a smart design choice
Comprehensive test coverage: 31 new tests covering the core logic
The sanitization of checkpoint text to prevent markdown/prompt injection is a nice security touch
Clean separation between prompt-level guidance and system-level enforcement

Please fix the compilation failures and I'm happy to approve.

tribendu · 2026-02-17T13:19:13Z

GitHub PR Review Batch 3 - nearai/ironclaw

PR #71: Fix undo/redo deadlock and checkpoint behavior

Summary

This PR addresses critical concurrency issues in the undo/redo system. The main changes are:

Reordering lock acquisitions in undo_turn() and redo_turn() to lock session before undo manager, matching process_user_input() pattern to prevent deadlocks
Changing undo() to return owned Checkpoint instead of &Checkpoint, fixing a bug where repeated undos returned the same checkpoint reference instead of walking backwards through history
Adding current_turn and current_messages parameters to redo() so it can save the current state before popping from redo stack
Adding comprehensive tests to verify undo/redo stack invariants and state preservation

Pros

Critical bug fix: The deadlock prevention is well-reasoned and follows existing patterns in the codebase
Ownership fix: Returning owned Checkpoint from undo() eliminates the borrow/snapshot bug that prevented repeated undos
Better API design: The new redo() signature is more explicit about state management
Excellent test coverage: Added 3 new tests covering repeated undo, undo/redo cycles, and stack size consistency
Documentation: Added helpful docstrings explaining the stack behavior and invariants

Concerns

Missing error handling: No check if thread.restore_from_messages() succeeds/fails before returning success
Clone overhead: The undo() function now clones messages when saving to redo stack, but this is unavoidable with the API change
Test gap: No test for the deadlock scenario itself (would require concurrent test setup)
Migration note: The change from &Checkpoint to Checkpoint is a breaking API change for any external users of UndoManager

Suggestions

Consider returning Result<Checkpoint, ...> from undo() and propagate restoration errors
Add integration test with concurrent calls to verify deadlock is truly fixed
Document the breaking API change in upgrade notes if this is a public library

PR #66: Update README architecture diagram to use Unicode box drawing

Summary

This PR updates the README.md architecture diagram from ASCII art to Unicode box drawing characters. The change improves readability and visual appeal of the system architecture diagram, using characters like ╭───╮, │, ● instead of +, -, |. The diagram structure and content remain unchanged.

Pros

Better aesthetics: Unicode box drawing creates cleaner, more professional diagrams
Improved readability: The visual distinction between components is clearer
Modern standard: Unicode box drawing is the standard for terminal-based diagrams in modern tools
Preserves structure: The logical diagram structure is unchanged

Concerns

Terminal compatibility: Some older terminals or Windows default fonts may not render Unicode box drawing correctly, showing replacement characters
Copy-paste issues: Unicode characters may cause issues in some markdown viewers or plain text contexts
Diff noise: This change makes future diff review harder since the diagram structure is visually the same but characters changed

Suggestions

Consider adding a note about terminal compatibility if users report issues
For CI environments, ensure the font/encoding is UTF-8 aware

PR #63: Memory Guardian and Cognitive Routines

Summary

This is a large PR (894 lines of new code) adding a comprehensive memory management system with two layers:

Cognitive Routines (Prompt-Level):

Pre-game routine (5-step checklist)
Checkpoint reminder templates
After-action review templates

Memory Guardian (System-Level):

Pre-compaction breadcrumbs (writes state snapshot before context loss)
Auto-breadcrumbs (every N tool calls, default 10)
Checkpoint gate (escalating pressure: gentle → firm → urgent at 12/20/30 calls)
Memory search tracking (reminds agent to search after 8 turns)

The PR also adds line number tracking to memory chunks for citation support (V9 migration), checkpoint tracker to Thread struct, and extensive testing (15 new tests).

Pros

Comprehensive documentation: Excellent module-level docs explaining the two-layer design rationale
Design rationale: Clear explanation of why both layers exist (instructions vs. enforcement)
Good defaults: Sensible thresholds (breadcrumb_interval=10, gate_thresholds=[12,20,30])
Defense in depth: Escalating pressure vs. hard blocking is a thoughtful UX choice
Sanitization functions: sanitize_checkpoint_text() prevents markdown/prompt injection
Extensive testing: 15 tests covering the full functionality

Concerns

Large PR: 894 lines makes review difficult; could be split into smaller PRs
Config not wired up: CognitiveConfig is defined but not loaded from configuration files or used to enable/disable features
Session lock pattern: Multiple session lock acquisitions in the agent loop could become a performance bottleneck with high tool call volume
Breadcrumb performance: Async I/O in the tool execution path for every Nth tool call adds latency
Missing integration tests: No tests verifying the guardian works end-to-end with compaction/reset
Tracker not persisted: CheckpointTracker is in-memory only, resets on session restart

Suggestions

Split into smaller PRs: (1) cognitive module structure, (2) guardian integration, (3) database migration
Wire up CognitiveConfig from config files/CLI flags so features can be toggled
Consider making breadcrumb writing non-blocking (spawn without await)
Document the performance impact of breadcrumb I/O
Add integration test verifying pre-compaction breadcrumb actually writes
Add metrics/logging for guardian actions (breadcrumb writes, gate triggers)

PR #62: Add Tinfoil private inference provider

Summary

This PR adds support for Tinfoil as a new LLM backend. The changes include:

Add Tinfoil variant to LlmBackend enum with string parsing
Add TinfoilConfig struct with api_key and model fields
Add create_tinfoil_provider() function that uses Rig's OpenAI client with https://inference.tinfoil.sh/v1 as base URL
Explicitly use completions_api() for Tinfoil (they don't support the newer Responses API)
Wire up in create_llm_provider() dispatch
Update setup wizard to include tinfoil field in config

Pros

Clean implementation: Follows existing provider patterns (Ollama, OpenAI-compatible)
Proper error handling: Returns LlmError::AuthFailed when tinfoil config is missing
Environment variable support: Reads TINFOIL_API_KEY and TINFOIL_MODEL from env with defaults
Good documentation: Note about why completions_api() is explicitly used

Concerns

Hardcoded base URL: TINFOIL_BASE_URL constant is not configurable via environment variable
No provider documentation: No usage examples or caveats in the main docs
Secret exposure: Uses expose_secret() on api_key when passing to Rig - this is the correct approach but worth verifying Rig doesn't log it
No health check: No verification that the Tinfoil endpoint works on startup

Suggestions

Make TINFOIL_BASE_URL configurable via environment variable
Add a brief section in the README about supported LLM providers including Tinfoil
Consider adding a quick health check or connection test on startup when tinfoil is configured

PR #61: Add PostgreSQL test workflow for CI

Summary

This PR adds a PostgreSQL with pgvector service to the GitHub Actions test workflow. The changes enable running tests against a real PostgreSQL database instead of libSQL-only testing. The service is configured with pgvector/pgvector:pg16 image and health checks.

Pros

Better test coverage: Enables testing against production database backend (PostgreSQL + pgvector)
Proper service configuration: Health checks, retries, and proper port mapping
Simple integration: Just adds the service and env var for DATABASE_URL

Concerns

No database fixtures: No test data setup or migrations for the test database
No test splitting: All tests still run against both backends; could be slow
Missing test matrix: Could benefit from matrix strategy testing libSQL and PostgreSQL in parallel
No cleanup between tests: Tests may interfere with each other's state
Flaky test risk: No fixture isolation or transaction rollback

Suggestions

Add a job matrix to test libSQL and PostgreSQL in parallel
Add migration step before running tests: cargo run -- database migrate
Consider wrapping tests in transactions with rollback for isolation
Add a separate integration test job vs. unit tests

PR #57: Sandbox job monitoring and credential grants persistence

Summary

This is a large PR (~6333 lines touched) introducing several features:

Job Monitor System: New job_monitor.rs module spawns background tasks that watch for events from sandbox jobs (especially Claude Code) and inject assistant messages back into the agent loop via an injection channel in ChannelManager
Credential Grants Persistence: Added credential_grants_json field to sandbox jobs for restoring credentials on job restart
Claude Code OAuth Token Extraction: New logic to read OAuth tokens from macOS Keychain or Linux ~/.claude/.credentials.json for passing to containers via ANTHROPIC_API_KEY env var
Database improvements: Changed LibSqlBackend::connect() to async and added PRAGMA busy_timeout = 5000 and PRAGMA journal_mode=WAL for better concurrency
NEAR AI URL migration: Changed default from https://cloud-api.near.ai to https://private.near.ai
Worker Docker image: Added GitHub CLI installation and improved layer caching
Model command enhancement: /model now lists available models with the active one marked

Pros

Job monitoring is useful: Forwarding Claude Code output to the main agent enables better visibility into sub-agent work
Good test coverage: Job monitor has 4 unit tests for message forwarding, filtering, and lifecycle
Credential persistence: Restoring grants on job restart is a good UX improvement
Database concurrency: WAL mode and busy_timeout reduce lock contention in libSQL
OAuth extraction: Clever use of existing Claude auth rather than re-implementing auth

Concerns

Monolithic PR: Too many unrelated changes in one PR (job monitor, credentials, database, NEAR AI URLs, Docker, model command)
Cross-platform fragility: OAuth token extraction depends on macOS Keychain commands and Linux file paths - could break on different systems
Secret handling: OAuth token is passed via ANTHROPIC_API_KEY env var - this could be leaked in logs if the container or tool doesn't sanitize
Channel coupling: Job monitor couples Claude Code specifically to the agent loop - not generic for any job type
Database breaking change: Making connect() async is a breaking change for any code calling it synchronously
No rate limiting: Job monitor could spam the agent loop if the sub-agent produces many messages quickly
Missing test forOAuth: No tests for the token extraction logic

Suggestions

Split into smaller focused PRs: (1) job monitor, (2) credential grants, (3) database changes (4) Docker/URL migrations
Add rate limiting to job monitor (e.g., max 1 message/second)
Make job monitor generic (protocol-agnostic) instead of Claude-specific
Add tests for OAuth token extraction on both platforms (mocking the Keychain/file read)
Document the connect() async migration as a breaking change
Add warning logs when job monitor lags or drops messages
Verify ANTHROPIC_API_KEY doesn't leak in container logs

Summary

- Add line_start, line_end to MemoryChunk in libsql_backend - Add path, line_start, line_end to RankedResult in FTS/vector queries - Add line_start, line_end columns to libsql migrations schema - Add get_opt_i64 helper for nullable integer columns - Fix CognitiveConfig in tests with ..Default::default() - Replace .unwrap() with safe patterns in chunker.rs - Add #[cfg(feature = "postgres")] to pub use repository::Repository

- Add connection check to security tests (skip if Postgres unavailable) - Document GeminiEmbeddings in CLAUDE.md config section - Update feature parity to mention Gemini embeddings provider

onlyamicrowave added 8 commits February 13, 2026 04:54

feat(embeddings): add Gemini embedding provider

22eda69

- Add GeminiEmbeddings struct with embedContent and batchEmbedContents support - Support gemini-embedding-001 (768 dims) and text-embedding-004 - Proper error handling for auth, rate limits, and API errors - Export from workspace module

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

src/agent/cognitive.rs Show resolved Hide resolved

src/workspace/embeddings.rs Show resolved Hide resolved

src/agent/cognitive.rs Show resolved Hide resolved

src/cli/memory.rs Outdated Show resolved Hide resolved

src/tools/builtin/memory.rs Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

src/cli/memory.rs Outdated Show resolved Hide resolved

tests/cognitive_integration.rs Outdated Show resolved Hide resolved

onlyamicrowave added 3 commits February 13, 2026 15:55

refactor: simplify read() line range logic with skip/take

5ec9c41

Replace verbose match block with map_or + iterator methods. More idiomatic Rust and easier to read.

joe-rlo marked this pull request as ready for review February 13, 2026 16:02

ilblackdragon mentioned this pull request Feb 14, 2026

feat: Memory & knowledge system enhancements (local embeddings, batching, citation, memory CLI) #87

Open

8 tasks

ilblackdragon and others added 2 commits February 13, 2026 22:17

Merge branch 'main' into microwave/memory-improvements

d701265

Merge branch 'main' into microwave/memory-improvements

349e5bd

joe-rlo and others added 4 commits February 15, 2026 03:07

Merge branch 'main' into microwave/memory-improvements

15f0a6d

Merge branch 'main' into microwave/memory-improvements

59be7c3

Merge branch 'main' into microwave/memory-improvements

9974efd

ilblackdragon requested changes Feb 17, 2026

View reviewed changes

tribendu mentioned this pull request Feb 17, 2026

fix: undo() peeks without popping, breaking repeated undo and leaking redo stack #71

Merged

5 tasks

This was referenced Feb 17, 2026

Revise architecture diagram for better representation #66

Open

feat: add Tinfoil private inference provider #62

Merged

This was referenced Feb 17, 2026

fix: Add PostgreSQL service to CI workflow for integration tests #61

Open

feat: Add PR review tools, job monitor, and channel injection for E2E sandbox workflows #57

Merged

onlyamicrowave and others added 4 commits February 17, 2026 13:46

fix: address remaining PR nearai#63 feedback

2c579cd

- Add connection check to security tests (skip if Postgres unavailable) - Document GeminiEmbeddings in CLAUDE.md config section - Update feature parity to mention Gemini embeddings provider

chore: remove trailing blank lines from Cargo.toml files

ef2d439

Merge branch 'main' into microwave/memory-improvements

470c2cd

joe-rlo requested a review from ilblackdragon February 17, 2026 14:03

ilblackdragon and others added 3 commits February 17, 2026 10:16

Merge branch 'main' into microwave/memory-improvements

62ab348

Merge branch 'main' into microwave/memory-improvements

61b52c5

Merge branch 'main' into microwave/memory-improvements

b930086

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): add citation support, line tracking, and cognitive routines#63

feat(memory): add citation support, line tracking, and cognitive routines#63
joe-rlo wants to merge 25 commits intonearai:mainfrom
joe-rlo:microwave/memory-improvements

joe-rlo commented Feb 13, 2026

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joe-rlo commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

tribendu commented Feb 15, 2026

Uh oh!

ilblackdragon left a comment

Uh oh!

tribendu commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

joe-rlo commented Feb 13, 2026

Summary

Changes

1. Memory Citations with Line Numbers

2. Cognitive Routines Module (src/agent/cognitive.rs)

3. Security Hardening

Testing

Motivation

Breaking Changes

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joe-rlo commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

tribendu commented Feb 15, 2026

Code Review: PR #63 - "feat(memory): add citation support, line tracking, and cognitive routines"

Summary

Pros

Excellent Test Coverage

Good Security Practices

Clean Code Organization

Practical Features

Database Design

Rust Best Practices

Concerns

1. Potential Bug in char_to_line Function (chunker.rs:157-161)

2. Word-Based Chunking May Break Context (chunker.rs:165-183)

3. Security: Path Traversal May Be Incomplete (workspace/mod.rs)

4. Empty Content Handling Inconsistent (chunker.rs)

5. Missing Backfill Implementation

6. Citation Format Ambiguity

7. No Concurrent Write Protection

8. Gemini Embeddings Error Handling

9. Memory Leak in CheckpointTracker

10. Test Coverage Gap

Suggestions

High Priority

Medium Priority

Low Priority

Overall Assessment

Additional Notes

Uh oh!

ilblackdragon left a comment

Choose a reason for hiding this comment

Review: feat(memory): add citation support, line tracking, and cognitive routines

Blocking: Compilation Failure (libSQL Backend)

Blocking: Integration Test Won't Compile

.unwrap() in Production Code

Feature Gate Issue

Minor Issues

Positive Notes

Uh oh!

tribendu commented Feb 17, 2026

GitHub PR Review Batch 3 - nearai/ironclaw

PR #71: Fix undo/redo deadlock and checkpoint behavior

Summary

Pros

Concerns

Suggestions

PR #66: Update README architecture diagram to use Unicode box drawing

Summary

2. Cognitive Routines Module (`src/agent/cognitive.rs`)

1. Potential Bug in `char_to_line` Function (chunker.rs:157-161)

`.unwrap()` in Production Code