Skip to content

Add intelligent tool response optimization with call_tool integration#267

Open
aponcedeleonch wants to merge 3 commits intomainfrom
call-tool-optimized
Open

Add intelligent tool response optimization with call_tool integration#267
aponcedeleonch wants to merge 3 commits intomainfrom
call-tool-optimized

Conversation

@aponcedeleonch
Copy link
Member

This adds a response optimization system that intelligently compresses large tool responses while preserving task-relevant information. The system integrates with call_tool to automatically optimize responses that exceed token thresholds.

Response Optimizer Features:

  • Content type classification (JSON, Markdown, unstructured text)
  • Structure-aware traversal using breadth-first strategy
  • LLMLingua-2 token-level summarization with ONNX model
  • Query hints for retrieving specific parts of original responses
  • KV store for temporary storage of original responses (TTL-based expiration)

New MCP Tool:

  • search_in_tool_response: Query stored responses using JQ (JSON), section headers (Markdown), or shell commands (text)

Database:

  • Added tool_responses table for KV store with session-based grouping
  • Indexed by session_key, expires_at, and tool_name

Configuration:

  • RESPONSE_OPTIMIZER_ENABLED: Enable intelligent optimization (default: false)
  • RESPONSE_OPTIMIZER_THRESHOLD: Token threshold for optimization (default: 1000)
  • RESPONSE_KV_TTL: TTL for stored responses in seconds (default: 300)
  • RESPONSE_HEAD_LINES/RESPONSE_TAIL_LINES: Lines preserved for unstructured text (default: 20)
  • LLMLINGUA_MODEL_PATH: Path to ONNX model directory (optional, see README)

AppWorld Experiment:

  • Example implementation using Pydantic AI agent with find_tool, call_tool, and search_in_tool_response
  • Task commands for running AppWorld experiments with resume capability
  • Measures task completion rates and response optimization effectiveness

Note: ONNX model files excluded from git (too large). See examples/call_tool_optimizer/README.md for export instructions.

@claude
Copy link

claude bot commented Jan 15, 2026

Code Review

Summary

This PR adds an intelligent response optimization system with call_tool integration. The implementation is well-structured and follows best practices. Below are specific feedback items:


🔍 Code Quality & Best Practices

Strong Points:

  • Clean separation of concerns with modular architecture (classifier, traversers, summarizers, query_executor)
  • Good use of Pydantic for validation and type safety
  • Async/await properly implemented throughout
  • Follows repository patterns from existing code

Issues:

  1. Verbose fallback in LLMLingua summarizer (src/mcp_optimizer/response_optimizer/summarizers/llmlingua.py:235-244)

    • Simple character-based truncation could be extracted to a utility function to reduce code duplication
    • Consider consolidating with the base summarizer's fallback logic
  2. Repetitive error handling in tool_response_ops.py

    • Multiple datetime parsing operations could use a helper method
    • Reduces code from ~40 lines to ~25 lines with better maintainability
  3. Migration date format inconsistency (migrations/versions/2025_08_18_0743-d2977d4c8c53_create_initial_tables.py)

    • Migration filename shows August 2025 (future date: 2025_08_18)
    • Should reflect actual creation date or use standardized naming

🐛 Potential Bugs

  1. Race condition in cleanup (src/mcp_optimizer/db/tool_response_ops.py:191-222)

    • cleanup_expired() count query + delete query pattern not atomic
    • If records expire between count and delete, count will be incorrect (minor impact)
    • Consider using RETURNING clause or accept slight inaccuracy
  2. Unsafe np.amax typing workaround (src/mcp_optimizer/response_optimizer/summarizers/llmlingua.py:133)

    • Comment says "work around numpy typing limitations"
    • Using Any bypasses type checking - could mask errors
    • Suggest: logits_max as np.ndarray if compatible
  3. Missing session_key validation in search_in_tool_response

    • Tool allows querying any response_id without session_key validation
    • Could expose responses across sessions if IDs are guessable
    • Consider adding session_key parameter for security

⚡ Performance Considerations

  1. ONNX model loading (src/mcp_optimizer/response_optimizer/summarizers/llmlingua.py:60-102)

    • Model loads synchronously on first use, blocking the event loop
    • Large models could cause latency spikes
    • Consider: async loading or eager initialization during server startup
  2. Token estimation fallback (src/mcp_optimizer/response_optimizer/optimizer.py:22-25)

    • Character-based estimation (len(text) // 4) is crude approximation
    • May cause over/under optimization for non-English or code-heavy content
    • Already mitigated by optional TokenCounter - good design
  3. Traverser recreation (src/mcp_optimizer/response_optimizer/optimizer.py:75-94)

    • Traversers use lazy initialization (good)
    • But TextTraverser gets head_lines/tail_lines which could change
    • Current design is correct, just noting for maintenance

🔒 Security Concerns

  1. TTL cleanup not automated

    • Expired responses remain in DB until manual cleanup_expired() call
    • Could lead to unbounded DB growth
    • Suggest: background task or per-request cleanup trigger
  2. JQ command injection risk (response_optimizer/query_executor.py assumed)

    • If query executor runs shell commands with user input, ensure proper sanitization
    • Couldn't verify without reading file - flag for review
  3. Database path permissions (config.py:426-484)

    • Good: Sets secure permissions (read/write owner only)
    • Issue: Skips security for /tmp and /data paths (line 633)
    • /tmp databases could be world-readable depending on system umask

🚀 Breaking Changes

None detected - All changes are additive:

  • New config flags default to disabled (response_optimizer_enabled=False)
  • New tools (search_in_tool_response) don't affect existing functionality
  • Legacy max_tool_response_tokens still supported

📝 Verbosity & Clarity

Excellent:

  • Comprehensive docstrings with Args/Returns
  • README with setup instructions and examples
  • Type hints throughout

Minor improvements:

  • Some log messages could be more concise (e.g., "Optimizing tool response" could include session_key for better debugging)
  • AppWorld experiment has many files - consider splitting README into setup vs usage docs

✅ Strengths

  1. Feature flag pattern allows safe rollout (response_optimizer_enabled)
  2. Graceful fallbacks when ONNX model unavailable
  3. Well-tested integration with Pydantic AI in examples
  4. Token metrics provide visibility into optimization effectiveness
  5. Follows repository conventions (uv, Taskfile, pydantic, no ORM)

📊 Metrics

  • Lines changed: 4459 additions, 36 deletions
  • Files touched: 30
  • New modules: 15
  • Test coverage: Not visible in PR (verify tests exist)

Recommendation

Approve with minor improvements suggested. The implementation is solid and follows best practices. Address security item #1 (TTL cleanup) before merging. Other items are nice-to-haves that can be addressed in follow-up PRs.

Estimated reading time: 1.5 minutes

@aponcedeleonch aponcedeleonch force-pushed the call-tool-optimized branch 5 times, most recently from e18cc82 to a911a68 Compare January 23, 2026 16:55
therealnb
therealnb previously approved these changes Jan 26, 2026
Copy link
Collaborator

@therealnb therealnb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good so far, but we'll wait for the experiment results before merging.

@aponcedeleonch aponcedeleonch force-pushed the call-tool-optimized branch 4 times, most recently from 1ff8cbd to 479239d Compare January 26, 2026 17:07
aponcedeleonch and others added 3 commits January 27, 2026 18:44
This adds a response optimization system that intelligently compresses large tool responses while preserving task-relevant information. The system integrates with call_tool to automatically optimize responses that exceed token thresholds.

Response Optimizer Features:
- Content type classification (JSON, Markdown, unstructured text)
- Structure-aware traversal using breadth-first strategy
- LLMLingua-2 token-level summarization with ONNX model
- Query hints for retrieving specific parts of original responses
- KV store for temporary storage of original responses (TTL-based expiration)

New MCP Tool:
- search_in_tool_response: Query stored responses using JQ (JSON), section headers (Markdown), or shell commands (text)

Database:
- Added tool_responses table for KV store with session-based grouping
- Indexed by session_key, expires_at, and tool_name

Configuration:
- RESPONSE_OPTIMIZER_ENABLED: Enable intelligent optimization (default: false)
- RESPONSE_OPTIMIZER_THRESHOLD: Token threshold for optimization (default: 1000)
- RESPONSE_KV_TTL: TTL for stored responses in seconds (default: 300)
- RESPONSE_HEAD_LINES/RESPONSE_TAIL_LINES: Lines preserved for unstructured text (default: 20)
- LLMLINGUA_MODEL_PATH: Path to ONNX model directory (optional, see README)

AppWorld Experiment:
- Example implementation using Pydantic AI agent with find_tool, call_tool, and search_in_tool_response
- Task commands for running AppWorld experiments with resume capability
- Measures task completion rates and response optimization effectiveness

Note: ONNX model files excluded from git (too large). See examples/call_tool_optimizer/README.md for export instructions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants