Add intelligent tool response optimization with call_tool integration by aponcedeleonch · Pull Request #267 · StacklokLabs/mcp-optimizer

aponcedeleonch · 2026-01-15T22:24:24Z

This adds a response optimization system that intelligently compresses large tool responses while preserving task-relevant information. The system integrates with call_tool to automatically optimize responses that exceed token thresholds.

Response Optimizer Features:

Content type classification (JSON, Markdown, unstructured text)
Structure-aware traversal using breadth-first strategy
LLMLingua-2 token-level summarization with ONNX model
Query hints for retrieving specific parts of original responses
KV store for temporary storage of original responses (TTL-based expiration)

New MCP Tool:

search_in_tool_response: Query stored responses using JQ (JSON), section headers (Markdown), or shell commands (text)

Database:

Added tool_responses table for KV store with session-based grouping
Indexed by session_key, expires_at, and tool_name

Configuration:

RESPONSE_OPTIMIZER_ENABLED: Enable intelligent optimization (default: false)
RESPONSE_OPTIMIZER_THRESHOLD: Token threshold for optimization (default: 1000)
RESPONSE_KV_TTL: TTL for stored responses in seconds (default: 300)
RESPONSE_HEAD_LINES/RESPONSE_TAIL_LINES: Lines preserved for unstructured text (default: 20)
LLMLINGUA_MODEL_PATH: Path to ONNX model directory (optional, see README)

AppWorld Experiment:

Example implementation using Pydantic AI agent with find_tool, call_tool, and search_in_tool_response
Task commands for running AppWorld experiments with resume capability
Measures task completion rates and response optimization effectiveness

Note: ONNX model files excluded from git (too large). See examples/call_tool_optimizer/README.md for export instructions.

claude · 2026-01-15T22:26:12Z

Code Review

Summary

This PR adds an intelligent response optimization system with call_tool integration. The implementation is well-structured and follows best practices. Below are specific feedback items:

🔍 Code Quality & Best Practices

Strong Points:

Clean separation of concerns with modular architecture (classifier, traversers, summarizers, query_executor)
Good use of Pydantic for validation and type safety
Async/await properly implemented throughout
Follows repository patterns from existing code

Issues:

Verbose fallback in LLMLingua summarizer (src/mcp_optimizer/response_optimizer/summarizers/llmlingua.py:235-244)
- Simple character-based truncation could be extracted to a utility function to reduce code duplication
- Consider consolidating with the base summarizer's fallback logic
Repetitive error handling in tool_response_ops.py
- Multiple datetime parsing operations could use a helper method
- Reduces code from ~40 lines to ~25 lines with better maintainability
Migration date format inconsistency (migrations/versions/2025_08_18_0743-d2977d4c8c53_create_initial_tables.py)
- Migration filename shows August 2025 (future date: 2025_08_18)
- Should reflect actual creation date or use standardized naming

🐛 Potential Bugs

Race condition in cleanup (src/mcp_optimizer/db/tool_response_ops.py:191-222)
- cleanup_expired() count query + delete query pattern not atomic
- If records expire between count and delete, count will be incorrect (minor impact)
- Consider using RETURNING clause or accept slight inaccuracy
Unsafe np.amax typing workaround (src/mcp_optimizer/response_optimizer/summarizers/llmlingua.py:133)
- Comment says "work around numpy typing limitations"
- Using Any bypasses type checking - could mask errors
- Suggest: logits_max as np.ndarray if compatible
Missing session_key validation in search_in_tool_response
- Tool allows querying any response_id without session_key validation
- Could expose responses across sessions if IDs are guessable
- Consider adding session_key parameter for security

⚡ Performance Considerations

ONNX model loading (src/mcp_optimizer/response_optimizer/summarizers/llmlingua.py:60-102)
- Model loads synchronously on first use, blocking the event loop
- Large models could cause latency spikes
- Consider: async loading or eager initialization during server startup
Token estimation fallback (src/mcp_optimizer/response_optimizer/optimizer.py:22-25)
- Character-based estimation (len(text) // 4) is crude approximation
- May cause over/under optimization for non-English or code-heavy content
- Already mitigated by optional TokenCounter - good design
Traverser recreation (src/mcp_optimizer/response_optimizer/optimizer.py:75-94)
- Traversers use lazy initialization (good)
- But TextTraverser gets head_lines/tail_lines which could change
- Current design is correct, just noting for maintenance

🔒 Security Concerns

TTL cleanup not automated
- Expired responses remain in DB until manual cleanup_expired() call
- Could lead to unbounded DB growth
- Suggest: background task or per-request cleanup trigger
JQ command injection risk (response_optimizer/query_executor.py assumed)
- If query executor runs shell commands with user input, ensure proper sanitization
- Couldn't verify without reading file - flag for review
Database path permissions (config.py:426-484)
- Good: Sets secure permissions (read/write owner only)
- Issue: Skips security for /tmp and /data paths (line 633)
- /tmp databases could be world-readable depending on system umask

🚀 Breaking Changes

None detected - All changes are additive:

New config flags default to disabled (response_optimizer_enabled=False)
New tools (search_in_tool_response) don't affect existing functionality
Legacy max_tool_response_tokens still supported

📝 Verbosity & Clarity

Excellent:

Comprehensive docstrings with Args/Returns
README with setup instructions and examples
Type hints throughout

Minor improvements:

Some log messages could be more concise (e.g., "Optimizing tool response" could include session_key for better debugging)
AppWorld experiment has many files - consider splitting README into setup vs usage docs

✅ Strengths

Feature flag pattern allows safe rollout (response_optimizer_enabled)
Graceful fallbacks when ONNX model unavailable
Well-tested integration with Pydantic AI in examples
Token metrics provide visibility into optimization effectiveness
Follows repository conventions (uv, Taskfile, pydantic, no ORM)

📊 Metrics

Lines changed: 4459 additions, 36 deletions
Files touched: 30
New modules: 15
Test coverage: Not visible in PR (verify tests exist)

Recommendation

Approve with minor improvements suggested. The implementation is solid and follows best practices. Address security item #1 (TTL cleanup) before merging. Other items are nice-to-haves that can be addressed in follow-up PRs.

Estimated reading time: 1.5 minutes

therealnb

I think it looks good so far, but we'll wait for the experiment results before merging.

This adds a response optimization system that intelligently compresses large tool responses while preserving task-relevant information. The system integrates with call_tool to automatically optimize responses that exceed token thresholds. Response Optimizer Features: - Content type classification (JSON, Markdown, unstructured text) - Structure-aware traversal using breadth-first strategy - LLMLingua-2 token-level summarization with ONNX model - Query hints for retrieving specific parts of original responses - KV store for temporary storage of original responses (TTL-based expiration) New MCP Tool: - search_in_tool_response: Query stored responses using JQ (JSON), section headers (Markdown), or shell commands (text) Database: - Added tool_responses table for KV store with session-based grouping - Indexed by session_key, expires_at, and tool_name Configuration: - RESPONSE_OPTIMIZER_ENABLED: Enable intelligent optimization (default: false) - RESPONSE_OPTIMIZER_THRESHOLD: Token threshold for optimization (default: 1000) - RESPONSE_KV_TTL: TTL for stored responses in seconds (default: 300) - RESPONSE_HEAD_LINES/RESPONSE_TAIL_LINES: Lines preserved for unstructured text (default: 20) - LLMLINGUA_MODEL_PATH: Path to ONNX model directory (optional, see README) AppWorld Experiment: - Example implementation using Pydantic AI agent with find_tool, call_tool, and search_in_tool_response - Task commands for running AppWorld experiments with resume capability - Measures task completion rates and response optimization effectiveness Note: ONNX model files excluded from git (too large). See examples/call_tool_optimizer/README.md for export instructions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

aponcedeleonch requested review from ptelang and therealnb January 15, 2026 22:24

aponcedeleonch force-pushed the call-tool-optimized branch 5 times, most recently from e18cc82 to a911a68 Compare January 23, 2026 16:55

therealnb previously approved these changes Jan 26, 2026

View reviewed changes

aponcedeleonch dismissed therealnb’s stale review via 226b64f January 26, 2026 14:01

aponcedeleonch force-pushed the call-tool-optimized branch 4 times, most recently from 1ff8cbd to 479239d Compare January 26, 2026 17:07

aponcedeleonch and others added 3 commits January 27, 2026 18:44

Updated appworld repo to stackloklabs fork

58c1206

Experiments running with resuming

a2e9dce

aponcedeleonch force-pushed the call-tool-optimized branch from 479239d to a2e9dce Compare January 27, 2026 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add intelligent tool response optimization with call_tool integration#267

Add intelligent tool response optimization with call_tool integration#267
aponcedeleonch wants to merge 3 commits intomainfrom
call-tool-optimized

aponcedeleonch commented Jan 15, 2026

Uh oh!

claude bot commented Jan 15, 2026

Uh oh!

therealnb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aponcedeleonch commented Jan 15, 2026

Uh oh!

claude bot commented Jan 15, 2026

Code Review

Summary

🔍 Code Quality & Best Practices

🐛 Potential Bugs

⚡ Performance Considerations

🔒 Security Concerns

🚀 Breaking Changes

📝 Verbosity & Clarity

✅ Strengths

📊 Metrics

Recommendation

Uh oh!

therealnb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants