-
Notifications
You must be signed in to change notification settings - Fork 613
Description
The Claude Agent SDK (Python) does not currently expose the Tool Search Tool and defer_loading capabilities that exist in the raw Anthropic API via the advanced-tool-use-2025-11-20 beta header. For applications with large MCP tool catalogs (50-200+ tools), this results in significant context window consumption before any conversation begins, degrading model performance and increasing costs.
We are requesting native support for deferred tool loading in the Claude Agent SDK, similar to how Claude Code implements it internally.
Environment
- Package:
claude-agent-sdk(Python) - Version: Latest (as of January 2025)
- Use Case: Enterprise financial analysis platform with 150+ MCP tools
- Deployment: Azure Container App with co-located Agent SDK and MCP tools
The Problem
Context Token Consumption
With 150+ tools registered, our tool definitions consume approximately 40-60K tokens before any user message is processed. This creates three critical issues:
- Degraded reasoning quality - Less context available for actual task completion
- Increased costs - Paying for tool schema tokens on every request
- Context overflow risk - Multi-turn conversations can breach context limits
For reference, the GitHub MCP server alone (91 tools) consumes ~46,000 tokens—22% of Claude Opus's context window (source).
The Solution Exists, But Not in the SDK
Anthropic released the advanced-tool-use-2025-11-20 beta in November 2025 with three features addressing this exact problem:
- Tool Search Tool - On-demand tool discovery
defer_loading: true- Lazy tool schema injection- Programmatic Tool Calling - Batch tool operations
These features are documented at:
- https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool
- https://www.anthropic.com/engineering/advanced-tool-use
However, the Claude Agent SDK does not expose these capabilities:
# This works with raw API
response = client.beta.messages.create(
betas=["advanced-tool-use-2025-11-20"],
tools=[{"name": "my_tool", "defer_loading": True, ...}],
...
)
# But Agent SDK has no equivalent
agent = Claude(
model="claude-sonnet-4-5-20250929",
tools=[...], # No defer_loading option
# No beta header configuration
)The only documented beta for Agent SDK is context-1m-2025-08-07:
SdkBeta = Literal["context-1m-2025-08-07"] # No advanced-tool-use betaOur Research: How Claude Code Implements Deferred Loading
We analyzed the Claude Code CLI source code to understand how deferred loading works internally, since the Agent SDK is based on Claude Code. Here are our findings:
1. Tool Eligibility Check
// From unminified/02-beautified/cli.js:283066-283069
function eV(A) {
if (A.isMcp === !0) return !0; // Only MCP tools are deferred
return !1; // Built-in tools: NEVER deferred
}Only MCP tools are eligible for deferred loading. Built-in tools are never deferred.
2. MCP Tool Detection
// From line 320000
return A.name?.startsWith('mcp__') || A.isMcp === !0;Tools are identified as MCP tools if:
- The name starts with
mcp__prefix, OR - The tool has
isMcp: trueproperty
3. Enable Modes via Environment Variable
The ENABLE_TOOL_SEARCH environment variable controls behavior:
| Value | Mode | Behavior |
|---|---|---|
true |
tst |
Tool Search always enabled |
<number> (e.g., 10) |
tst-auto |
Auto-enabled when deferred tool descriptions exceed threshold % of context |
false or 100 |
standard |
Disabled |
| Not set | tst-auto |
Defaults to auto mode |
4. The defer_loading API Flag
// From lines 439862-439863
if (K.deferLoading) z.defer_loading = !0;When a tool is marked for deferral, Claude Code sets defer_loading: true on the tool schema sent to the API.
5. ToolSearch Tool Injection
When deferred loading is active, Claude Code automatically includes a ToolSearch tool that supports:
- Keyword search:
"slack message"finds Slack-related tools - Direct selection:
"select:mcp__my_tool"loads a specific tool
The system tracks which tools Claude has "discovered" via ToolSearch and only includes those in subsequent API calls.
What's Missing in Claude Agent SDK
Based on our analysis, the Agent SDK appears to be missing:
| Feature | Claude Code | Agent SDK |
|---|---|---|
ENABLE_TOOL_SEARCH env var |
✅ Supported | ❓ Unknown/Undocumented |
mcp__ prefix detection |
✅ Supported | ❓ Unknown |
isMcp property handling |
✅ Supported | ❓ Unknown |
defer_loading on tool schemas |
✅ Supported | ❌ Not exposed |
| ToolSearch tool injection | ✅ Automatic | ❌ Not available |
advanced-tool-use-2025-11-20 beta |
✅ Used internally | ❌ Not configurable |
Proposed Solution
Option A: Expose Existing Claude Code Logic (Preferred)
If the Agent SDK inherits from Claude Code, expose the existing deferred loading configuration:
from claude_agent_sdk import Claude, AgentConfig
agent = Claude(
model="claude-sonnet-4-5-20250929",
config=AgentConfig(
enable_tool_search=True, # or "auto" or percentage threshold
),
mcp_servers=[...],
)Option B: Allow Beta Header Configuration
Allow users to specify beta headers for API calls:
agent = Claude(
model="claude-sonnet-4-5-20250929",
betas=["advanced-tool-use-2025-11-20"],
tools=[
{"name": "core_tool", "defer_loading": False, ...},
{"name": "specialized_tool", "defer_loading": True, ...},
],
)Option C: Native defer_loading Support on Tools
Add defer_loading parameter to tool registration:
from claude_agent_sdk import tool
@tool(
name="specialized_calculator",
description="...",
defer_loading=True, # New parameter
)
async def specialized_calculator(args):
...Questions for the Team
-
Does the Agent SDK inherit Claude Code's deferred loading logic? If so, is
ENABLE_TOOL_SEARCHenv var respected? -
Is there a planned timeline for exposing
defer_loadingand Tool Search Tool in the Agent SDK? -
What is the recommended workaround for large tool catalogs (1000+ tools) today? Should we:
- Use the raw Anthropic API instead of Agent SDK for tool-heavy operations?
- Implement client-side tool filtering before passing to the SDK?
- Something else?
-
Will
mcp__prefixed tools automatically getisMcp: truewhen registered viacreate_sdk_mcp_server?
Impact
This feature would enable:
- 85-95% reduction in initial context token usage
- Improved model reasoning with more context available for actual tasks
- Cost savings on API calls
- Support for enterprise-scale tool catalogs (1000-1500+ tools)
Without this feature, users with large tool catalogs must either:
- Accept degraded performance and higher costs
- Abandon Agent SDK for raw API usage
- Implement complex client-side tool filtering workarounds
Related Issues & References
- TypeScript SDK Feature Request: anthropics/claude-agent-sdk-typescript#124
- Anthropic Blog: Introducing Advanced Tool Use
- API Documentation: Tool Search Tool
Workaround (Current)
For others facing this issue, our current workaround is semantic pre-filtering:
from sentence_transformers import SentenceTransformer
import numpy as np
# Pre-compute embeddings at startup
model = SentenceTransformer("all-MiniLM-L6-v2")
tool_embeddings = model.encode([f"{t['name']}: {t['description']}" for t in ALL_TOOLS])
def get_relevant_tools(query: str, top_k: int = 15) -> list:
query_emb = model.encode(query)
similarities = np.dot(tool_embeddings, query_emb)
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [ALL_TOOLS[i] for i in top_indices]
# Filter tools before creating agent
relevant_tools = get_relevant_tools(user_query)
agent = Claude(model="claude-sonnet-4-5-20250929", tools=relevant_tools)This achieves ~85% context reduction but adds complexity and may miss relevant tools.
Thank you for considering this feature request. Happy to provide additional information or assist with testing.