Skip to content

[FEATURE] Support for Tool Search Tool and Deferred Loading (defer_loading) to Reduce Token Usage #525

@haseebrj17

Description

@haseebrj17

The Claude Agent SDK (Python) does not currently expose the Tool Search Tool and defer_loading capabilities that exist in the raw Anthropic API via the advanced-tool-use-2025-11-20 beta header. For applications with large MCP tool catalogs (50-200+ tools), this results in significant context window consumption before any conversation begins, degrading model performance and increasing costs.

We are requesting native support for deferred tool loading in the Claude Agent SDK, similar to how Claude Code implements it internally.

Environment

  • Package: claude-agent-sdk (Python)
  • Version: Latest (as of January 2025)
  • Use Case: Enterprise financial analysis platform with 150+ MCP tools
  • Deployment: Azure Container App with co-located Agent SDK and MCP tools

The Problem

Context Token Consumption

With 150+ tools registered, our tool definitions consume approximately 40-60K tokens before any user message is processed. This creates three critical issues:

  1. Degraded reasoning quality - Less context available for actual task completion
  2. Increased costs - Paying for tool schema tokens on every request
  3. Context overflow risk - Multi-turn conversations can breach context limits

For reference, the GitHub MCP server alone (91 tools) consumes ~46,000 tokens—22% of Claude Opus's context window (source).

The Solution Exists, But Not in the SDK

Anthropic released the advanced-tool-use-2025-11-20 beta in November 2025 with three features addressing this exact problem:

  1. Tool Search Tool - On-demand tool discovery
  2. defer_loading: true - Lazy tool schema injection
  3. Programmatic Tool Calling - Batch tool operations

These features are documented at:

However, the Claude Agent SDK does not expose these capabilities:

# This works with raw API
response = client.beta.messages.create(
    betas=["advanced-tool-use-2025-11-20"],
    tools=[{"name": "my_tool", "defer_loading": True, ...}],
    ...
)

# But Agent SDK has no equivalent
agent = Claude(
    model="claude-sonnet-4-5-20250929",
    tools=[...],  # No defer_loading option
    # No beta header configuration
)

The only documented beta for Agent SDK is context-1m-2025-08-07:

SdkBeta = Literal["context-1m-2025-08-07"]  # No advanced-tool-use beta

Our Research: How Claude Code Implements Deferred Loading

We analyzed the Claude Code CLI source code to understand how deferred loading works internally, since the Agent SDK is based on Claude Code. Here are our findings:

1. Tool Eligibility Check

// From unminified/02-beautified/cli.js:283066-283069
function eV(A) {
  if (A.isMcp === !0) return !0;  // Only MCP tools are deferred
  return !1;                       // Built-in tools: NEVER deferred
}

Only MCP tools are eligible for deferred loading. Built-in tools are never deferred.

2. MCP Tool Detection

// From line 320000
return A.name?.startsWith('mcp__') || A.isMcp === !0;

Tools are identified as MCP tools if:

  • The name starts with mcp__ prefix, OR
  • The tool has isMcp: true property

3. Enable Modes via Environment Variable

The ENABLE_TOOL_SEARCH environment variable controls behavior:

Value Mode Behavior
true tst Tool Search always enabled
<number> (e.g., 10) tst-auto Auto-enabled when deferred tool descriptions exceed threshold % of context
false or 100 standard Disabled
Not set tst-auto Defaults to auto mode

4. The defer_loading API Flag

// From lines 439862-439863
if (K.deferLoading) z.defer_loading = !0;

When a tool is marked for deferral, Claude Code sets defer_loading: true on the tool schema sent to the API.

5. ToolSearch Tool Injection

When deferred loading is active, Claude Code automatically includes a ToolSearch tool that supports:

  • Keyword search: "slack message" finds Slack-related tools
  • Direct selection: "select:mcp__my_tool" loads a specific tool

The system tracks which tools Claude has "discovered" via ToolSearch and only includes those in subsequent API calls.

What's Missing in Claude Agent SDK

Based on our analysis, the Agent SDK appears to be missing:

Feature Claude Code Agent SDK
ENABLE_TOOL_SEARCH env var ✅ Supported ❓ Unknown/Undocumented
mcp__ prefix detection ✅ Supported ❓ Unknown
isMcp property handling ✅ Supported ❓ Unknown
defer_loading on tool schemas ✅ Supported ❌ Not exposed
ToolSearch tool injection ✅ Automatic ❌ Not available
advanced-tool-use-2025-11-20 beta ✅ Used internally ❌ Not configurable

Proposed Solution

Option A: Expose Existing Claude Code Logic (Preferred)

If the Agent SDK inherits from Claude Code, expose the existing deferred loading configuration:

from claude_agent_sdk import Claude, AgentConfig

agent = Claude(
    model="claude-sonnet-4-5-20250929",
    config=AgentConfig(
        enable_tool_search=True,  # or "auto" or percentage threshold
    ),
    mcp_servers=[...],
)

Option B: Allow Beta Header Configuration

Allow users to specify beta headers for API calls:

agent = Claude(
    model="claude-sonnet-4-5-20250929",
    betas=["advanced-tool-use-2025-11-20"],
    tools=[
        {"name": "core_tool", "defer_loading": False, ...},
        {"name": "specialized_tool", "defer_loading": True, ...},
    ],
)

Option C: Native defer_loading Support on Tools

Add defer_loading parameter to tool registration:

from claude_agent_sdk import tool

@tool(
    name="specialized_calculator",
    description="...",
    defer_loading=True,  # New parameter
)
async def specialized_calculator(args):
    ...

Questions for the Team

  1. Does the Agent SDK inherit Claude Code's deferred loading logic? If so, is ENABLE_TOOL_SEARCH env var respected?

  2. Is there a planned timeline for exposing defer_loading and Tool Search Tool in the Agent SDK?

  3. What is the recommended workaround for large tool catalogs (1000+ tools) today? Should we:

    • Use the raw Anthropic API instead of Agent SDK for tool-heavy operations?
    • Implement client-side tool filtering before passing to the SDK?
    • Something else?
  4. Will mcp__ prefixed tools automatically get isMcp: true when registered via create_sdk_mcp_server?

Impact

This feature would enable:

  • 85-95% reduction in initial context token usage
  • Improved model reasoning with more context available for actual tasks
  • Cost savings on API calls
  • Support for enterprise-scale tool catalogs (1000-1500+ tools)

Without this feature, users with large tool catalogs must either:

  • Accept degraded performance and higher costs
  • Abandon Agent SDK for raw API usage
  • Implement complex client-side tool filtering workarounds

Related Issues & References

Workaround (Current)

For others facing this issue, our current workaround is semantic pre-filtering:

from sentence_transformers import SentenceTransformer
import numpy as np

# Pre-compute embeddings at startup
model = SentenceTransformer("all-MiniLM-L6-v2")
tool_embeddings = model.encode([f"{t['name']}: {t['description']}" for t in ALL_TOOLS])

def get_relevant_tools(query: str, top_k: int = 15) -> list:
    query_emb = model.encode(query)
    similarities = np.dot(tool_embeddings, query_emb)
    top_indices = np.argsort(similarities)[-top_k:][::-1]
    return [ALL_TOOLS[i] for i in top_indices]

# Filter tools before creating agent
relevant_tools = get_relevant_tools(user_query)
agent = Claude(model="claude-sonnet-4-5-20250929", tools=relevant_tools)

This achieves ~85% context reduction but adds complexity and may miss relevant tools.


Thank you for considering this feature request. Happy to provide additional information or assist with testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions