[FEATURE] Support for Tool Search Tool and Deferred Loading (`defer_loading`) to Reduce Token Usage

The Claude Agent SDK (Python) does not currently expose the **Tool Search Tool** and **`defer_loading`** capabilities that exist in the raw Anthropic API via the `advanced-tool-use-2025-11-20` beta header. For applications with large MCP tool catalogs (50-200+ tools), this results in significant context window consumption before any conversation begins, degrading model performance and increasing costs.

We are requesting native support for deferred tool loading in the Claude Agent SDK, similar to how Claude Code implements it internally.

## Environment

- **Package**: `claude-agent-sdk` (Python)
- **Version**: Latest (as of January 2025)
- **Use Case**: Enterprise financial analysis platform with 150+ MCP tools
- **Deployment**: Azure Container App with co-located Agent SDK and MCP tools

## The Problem

### Context Token Consumption

With 150+ tools registered, our tool definitions consume approximately **40-60K tokens** before any user message is processed. This creates three critical issues:

1. **Degraded reasoning quality** - Less context available for actual task completion
2. **Increased costs** - Paying for tool schema tokens on every request
3. **Context overflow risk** - Multi-turn conversations can breach context limits

For reference, the GitHub MCP server alone (91 tools) consumes ~46,000 tokens—22% of Claude Opus's context window ([source](https://www.anthropic.com/engineering/advanced-tool-use)).

### The Solution Exists, But Not in the SDK

Anthropic released the `advanced-tool-use-2025-11-20` beta in November 2025 with three features addressing this exact problem:

1. **Tool Search Tool** - On-demand tool discovery
2. **`defer_loading: true`** - Lazy tool schema injection
3. **Programmatic Tool Calling** - Batch tool operations

These features are documented at:
- https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool
- https://www.anthropic.com/engineering/advanced-tool-use

However, the Claude Agent SDK does not expose these capabilities:
```python
# This works with raw API
response = client.beta.messages.create(
    betas=["advanced-tool-use-2025-11-20"],
    tools=[{"name": "my_tool", "defer_loading": True, ...}],
    ...
)

# But Agent SDK has no equivalent
agent = Claude(
    model="claude-sonnet-4-5-20250929",
    tools=[...],  # No defer_loading option
    # No beta header configuration
)
```

The only documented beta for Agent SDK is `context-1m-2025-08-07`:
```python
SdkBeta = Literal["context-1m-2025-08-07"]  # No advanced-tool-use beta
```

## Our Research: How Claude Code Implements Deferred Loading

We analyzed the Claude Code CLI source code to understand how deferred loading works internally, since the Agent SDK is based on Claude Code. Here are our findings:

### 1. Tool Eligibility Check
```javascript
// From unminified/02-beautified/cli.js:283066-283069
function eV(A) {
  if (A.isMcp === !0) return !0;  // Only MCP tools are deferred
  return !1;                       // Built-in tools: NEVER deferred
}
```

**Only MCP tools are eligible for deferred loading.** Built-in tools are never deferred.

### 2. MCP Tool Detection
```javascript
// From line 320000
return A.name?.startsWith('mcp__') || A.isMcp === !0;
```

Tools are identified as MCP tools if:
- The name starts with `mcp__` prefix, OR
- The tool has `isMcp: true` property

### 3. Enable Modes via Environment Variable

The `ENABLE_TOOL_SEARCH` environment variable controls behavior:

| Value | Mode | Behavior |
|-------|------|----------|
| `true` | `tst` | Tool Search always enabled |
| `<number>` (e.g., `10`) | `tst-auto` | Auto-enabled when deferred tool descriptions exceed threshold % of context |
| `false` or `100` | `standard` | Disabled |
| Not set | `tst-auto` | Defaults to auto mode |

### 4. The `defer_loading` API Flag
```javascript
// From lines 439862-439863
if (K.deferLoading) z.defer_loading = !0;
```

When a tool is marked for deferral, Claude Code sets `defer_loading: true` on the tool schema sent to the API.

### 5. ToolSearch Tool Injection

When deferred loading is active, Claude Code automatically includes a ToolSearch tool that supports:
- Keyword search: `"slack message"` finds Slack-related tools
- Direct selection: `"select:mcp__my_tool"` loads a specific tool

The system tracks which tools Claude has "discovered" via ToolSearch and only includes those in subsequent API calls.

## What's Missing in Claude Agent SDK

Based on our analysis, the Agent SDK appears to be missing:

| Feature | Claude Code | Agent SDK |
|---------|-------------|-----------|
| `ENABLE_TOOL_SEARCH` env var | ✅ Supported | ❓ Unknown/Undocumented |
| `mcp__` prefix detection | ✅ Supported | ❓ Unknown |
| `isMcp` property handling | ✅ Supported | ❓ Unknown |
| `defer_loading` on tool schemas | ✅ Supported | ❌ Not exposed |
| ToolSearch tool injection | ✅ Automatic | ❌ Not available |
| `advanced-tool-use-2025-11-20` beta | ✅ Used internally | ❌ Not configurable |

## Proposed Solution

### Option A: Expose Existing Claude Code Logic (Preferred)

If the Agent SDK inherits from Claude Code, expose the existing deferred loading configuration:
```python
from claude_agent_sdk import Claude, AgentConfig

agent = Claude(
    model="claude-sonnet-4-5-20250929",
    config=AgentConfig(
        enable_tool_search=True,  # or "auto" or percentage threshold
    ),
    mcp_servers=[...],
)
```

### Option B: Allow Beta Header Configuration

Allow users to specify beta headers for API calls:
```python
agent = Claude(
    model="claude-sonnet-4-5-20250929",
    betas=["advanced-tool-use-2025-11-20"],
    tools=[
        {"name": "core_tool", "defer_loading": False, ...},
        {"name": "specialized_tool", "defer_loading": True, ...},
    ],
)
```

### Option C: Native `defer_loading` Support on Tools

Add `defer_loading` parameter to tool registration:
```python
from claude_agent_sdk import tool

@tool(
    name="specialized_calculator",
    description="...",
    defer_loading=True,  # New parameter
)
async def specialized_calculator(args):
    ...
```

## Questions for the Team

1. **Does the Agent SDK inherit Claude Code's deferred loading logic?** If so, is `ENABLE_TOOL_SEARCH` env var respected?

2. **Is there a planned timeline** for exposing `defer_loading` and Tool Search Tool in the Agent SDK?

3. **What is the recommended workaround** for large tool catalogs (1000+ tools) today? Should we:
   - Use the raw Anthropic API instead of Agent SDK for tool-heavy operations?
   - Implement client-side tool filtering before passing to the SDK?
   - Something else?

4. **Will `mcp__` prefixed tools automatically get `isMcp: true`** when registered via `create_sdk_mcp_server`?

## Impact

This feature would enable:
- **85-95% reduction** in initial context token usage
- **Improved model reasoning** with more context available for actual tasks
- **Cost savings** on API calls
- **Support for enterprise-scale tool catalogs** (1000-1500+ tools)

Without this feature, users with large tool catalogs must either:
- Accept degraded performance and higher costs
- Abandon Agent SDK for raw API usage
- Implement complex client-side tool filtering workarounds

## Related Issues & References

- TypeScript SDK Feature Request: [anthropics/claude-agent-sdk-typescript#124](https://github.com/anthropics/claude-agent-sdk-typescript/issues/124)
- Anthropic Blog: [Introducing Advanced Tool Use](https://www.anthropic.com/engineering/advanced-tool-use)
- API Documentation: [Tool Search Tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool)

## Workaround (Current)

For others facing this issue, our current workaround is semantic pre-filtering:
```python
from sentence_transformers import SentenceTransformer
import numpy as np

# Pre-compute embeddings at startup
model = SentenceTransformer("all-MiniLM-L6-v2")
tool_embeddings = model.encode([f"{t['name']}: {t['description']}" for t in ALL_TOOLS])

def get_relevant_tools(query: str, top_k: int = 15) -> list:
    query_emb = model.encode(query)
    similarities = np.dot(tool_embeddings, query_emb)
    top_indices = np.argsort(similarities)[-top_k:][::-1]
    return [ALL_TOOLS[i] for i in top_indices]

# Filter tools before creating agent
relevant_tools = get_relevant_tools(user_query)
agent = Claude(model="claude-sonnet-4-5-20250929", tools=relevant_tools)
```

This achieves ~85% context reduction but adds complexity and may miss relevant tools.

---

**Thank you for considering this feature request.** Happy to provide additional information or assist with testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support for Tool Search Tool and Deferred Loading (`defer_loading`) to Reduce Token Usage #525

Environment

The Problem

Context Token Consumption

The Solution Exists, But Not in the SDK

Our Research: How Claude Code Implements Deferred Loading

1. Tool Eligibility Check

2. MCP Tool Detection

3. Enable Modes via Environment Variable

4. The `defer_loading` API Flag

5. ToolSearch Tool Injection

What's Missing in Claude Agent SDK

Proposed Solution

Option A: Expose Existing Claude Code Logic (Preferred)

Option B: Allow Beta Header Configuration

Option C: Native `defer_loading` Support on Tools

Questions for the Team

Impact

Related Issues & References

Workaround (Current)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Value	Mode	Behavior
`true`	`tst`	Tool Search always enabled
`<number>` (e.g., `10`)	`tst-auto`	Auto-enabled when deferred tool descriptions exceed threshold % of context
`false` or `100`	`standard`	Disabled
Not set	`tst-auto`	Defaults to auto mode

Feature	Claude Code	Agent SDK
`ENABLE_TOOL_SEARCH` env var	✅ Supported	❓ Unknown/Undocumented
`mcp__` prefix detection	✅ Supported	❓ Unknown
`isMcp` property handling	✅ Supported	❓ Unknown
`defer_loading` on tool schemas	✅ Supported	❌ Not exposed
ToolSearch tool injection	✅ Automatic	❌ Not available
`advanced-tool-use-2025-11-20` beta	✅ Used internally	❌ Not configurable

[FEATURE] Support for Tool Search Tool and Deferred Loading (defer_loading) to Reduce Token Usage #525

Description

Environment

The Problem

Context Token Consumption

The Solution Exists, But Not in the SDK

Our Research: How Claude Code Implements Deferred Loading

1. Tool Eligibility Check

2. MCP Tool Detection

3. Enable Modes via Environment Variable

4. The defer_loading API Flag

5. ToolSearch Tool Injection

What's Missing in Claude Agent SDK

Proposed Solution

Option A: Expose Existing Claude Code Logic (Preferred)

Option B: Allow Beta Header Configuration

Option C: Native defer_loading Support on Tools

Questions for the Team

Impact

Related Issues & References

Workaround (Current)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[FEATURE] Support for Tool Search Tool and Deferred Loading (`defer_loading`) to Reduce Token Usage #525

4. The `defer_loading` API Flag

Option C: Native `defer_loading` Support on Tools