Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,14 @@ Check out a variety of sample implementations of the SDK in the examples section
Simple deep research clone that demonstrates complex multi-agent research workflows.

- **[tools](https://github.com/openai/openai-agents-python/tree/main/examples/tools):**
Learn how to implement OAI hosted tools such as:
Learn how to implement OAI hosted tools and experimental Codex tooling such as:

- Web search and web search with filters
- File search
- Code interpreter
- Computer use
- Image generation
- Experimental Codex tool workflows (`examples/tools/codex.py`)

- **[voice](https://github.com/openai/openai-agents-python/tree/main/examples/voice):**
See examples of voice agents, using our TTS and STT models, including streamed voice examples.
130 changes: 130 additions & 0 deletions docs/human_in_the_loop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Human-in-the-loop

Use the human-in-the-loop (HITL) flow to pause agent execution until a person approves or rejects sensitive tool calls. Tools declare when they need approval, run results surface pending approvals as interruptions, and `RunState` lets you serialize and resume runs after decisions are made.

## Marking tools that need approval

Set `needs_approval` to `True` to always require approval or provide an async function that decides per call. The callable receives the run context, parsed tool parameters, and the tool call ID.

```python
from agents import Agent, Runner, function_tool


@function_tool(needs_approval=True)
async def cancel_order(order_id: int) -> str:
return f"Cancelled order {order_id}"


async def requires_review(_ctx, params, _call_id) -> bool:
return "refund" in params.get("subject", "").lower()


@function_tool(needs_approval=requires_review)
async def send_email(subject: str, body: str) -> str:
return f"Sent '{subject}'"


agent = Agent(
name="Support agent",
instructions="Handle tickets and ask for approval when needed.",
tools=[cancel_order, send_email],
)
```

`needs_approval` is available on [`function_tool`][agents.tool.function_tool], [`Agent.as_tool`][agents.agent.Agent.as_tool], [`ShellTool`][agents.tool.ShellTool], and [`ApplyPatchTool`][agents.tool.ApplyPatchTool]. Local MCP servers also support approvals through `require_approval` on [`MCPServerStdio`][agents.mcp.server.MCPServerStdio], [`MCPServerSse`][agents.mcp.server.MCPServerSse], and [`MCPServerStreamableHttp`][agents.mcp.server.MCPServerStreamableHttp]. Hosted MCP servers support approvals via [`HostedMCPTool`][agents.tool.HostedMCPTool] with `tool_config={"require_approval": "always"}` and an optional `on_approval_request` callback. Shell and apply_patch tools accept an `on_approval` callback if you want to auto-approve or auto-reject without surfacing an interruption.

## How the approval flow works

1. When the model emits a tool call, the runner evaluates `needs_approval`.
2. If an approval decision for that tool call is already stored in the [`RunContextWrapper`][agents.run_context.RunContextWrapper] (for example, from `always_approve=True`), the runner proceeds without prompting. Per-call approvals are scoped to the specific call ID; use `always_approve=True` to allow future calls automatically.
3. Otherwise, execution pauses and `RunResult.interruptions` (or `RunResultStreaming.interruptions`) contains `ToolApprovalItem` entries with details such as `agent.name`, `name`, and `arguments`.
4. Convert the result to a `RunState` with `result.to_state()`, call `state.approve(...)` or `state.reject(...)` (optionally passing `always_approve` or `always_reject`), and then resume with `Runner.run(agent, state)` or `Runner.run_streamed(agent, state)`.
5. The resumed run continues where it left off and will re-enter this flow if new approvals are needed.

## Example: pause, approve, resume

The snippet below mirrors the JavaScript HITL guide: it pauses when a tool needs approval, persists state to disk, reloads it, and resumes after collecting a decision.

```python
import asyncio
import json
from pathlib import Path

from agents import Agent, Runner, RunState, function_tool


async def needs_oakland_approval(_ctx, params, _call_id) -> bool:
return "Oakland" in params.get("city", "")


@function_tool(needs_approval=needs_oakland_approval)
async def get_temperature(city: str) -> str:
return f"The temperature in {city} is 20° Celsius"


agent = Agent(
name="Weather assistant",
instructions="Answer weather questions with the provided tools.",
tools=[get_temperature],
)

STATE_PATH = Path(".cache/hitl_state.json")


def prompt_approval(tool_name: str, arguments: str | None) -> bool:
answer = input(f"Approve {tool_name} with {arguments}? [y/N]: ").strip().lower()
return answer in {"y", "yes"}


async def main() -> None:
result = await Runner.run(agent, "What is the temperature in Oakland?")

while result.interruptions:
# Persist the paused state.
state = result.to_state()
STATE_PATH.parent.mkdir(parents=True, exist_ok=True)
STATE_PATH.write_text(state.to_string())

# Load the state later (could be a different process).
stored = json.loads(STATE_PATH.read_text())
state = await RunState.from_json(agent, stored)

for interruption in result.interruptions:
approved = await asyncio.get_running_loop().run_in_executor(
None, prompt_approval, interruption.name or "unknown_tool", interruption.arguments
)
if approved:
state.approve(interruption, always_approve=False)
else:
state.reject(interruption)

result = await Runner.run(agent, state)

print(result.final_output)


if __name__ == "__main__":
asyncio.run(main())
```

In this example, `prompt_approval` is synchronous because it uses `input()` and is executed with `run_in_executor(...)`. If your approval source is already asynchronous (for example, an HTTP request or async database query), you can use an `async def` function and `await` it directly instead.

To stream output while waiting for approvals, call `Runner.run_streamed`, consume `result.stream_events()` until it completes, and then follow the same `result.to_state()` and resume steps shown above.

## Other patterns in this repository

- **Streaming approvals**: `examples/agent_patterns/human_in_the_loop_stream.py` shows how to drain `stream_events()` and then approve pending tool calls before resuming with `Runner.run_streamed(agent, state)`.
- **Agent as tool approvals**: `Agent.as_tool(..., needs_approval=...)` applies the same interruption flow when delegated agent tasks need review.
- **Shell and apply_patch tools**: `ShellTool` and `ApplyPatchTool` also support `needs_approval`. Use `state.approve(interruption, always_approve=True)` or `state.reject(..., always_reject=True)` to cache the decision for future calls. For automatic decisions, provide `on_approval` (see `examples/tools/shell.py`); for manual decisions, handle interruptions (see `examples/tools/shell_human_in_the_loop.py`).
- **Local MCP servers**: Use `require_approval` on `MCPServerStdio` / `MCPServerSse` / `MCPServerStreamableHttp` to gate MCP tool calls (see `examples/mcp/get_all_mcp_tools_example/main.py` and `examples/mcp/tool_filter_example/main.py`).
- **Hosted MCP servers**: Set `require_approval` to `"always"` on `HostedMCPTool` to force HITL, optionally providing `on_approval_request` to auto-approve or reject (see `examples/hosted_mcp/human_in_the_loop.py` and `examples/hosted_mcp/on_approval.py`). Use `"never"` for trusted servers (`examples/hosted_mcp/simple.py`).
- **Sessions and memory**: Pass a session to `Runner.run` so approvals and conversation history survive multiple turns. SQLite and OpenAI Conversations session variants are in `examples/memory/memory_session_hitl_example.py` and `examples/memory/openai_session_hitl_example.py`.
- **Realtime agents**: The realtime demo exposes WebSocket messages that approve or reject tool calls via `approve_tool_call` / `reject_tool_call` on the `RealtimeSession` (see `examples/realtime/app/server.py` for the server-side handlers).

## Long-running approvals

`RunState` is designed to be durable. Use `state.to_json()` or `state.to_string()` to store pending work in a database or queue and recreate it later with `RunState.from_json(...)` or `RunState.from_string(...)`. Pass `context_override` if you do not want to persist sensitive context data in the serialized payload.

## Versioning pending tasks

If approvals may sit for a while, store a version marker for your agent definitions or SDK alongside the serialized state. You can then route deserialization to the matching code path to avoid incompatibilities when models, prompts, or tool definitions change.
82 changes: 82 additions & 0 deletions docs/mcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,33 @@ matrix below summarises the options that the Python SDK supports.

The sections below walk through each option, how to configure it, and when to prefer one transport over another.

## Agent-level MCP configuration

In addition to choosing a transport, you can tune how MCP tools are prepared by setting `Agent.mcp_config`.

```python
from agents import Agent

agent = Agent(
name="Assistant",
mcp_servers=[server],
mcp_config={
# Try to convert MCP tool schemas to strict JSON schema.
"convert_schemas_to_strict": True,
# If None, MCP tool failures are raised as exceptions instead of
# returning model-visible error text.
"failure_error_function": None,
},
)
```

Notes:

- `convert_schemas_to_strict` is best-effort. If a schema cannot be converted, the original schema is used.
- `failure_error_function` controls how MCP tool call failures are surfaced to the model.
- When `failure_error_function` is unset, the SDK uses the default tool error formatter.
- Server-level `failure_error_function` overrides `Agent.mcp_config["failure_error_function"]` for that server.

## 1. Hosted MCP server tools

Hosted tools push the entire tool round-trip into OpenAI's infrastructure. Instead of your code listing and calling tools, the
Expand Down Expand Up @@ -178,6 +205,61 @@ The constructor accepts additional options:
- `use_structured_content` toggles whether `tool_result.structured_content` is preferred over textual output.
- `max_retry_attempts` and `retry_backoff_seconds_base` add automatic retries for `list_tools()` and `call_tool()`.
- `tool_filter` lets you expose only a subset of tools (see [Tool filtering](#tool-filtering)).
- `require_approval` enables human-in-the-loop approval policies on local MCP tools.
- `failure_error_function` customizes model-visible MCP tool failure messages; set it to `None` to raise errors instead.
- `tool_meta_resolver` injects per-call MCP `_meta` payloads before `call_tool()`.

### Approval policies for local MCP servers

`MCPServerStdio`, `MCPServerSse`, and `MCPServerStreamableHttp` all accept `require_approval`.

Supported forms:

- `"always"` or `"never"` for all tools.
- `True` / `False` (equivalent to always/never).
- A per-tool map, for example `{"delete_file": "always", "read_file": "never"}`.
- A grouped object:
`{"always": {"tool_names": [...]}, "never": {"tool_names": [...]}}`.

```python
async with MCPServerStreamableHttp(
name="Filesystem MCP",
params={"url": "http://localhost:8000/mcp"},
require_approval={"always": {"tool_names": ["delete_file"]}},
) as server:
...
```

For a full pause/resume flow, see [Human-in-the-loop](human_in_the_loop.md) and `examples/mcp/get_all_mcp_tools_example/main.py`.

### Per-call metadata with `tool_meta_resolver`

Use `tool_meta_resolver` when your MCP server expects request metadata in `_meta` (for example, tenant IDs or trace context). The example below assumes you pass a `dict` as `context` to `Runner.run(...)`.

```python
from agents.mcp import MCPServerStreamableHttp, MCPToolMetaContext


def resolve_meta(context: MCPToolMetaContext) -> dict[str, str] | None:
run_context_data = context.run_context.context or {}
tenant_id = run_context_data.get("tenant_id")
if tenant_id is None:
return None
return {"tenant_id": str(tenant_id), "source": "agents-sdk"}


server = MCPServerStreamableHttp(
name="Metadata-aware MCP",
params={"url": "http://localhost:8000/mcp"},
tool_meta_resolver=resolve_meta,
)
```

If your run context is a Pydantic model, dataclass, or custom class, read the tenant ID with attribute access instead.

### MCP tool outputs: text and images

When an MCP tool returns image content, the SDK maps it to image tool output entries automatically. Mixed text/image responses are forwarded as a list of output items, so agents can consume MCP image results the same way they consume image output from regular function tools.

## 3. HTTP with SSE MCP servers

Expand Down
3 changes: 3 additions & 0 deletions docs/ref/agent_tool_input.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Agent Tool Input`

::: agents.agent_tool_input
3 changes: 3 additions & 0 deletions docs/ref/agent_tool_state.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Agent Tool State`

::: agents.agent_tool_state
3 changes: 3 additions & 0 deletions docs/ref/memory/session_settings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Session Settings`

::: agents.memory.session_settings
3 changes: 3 additions & 0 deletions docs/ref/run_config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Run Config`

::: agents.run_config
3 changes: 3 additions & 0 deletions docs/ref/run_error_handlers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Run Error Handlers`

::: agents.run_error_handlers
3 changes: 3 additions & 0 deletions docs/ref/run_internal/agent_runner_helpers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Agent Runner Helpers`

::: agents.run_internal.agent_runner_helpers
3 changes: 3 additions & 0 deletions docs/ref/run_internal/approvals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Approvals`

::: agents.run_internal.approvals
3 changes: 3 additions & 0 deletions docs/ref/run_internal/error_handlers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Error Handlers`

::: agents.run_internal.error_handlers
3 changes: 3 additions & 0 deletions docs/ref/run_internal/guardrails.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Guardrails`

::: agents.run_internal.guardrails
3 changes: 3 additions & 0 deletions docs/ref/run_internal/items.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Items`

::: agents.run_internal.items
3 changes: 3 additions & 0 deletions docs/ref/run_internal/oai_conversation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Oai Conversation`

::: agents.run_internal.oai_conversation
3 changes: 3 additions & 0 deletions docs/ref/run_internal/run_loop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Run Loop`

::: agents.run_internal.run_loop
3 changes: 3 additions & 0 deletions docs/ref/run_internal/run_steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Run Steps`

::: agents.run_internal.run_steps
3 changes: 3 additions & 0 deletions docs/ref/run_internal/session_persistence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Session Persistence`

::: agents.run_internal.session_persistence
3 changes: 3 additions & 0 deletions docs/ref/run_internal/streaming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Streaming`

::: agents.run_internal.streaming
3 changes: 3 additions & 0 deletions docs/ref/run_internal/tool_actions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Tool Actions`

::: agents.run_internal.tool_actions
3 changes: 3 additions & 0 deletions docs/ref/run_internal/tool_execution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Tool Execution`

::: agents.run_internal.tool_execution
3 changes: 3 additions & 0 deletions docs/ref/run_internal/tool_planning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Tool Planning`

::: agents.run_internal.tool_planning
3 changes: 3 additions & 0 deletions docs/ref/run_internal/tool_use_tracker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Tool Use Tracker`

::: agents.run_internal.tool_use_tracker
3 changes: 3 additions & 0 deletions docs/ref/run_internal/turn_preparation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Turn Preparation`

::: agents.run_internal.turn_preparation
3 changes: 3 additions & 0 deletions docs/ref/run_internal/turn_resolution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Turn Resolution`

::: agents.run_internal.turn_resolution
3 changes: 3 additions & 0 deletions docs/ref/run_state.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Run State`

::: agents.run_state
3 changes: 3 additions & 0 deletions docs/ref/tracing/context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Context`

::: agents.tracing.context
3 changes: 3 additions & 0 deletions docs/ref/tracing/model_tracing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `Model Tracing`

::: agents.tracing.model_tracing
14 changes: 14 additions & 0 deletions docs/release.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,20 @@ We will increment `Z` for non-breaking changes:

## Breaking change changelog

### 0.8.0

In this version, two runtime behavior changes may require migration work:

- Function tools wrapping **synchronous** Python callables now execute on worker threads via `asyncio.to_thread(...)` instead of running on the event loop thread. If your tool logic depends on thread-local state or thread-affine resources, migrate to an async tool implementation or make thread affinity explicit in your tool code.
- Local MCP tool failure handling is now configurable, and the default behavior can return model-visible error output instead of failing the whole run. If you rely on fail-fast semantics, set `mcp_config={"failure_error_function": None}`. Server-level `failure_error_function` values override the agent-level setting, so set `failure_error_function=None` on each local MCP server that has an explicit handler.

### 0.7.0

In this version, there were a few behavior changes that can affect existing applications:

- Nested handoff history is now **opt-in** (disabled by default). If you depended on the v0.6.x default nested behavior, explicitly set `RunConfig(nest_handoff_history=True)`.
- The default `reasoning.effort` for `gpt-5.1` / `gpt-5.2` changed to `"none"` (from the previous default `"low"` configured by SDK defaults). If your prompts or quality/cost profile relied on `"low"`, set it explicitly in `model_settings`.

### 0.6.0

In this version, the default handoff history is now packaged into a single assistant message instead of exposing the raw user/assistant turns, giving downstream agents a concise, predictable recap
Expand Down
27 changes: 27 additions & 0 deletions docs/results.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,30 @@ The [`raw_responses`][agents.result.RunResultBase.raw_responses] property contai
### Original input

The [`input`][agents.result.RunResultBase.input] property contains the original input you provided to the `run` method. In most cases you won't need this, but it's available in case you do.

### Interruptions and resuming runs

If a run pauses for tool approval, pending approvals are exposed in [`interruptions`][agents.result.RunResultBase.interruptions]. Convert the result into a [`RunState`][agents.run_state.RunState] with `to_state()`, approve or reject the interruption(s), and resume with `Runner.run(...)` or `Runner.run_streamed(...)`.

```python
from agents import Agent, Runner

agent = Agent(name="Assistant", instructions="Use tools when needed.")
result = await Runner.run(agent, "Delete temp files that are no longer needed.")

if result.interruptions:
state = result.to_state()
for interruption in result.interruptions:
state.approve(interruption)
result = await Runner.run(agent, state)
```

Both [`RunResult`][agents.result.RunResult] and [`RunResultStreaming`][agents.result.RunResultStreaming] support `to_state()`.

### Convenience helpers

`RunResultBase` includes a few helper methods/properties that are useful in production flows:

- [`final_output_as(...)`][agents.result.RunResultBase.final_output_as] casts final output to a specific type (optionally with runtime type checking).
- [`last_response_id`][agents.result.RunResultBase.last_response_id] returns the latest model response ID, useful for response chaining.
- [`release_agents(...)`][agents.result.RunResultBase.release_agents] drops strong references to agents when you want to reduce memory pressure after inspecting results.
Loading