Skip to content

Python: Fix: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions#3722

Open
Copilot wants to merge 9 commits intomainfrom
copilot/fix-agentthread-tool-resubmission
Open

Python: Fix: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions#3722
Copilot wants to merge 9 commits intomainfrom
copilot/fix-agentthread-tool-resubmission

Conversation

Copy link
Contributor

Copilot AI commented Feb 6, 2026

Motivation and Context

When using AzureAIProjectAgentProvider with @tool decorated functions and a persistent AgentThread, subsequent turns re-submit tool call outputs from previous turns, causing Azure AI to reject requests with:

Error code: 400 - {'error': {'message': 'No tool call found for function call output with call_id call_XXXXX.', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}

Root Cause: The Azure AI Responses API uses a previous_response_id pattern for chaining responses where the server maintains the full conversation history. However, AgentThread.message_store accumulates ALL messages including function_result content from prior turns, and these were being re-submitted on each turn. Azure AI rejects function_call_output items without corresponding function_call in the SAME request.

Description

Modified AzureAIClient._prepare_messages_for_azure_ai() to intelligently filter messages when using response chaining:

The Fix:

  • Detects when previous_response_id is being used (conversation_id starts with "resp_")
  • Finds the last assistant message in the conversation history
  • Only includes messages that come AFTER the last assistant message (these are the "new" messages from the current turn)
  • Filters out function_result messages even from new messages
  • Skips old function results and assistant messages (already in server-side history)

Backward Compatibility:

  • Without previous_response_id (first turn or conv_* IDs): All messages are sent as before
  • With previous_response_id (resp_* IDs): Only messages after the last assistant message are sent
  • Does not affect V1 Agents API (AzureAIAgentClient) which uses threads

Changes Made:

  1. python/packages/azure-ai/agent_framework_azure_ai/_client.py

    • Modified _prepare_options() to pass options to _prepare_messages_for_azure_ai()
    • Enhanced _prepare_messages_for_azure_ai() with intelligent message filtering logic that finds the last assistant message and only includes subsequent messages when using response chaining
  2. python/packages/azure-ai/tests/test_azure_ai_client.py

    • Fixed existing tests to pass new parameters
    • Added test_prepare_messages_filters_old_function_results_with_previous_response_id()
    • Added test_prepare_messages_includes_all_without_previous_response_id()
  3. python/packages/azure-ai/tests/test_multi_turn_function_tools.py

    • Added comprehensive multi-turn test scenarios
    • Fixed test structure to properly simulate service-managed threads

Impact:
This fix enables multi-turn conversations with @tool functions in AzureAIProjectAgentProvider, resolving a core use case blocker for the framework.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible (246 passed, 43 skipped in azure-ai package)
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.
Original prompt

This section details on the original issue you should resolve

<issue_title>Python: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions</issue_title>
<issue_description>## Description

When using AzureAIProjectAgentProvider with @tool decorated functions and a persistent AgentThread, subsequent turns re-submit tool call outputs from previous turns, causing Azure AI to reject the request with:

Error code: 400 - {'error': {'message': 'No tool call found for function call output with call_id call_XXXXX.', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}

Environment

  • agent-framework: 1.0.0b251204 (and later)
  • agent-framework-azure-ai: latest
  • Python: 3.12
  • OS: Linux (WSL2)

Minimal Reproduction

import asyncio
from typing import Annotated
from agent_framework import tool
from agent_framework.azure import AzureAIProjectAgentProvider
from azure.identity.aio import AzureCliCredential
from pydantic import Field


@tool(approval_mode="never_require")
def calculate_tip(
    bill_amount: Annotated[float, Field(description="Bill amount in dollars")],
    tip_percent: Annotated[float, Field(description="Tip percentage")],
) -> str:
    """Calculate tip amount for a bill."""
    tip = bill_amount * (tip_percent / 100)
    return f"Tip: ${tip:.2f}, Total: ${bill_amount + tip:.2f}"


async def main():
    async with (
        AzureCliCredential() as credential,
        AzureAIProjectAgentProvider(credential=credential) as provider,
    ):
        agent = await provider.create_agent(
            name="tip-calculator",
            instructions="Use the calculate_tip tool to help with calculations.",
            tools=[calculate_tip],
        )

        # Single thread for multi-turn (BUG TRIGGER)
        thread = agent.get_new_thread()

        # Turn 1: Works fine
        result1 = await agent.run("Calculate 15% tip on an $85 bill", thread=thread)
        print(f"Turn 1: {result1.text}")

        # Turn 2: FAILS - re-submits tool output from Turn 1
        result2 = await agent.run("Now calculate 20% tip on the same $85 bill", thread=thread)
        print(f"Turn 2: {result2.text}")


asyncio.run(main())

Expected Behavior

Multi-turn conversations with @tool functions should work on the same AgentThread. Turn 2 should only submit the new user message, not re-submit tool outputs from Turn 1.

Actual Behavior

Turn 1 succeeds, but Turn 2 fails immediately with:

Error code: 400 - {'error': {'message': 'No tool call found for function call output with call_id call_XXXXX.', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}

The error indicates that AgentThread accumulates function_call_output items from prior turns and re-submits them on subsequent turns. Azure AI rejects these because there's no corresponding function_call in the current request.

Analysis

The bug appears to be in how AgentThread manages conversation state when function tools are used:

  1. Turn 1: User message → Agent calls tool → Tool output submitted → Agent responds ✅
  2. Turn 2: User message + stale tool output from Turn 1 → Azure rejects (no matching call_id) ❌

The AgentThread should either:

  • Clear tool outputs after each turn completes
  • Not accumulate tool outputs in the thread state
  • Filter out stale tool outputs before submitting to Azure

Observed Patterns

Scenario Result
Hosted tools (CodeInterpreter, FileSearch, WebSearch) ✅ Works - multi-turn stable
Function tools (@tool) on same thread ❌ Fails on turn 2+
Function tools with fresh thread per turn ✅ Works - but loses context
SDK-level previous_response_id pattern ✅ Works - canonical solution

Workarounds

1. Use fresh thread per turn (loses context)

for message in messages:
    thread = agent.get_new_thread()  # Fresh thread each time
    result = await agent.run(message, thread=thread)

2. Use SDK-level previous_response_id pattern

# Bypass AgentThread, use azure-ai-projects SDK directly
response = await openai_client.responses.create(
    input=message,
    previous_response_id=last_response_id,
    extra_body={"agent": {"name": agent.name, "type": "agent_reference"}},
)
# Process tool calls, chain via response.id

3. Use hosted tools instead of function tools

Hosted tools (CodeInterpreter, FileSearch, WebSearch) work correctly with multi-turn on the same thread.

Impact

This bug makes @tool functions unusable for multi-turn conversations with AzureAIProjectAgentProvider, which is a core use case for the framework.

Related Issues

<agent_instructions>Fix the ...


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 4 commits February 6, 2026 11:16
When using Azure AI Responses API with previous_response_id (response chaining),
the server maintains the full conversation history. We should not re-submit old
function results and assistant messages as they're already part of the server-side
history. This fixes the bug where AgentThread was re-submitting tool outputs from
previous turns, causing Azure AI to reject requests with error:
"No tool call found for function call output with call_id".

Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Added unit tests to verify that when using previous_response_id (response chaining),
old function results and assistant messages are filtered out and only new user messages
are sent. Also updated existing tests to pass required parameters.

Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Simplified the filtering condition to only check for function_result content
since function_call content should never appear in user messages.

Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix AgentThread re-submitting tool outputs on subsequent turns Fix: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions Feb 6, 2026
@github-actions github-actions bot changed the title Fix: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions Python: Fix: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions Feb 6, 2026
Applied ruff formatting as requested in code review to ensure consistent
code style across the changes.

Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 6, 2026

@copilot Run python -m pip install pre-commit to fix formatting of the changes

Done. Applied ruff formatting in commit f8e6a70.

@markwallace-microsoft
Copy link
Member

@copilot Run the unit tests (Run uv run poe all-tests) and fix any failures.

- Updated _prepare_messages_for_azure_ai to find last assistant message and only
  include messages after it when using response chaining
- Fixed test_multi_turn_with_previous_response_id_filters_old_messages to avoid
  message_store/service_thread_id conflict
- All azure-ai package tests now pass (246 passed, 43 skipped)

Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 6, 2026

@copilot Run the unit tests (Run uv run poe all-tests) and fix any failures.

Done. Fixed test failures in commit 71e1810. All azure-ai package tests now pass (246 passed, 43 skipped).

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • astral.sh
    • Triggering command: /usr/bin/curl curl -LsSf REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@markwallace-microsoft
Copy link
Member

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/azure-ai/agent_framework_azure_ai
   _client.py208796%360, 362, 441, 444, 552, 554, 630
TOTAL16596201887% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
3916 225 💤 0 ❌ 0 🔥 1m 9s ⏱️

@markwallace-microsoft markwallace-microsoft marked this pull request as ready for review February 6, 2026 15:54
Copilot AI review requested due to automatic review settings February 6, 2026 15:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the Azure AI integration where multi-turn conversations with @tool decorated functions would re-submit old function call outputs from previous turns, causing Azure AI to reject requests with a 400 error. The root cause was that AgentThread.message_store accumulates all messages including function results, and these were being re-submitted on each turn when using the Azure AI Responses API's previous_response_id pattern.

Changes:

  • Modified _prepare_messages_for_azure_ai() to intelligently filter messages when using response chaining (conversation_id starting with "resp_"), only including messages after the last assistant message
  • Updated the method signature to accept options and **kwargs parameters for response chaining detection
  • Added comprehensive test coverage for the message filtering logic in both unit and integration tests

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
python/packages/azure-ai/agent_framework_azure_ai/_client.py Enhanced _prepare_messages_for_azure_ai() with response chaining detection and message filtering logic; updated _prepare_options() to pass options to the message preparation method
python/packages/azure-ai/tests/test_azure_ai_client.py Updated existing tests to pass new parameters to _prepare_messages_for_azure_ai(); added two new tests verifying filtering behavior with and without previous_response_id
python/packages/azure-ai/tests/test_multi_turn_function_tools.py New test file with comprehensive multi-turn scenarios testing the fix for function tool re-submission issue

Comment on lines +124 to +126
# Count function_call_output items in turn 2
sum(1 for item in turn2_input if isinstance(item, dict) and item.get("type") == "function_call_output")

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of this sum() operation is not being assigned to a variable or used in any assertion. This line appears to be dead code. Either remove this line or assign the result to a variable if it was meant to be used for verification.

Suggested change
# Count function_call_output items in turn 2
sum(1 for item in turn2_input if isinstance(item, dict) and item.get("type") == "function_call_output")

Copilot uses AI. Check for mistakes.
Comment on lines +525 to +527
# Also filter out function result messages
has_function_result = any(content.type == "function_result" for content in message.contents)
if not has_function_result:
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a message contains both function_result content AND other content types (e.g., text), the entire message is filtered out if any content is a function_result. This could lead to loss of non-function-result content. Consider filtering at the content level rather than the message level, or creating a new message with only the non-function-result content items.

Suggested change
# Also filter out function result messages
has_function_result = any(content.type == "function_result" for content in message.contents)
if not has_function_result:
# Filter out function result contents while preserving any other content types.
has_function_result = any(content.type == "function_result" for content in message.contents)
if has_function_result:
non_function_contents = [
content for content in message.contents if content.type != "function_result"
]
# If there are non-function_result contents, include them; otherwise skip this message.
if non_function_contents:
# Preserve the original message object when possible to avoid losing metadata.
if len(non_function_contents) == len(message.contents):
result.append(message)
else:
result.append(
ChatMessage(
role=message.role,
contents=non_function_contents,
)
)
else:

Copilot uses AI. Check for mistakes.
Comment on lines +517 to 531
for idx, message in enumerate(messages):
if message.role in ["system", "developer"]:
for text_content in [content for content in message.contents if content.type == "text"]:
instructions_list.append(text_content.text) # type: ignore[arg-type]
elif use_response_chaining:
# When using response chaining, only include messages after the last assistant message
# These are the "new" messages from the current turn
if idx > last_assistant_idx:
# Also filter out function result messages
has_function_result = any(content.type == "function_result" for content in message.contents)
if not has_function_result:
result.append(message)
# Skip all messages at or before the last assistant message (already in server history)
else:
result.append(message)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for system/developer messages doesn't respect the response chaining behavior correctly. If a system/developer message appears after the last assistant message when using response chaining, it will still be extracted to instructions (lines 518-520) and won't be included in the result list. However, the elif condition at line 521 will never be reached for these messages. This means system/developer messages after the last assistant message will be extracted to instructions but not appear in the result, which may not be the intended behavior. Consider whether system/developer messages should be handled differently in the response chaining scenario.

Copilot uses AI. Check for mistakes.
Comment on lines +331 to +406
async def test_prepare_messages_filters_old_function_results_with_previous_response_id(
mock_project_client: MagicMock,
) -> None:
"""Test _prepare_messages_for_azure_ai filters old function results when using previous_response_id."""
client = create_test_azure_ai_client(mock_project_client)

# Simulate a multi-turn conversation with function calls
messages = [
# Turn 1 - user asks a question
ChatMessage(role="user", contents=[Content.from_text(text="Calculate 15% tip on $85")]),
# Turn 1 - assistant makes a function call
ChatMessage(
role="assistant",
contents=[
Content.from_function_call(
call_id="call_123", name="calculate_tip", arguments='{"bill_amount": 85, "tip_percent": 15}'
)
],
),
# Turn 1 - function result
ChatMessage(
role="user",
contents=[Content.from_function_result(call_id="call_123", result="Tip: $12.75, Total: $97.75")],
),
# Turn 1 - assistant responds with text
ChatMessage(role="assistant", contents=[Content.from_text(text="The tip is $12.75")]),
# Turn 2 - NEW user message
ChatMessage(role="user", contents=[Content.from_text(text="Now calculate 20% tip on $85")]),
]

# Test WITH previous_response_id (should filter to only new user message)
options = {"conversation_id": "resp_turn1"}
result_messages, instructions = client._prepare_messages_for_azure_ai(messages, options) # type: ignore

# Should only have the NEW user message from turn 2
assert len(result_messages) == 1
assert result_messages[0].role == "user"
assert any(c.type == "text" for c in result_messages[0].contents)
# Should not have function results
assert not any(c.type == "function_result" for c in result_messages[0].contents)
assert instructions is None


async def test_prepare_messages_includes_all_without_previous_response_id(
mock_project_client: MagicMock,
) -> None:
"""Test _prepare_messages_for_azure_ai includes all messages without previous_response_id."""
client = create_test_azure_ai_client(mock_project_client)

# Same messages as previous test
messages = [
ChatMessage(role="user", contents=[Content.from_text(text="Calculate 15% tip on $85")]),
ChatMessage(
role="assistant",
contents=[
Content.from_function_call(
call_id="call_123", name="calculate_tip", arguments='{"bill_amount": 85, "tip_percent": 15}'
)
],
),
ChatMessage(
role="user",
contents=[Content.from_function_result(call_id="call_123", result="Tip: $12.75, Total: $97.75")],
),
ChatMessage(role="assistant", contents=[Content.from_text(text="The tip is $12.75")]),
ChatMessage(role="user", contents=[Content.from_text(text="Now calculate 20% tip on $85")]),
]

# Test WITHOUT previous_response_id (should include all messages)
options: dict[str, Any] = {}
result_messages, instructions = client._prepare_messages_for_azure_ai(messages, options) # type: ignore

# Should have all non-system messages (5 in this case)
assert len(result_messages) == 5
assert instructions is None

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage for the new filtering logic should include edge cases such as: (1) messages with mixed content types where some contents are function_result and others are not, (2) system/developer messages appearing after the last assistant message when using response chaining, and (3) scenarios where there is no assistant message in the history but response chaining is active. Consider adding test cases for these scenarios to ensure the filtering logic handles them correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +528
# When using response chaining, find the index of the last assistant message
# Messages after that are "new" and should be included
last_assistant_idx = -1
if use_response_chaining:
for i in range(len(messages) - 1, -1, -1):
if messages[i].role == "assistant":
last_assistant_idx = i
break

# System/developer messages are turned into instructions, since there is no such message roles in Azure AI.
for message in messages:
for idx, message in enumerate(messages):
if message.role in ["system", "developer"]:
for text_content in [content for content in message.contents if content.type == "text"]:
instructions_list.append(text_content.text) # type: ignore[arg-type]
elif use_response_chaining:
# When using response chaining, only include messages after the last assistant message
# These are the "new" messages from the current turn
if idx > last_assistant_idx:
# Also filter out function result messages
has_function_result = any(content.type == "function_result" for content in message.contents)
if not has_function_result:
result.append(message)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using response chaining (conversation_id starts with "resp_") but there is no assistant message in the history (last_assistant_idx remains -1), the condition at line 524 (idx > last_assistant_idx) will be true for ALL messages (since all indices are >= 0). This means all non-system/developer messages after filtering will be included, which may not be the intended behavior. Consider handling the case where no assistant message exists in the conversation history when response chaining is active.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions

2 participants