Python: Fix: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions#3722
Python: Fix: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions#3722
Conversation
When using Azure AI Responses API with previous_response_id (response chaining), the server maintains the full conversation history. We should not re-submit old function results and assistant messages as they're already part of the server-side history. This fixes the bug where AgentThread was re-submitting tool outputs from previous turns, causing Azure AI to reject requests with error: "No tool call found for function call output with call_id". Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Added unit tests to verify that when using previous_response_id (response chaining), old function results and assistant messages are filtered out and only new user messages are sent. Also updated existing tests to pass required parameters. Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Simplified the filtering condition to only check for function_result content since function_call content should never appear in user messages. Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Applied ruff formatting as requested in code review to ensure consistent code style across the changes. Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
|
@copilot Run the unit tests (Run |
- Updated _prepare_messages_for_azure_ai to find last assistant message and only include messages after it when using response chaining - Fixed test_multi_turn_with_previous_response_id_filters_old_messages to avoid message_store/service_thread_id conflict - All azure-ai package tests now pass (246 passed, 43 skipped) Co-authored-by: markwallace-microsoft <127216156+markwallace-microsoft@users.noreply.github.com>
Done. Fixed test failures in commit 71e1810. All azure-ai package tests now pass (246 passed, 43 skipped). Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug in the Azure AI integration where multi-turn conversations with @tool decorated functions would re-submit old function call outputs from previous turns, causing Azure AI to reject requests with a 400 error. The root cause was that AgentThread.message_store accumulates all messages including function results, and these were being re-submitted on each turn when using the Azure AI Responses API's previous_response_id pattern.
Changes:
- Modified
_prepare_messages_for_azure_ai()to intelligently filter messages when using response chaining (conversation_id starting with "resp_"), only including messages after the last assistant message - Updated the method signature to accept
optionsand**kwargsparameters for response chaining detection - Added comprehensive test coverage for the message filtering logic in both unit and integration tests
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| python/packages/azure-ai/agent_framework_azure_ai/_client.py | Enhanced _prepare_messages_for_azure_ai() with response chaining detection and message filtering logic; updated _prepare_options() to pass options to the message preparation method |
| python/packages/azure-ai/tests/test_azure_ai_client.py | Updated existing tests to pass new parameters to _prepare_messages_for_azure_ai(); added two new tests verifying filtering behavior with and without previous_response_id |
| python/packages/azure-ai/tests/test_multi_turn_function_tools.py | New test file with comprehensive multi-turn scenarios testing the fix for function tool re-submission issue |
| # Count function_call_output items in turn 2 | ||
| sum(1 for item in turn2_input if isinstance(item, dict) and item.get("type") == "function_call_output") | ||
|
|
There was a problem hiding this comment.
The result of this sum() operation is not being assigned to a variable or used in any assertion. This line appears to be dead code. Either remove this line or assign the result to a variable if it was meant to be used for verification.
| # Count function_call_output items in turn 2 | |
| sum(1 for item in turn2_input if isinstance(item, dict) and item.get("type") == "function_call_output") |
| # Also filter out function result messages | ||
| has_function_result = any(content.type == "function_result" for content in message.contents) | ||
| if not has_function_result: |
There was a problem hiding this comment.
When a message contains both function_result content AND other content types (e.g., text), the entire message is filtered out if any content is a function_result. This could lead to loss of non-function-result content. Consider filtering at the content level rather than the message level, or creating a new message with only the non-function-result content items.
| # Also filter out function result messages | |
| has_function_result = any(content.type == "function_result" for content in message.contents) | |
| if not has_function_result: | |
| # Filter out function result contents while preserving any other content types. | |
| has_function_result = any(content.type == "function_result" for content in message.contents) | |
| if has_function_result: | |
| non_function_contents = [ | |
| content for content in message.contents if content.type != "function_result" | |
| ] | |
| # If there are non-function_result contents, include them; otherwise skip this message. | |
| if non_function_contents: | |
| # Preserve the original message object when possible to avoid losing metadata. | |
| if len(non_function_contents) == len(message.contents): | |
| result.append(message) | |
| else: | |
| result.append( | |
| ChatMessage( | |
| role=message.role, | |
| contents=non_function_contents, | |
| ) | |
| ) | |
| else: |
| for idx, message in enumerate(messages): | ||
| if message.role in ["system", "developer"]: | ||
| for text_content in [content for content in message.contents if content.type == "text"]: | ||
| instructions_list.append(text_content.text) # type: ignore[arg-type] | ||
| elif use_response_chaining: | ||
| # When using response chaining, only include messages after the last assistant message | ||
| # These are the "new" messages from the current turn | ||
| if idx > last_assistant_idx: | ||
| # Also filter out function result messages | ||
| has_function_result = any(content.type == "function_result" for content in message.contents) | ||
| if not has_function_result: | ||
| result.append(message) | ||
| # Skip all messages at or before the last assistant message (already in server history) | ||
| else: | ||
| result.append(message) |
There was a problem hiding this comment.
The logic for system/developer messages doesn't respect the response chaining behavior correctly. If a system/developer message appears after the last assistant message when using response chaining, it will still be extracted to instructions (lines 518-520) and won't be included in the result list. However, the elif condition at line 521 will never be reached for these messages. This means system/developer messages after the last assistant message will be extracted to instructions but not appear in the result, which may not be the intended behavior. Consider whether system/developer messages should be handled differently in the response chaining scenario.
| async def test_prepare_messages_filters_old_function_results_with_previous_response_id( | ||
| mock_project_client: MagicMock, | ||
| ) -> None: | ||
| """Test _prepare_messages_for_azure_ai filters old function results when using previous_response_id.""" | ||
| client = create_test_azure_ai_client(mock_project_client) | ||
|
|
||
| # Simulate a multi-turn conversation with function calls | ||
| messages = [ | ||
| # Turn 1 - user asks a question | ||
| ChatMessage(role="user", contents=[Content.from_text(text="Calculate 15% tip on $85")]), | ||
| # Turn 1 - assistant makes a function call | ||
| ChatMessage( | ||
| role="assistant", | ||
| contents=[ | ||
| Content.from_function_call( | ||
| call_id="call_123", name="calculate_tip", arguments='{"bill_amount": 85, "tip_percent": 15}' | ||
| ) | ||
| ], | ||
| ), | ||
| # Turn 1 - function result | ||
| ChatMessage( | ||
| role="user", | ||
| contents=[Content.from_function_result(call_id="call_123", result="Tip: $12.75, Total: $97.75")], | ||
| ), | ||
| # Turn 1 - assistant responds with text | ||
| ChatMessage(role="assistant", contents=[Content.from_text(text="The tip is $12.75")]), | ||
| # Turn 2 - NEW user message | ||
| ChatMessage(role="user", contents=[Content.from_text(text="Now calculate 20% tip on $85")]), | ||
| ] | ||
|
|
||
| # Test WITH previous_response_id (should filter to only new user message) | ||
| options = {"conversation_id": "resp_turn1"} | ||
| result_messages, instructions = client._prepare_messages_for_azure_ai(messages, options) # type: ignore | ||
|
|
||
| # Should only have the NEW user message from turn 2 | ||
| assert len(result_messages) == 1 | ||
| assert result_messages[0].role == "user" | ||
| assert any(c.type == "text" for c in result_messages[0].contents) | ||
| # Should not have function results | ||
| assert not any(c.type == "function_result" for c in result_messages[0].contents) | ||
| assert instructions is None | ||
|
|
||
|
|
||
| async def test_prepare_messages_includes_all_without_previous_response_id( | ||
| mock_project_client: MagicMock, | ||
| ) -> None: | ||
| """Test _prepare_messages_for_azure_ai includes all messages without previous_response_id.""" | ||
| client = create_test_azure_ai_client(mock_project_client) | ||
|
|
||
| # Same messages as previous test | ||
| messages = [ | ||
| ChatMessage(role="user", contents=[Content.from_text(text="Calculate 15% tip on $85")]), | ||
| ChatMessage( | ||
| role="assistant", | ||
| contents=[ | ||
| Content.from_function_call( | ||
| call_id="call_123", name="calculate_tip", arguments='{"bill_amount": 85, "tip_percent": 15}' | ||
| ) | ||
| ], | ||
| ), | ||
| ChatMessage( | ||
| role="user", | ||
| contents=[Content.from_function_result(call_id="call_123", result="Tip: $12.75, Total: $97.75")], | ||
| ), | ||
| ChatMessage(role="assistant", contents=[Content.from_text(text="The tip is $12.75")]), | ||
| ChatMessage(role="user", contents=[Content.from_text(text="Now calculate 20% tip on $85")]), | ||
| ] | ||
|
|
||
| # Test WITHOUT previous_response_id (should include all messages) | ||
| options: dict[str, Any] = {} | ||
| result_messages, instructions = client._prepare_messages_for_azure_ai(messages, options) # type: ignore | ||
|
|
||
| # Should have all non-system messages (5 in this case) | ||
| assert len(result_messages) == 5 | ||
| assert instructions is None | ||
|
|
There was a problem hiding this comment.
The test coverage for the new filtering logic should include edge cases such as: (1) messages with mixed content types where some contents are function_result and others are not, (2) system/developer messages appearing after the last assistant message when using response chaining, and (3) scenarios where there is no assistant message in the history but response chaining is active. Consider adding test cases for these scenarios to ensure the filtering logic handles them correctly.
| # When using response chaining, find the index of the last assistant message | ||
| # Messages after that are "new" and should be included | ||
| last_assistant_idx = -1 | ||
| if use_response_chaining: | ||
| for i in range(len(messages) - 1, -1, -1): | ||
| if messages[i].role == "assistant": | ||
| last_assistant_idx = i | ||
| break | ||
|
|
||
| # System/developer messages are turned into instructions, since there is no such message roles in Azure AI. | ||
| for message in messages: | ||
| for idx, message in enumerate(messages): | ||
| if message.role in ["system", "developer"]: | ||
| for text_content in [content for content in message.contents if content.type == "text"]: | ||
| instructions_list.append(text_content.text) # type: ignore[arg-type] | ||
| elif use_response_chaining: | ||
| # When using response chaining, only include messages after the last assistant message | ||
| # These are the "new" messages from the current turn | ||
| if idx > last_assistant_idx: | ||
| # Also filter out function result messages | ||
| has_function_result = any(content.type == "function_result" for content in message.contents) | ||
| if not has_function_result: | ||
| result.append(message) |
There was a problem hiding this comment.
When using response chaining (conversation_id starts with "resp_") but there is no assistant message in the history (last_assistant_idx remains -1), the condition at line 524 (idx > last_assistant_idx) will be true for ALL messages (since all indices are >= 0). This means all non-system/developer messages after filtering will be included, which may not be the intended behavior. Consider handling the case where no assistant message exists in the conversation history when response chaining is active.
Motivation and Context
When using
AzureAIProjectAgentProviderwith@tooldecorated functions and a persistentAgentThread, subsequent turns re-submit tool call outputs from previous turns, causing Azure AI to reject requests with:Root Cause: The Azure AI Responses API uses a
previous_response_idpattern for chaining responses where the server maintains the full conversation history. However,AgentThread.message_storeaccumulates ALL messages includingfunction_resultcontent from prior turns, and these were being re-submitted on each turn. Azure AI rejectsfunction_call_outputitems without correspondingfunction_callin the SAME request.Description
Modified
AzureAIClient._prepare_messages_for_azure_ai()to intelligently filter messages when using response chaining:The Fix:
previous_response_idis being used (conversation_id starts with "resp_")function_resultmessages even from new messagesBackward Compatibility:
previous_response_id(first turn or conv_* IDs): All messages are sent as beforeprevious_response_id(resp_* IDs): Only messages after the last assistant message are sentChanges Made:
python/packages/azure-ai/agent_framework_azure_ai/_client.py
_prepare_options()to pass options to_prepare_messages_for_azure_ai()_prepare_messages_for_azure_ai()with intelligent message filtering logic that finds the last assistant message and only includes subsequent messages when using response chainingpython/packages/azure-ai/tests/test_azure_ai_client.py
test_prepare_messages_filters_old_function_results_with_previous_response_id()test_prepare_messages_includes_all_without_previous_response_id()python/packages/azure-ai/tests/test_multi_turn_function_tools.py
Impact:
This fix enables multi-turn conversations with
@toolfunctions inAzureAIProjectAgentProvider, resolving a core use case blocker for the framework.Contribution Checklist
Original prompt
This section details on the original issue you should resolve
<issue_title>Python: AgentThread re-submits prior tool outputs on subsequent turns with @tool functions</issue_title>
<issue_description>## Description
When using
AzureAIProjectAgentProviderwith@tooldecorated functions and a persistentAgentThread, subsequent turns re-submit tool call outputs from previous turns, causing Azure AI to reject the request with:Environment
Minimal Reproduction
Expected Behavior
Multi-turn conversations with
@toolfunctions should work on the sameAgentThread. Turn 2 should only submit the new user message, not re-submit tool outputs from Turn 1.Actual Behavior
Turn 1 succeeds, but Turn 2 fails immediately with:
The error indicates that
AgentThreadaccumulatesfunction_call_outputitems from prior turns and re-submits them on subsequent turns. Azure AI rejects these because there's no correspondingfunction_callin the current request.Analysis
The bug appears to be in how
AgentThreadmanages conversation state when function tools are used:The
AgentThreadshould either:Observed Patterns
@tool) on same threadprevious_response_idpatternWorkarounds
1. Use fresh thread per turn (loses context)
2. Use SDK-level
previous_response_idpattern3. Use hosted tools instead of function tools
Hosted tools (CodeInterpreter, FileSearch, WebSearch) work correctly with multi-turn on the same thread.
Impact
This bug makes
@toolfunctions unusable for multi-turn conversations withAzureAIProjectAgentProvider, which is a core use case for the framework.Related Issues
</issue_description>
<agent_instructions>Fix the ...
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.