Skip to content

fix: preserve tool call/response pairing in forceCompression#871

Open
QuietyAwe wants to merge 2 commits intosipeed:mainfrom
QuietyAwe:fix/force-compression-tool-pairing
Open

fix: preserve tool call/response pairing in forceCompression#871
QuietyAwe wants to merge 2 commits intosipeed:mainfrom
QuietyAwe:fix/force-compression-tool-pairing

Conversation

@QuietyAwe
Copy link

Problem

The forceCompression function was cutting messages from the middle without considering tool call/response pairs, causing orphaned tool role messages that trigger API 400 errors:

"messages with role 'tool' must be a response to a preceeding message with 'tool_calls'"

Root Cause

When context limit is hit, forceCompression would simply cut at mid = len(conversation) / 2, which could break a tool call/response pair:

Before compression:

[0] user: "What's the weather?"
[1] assistant: [tool_calls: get_weather]
[2] tool: "Sunny, 25°C"
[3] assistant: "It's sunny..."
[4] user: "Thanks!"

After compression (mid=2, keeps [2], [3], [4]):

[0] system: "..."
[1] tool: "Sunny, 25°C"      // ❌ Orphaned! No preceding assistant with tool_calls
[2] assistant: "It's sunny..."
[3] user: "Thanks!"

Solution

  1. findSafeCutPoint() - Finds a safe cut point after a user message, ensuring tool call/response pairs are kept together. User messages are always safe to cut after because they don't have tool call dependencies.

  2. removeOrphanedToolMessages() - Safety net that removes any orphaned tool messages at the start of the kept conversation (edge case handling).

Testing

Added comprehensive unit tests:

  • TestFindSafeCutPoint - Tests various scenarios for finding safe cut points
  • TestRemoveOrphanedToolMessages - Tests orphaned message removal
  • TestForceCompressionPreservesToolPairs - Integration test for tool pair preservation

All tests pass ✅

Fixes #870

The forceCompression function was cutting messages from the middle without
considering tool call/response pairs, causing orphaned 'tool' role messages
that trigger API 400 errors:

  "messages with role 'tool' must be a response to a preceeding message
   with 'tool_calls'"

Changes:
- Add findSafeCutPoint() to find a safe cut point after a user message,
  which ensures tool call/response pairs are kept together
- Add removeOrphanedToolMessages() as a safety net to remove any orphaned
  tool messages at the start of the kept conversation
- Add comprehensive unit tests for both helper functions

Fixes sipeed#870
Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good approach with findSafeCutPoint — finding a user message boundary is a clean prevention strategy. A few observations:

  1. Coverage gap: This fix only covers forceCompression() but the same bug exists in summarizeSession() via TruncateHistory(key, 4). PR #665 covers both paths.

  2. Fallback to mid: findSafeCutPoint falls back to mid when no user message is found, which reintroduces the original bug. Edge case: conversations with very few user messages (e.g., one long user prompt followed by many tool calls).

  3. Orphaned assistant messages: removeOrphanedToolMessages strips leading tool role messages but does not handle orphaned assistant messages with ToolCalls at the end of the kept conversation. These would cause the provider to expect tool results that were cut away.

  4. PR #730: Already merged, addresses the same class of bug in sanitizeHistoryForProvider(). Consider whether this PR and #665 are still needed as additional layers of defense.

  5. Issue #870 vs #475: Issue #870 appears to be a duplicate of #475 (filed 2026-02-19), which is the issue our PR #665 closes.

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix for a real production issue. The root cause is well-explained: forceCompression was naively cutting at mid = len/2 without considering tool call/response pairing, which would leave orphaned tool role messages that trigger API 400 errors.

The approach is sound:

  1. findSafeCutPoint() searches for a user message boundary (always safe since user messages have no tool call dependencies)
  2. removeOrphanedToolMessages() acts as a safety net for edge cases

Code review notes:

Tests are thorough -- unit tests for findSafeCutPoint, removeOrphanedToolMessages, and an integration test for the full scenario. The test cases cover forward search, backward search, fallback, and tool pair preservation.

One edge case to consider (non-blocking): if findSafeCutPoint falls back to mid (no user messages found), the cut still happens at a potentially unsafe location, and removeOrphanedToolMessages only strips leading tool messages. If the cut happens in the middle of a multi-tool-call sequence (assistant with multiple tool_calls followed by multiple tool responses), some tool responses could survive while their corresponding tool_calls are removed. In practice this is extremely unlikely since conversations almost always have user messages, so the fallback path should never trigger in real usage.

Nit: test file is missing a trailing newline.

LGTM. This prevents a class of 400 errors that are otherwise very confusing to debug.

@QuietyAwe
Copy link
Author

Thanks for the thorough review @nikolasdehor! Great points. Let me address each:

1. Coverage gap & PR #665:
You're right that summarizeSession() has the same issue via TruncateHistory(). I'll review #665 to see if it covers both paths comprehensively. If so, this PR can serve as an additional defense layer focused specifically on forceCompression().

2. Fallback to mid:
Agreed this is a theoretical risk. In practice, conversations almost always have user messages, so the fallback should rarely trigger. However, I can add an additional check to ensure the fallback cut point doesn't orphan tool pairs — perhaps by extending findSafeCutPoint to also consider cutting after the last complete tool call/response sequence.

3. Orphaned assistant messages with ToolCalls:
Good catch! If we cut away tool responses but keep an assistant message with ToolCalls, the provider will expect results that don't exist. I'll add removeOrphanedAssistantWithToolCalls() to handle this edge case.

4. PR #730:
Good to know #730 already addresses this in sanitizeHistoryForProvider(). These fixes can coexist as defense-in-depth — #730 sanitizes before provider calls, while this PR prevents bad cuts at the source.

5. Issue #870 vs #475:
Thanks for pointing this out. I'll reference #475 in the PR description and close #870 as a duplicate.

6. Nit (trailing newline):
Will fix in the next commit.


I'll push updates soon to address #2, #3, and #6. Let me know if you have additional suggestions!

- Enhanced findSafeCutPoint to handle fallback cases by finding safe
  cut points after complete tool sequences when no user message exists
- Added removeOrphanedAssistantWithToolCalls to handle orphaned
  assistant messages with ToolCalls at the end of kept conversation
- Added comprehensive tests for new edge cases
- All tests pass

Addresses review feedback from @nikolasdehor on PR sipeed#871
@QuietyAwe
Copy link
Author

Updates pushed ✅

I've addressed the review feedback:

1. Improved findSafeCutPoint fallback handling

When no user message is found, the function now searches for a safe cut point after complete tool sequences instead of naively falling back to mid. It finds:

  • Assistant messages without tool_calls (safe cut points)
  • Complete tool_call → tool_result sequences (cuts after all results)

2. Added removeOrphanedAssistantWithToolCalls

This new function handles orphaned assistant messages with ToolCalls at the end of the kept conversation. It:

  • Detects assistant messages whose tool_calls don't have all their results
  • Strips ToolCalls and keeps text content if present
  • Removes entirely if no text content

3. Fixed test file trailing newline ✅

Test Results

All tests pass including new edge case tests:

  • TestFindSafeCutPoint_FallbackToToolSequence
  • TestRemoveOrphanedAssistantWithToolCalls (4 scenarios)

Note on PR #665: I see that #665 takes a complementary approach with sanitizeToolPairs() that can be applied after both forceCompression() and summarizeSession(). The approaches can coexist:

Both provide defense-in-depth against tool pair orphaning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] forceCompression breaks tool call/response pairing, causing 400 error

2 participants