Skip to content

Commit 77e4acf

Browse files
author
Mateusz
committed
feat(session): finalize B2BUA session handling and session continuity improvements
- Implement status-aware request deduplication to handle zombie retries and legitimate 429/503 retries. - Enhance intelligent session resolution with structural evidence requirements for topic similarity matching to prevent cross-session contamination. - Add comprehensive integration and unit tests for parallel agent isolation and status-aware deduplication. - Update documentation and configuration examples for session continuity and B2BUA handling.
1 parent 2b735fb commit 77e4acf

21 files changed

+3680
-2732
lines changed

.kiro/specs/b2bua-like-session-handling/design.md

Lines changed: 602 additions & 578 deletions
Large diffs are not rendered by default.

.kiro/specs/b2bua-like-session-handling/requirements.md

Lines changed: 245 additions & 242 deletions
Large diffs are not rendered by default.

.kiro/specs/b2bua-like-session-handling/tasks.md

Lines changed: 184 additions & 176 deletions
Large diffs are not rendered by default.

config/config.example.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,18 @@ sandboxing:
4949
session:
5050
cleanup_enabled: true
5151
cleanup_interval: 3600 # 1 hour
52+
53+
# Session continuity (proxy-side heuristic session reuse when clients do not send session ids)
54+
session_continuity:
55+
enabled: true
56+
fuzzy_matching: true
57+
max_session_age_seconds: 604800 # 7 days
58+
fingerprint_message_count: 5
59+
client_key_includes_ip: true
60+
# Topic similarity is inherently weaker and can increase risk of cross-session merges.
61+
# Disabled by default. Enable only if you understand the trade-offs.
62+
enable_topic_similarity_matching: false
63+
5264
max_age: 86400 # 1 day
5365
default_interactive_mode: true
5466
force_set_project: false
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# Zombie Request Deduplication Fix
2+
3+
## Problem: Phantom Requests After Client Shutdown
4+
5+
**Production Issue (2026-01-25):** After stopping OpenCode clients, the proxy continued processing new incoming HTTP requests with identical payloads. This indicated client-side retry logic continuing even after shutdown.
6+
7+
### Impact
8+
-**Cost waste:** Each zombie retry consumed backend API quota (87k+ tokens per request)
9+
-**Log pollution:** Made debugging difficult
10+
-**False metrics:** Inflated usage statistics
11+
-**Session confusion:** Interleaved with legitimate requests
12+
13+
## Root Cause
14+
15+
The proxy **was working correctly** - it processed all incoming HTTP requests as expected. The issue was:
16+
17+
1. **Streaming bypass:** Deduplication was disabled for all streaming requests
18+
2. **Client bugs:** OpenCode's retry logic didn't clear when stopped
19+
3. **No status tracking:** Couldn't distinguish zombie retries from legitimate 429 retries
20+
21+
## Solution: Status-Aware Deduplication
22+
23+
Enhanced `RequestDeduplicationService` to track request completion status and make intelligent duplicate decisions:
24+
25+
### Deduplication Matrix
26+
27+
| Original Status | Duplicate Arrives | Behavior | Reason |
28+
|----------------|-------------------|----------|---------|
29+
| **IN_FLIGHT** | Any time | ❌ BLOCKED | True parallel duplicate |
30+
| **SUCCESS (200)** | Within window | ❌ BLOCKED | Zombie retry after success |
31+
| **RETRIABLE_ERROR (429, 503, 502, 504, 408)** | **ANY TIME** |**ALLOWED** | **Legitimate retry** |
32+
| **CLIENT_DISCONNECT** | Within window | ❌ BLOCKED | Zombie retry after disconnect |
33+
| Any status | After window expires | ✅ ALLOWED | Expired, treat as new |
34+
35+
### Critical Guarantee
36+
37+
**Retries after 429/503 errors are NEVER blocked, regardless of timing.**
38+
39+
This ensures the fix doesn't interfere with legitimate retry workflows while preventing zombie request waste.
40+
41+
## Implementation Changes
42+
43+
### 1. Enhanced `RequestDeduplicationService`
44+
45+
```python
46+
@dataclass
47+
class TrackedRequest:
48+
timestamp: float
49+
status: RequestStatus # IN_FLIGHT, SUCCESS, RETRIABLE_ERROR, CLIENT_DISCONNECT
50+
status_code: int | None = None
51+
52+
async def check_and_register(request, session_id):
53+
# CRITICAL: Always allow retries after retriable errors
54+
if tracked.status == RequestStatus.RETRIABLE_ERROR:
55+
return (False, hash) # Not a duplicate
56+
57+
# Block duplicates of in-flight, success, or disconnected requests
58+
if age < window and tracked.status in (IN_FLIGHT, SUCCESS, CLIENT_DISCONNECT):
59+
return (True, hash) # Is a duplicate
60+
```
61+
62+
### 2. Updated `BackendRequestManager`
63+
64+
- Removed streaming bypass (now dedups all requests)
65+
- Calls `mark_request_complete()` with status code after request completes
66+
- Handles client disconnects (`asyncio.CancelledError`)
67+
- Preserves `x-llmproxy-no-dedup` header for opt-out
68+
69+
### 3. Enabled Streaming Deduplication
70+
71+
**Before:**
72+
```python
73+
if request.stream:
74+
return True # Bypass deduplication
75+
```
76+
77+
**After:**
78+
```python
79+
# Only bypass if explicitly requested via header
80+
if headers.get("x-llmproxy-no-dedup") == "true":
81+
return True
82+
return False # Apply deduplication
83+
```
84+
85+
## Configuration
86+
87+
No configuration changes required. The existing deduplication settings apply:
88+
89+
```yaml
90+
# Default values (configured via DI registration)
91+
deduplication:
92+
window_seconds: 3.0 # Block duplicates within 3 seconds
93+
enabled: true # Now applies to streaming too
94+
max_cache_size: 10000
95+
```
96+
97+
To opt out (for specific clients):
98+
```bash
99+
curl -H "x-llmproxy-no-dedup: true" ...
100+
```
101+
102+
## Test Coverage
103+
104+
### Unit Tests (22/22 passed)
105+
- ✅ `test_retry_after_429_always_allowed` - Never blocks 429 retries
106+
- ✅ `test_retry_after_503_allowed` - Allows service unavailable retries
107+
- ✅ `test_retry_after_success_blocked` - Blocks zombie retries after 200
108+
- ✅ `test_retry_after_client_disconnect_blocked` - Blocks zombie retries after disconnect
109+
- ✅ `test_parallel_duplicate_blocked` - Blocks true parallel duplicates
110+
- ✅ `test_multiple_retries_after_429_allowed` - Allows retry loops
111+
- ✅ `test_zombie_pattern_detection` - Reproduces production scenario
112+
113+
### Integration Tests (4/4 passed)
114+
- ✅ `test_streaming_dedup_enabled_for_streaming_requests` - Streaming now dedups
115+
- ✅ `test_streaming_dedup_bypass_via_header` - Opt-out still works
116+
- ✅ Backend request manager integration tests
117+
118+
## Backward Compatibility
119+
120+
- ✅ Legitimate 429 retries: **Unaffected** (always allowed)
121+
- ✅ Normal workflows: **Unaffected** (dedups only identical requests)
122+
- ✅ Opt-out header: **Still works** (`x-llmproxy-no-dedup: true`)
123+
- ⚠️ Breaking: Streaming requests now deduplicated (was bypassed before)
124+
125+
**Migration:** If clients rely on streaming bypass, add `x-llmproxy-no-dedup: true` header.
126+
127+
## Production Verification
128+
129+
The fix prevents the exact scenario from logs:
130+
131+
```
132+
# Before fix:
133+
01:53:37 - Client sends request (120 messages)
134+
01:53:41 - Client disconnects
135+
01:53:42 - Client sends SAME request again ← Zombie retry
136+
01:53:42 - Client disconnects
137+
01:53:42 - Client sends SAME request again ← Zombie retry
138+
...continues indefinitely
139+
140+
# After fix:
141+
01:53:37 - Client sends request (120 messages)
142+
01:53:41 - Client disconnects → marked as CLIENT_DISCONNECT
143+
01:53:42 - Client sends SAME request → BLOCKED (duplicate)
144+
```
145+
146+
## Monitoring
147+
148+
Check deduplication stats via diagnostics endpoint:
149+
150+
```python
151+
stats = dedup_service.get_stats()
152+
print(f"Retries after errors: {stats.extra['retries_after_error_allowed']}")
153+
print(f"Zombies blocked: {stats.duplicates_blocked}")
154+
```
155+
156+
High `duplicates_blocked` with low `retries_after_error_allowed` indicates zombie request patterns.

0 commit comments

Comments
 (0)