Skip to content

Commit d5e93b0

Browse files
author
Mateusz
committed
Update B2BUA spec phase and optimize test performance\n\n- Move B2BUA-like session handling spec to design-generated phase\n- Add design and research documentation for B2BUA feature\n- Optimize property tests by reducing example counts for better performance\n- Adjust regression test parameters for improved execution time
1 parent f0ba058 commit d5e93b0

File tree

9 files changed

+658
-67
lines changed

9 files changed

+658
-67
lines changed

.kiro/specs/b2bua-like-session-handling/design.md

Lines changed: 455 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# Research & Design Decisions
2+
3+
---
4+
**Purpose**: Capture discovery findings, architectural investigations, and rationale that inform the technical design.
5+
6+
**Project Context**: Universal LLM Proxy - FastAPI async, DI containers, staged initialization, adapter pattern.
7+
---
8+
9+
## Summary
10+
- **Feature**: `b2bua-like-session-handling`
11+
- **Discovery Scope**: Complex Integration (Extension)
12+
- **Key Findings**:
13+
- The codebase currently uses a single `session_id` across session state, backend execution, usage tracking, and wire capture; several paths treat client inputs as authoritative and some subsystems fall back to `request_id` as a “session-like” identifier.
14+
- Backend orchestration already supports multi-attempt failover, but has no first-class per-attempt identity; the same identifier is reused across attempts, which blocks correct A-leg→B-leg mapping.
15+
- Multi-user auth enforcement (SSO middleware) gates requests but does not attach a stable token identity to the request context; session continuity scoping therefore needs an explicit `auth_scope_id` derivation/injection strategy.
16+
17+
## Research Log
18+
19+
### Existing Codebase Analysis
20+
- **Components Reviewed**:
21+
- RequestContext + adapters: `src/core/domain/request_context.py`, `src/core/transport/fastapi/request_adapters.py`
22+
- Session resolution and continuity: `src/core/services/session_resolver_service.py`, `src/core/services/intelligent_session_resolver.py`, `src/request_middleware.py`
23+
- Session-scoped state: `src/core/domain/session.py`, `src/core/services/session_enricher.py`, `src/core/services/session_manager_service.py`
24+
- Backend orchestration and multi-attempt flows: `src/core/services/backend_completion_flow/service.py`, `src/core/services/backend_completion_flow/completion_session_resolver.py`
25+
- Connector dispatch: `src/core/services/connector_invoker.py`, connectors under `src/connectors/`
26+
- Wire capture and fallbacks: `src/core/services/cbor_wire_capture_service.py`, `src/core/transport/fastapi/adapters/capture/wire_capture_coordinator.py`, `src/core/services/stream_session_id_resolver.py`
27+
- Usage tracking persistence: `src/core/domain/usage_record.py`, `src/core/database/models/usage.py`
28+
- Auth gating: `src/core/app/middleware/sso_middleware_adapter.py`, `src/core/auth/sso/middleware.py`
29+
- **Patterns Identified**:
30+
- Staged init and DI seams are stable and support introducing new services behind flags.
31+
- Backend calls are centralized in `BackendCompletionFlow` which is the natural boundary for per-attempt (B-leg) identity allocation.
32+
- `RequestContext.extensions` exists as the explicitly allowed JSON-safe extension channel for cross-layer data, which can bridge identity propagation during migration.
33+
- **Implications**:
34+
- The design should introduce a dedicated identity/mapping service and thread A-leg and B-leg identities through existing DI seams rather than continuing to overload the single `session_id`.
35+
36+
### Session Identity Semantics Collisions
37+
- **Context**: Multiple subsystems assume `session_id` is both (1) the key for session-scoped state and (2) a backend-facing correlation identifier.
38+
- **Findings**:
39+
- `BackendProcessor` loads session state using `session_id` (`src/core/services/backend_processor.py`), implying the “session_id” on that boundary must be the A-leg identifier.
40+
- Backend completion uses a “session_id_for_backend” value for both per-session backend caches and backend-call kwargs (`src/core/services/backend_completion_flow/service.py`), which must be split into A-leg vs B-leg identifiers.
41+
- Some components fall back to `request_id` for “session-like” correlation (`src/core/services/stream_session_id_resolver.py`, `src/core/transport/fastapi/adapters/capture/wire_capture_coordinator.py`), which violates the session-vs-request separation requirement.
42+
- **Implications**:
43+
- The design must make A-leg identity the only key for session-scoped state and ensure B-leg identity is used only for outbound provider session/conversation correlation and per-attempt observability.
44+
45+
### Auth Scope Identity Availability
46+
- **Context**: Requirements scope continuity to (`auth_scope_id`, `client_session_id`) and default multi-user behavior is “resume only with the same bearer token”.
47+
- **Findings**:
48+
- SSO middleware validates tokens but does not propagate token identity or user identity into `RequestContext` / `request.state` (`src/core/app/middleware/sso_middleware_adapter.py`, `src/core/auth/sso/middleware.py`).
49+
- The SSO token database yields stable identifiers (`token_id`, `user_id`) once validated; those are suitable sources for `auth_scope_id` in multi-user mode (token-scoped by default).
50+
- In single-user localhost mode, there is no token; continuity needs an explicit “single implicit scope” representation.
51+
- **Implications**:
52+
- The design needs an explicit `IAuthScopeResolver` (or equivalent) and a transport integration point (middleware/controller adapter) to inject `auth_scope_id` into RequestContext in a reliable, testable way.
53+
54+
### Concurrency and Atomic B-leg Sequencing
55+
- **Context**: Requirement 2.5 demands atomic per-A-leg sequence allocation even under concurrency and multiple worker processes.
56+
- **Findings**:
57+
- In-memory counters are not safe under multi-process deployments; atomicity needs a shared coordination mechanism.
58+
- The project already includes a DB layer (SQLModel/Alembic) and uses SQLite for other features; a persistence-backed counter can provide atomic increments when configured.
59+
- **Implications**:
60+
- The design should support both in-memory (single-process) and persistent (multi-process-safe) sequence allocation with clear configuration gating and failure modes.
61+
62+
### Wire Capture Schema Evolution
63+
- **Context**: Requirements require capturing `a_session_id` and `b_session_id` distinctly.
64+
- **Findings**:
65+
- Capture metadata currently supports only a single `session_id` field (`src/core/domain/cbor_capture.py`) and several call sites supply `session_id` from `context.session_id`.
66+
- `CborWireCaptureService` also falls back to request_id or an internal capture file id when `session_id` is empty (`src/core/services/cbor_wire_capture_service.py`).
67+
- **Implications**:
68+
- The design should add explicit metadata fields for A-leg and B-leg IDs and remove the “request_id as session fallback” behavior when B2BUA mode is enabled (preserving legacy behavior when disabled).
69+
70+
## Architecture Pattern Evaluation
71+
72+
| Option | Description | Strengths | Risks / Limitations | Notes |
73+
|--------|-------------|-----------|---------------------|-------|
74+
| Extend existing `session_id` semantics | Reinterpret `session_id` as A-leg everywhere; derive B-leg ad-hoc | Fewer new types | High risk of leakage and subtle drift | Not preferred |
75+
| Dedicated identity service | Introduce explicit identity/mapping service behind DI seam | Clear boundaries, testable | Requires threading identities through orchestrators | Preferred foundation |
76+
| Hybrid staged migration | Add identity service + migrate highest-leverage seams first | Reduces rollout risk | Mixed-model period requires discipline | Recommended approach |
77+
78+
## Design Decisions
79+
80+
### Decision: Treat all client-provided session identifiers as untrusted metadata
81+
- **Context**: Prevent spoofing/session fixation and eliminate identifier leaks.
82+
- **Alternatives Considered**:
83+
1. Trust `x-session-id` as canonical session key
84+
2. Treat it as metadata and map to internal A-leg ids
85+
- **Selected Approach**: Store client-provided identifiers as `client_session_id` metadata only; never use them as canonical internal IDs or forward them upstream.
86+
- **Rationale**: Aligns with security requirements and B2BUA isolation.
87+
- **Trade-offs**: Requires continuity mapping store to preserve usability for clients that expect stable sessions.
88+
89+
### Decision: Use `auth_scope_id` for continuity scoping
90+
- **Context**: Multi-user mode must scope continuity to the same bearer token by default; localhost mode has no token.
91+
- **Alternatives Considered**:
92+
1. User-level scope (`user_id`)
93+
2. Token-level scope (`token_id`)
94+
3. Token hash (derived from raw token)
95+
- **Selected Approach**: Default `auth_scope_id` to a stable token identity (`token_id`) when available; represent localhost mode as a single implicit scope.
96+
- **Rationale**: Matches “resume only with same token” default and avoids storing raw tokens.
97+
- **Trade-offs**: Requires explicit context injection since current middleware gates but does not expose identity.
98+
99+
### Decision: Allocate B-legs at the backend-attempt boundary
100+
- **Context**: B-legs must be created per attempt (failover, follow-ups) and have atomically incremented `<seq>`.
101+
- **Alternatives Considered**:
102+
1. Allocate B-leg once per inbound request
103+
2. Allocate B-leg per backend attempt inside `BackendCompletionFlow`
104+
- **Selected Approach**: Allocate B-leg within backend-attempt orchestration (inside/adjacent to `BackendCompletionFlow`) so each attempt gets a unique `b_session_id`.
105+
- **Rationale**: Matches existing multi-attempt orchestration and supports accurate mapping and observability.
106+
- **Trade-offs**: Requires threading B-leg identity into connector invocation and capture/usage layers.
107+
108+
### Decision: DI Lifetime Selection
109+
- **Context**: Session identity and mapping are cross-cutting and must be concurrency-safe.
110+
- **Selected Approach**:
111+
- Identity formatters/allocators: `Singleton` (stateless/pure functions).
112+
- Mapping store: `Singleton` (shared state / shared DB handle per process).
113+
- Per-request identity “view”: carried in `RequestContext` and/or request-scoped helper.
114+
- **Rationale**: Aligns with existing DI patterns and avoids per-request recomputation of persistent mapping state.
115+
116+
### Decision: Error Handling Strategy
117+
- **Context**: Failures in mapping/persistence must not leak identifiers and should fail-open safely.
118+
- **Selected Approach**:
119+
- Introduce domain errors that extend `LLMProxyError` for identity/mapping failures.
120+
- When mapping store operations fail, create a new A-leg session and continue with degraded continuity (subject to config), emitting diagnostic logs.
121+
- **Rationale**: Preserves availability while keeping security properties.
122+
123+
## Risks & Mitigations
124+
- Risk 1: Identifier leakage through legacy `session_id` paths - Mitigation: Centralize B-leg injection at connector boundary; add explicit “no request_id fallback” rules when B2BUA enabled.
125+
- Risk 2: Atomic `<seq>` allocation breaks under multi-process - Mitigation: Require a shared persistent allocator when multi-worker is enabled; provide clear config and startup validation.
126+
- Risk 3: Schema evolution for captures/usage breaks tooling - Mitigation: Backward-compatible metadata additions and version-aware inspection tooling updates.
127+
128+
## Performance Considerations
129+
- Mapping lookups should be O(1) with TTL and bounded growth.
130+
- Persistent atomic sequencing introduces DB overhead; keep operations minimal (single-row transactional increment) and make persistence optional.
131+
- Wire capture enrichment adds metadata only; payload capture remains byte-precise.
132+
133+
## References
134+
- `src/core/services/backend_completion_flow/service.py` - Multi-attempt orchestration boundary
135+
- `src/core/services/session_enricher.py` - Session-scoped state enrichment boundary
136+
- `src/core/domain/request_context.py` - Cross-layer context contract + extensions container
137+
- `src/core/services/connector_invoker.py` - Canonical connector context projection
138+
- `src/core/services/cbor_wire_capture_service.py` and `src/core/domain/cbor_capture.py` - Capture metadata contract
139+
- `.kiro/specs/b2bua-like-session-handling/requirements.md` - Requirements and EARS constraints
140+
- `.kiro/specs/b2bua-like-session-handling/gap-analysis.md` - Implementation gap findings

.kiro/specs/b2bua-like-session-handling/spec.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
{
22
"feature_name": "b2bua-like-session-handling",
33
"created_at": "2025-12-31T16:11:07Z",
4-
"updated_at": "2025-12-31T16:49:21Z",
4+
"updated_at": "2025-12-31T17:17:19Z",
55
"language": "en",
6-
"phase": "requirements-generated",
6+
"phase": "design-generated",
77
"approvals": {
88
"requirements": {
99
"generated": true,
10-
"approved": false
10+
"approved": true
1111
},
1212
"design": {
13-
"generated": false,
13+
"generated": true,
1414
"approved": false
1515
},
1616
"tasks": {

tests/property/core/services/test_eos_dedupe_properties.py

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -86,14 +86,14 @@ def signal_strategy() -> st.SearchStrategy[EndOfSessionSignal]:
8686

8787
@pytest.mark.asyncio
8888
@given(signals=st.lists(signal_strategy(), min_size=2, max_size=5))
89-
@property_test_settings(
90-
max_examples=10,
91-
suppress_health_check=[
92-
HealthCheck.too_slow,
93-
HealthCheck.data_too_large,
94-
HealthCheck.function_scoped_fixture,
95-
],
96-
)
89+
@property_test_settings(
90+
max_examples=10, # Reduced from 15 for performance
91+
suppress_health_check=[
92+
HealthCheck.too_slow,
93+
HealthCheck.data_too_large,
94+
HealthCheck.function_scoped_fixture,
95+
],
96+
)
9797
async def test_property_multiple_signals_single_emission(
9898
eos_service: EndOfSessionService,
9999
mock_event_bus: IEventBus,
@@ -152,21 +152,21 @@ async def claim_side_effect(*args, **kwargs):
152152
assert mock_session_repository.claim_eos_emission.await_count >= 1
153153

154154

155-
@pytest.mark.asyncio
156-
@given(
157-
session_ids=st.lists(
158-
st.text(min_size=1, max_size=50), min_size=2, max_size=5, unique=True
159-
),
160-
signals_per_session=st.integers(min_value=2, max_value=5),
161-
)
162-
@property_test_settings(
163-
max_examples=15, # Reduced from 20 for performance
164-
suppress_health_check=[
165-
HealthCheck.too_slow,
166-
HealthCheck.data_too_large,
167-
HealthCheck.function_scoped_fixture,
168-
],
169-
)
155+
@pytest.mark.asyncio
156+
@given(
157+
session_ids=st.lists(
158+
st.text(min_size=1, max_size=50), min_size=2, max_size=5, unique=True
159+
),
160+
signals_per_session=st.integers(min_value=2, max_value=5),
161+
)
162+
@property_test_settings(
163+
max_examples=10, # Reduced from 15 for performance
164+
suppress_health_check=[
165+
HealthCheck.too_slow,
166+
HealthCheck.data_too_large,
167+
HealthCheck.function_scoped_fixture,
168+
],
169+
)
170170
@freeze_time("2024-01-01 12:00:00")
171171
async def test_property_concurrent_sessions_independent_dedupe(
172172
eos_service: EndOfSessionService,

tests/property/memory/test_summary_storage_completeness_properties.py

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -159,10 +159,8 @@ def minimal_session_summary_for_nested_validation(draw: st.DrawFn) -> SessionSum
159159

160160
@given(summary=session_summary_strategy())
161161
@property_test_settings(
162-
max_examples=15, # Reduced from 20 for performance
163-
suppress_health_check=[
164-
HealthCheck.filter_too_much
165-
],
162+
max_examples=10, # Reduced from 15 for performance
163+
suppress_health_check=[HealthCheck.filter_too_much],
166164
)
167165
@freeze_time("2024-01-01 12:00:00")
168166
def test_property_7_summary_has_all_required_fields(summary: SessionSummary) -> None:
@@ -203,9 +201,7 @@ def test_property_7_summary_has_all_required_fields(summary: SessionSummary) ->
203201
@given(summary=session_summary_strategy())
204202
@property_test_settings(
205203
max_examples=6, # Reduced from 8 for performance
206-
suppress_health_check=[
207-
HealthCheck.filter_too_much
208-
],
204+
suppress_health_check=[HealthCheck.filter_too_much],
209205
)
210206
@freeze_time("2024-01-01 12:00:00")
211207
def test_property_7_summary_model_format(summary: SessionSummary) -> None:
@@ -297,7 +293,7 @@ def test_property_7_summary_is_immutable(summary: SessionSummary) -> None:
297293
@given(summary=session_summary_strategy())
298294
@property_test_settings(
299295
max_examples=5, # Reduced from 6 for performance
300-
suppress_health_check=[HealthCheck.filter_too_much]
296+
suppress_health_check=[HealthCheck.filter_too_much],
301297
)
302298
@freeze_time("2024-01-01 12:00:00")
303299
def test_property_7_summary_serializable(summary: SessionSummary) -> None:

tests/property/test_non_forwardable_message_properties.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -552,8 +552,8 @@ async def is_tagged_side_effect(
552552
session_id=st.text(min_size=1, max_size=50),
553553
scope=st.sampled_from(list(NonForwardableTagScope)),
554554
)
555-
@property_test_settings(max_examples=30) # Reduced for async tests
556-
async def test_property_filtering_removes_only_tagged_messages(
555+
@property_test_settings(max_examples=20) # Reduced from 30 for performance
556+
async def test_property_filtering_removes_only_tagged_messages(
557557
messages: list[ChatMessage],
558558
session_id: str,
559559
scope: NonForwardableTagScope,

tests/property/test_sso_startup_properties.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -158,10 +158,10 @@ def test_property_legacy_auth_disabled_in_sso_mode(sso_config, host, legacy_keys
158158
)
159159

160160

161-
# Property 3: Non-Loopback Startup Rejection
162-
@settings(max_examples=50)
163-
@given(host=non_loopback_address_strategy())
164-
def test_property_non_loopback_startup_rejection(host):
161+
# Property 3: Non-Loopback Startup Rejection
162+
@settings(max_examples=20) # Reduced from 50 for performance
163+
@given(host=non_loopback_address_strategy())
164+
def test_property_non_loopback_startup_rejection(host):
165165
"""
166166
Feature: sso-authentication, Property 3: Non-Loopback Startup Rejection
167167

tests/property/test_usage_data_preservation_properties.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -217,9 +217,9 @@ def test_property_1_usage_at_top_level_in_sse_output(
217217
)
218218

219219

220-
@given(chunk=stop_chunk_with_usage_strategy())
221-
@property_test_settings()
222-
def test_property_1_usage_not_in_delta_content(chunk: StopChunkWithUsage) -> None:
220+
@given(chunk=stop_chunk_with_usage_strategy())
221+
@property_test_settings(max_examples=25) # Reduced from 50 for performance
222+
def test_property_1_usage_not_in_delta_content(chunk: StopChunkWithUsage) -> None:
223223
"""
224224
**Feature: gemini-oauth-streaming-fix, Property 1: Usage data preservation**
225225
**Validates: Requirements 1.1, 4.1**

tests/regression/test_parameter_resolution_leak_regression.py

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -47,29 +47,29 @@ def test_repeated_record_calls_replace_previous_entries(
4747
"Repeated record() calls should replace previous entries."
4848
)
4949

50-
def test_history_bounded_by_max_size(self, resolution: ParameterResolution) -> None:
51-
"""Test that _history is bounded by _MAX_HISTORY_SIZE."""
52-
from src.core.config.parameter_resolution import ParameterResolution
53-
54-
max_size = ParameterResolution._MAX_HISTORY_SIZE
55-
56-
# Record many unique parameters (more than max size)
57-
num_parameters = max_size + 1000
58-
for i in range(num_parameters):
59-
parameter_name = f"test.parameter.{i}"
60-
resolution.record(
61-
name=parameter_name,
62-
value=i,
63-
source=ParameterSource.CONFIG_FILE,
64-
origin=f"config_{i}.yaml",
65-
)
66-
67-
# History should not exceed max size
68-
history_size = len(resolution._history)
69-
assert history_size <= max_size, (
70-
f"History size ({history_size}) exceeded max size ({max_size}). "
71-
"Oldest entries should be evicted."
72-
)
50+
def test_history_bounded_by_max_size(self, resolution: ParameterResolution) -> None:
51+
"""Test that _history is bounded by _MAX_HISTORY_SIZE."""
52+
from src.core.config.parameter_resolution import ParameterResolution
53+
54+
max_size = ParameterResolution._MAX_HISTORY_SIZE
55+
56+
# Record many unique parameters (more than max size)
57+
num_parameters = max_size + 500 # Reduced from 1000 for performance
58+
for i in range(num_parameters):
59+
parameter_name = f"test.parameter.{i}"
60+
resolution.record(
61+
name=parameter_name,
62+
value=i,
63+
source=ParameterSource.CONFIG_FILE,
64+
origin=f"config_{i}.yaml",
65+
)
66+
67+
# History should not exceed max size
68+
history_size = len(resolution._history)
69+
assert history_size <= max_size, (
70+
f"History size ({history_size}) exceeded max size ({max_size}). "
71+
"Oldest entries should be evicted."
72+
)
7373

7474
def test_build_report_uses_latest_entry(
7575
self, resolution: ParameterResolution

0 commit comments

Comments
 (0)