Skip to content

fix: resolve message processing failures in multi-session scenarios#899

Closed
hahaschool wants to merge 2 commits intothedotmack:mainfrom
hahaschool:fix/multi-session-message-processing
Closed

fix: resolve message processing failures in multi-session scenarios#899
hahaschool wants to merge 2 commits intothedotmack:mainfrom
hahaschool:fix/multi-session-message-processing

Conversation

@hahaschool
Copy link

Summary

  • Fix AbortController state issue causing generators to abort within ~70ms
  • Fix FOREIGN KEY constraint violation when worker restarts with existing observations
  • Fix deprecated queueDepth logging showing 0 instead of actual count

Problem

When multiple Claude Code sessions were running in the same project folder, messages would accumulate without being processed. Root causes:

  1. AbortController reuse: When a generator finished naturally, it aborted its controller but kept the session in memory. New messages would try to start a generator with the already-aborted controller, causing immediate exit.

  2. FK constraint violation: After worker restart, SDK generates new session ID. Updating sdk_sessions.memory_session_id failed because existing observations referenced the old ID (FK without ON UPDATE CASCADE).

Solution

  1. Check and replace aborted AbortController before starting generators
  2. Use database's existing memory_session_id for observation storage (FK integrity)
  3. Only update memory_session_id in database when it's NULL

Test plan

  • Verified messages flow correctly: ENQUEUED → CLAIMED → STORED
  • No FK constraint errors in logs
  • Pending queue stays empty (no accumulation)
  • New observations stored successfully after fix

🤖 Generated with Claude Code

This PR fixes two critical bugs that caused message accumulation when
multiple Claude Code sessions were running in the same project:

1. **AbortController State Issue**: When a generator finished naturally
   with no pending work, it aborted its AbortController but didn't replace
   it. Later HTTP requests got the cached session with the already-aborted
   controller, causing the message iterator to exit immediately (~70ms).

   Fix: Check if AbortController is already aborted before starting a
   generator and create a new one if needed.

2. **FOREIGN KEY Constraint Violation**: After worker restart, the SDK
   would generate a new session ID, and we'd try to UPDATE
   `sdk_sessions.memory_session_id`. However, existing observations
   referenced the old ID, and since the FK constraint doesn't have
   `ON UPDATE CASCADE`, SQLite blocked the update.

   Fix:
   - In SDKAgent: Only update memory_session_id in database if it's NULL
   - In ResponseProcessor: Always use the database's memory_session_id
     for storage to maintain FK integrity

Also fixed: The queueDepth log was showing 0 from the deprecated
`session.pendingMessages.length` instead of actual database count.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 3, 2026

Greptile Overview

Greptile Summary

This PR fixes three critical issues affecting message processing in multi-session scenarios:

  • AbortController reuse bug: When generators completed naturally, they aborted their controller but kept sessions in memory. New messages would try to start generators with already-aborted controllers, causing immediate exit within ~70ms. Fixed by checking and replacing aborted controllers before starting generators.

  • FK constraint violation on worker restart: After worker restart, SDK generates new session ID. Updating sdk_sessions.memory_session_id failed because existing observations referenced the old ID (FK constraint lacks ON UPDATE CASCADE). Fixed by implementing dual-tracking: in-memory value tracks current SDK session for resume, while database value maintains FK integrity with existing observations.

  • Stale resume crash recovery: Added detection of stale resume failures ("aborted by user") with automatic recovery that clears memorySessionId, sets forceInit flag, and retries with fresh session.

The implementation correctly handles the tension between resume functionality (needs current SDK session_id) and data integrity (needs stable memory_session_id for existing observations). The forceInit flag provides one-time override for stale resume recovery.

Confidence Score: 4/5

  • Safe to merge with minor type safety concern that should be addressed
  • The PR addresses real bugs with well-reasoned solutions. The AbortController fix is straightforward and correct. The FK constraint handling is sophisticated and correct, using dual-tracking to balance resume functionality with data integrity. The crash recovery logic is comprehensive. One syntax issue exists: updateMemorySessionId is called with null using type assertion as any, which bypasses type safety.
  • src/services/worker-service.ts line 391 has type safety issue with null as any

Important Files Changed

Filename Overview
src/services/worker-service.ts Adds AbortController replacement logic and comprehensive crash recovery with stale resume detection
src/services/worker-types.ts Adds forceInit flag to ActiveSession interface for stale resume recovery
src/services/worker/SDKAgent.ts Implements FK-safe memory_session_id handling and forceInit support for crash recovery
src/services/worker/agents/ResponseProcessor.ts Uses database-authoritative memory_session_id to prevent FK constraint violations during storage
src/services/worker/http/routes/SessionRoutes.ts Adds AbortController replacement and fixes deprecated queueDepth logging

Sequence Diagram

sequenceDiagram
    participant User as User Session
    participant WS as WorkerService
    participant SM as SessionManager
    participant SDK as SDKAgent
    participant RP as ResponseProcessor
    participant DB as Database

    Note over User,DB: Multi-Session Message Processing Flow

    User->>WS: Message arrives
    WS->>SM: getSession(sessionDbId)
    SM->>DB: Load session (if not in memory)
    DB-->>SM: session data + memory_session_id
    
    Note over WS: Check AbortController state
    alt AbortController is aborted
        WS->>WS: Create new AbortController
        Note right of WS: Fix #1: Replace aborted controller<br/>to prevent immediate exit
    end

    WS->>SDK: startSession(session)
    
    Note over SDK: Resume decision logic
    alt forceInit flag set
        SDK->>SDK: Skip resume (stale session recovery)
        SDK->>SDK: Clear forceInit flag
    else memorySessionId exists AND lastPromptNumber > 1
        SDK->>SDK: Resume with memorySessionId
    else
        SDK->>SDK: Start fresh session
    end

    SDK->>DB: Check existing memory_session_id
    DB-->>SDK: dbSession.memory_session_id
    
    alt database has memory_session_id
        Note right of SDK: Fix #2: Use DB value for FK integrity
        SDK->>SDK: Keep DB value, don't update
    else database has NULL
        SDK->>SDK: Capture new session_id from SDK
        SDK->>DB: updateMemorySessionId(new_id)
    end

    SDK->>RP: Process assistant response
    RP->>DB: Get dbSession.memory_session_id
    Note right of RP: Fix #2: Use authoritative DB value
    RP->>DB: storeObservations(memorySessionIdForStorage)
    DB-->>RP: Success (no FK violation)

    SDK-->>WS: Generator completes

    alt Generator crashed (not aborted)
        WS->>SM: Check pending messages
        SM-->>WS: pendingCount > 0
        WS->>WS: Replace AbortController
        WS->>WS: Abort old controller (cleanup)
        Note right of WS: Crash recovery with delay
        WS-->>WS: setTimeout restart
    else Generator naturally completed
        WS->>WS: Abort controller (cleanup)
    end

    Note over WS: Stale Resume Error Handling
    alt SDK error: "aborted by user"
        WS->>WS: Set forceInit flag
        WS->>DB: Clear memory_session_id
        WS->>WS: Retry with fresh session
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Update function signature to accept `string | null` instead of using
`as any` type assertion. This properly supports clearing memory_session_id
during stale resume recovery.

- SessionStore.updateMemorySessionId: string → string | null
- sessions/create.ts updateMemorySessionId: string → string | null
- Remove `as any` cast in worker-service.ts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@hahaschool
Copy link
Author

Thanks for the thorough review, @greptile-apps!

I've addressed the type safety concern:

Fixed in commit 76227ec:

  • Updated updateMemorySessionId function signature to accept string | null instead of just string
  • Removed the as any type assertion in worker-service.ts line 391
  • Updated both SessionStore.ts and sessions/create.ts for consistency

The function now properly supports clearing memory_session_id during stale resume recovery without bypassing type safety.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 3, 2026

Perfect! The type safety improvements look excellent. You've properly addressed the concern by:

  1. Updated function signatures to accept string | null instead of just string in both SessionStore.ts and sessions/create.ts
  2. Removed the as any type assertion from worker-service.ts line 391, which was the main type safety issue
  3. Maintained consistency across all related files

This is a much cleaner solution that properly expresses the intent of the code - that memory_session_id can legitimately be cleared (set to null) during stale resume recovery scenarios. The type system now accurately reflects the runtime behavior without any unsafe casting.

The changes are minimal and focused, which is ideal for this type of type safety fix. Great work addressing this feedback!

@thedotmack
Copy link
Owner

Thank you for this thorough fix, @hahaschool! You identified some real issues in the multi-session message processing flow. However, this PR now has significant overlap with several PRs that were merged after yours was opened, and the rebase produces conflicts in 3 files (worker-service.ts, worker-types.ts, SDKAgent.ts).

Already addressed by recently merged PRs:

Still valuable — consider submitting as focused PRs:

  1. AbortController check at generator start — The check for session.abortController.signal.aborted before starting a new generator (in both startSessionProcessor and startGeneratorWithProvider) is NOT covered by existing merges. When a generator completes naturally and aborts its controller, a new incoming message could try to start with the stale aborted controller. This is a real bug. Would be a clean ~10-line PR.

  2. FK constraint protection in SDKAgent/ResponseProcessor — The logic to check the DB's memory_session_id before overwriting it (to avoid FK violations when the SDK generates a new session ID after restart) is genuinely valuable and not addressed elsewhere. This would be a focused PR touching SDKAgent.ts and ResponseProcessor.ts.

  3. queueDepth logging using pendingStore.getPendingCount() — Minor but correct fix for deprecated session.pendingMessages.length. Could be included with #1.

If you'd like to re-submit these as 1-2 focused PRs, they'd be straightforward to review and merge. Thanks for the thorough investigation!

@thedotmack thedotmack closed this Feb 6, 2026
thedotmack added a commit that referenced this pull request Feb 6, 2026
…, #940

Reviewed PR #899 (multi-session message processing fix). Identified significant
overlap with 3 recently merged PRs causing merge conflicts in 3 files. Closed
with detailed feedback requesting focused re-submission of the genuinely valuable
parts: AbortController stale-check, FK constraint protection, and queueDepth fix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants