-
Notifications
You must be signed in to change notification settings - Fork 301
Fix data race on isResumed flag in NostrNetworkManager #3586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis pull request refactors NostrNetworkManager's connection continuation handling to eliminate a data race on the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant awaitConnection
participant Dictionary as Continuations<br/>Dictionary
participant Timeout as Timeout Task
participant Connect as connect()
Client->>awaitConnection: awaitConnection(timeout)
alt Already Connected
awaitConnection->>Client: Resume immediately
else Not Connected
awaitConnection->>Dictionary: Store continuation<br/>(under lock)
Dictionary->>awaitConnection: Done
par Timeout Path
Timeout->>Timeout: Sleep(timeout duration)
Timeout->>Dictionary: resumeConnectionContinuation<br/>(requestId)
alt Continuation Exists
Dictionary->>Timeout: Remove & resume
else Already Resumed
Dictionary->>Timeout: Not found, return
end
and Connect Path
Connect->>Connect: Mark connected
Connect->>Dictionary: Resume all pending<br/>continuations
Dictionary->>Client: Resume all
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
damus/Core/Networking/NostrNetworkManager/NostrNetworkManager.swift (1)
95-121:⚠️ Potential issue | 🟠 MajorPrevent a race between connection check and continuation registration.
If
connect()flipsisConnectedto true after Line 96 but before Line 109, this call can wait until the timeout even though the connection is already established. Re-check (and/or insert) under the same lock, or resume immediately if already connected.🛠️ Proposed fix (single locked check + immediate resume)
- // Short-circuit if already connected - continuationsLock.lock() - let alreadyConnected = isConnected - continuationsLock.unlock() - - guard !alreadyConnected else { - return - } - let requestId = UUID() var timeoutTask: Task<Void, Never>? await withCheckedContinuation { (continuation: CheckedContinuation<Void, Never>) in // Store the continuation in a thread-safe manner continuationsLock.lock() - connectionContinuations[requestId] = continuation - continuationsLock.unlock() + if isConnected { + continuationsLock.unlock() + continuation.resume() + return + } + connectionContinuations[requestId] = continuation + continuationsLock.unlock()
42d01d9 to
6d3a2a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@damusTests/NostrNetworkManagerConcurrencyTests.swift`:
- Around line 156-205: Add brief API docstrings for each helper method to
satisfy doc coverage: document awaitConnection(timeout:) to explain it
asynchronously waits for connection with optional timeout and how the
continuation is registered/cleared; document
resumeConnectionContinuation(requestId:) to state it resumes and removes the
stored continuation for the given UUID; document connect() to state it marks the
manager connected and resumes all pending continuations; and document reset() to
state it clears connection state and pending continuations. Place these comments
immediately above the corresponding function declarations (awaitConnection,
resumeConnectionContinuation, connect, reset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@damusTests/NostrNetworkManagerIntegrationTests.swift`:
- Around line 27-36: Add brief docstrings above the setUpWithError() and
tearDownWithError() methods describing their purpose: setUpWithError should note
it initializes damusState for each test (mention generate_test_damus_state,
mock_profile_info and addNdbToRelayPool settings) and tearDownWithError should
note it cleans up by setting damusState to nil. Keep each comment one or two
sentences and follow the file's existing docstring style and formatting.
- Around line 73-89: The test testRapidConnectDisconnectCycles currently only
calls damusState.nostrNetwork.connect(), so add a matching disconnect call to
exercise the disconnect path: inside the loop after the brief Task.sleep(for:
.milliseconds(50)) call, invoke await damusState.nostrNetwork.disconnect() (and
optionally a short sleep after disconnect) so each cycle performs connect then
disconnect; update the loop in the test method to call disconnect on the
nostrNetwork instance.
- Around line 93-136: The testConcurrentSubscriptions pre-population quietly
no-ops if test_notes.jsonl is missing and performs blocking I/O on the
`@MainActor`; change it to fail fast if the fixture is missing (XCTFail and return
when fileURL is nil) and move the file read off the main thread by awaiting a
background task (e.g., let notesJSONL = try await Task.detached { try
String(contentsOf: fileURL, encoding: .utf8) }.value), then iterate the lines
and call damusState.ndb.processEvent(...) on the appropriate actor; keep the
rest of the test (including the stream loop on
damusState.nostrNetwork.reader.advancedStream) unchanged.
🧹 Nitpick comments (1)
damusTests/NostrNetworkManagerIntegrationTests.swift (1)
138-191: Move bulk pre-population off@MainActorto keep the main thread unblocked.These tests do large synchronous
ndb.processEventloops on the MainActor. IfNdbis thread-safe, consider moving the pre-population work to a background task/queue and awaiting completion.♻️ Example refactor (apply similarly where safe)
- // Pre-populate - for i in 0..<20 { - guard let testNote = NostrEvent( - content: "Stress test note \(i)", - keypair: test_keypair, - kind: NostrKind.text.rawValue, - tags: [] - ) else { continue } - - let eventJson = encode_json(testNote)! - _ = damusState.ndb.processEvent("[\"EVENT\",\"subid\",\(eventJson)]") - } + // Pre-populate off MainActor (if Ndb is thread-safe). + let ndb = damusState.ndb + await Task.detached { + for i in 0..<20 { + guard let testNote = NostrEvent( + content: "Stress test note \(i)", + keypair: test_keypair, + kind: NostrKind.text.rawValue, + tags: [] + ) else { continue } + + let eventJson = encode_json(testNote)! + _ = ndb.processEvent("[\"EVENT\",\"subid\",\(eventJson)]") + } + }.valueAs per coding guidelines: Never block the main thread: All network requests, database queries, and expensive computations must run on background threads/queues using Task { }, DispatchQueue.global(), or Swift concurrency (async/await).
Also applies to: 195-248, 252-302, 330-380
…fixes" This reverts commit 0098c4a as these tests belong in their respective PR branches: - NostrNetworkManagerIntegrationTests.swift → PR2 damus-io#3586 - WalletModelIntegrationTests.swift → PR3 damus-io#3587 - FilterStateConcurrencyTests.swift → PR4 damus-io#3588 Signed-off-by: alltheseas <alltheseas@users.noreply.github.com>
3d09763 to
e014269
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@damusTests/NostrNetworkManagerIntegrationTests.swift`:
- Around line 101-151: The test currently calls damusState.ndb.processEvent(...)
on the main actor, which performs synchronous work through withNdb() →
keepNdbOpen() → ndb_process_event and can block the main thread; move the entire
pre-population loop that iterates notesJSONL.split(...) into a background task
(e.g. Task.detached { ... } or DispatchQueue.global().async) so processEvent
runs off the main actor, await its completion before continuing the test, and
apply the same change to the other test methods that pre-populate the DB.
Agent Requirements Compliance Review
Note on actors vs locks: NSLock used for consistency with existing NostrNetworkManager synchronization. Refactoring to actors would be a larger change better suited for a dedicated PR. |
0d230f6 to
b92c769
Compare
The previous implementation had two issues: 1. isResumed local variable was accessed without synchronization between timeout task and resumeConnectionContinuation() 2. Race between checking isConnected and registering the continuation could cause unnecessary timeout waits Changes: - Eliminate isResumed flag - use continuation dictionary as source of truth - Check isConnected atomically inside the lock when storing continuation - Use removeValue(forKey:) to atomically remove and return continuation Closes damus-io#3572 Changelog-Fixed: Fix data race in NostrNetworkManager connection handling Signed-off-by: alltheseas <alltheseas@users.noreply.github.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive test coverage for the isResumed race fix: Concurrency Tests (NostrNetworkManagerConcurrencyTests.swift - 4 tests): - Concurrent awaitConnection calls without double-resume - Timeout races with connect without double-resume - Stress test with many iterations - Already-connected immediate return Integration Tests (NostrNetworkManagerIntegrationTests.swift - 8 tests): - Concurrent awaitConnection calls - Rapid connect/disconnect cycles - Concurrent subscriptions - Concurrent subscriptions stress test - Event flow integrity - Subscription cancellation - Background/foreground transitions - Concurrent ops with lifecycle transitions Run with Thread Sanitizer for best results: xcodebuild test -enableThreadSanitizer YES Signed-off-by: alltheseas <alltheseas@users.noreply.github.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
b92c769 to
f64170f
Compare
Summary
Fix data race on isResumed flag in NostrNetworkManager.awaitConnection().
The previous implementation had two issues:
isResumedlocal variable was accessed without synchronization between timeout task andresumeConnectionContinuation()isConnectedand registering the continuation could cause unnecessary timeout waitsChanges:
isResumedflag - use continuation dictionary as source of truthisConnectedatomically inside the lock when storing continuationremoveValue(forKey:)to atomically remove and return continuationChecklist
Standard PR Checklist
Closes:orFixes:tags in the commit messages wherever applicable.Test report
Device: iPhone 17 Pro Simulator
iOS: 26.2
Damus: fix/network-layer-races branch
Setup: Thread Sanitizer enabled for race detection
Steps:
xcodebuild test -scheme damus -destination 'platform=iOS Simulator,name=iPhone 17 Pro' -only-testing:damusTests/NostrNetworkManagerConcurrencyTests -only-testing:damusTests/NostrNetworkManagerIntegrationTests -enableThreadSanitizer YESResults:
Unit Tests (4 tests)
testAwaitConnection_ConcurrentCalls_NoDoubleResumetestAwaitConnection_TimeoutRacesWithConnect_NoDoubleResumetestAwaitConnection_StressTest_ManyIterationstestAwaitConnection_AlreadyConnected_ImmediateReturnIntegration Tests (8 tests)
testConcurrentAwaitConnectionCallstestRapidConnectDisconnectCyclestestConcurrentSubscriptionstestConcurrentSubscriptions_StressTesttestEventFlowIntegritytestSubscriptionCancellationtestBackgroundForegroundTransitionstestConcurrentOpsWithLifecycleTransitionsTotal: 12 tests, 0 failures, ~4.6 seconds
Other notes
Tests designed for Thread Sanitizer detection of data races.
Closes #3572
🤖 Generated with Claude Code
Summary by CodeRabbit