fix: prevent handshake race condition causing ConnectionClosed failures by diegomrsantos · Pull Request #644 · sigp/anchor

diegomrsantos · 2025-10-01T12:58:40Z

Issue Addressed

Fixes SSV handshake failures caused by race conditions between connection establishment and protocol stream opening.

Observed Symptoms

2025 Sep 30, @ 23:34:08.674  DEBUG  Handshake failed peer_id=16Uiu2HAmQ13kHGvufdTn7WFS28A2szS8Avk5GVBjcKJergzCd5v1  error=Outbound(ConnectionClosed)
2025 Sep 30, @ 23:34:08.663  DEBUG  Failed to dial peer err=Dial error: dial condition was configured to only happen when both disconnected and there is currently no ongoing dialing attempt, but node is already connected or dial is in progress
2025 Sep 30, @ 23:34:08.662  DEBUG  Let's dial! peer=PeerId(16Uiu2HAmQ13kHGvufdTn7WFS28A2szS8Avk5GVBjcKJergzCd5v1)

Proposed Changes

Evolution of Approach

First Attempt: Identify-based Coordination

Queue peers on outbound ConnectionEstablished, wait for Identify::Received, then initiate handshake
Why This Was Wrong: Not composable (couples Identify and Handshake), unnecessary complexity (pending_handshakes queue), Identify is optional, against libp2p patterns (behaviors should be self-contained)

Current Approach: Self-Contained Behaviour with Semantic Events

The handshake behaviour is now fully self-contained and auto-initiating:

impl NetworkBehaviour for Behaviour {
    type ToSwarm = Event;  // Emits Event::Completed or Event::Failed
    
    fn on_swarm_event(&mut self, event: FromSwarm) {
        if let FromSwarm::ConnectionEstablished(conn_est) = &event {
            if let ConnectedPoint::Dialer { .. } = conn_est.endpoint {
                if conn_est.other_established == 0 {
                    // Auto-initiate handshake on first outbound connection
                    self.inner.send_request(&conn_est.peer_id, self.node_info.clone());
                }
            }
        }
        self.inner.on_swarm_event(event);
    }
    
    fn poll(&mut self, cx: &mut Context) -> Poll<ToSwarm<Event, ...>> {
        // Process RequestResponse events internally
        // Emit high-level Event::Completed or Event::Failed
    }
}

Why This Works:

Composable: Self-contained with no external dependencies
Handles concurrent dials: other_established == 0 ensures only the FIRST connection initiates, preventing duplicates
Clean event emission: Custom poll() processes low-level RequestResponse events and emits semantic events
Simple: No coordination, no pending queues, no external state

Evidence

The concurrent dials test proves the fix:

Without other_established == 0: Both peers initiate handshakes → duplicate requests

With the check: First ConnectionEstablished (other_established == 0) initiates, second (other_established == 1) skips → only ONE handshake

Implementation Details

Behaviour Changes:

Custom poll() processes RequestResponse events internally and emits semantic Event::Completed/Event::Failed
Events queue (VecDeque<Event>) for buffering handshake results
Auto-initiation on first outbound connection via on_swarm_event()
Stores NodeInfo for automatic handshake requests

Network Changes:

Removed pending_handshakes queue and manual initiation
Updated to handle semantic events directly
Changed dial policy from NotDialing to DisconnectedAndNotDialing
Simplified connection handling

Test Improvements:

All tests use drive() for clean, declarative testing
Reduced from 611 to 397 lines (35% reduction)
Helper functions eliminate duplication

Additional Info

Architecture

Follows idiomatic rust-libp2p patterns where behaviours are self-contained and composable. The behaviour:

Handles its own coordination without external state
Emits high-level semantic events for consumers
Processes low-level protocol details internally in poll()
Has no dependencies on other behaviours

This is the same pattern used in Lighthouse and other production rust-libp2p implementations.

Compatibility

The asymmetric dialer-initiates pattern matches Go SSV's handshake flow, ensuring proper interoperability.

Copilot

Pull Request Overview

This PR fixes handshake race conditions in the SSV network implementation by deferring handshake initiation until after the Identify protocol confirms connection stability and protocol support.

Key Changes:

Queue outbound handshakes instead of initiating immediately on connection establishment
Wait for Identify protocol completion before starting handshakes to ensure stable connections
Clean up pending handshakes on connection errors and closures

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
anchor/network/src/peer_manager/discovery.rs	Changed dial condition from DisconnectedAndNotDialing to NotDialing to reduce noise
anchor/network/src/network.rs	Added pending handshake queue, deferred initiation logic, and enhanced connection event handling
anchor/network/src/handshake/mod.rs	Added documentation and trace logging for handshake failure debugging

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

anchor/network/src/network.rs

AgeManning

I'm not sure I'm following why this is necessary.

Firstly, Identify at least in my mind is mainly used for debugging. In Lighthouse its more or less optional, I know ssv doesn't really have network specs, but I'm not sure if identify is a necessary required protocol.

I think all libp2p behaviours should work independently of each other. It allows behaviours to be composable. I think by tying the Handshake to Identify you are effectively saying the handshake protocol cannot function without identify. Which I dont think should be the case.

In regards to the actual issue, which seems to be concurrent handshake requests due to concurrent connections. I would have expected that when we dial, we dial a few multiaddrs (if multiple transports are supported). We do this in a single dial, via the dialopts. We then get libp2p to decide which transport we prioritize. I'm not sure why we would be calling dial again for any reason on an already connected peer. We should ensure we don't re-dial connected peers.

When we open a stream, libp2p uses multistream select to determine which protocols are supported. So we can have a connection with a peer, and the handshake can fail if they dont support it. I don't think this is an issue, if the negotation fails, we just drop the peer. We don't need identify to determine this in advance I dont think.

Would it be sufficient just to check num_established in connection_established and the connection direction before sending the handshake?. I.e if num_established = 1 and direction is outbound, then we do the handshake. This would simplify the changes and not rely on identify. Or is there something else I'm missing?

Copilot

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

anchor/network/src/handshake/mod.rs

anchor/network/src/network.rs

Copilot

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

anchor/network/src/network.rs

anchor/network/src/handshake/mod.rs

Fixes SSV handshake failures caused by opening protocol streams on connections that get closed due to concurrent dials or connection trimming. Observed as OutboundFailure::ConnectionClosed errors. The fix implements a pending handshake queue that defers handshake initiation until after the Identify protocol confirms the connection is stable and supports the SSV handshake protocol. Key changes: - Add pending_handshakes queue to defer handshake initiation - Wait for Identify protocol before initiating handshakes - Change dial policy from DisconnectedAndNotDialing to NotDialing - Add 30-second idle connection timeout for handshake completion - Improve logging to differentiate expected vs unexpected dial failures - Add comprehensive documentation explaining the race condition

Convert handshake behaviour from type alias to self-contained struct that automatically initiates handshakes on first outbound connections. This eliminates the need for external coordination and simplifies the architecture. Key Changes: - Replace type alias with proper Behaviour struct wrapping RequestResponseBehaviour - Implement NetworkBehaviour trait with auto-initiation in on_swarm_event - Store NodeInfo in the behaviour for automatic handshake initiation - Use `other_established == 0` check to prevent duplicate initiations during concurrent dials - Change create_behaviour() to Behaviour::new() (idiomatic Rust constructor) - Move initiate() to be a method on Behaviour struct - Remove manual handshake initiation from network.rs - Add concurrent_dials_both_initiate_handshake test proving the fix works - Add helper functions for cleaner test structure Evidence: The concurrent dials test proves that other_established == 0 prevents duplicate handshake initiations. Without the check, both peers initiate when they get Dialer connections (duplicate). With the check, only the first peer initiates. This makes handshakes fully composable without requiring Identify protocol coordination or external state management.

Convert standalone handle_event function to a method on Behaviour for better encapsulation and more idiomatic Rust. Extract test event handling logic into helper function to eliminate duplication and improve readability. API Changes: - handle_event() is now a method: behaviour.handle_event(event) - handle_request() and handle_response() moved into Behaviour impl as private methods - Removed unused initiate() method (handshakes are auto-initiated) - Simplified call sites in network.rs and tests Test Improvements: - Added handle_swarm_event_for_test() helper to eliminate 40+ lines of duplication - Tests are now much more readable and maintainable - All assertions and tracking logic centralized in one place Cleanup: - Removed .with_bandwidth_metrics() call (doesn't exist in current libp2p version) - Removed unused parameters from test helpers

dknopik · 2025-10-16T11:29:06Z

@diegomrsantos: As Joao noted here #660 the second log line is not necessarily associated with the other two. I've checked the recent logs, and can not really find instances of this issue anymore.

I want to avoid making significant changes to the handshake if they turn out to be motivated by an issue that does not exist.

Nevertheless, during the evolution of this PR it seems to have become a reasonable refactor. However, some changes seem to be unrelated, such as removal of logging. Also, some of the comments seem to still reference the original approach of using the identity protocol. Can you please clean it up a bit so that it is easier to review and more focused on the refactor?

diegomrsantos · 2025-10-16T12:21:40Z

@diegomrsantos: As Joao noted here #660 the second log line is not necessarily associated with the other two. I've checked the recent logs, and can not really find instances of this issue anymore.

I want to avoid making significant changes to the handshake if they turn out to be motivated by an issue that does not exist.

Nevertheless, during the evolution of this PR it seems to have become a reasonable refactor. However, some changes seem to be unrelated, such as removal of logging. Also, some of the comments seem to still reference the original approach of using the identity protocol. Can you please clean it up a bit so that it is easier to review and more focused on the refactor?

if other_established == 0 is commented out, two handshakes are triggered in concurrent_dials_only_one_handshake

…ify reference

claude-code-actions-sigp · 2025-10-23T09:51:54Z

Claude finished @diegomrsantos's task —— View job

Code Review Complete

I've conducted a comprehensive review of this PR focusing on code quality, security, performance, testing, and documentation. Here are my findings:

Summary

This PR refactors the handshake behavior to be self-contained and auto-initiating, addressing a potential race condition in concurrent connection scenarios. The implementation follows idiomatic rust-libp2p patterns and includes significant test improvements.

Key Change: The handshake behavior now automatically initiates handshakes on the first outbound connection using the other_established == 0 check, eliminating the need for external coordination.

Critical Issues

1. Race Condition Fix Necessity (Severity: Discussion Needed)

Location: Throughout the PR

As reviewers @AgeManning and @dknopik noted, the original race condition may no longer exist in production. The PR description shows logs that may not be causally related. However, the author demonstrates in concurrent_dials_only_one_handshake test that without the other_established == 0 check, duplicate handshakes would occur.

Recommendation:

Clarify whether this is primarily a refactor (main benefit) or a bug fix (secondary benefit)
The refactor itself has merit: better encapsulation, cleaner tests, self-contained behavior
If positioning as a bug fix, provide concrete evidence from production logs showing the race condition

2. Breaking Change in Public API (Severity: High)

Location: anchor/network/src/handshake/mod.rs:23-42

The PR fundamentally changes the public API:

Before: pub type Behaviour = RequestResponseBehaviour<Codec> + handle_event() function
After: Custom pub struct Behaviour with different event types

// Old API (removed)
pub type Behaviour = RequestResponseBehaviour<Codec>;
pub type Event = <Behaviour as NetworkBehaviour>::ToSwarm;
pub fn handle_event(...) -> Option<Result<Completed, Failed>>

// New API
pub struct Behaviour { ... }
pub enum Event { Completed {...}, Failed {...} }

Impact: This is a breaking change for any external code using the handshake module.

Recommendation:

Document this as a breaking change in commit message and PR description
Since this is within the anchor workspace and not a published library, the impact is contained
Consider whether gradual migration path is needed

3. Error Handling: Silent Failure (Severity: Medium)

Location: anchor/network/src/handshake/mod.rs:111

let _ = self.inner.send_response(channel, self.node_info.clone());

The response sending can fail, but errors are silently discarded. While this may be acceptable for handshake responses (the peer will timeout), it makes debugging difficult.

Recommendation:

if let Err(e) = self.inner.send_response(channel, self.node_info.clone()) {
    trace!(?peer_id, ?e, "Failed to send handshake response");
}

4. Potential Panic in Production Code (Severity: High)

Location: anchor/network/src/handshake/mod.rs:260

other => {
    return Poll::Ready(
        other.map_out(|_| unreachable!("GenerateEvent already handled")),
    );
}

While the comment explains this should be unreachable, using unreachable!() will panic if the assumption is violated (e.g., due to future libp2p changes).

Recommendation: Consider a more defensive approach:

other => {
    trace!("Bubbling up non-GenerateEvent ToSwarm event");
    return Poll::Ready(
        other.map_out(|_| {
            // This closure should never be called because we've handled all GenerateEvent variants.
            // If it does get called, it indicates a programming error or unexpected libp2p behavior.
            panic!("Unexpected: map_out closure called on non-GenerateEvent. This is a bug.")
        }),
    );
}

Or use a custom error type instead of panicking.

Code Quality Issues

5. Let Chains Syntax (Severity: Info)

Location: anchor/network/src/handshake/mod.rs:133-136

if let libp2p::swarm::FromSwarm::ConnectionEstablished(conn_est) = event
    && let libp2p::core::ConnectedPoint::Dialer { .. } = conn_est.endpoint
    && conn_est.other_established == 0

Let chains require Rust 1.64+. While likely fine for this project, consider documenting minimum Rust version in Cargo.toml if not already present.

6. Clone Performance (Severity: Low)

Location: Multiple locations in handshake/mod.rs

self.inner.send_request(peer_id, self.node_info.clone());

NodeInfo is cloned on every handshake request/response. While NodeInfo is relatively small, consider if Arc<NodeInfo> would be more efficient, especially under high connection load.

Analysis:

NodeInfo contains String fields and an Option<NodeMetadata> with more Strings
Each clone involves multiple heap allocations
Under heavy connection churn, this could add up

Recommendation: Profile first, optimize only if this shows up as a bottleneck.

7. Test Quality Improvement Suggestion

Location: anchor/network/src/handshake/mod.rs:402-408

The concurrent dials test verifies no duplicate handshakes by expecting a timeout on the second drive() call. This is good, but the 100ms timeout is arbitrary and could cause flakiness on slow CI systems.

Recommendation:

// Instead of timing out, explicitly check the event queue is empty
// Or use a shorter timeout (10-50ms) with a clear comment about why
let result = timeout(Duration::from_millis(50), async {
    // This should timeout - no more events should occur
    let ([_local], [_remote]): ([Event; 1], [Event; 1]) =
        drive(&mut local_swarm, &mut remote_swarm).await;
}).await;

assert!(result.is_err(), 
    "Expected no more handshake events (timeout), but got additional events. \
     This indicates duplicate handshakes occurred, violating the other_established==0 check.");

Security Considerations

8. Network Mismatch Handling (Severity: Info)

Location: anchor/network/src/handshake/mod.rs:144-152

The network mismatch detection is good. The code properly rejects connections from peers on different networks. No issues found.

9. DoS Vector: Resource Exhaustion (Severity: Low)

Location: anchor/network/src/handshake/mod.rs:84

events: VecDeque::new(),

Events are buffered in a VecDeque with no size limit. A malicious peer rapidly initiating connections could theoretically cause memory growth.

Analysis:

Mitigated by connection limits in ConnectionManager
Events are consumed in poll() on each event loop iteration
Unlikely to be a real issue in practice

Recommendation: Document the assumption that connection limits prevent unbounded growth, or add a maximum queue size.

Performance Considerations

10. Idle Connection Timeout (Severity: Info)

Location: anchor/network/src/network.rs:754

.with_idle_connection_timeout(Duration::from_secs(30));

Good improvement. The 30-second timeout provides time for handshake completion while still cleaning up idle connections. The comment explains the rationale well.

11. Dial Condition Change (Severity: Important)

Location: anchor/network/src/peer_manager/discovery.rs:216

- .condition(PeerCondition::NotDialing)
+ .condition(PeerCondition::DisconnectedAndNotDialing)

This is a significant behavior change:

Before (NotDialing): Could dial already-connected peers (creating multiple connections)
After (DisconnectedAndNotDialing): Only dial if both disconnected AND not currently dialing

Impact: This prevents creating duplicate connections from concurrent dials, which complements the other_established == 0 check.

Recommendation: This change should be highlighted in the PR description as it's a critical part of preventing the race condition.

Testing

12. Test Improvements (Severity: Positive)

Location: anchor/network/src/handshake/mod.rs:276-436

Excellent test improvements:

✅ Reduced from 611 to 397 lines (35% reduction)
✅ All tests use drive() for declarative testing
✅ Helper functions eliminate duplication (assert_completed, assert_network_mismatch)
✅ New test for concurrent dials (concurrent_dials_only_one_handshake)
✅ Tests no longer call internal functions - fully black-box

Positive feedback: The test quality is significantly improved.

13. Test Coverage Gap (Severity: Medium)

Location: Test suite

Missing test cases:

Rapid connection/disconnection: Does other_established handle reconnection properly?
Multiple simultaneous dials: What if 3+ connection attempts happen simultaneously?
Handshake timeout: No test for handshake timeout scenarios
Connection closed during handshake: What happens if connection drops mid-handshake?

Recommendation: Consider adding these edge case tests, especially for rapid reconnection scenarios.

Documentation

14. Documentation Quality (Severity: Positive)

Location: anchor/network/src/handshake/mod.rs

Good documentation:

✅ Clear doc comments on public items
✅ Explanation of other_established == 0 check
✅ Comments explain the asymmetric dialer-initiates pattern

15. PR Description Clarity (Severity: Info)

The PR description is excellent:

✅ Explains the evolution from Identify-based to self-contained approach
✅ Provides evidence (concurrent dials test)
✅ Explains architecture rationale

Minor suggestion: Add a migration guide for any downstream code using the old handle_event() API.

Architecture Assessment

16. Self-Contained Behavior Pattern (Severity: Positive)

The refactor to make the handshake behavior self-contained is architecturally sound:

✅ Follows rust-libp2p idioms (composable, independent behaviors)
✅ No coupling to Identify protocol
✅ Clean separation of concerns
✅ Events are semantic (Completed/Failed) rather than low-level (RequestResponse events)

This aligns with how Lighthouse and other production rust-libp2p implementations structure their code.

17. Event Processing in `poll()` (Severity: Good)

The custom poll() implementation that translates RequestResponse events to semantic events is well-designed:

✅ Properly handles all event variants
✅ Queues events for emission
✅ Bubbles up non-GenerateEvent ToSwarm events

Minor Issues

18. Unused Import (Severity: Trivial)

The PR already fixed this (debug import removed from handshake/mod.rs).

19. Logging Consistency (Severity: Low)

Location: anchor/network/src/network.rs:637

The handshake failure logging uses debug! level, which is appropriate. However, consider adding a counter metric for monitoring production handshake success/failure rates.

Note: The code already has metrics in handle_handshake_result() - good!

Recommendations Summary

Must Fix (Before Merge):

Replace unreachable!() with safer error handling (Issue Correct CI #4)
Add trace logging for handshake response send failures (Issue Remove sigp-crates dependency #3)
Clarify PR positioning: refactor vs bug fix (Issue Draft a Testing Framework #1)

Should Fix:
4. Document breaking API changes in PR description (Issue #2)
5. Add test for rapid reconnection scenarios (Issue #13)
6. Highlight dial condition change in PR description (Issue #11)

Nice to Have:
7. Consider Arc<NodeInfo> for clone efficiency (Issue #6)
8. Add event queue size limit documentation (Issue #9)
9. Add migration guide for old API (Issue #15)

Conclusion

This is a well-executed refactor that improves code organization, test quality, and follows rust-libp2p best practices. The self-contained behavior pattern is architecturally sound.

Main concern: The necessity of the race condition fix needs clarification. If this is primarily a refactor (with race prevention as a secondary benefit), that should be clearly stated.

Code quality: High. The implementation is clean, well-tested, and follows Rust idioms.

Recommendation: Approve with requested changes to address the critical issues (unreachable panic, error handling, PR description clarity).

dknopik

two nits, otherwise good!

anchor/network/src/network.rs

Remove redundant Completed and Failed structs that duplicated Event variant data. Pass Event directly to handlers instead of wrapping in Result. This eliminates unnecessary type conversions and follows idiomatic libp2p patterns for NetworkBehaviour events.

Re-add structured logging for connection errors that was removed in a previous refactoring. These errors are distinct from routine connection events: - OutgoingConnectionError: Failed to establish outgoing connection - IncomingConnectionError: Failed to accept incoming connection - ConnectionClosed with cause: Established connection failed at runtime Log errors at debug level and clean closures at trace level following libp2p's own logging patterns.

Replace silent failure handling with trace-level logging when handshake response cannot be sent. This occurs when the response channel is already closed (peer disconnected), which is not an error condition but useful for debugging handshake flows. Uses trace level as this is an expected race condition that doesn't require operator attention.

dknopik

LGTM, thanks!

diegomrsantos self-assigned this Oct 1, 2025

diegomrsantos added bug Something isn't working network v1.0.0 First Mainnet-release labels Oct 1, 2025

diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from 33e1d3d to 66a2b48 Compare October 1, 2025 13:00

diegomrsantos requested a review from Copilot October 1, 2025 18:20

Copilot AI reviewed Oct 1, 2025

View reviewed changes

anchor/network/src/network.rs Outdated Show resolved Hide resolved

anchor/network/src/network.rs Show resolved Hide resolved

diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from d6194c9 to eac0a0e Compare October 1, 2025 19:47

diegomrsantos requested review from AgeManning, jxs and petarjuki7 October 1, 2025 23:27

diegomrsantos marked this pull request as ready for review October 1, 2025 23:28

AgeManning reviewed Oct 2, 2025

View reviewed changes

diegomrsantos requested a review from AgeManning October 2, 2025 19:15

diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from 2208363 to c141a85 Compare October 2, 2025 19:40

diegomrsantos requested a review from Copilot October 2, 2025 19:41

Copilot AI reviewed Oct 2, 2025

View reviewed changes

anchor/network/src/handshake/mod.rs Show resolved Hide resolved

anchor/network/src/handshake/mod.rs Outdated Show resolved Hide resolved

anchor/network/src/network.rs Outdated Show resolved Hide resolved

diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from 3c4376a to 166809c Compare October 2, 2025 21:12

diegomrsantos requested a review from Copilot October 2, 2025 21:35

diegomrsantos removed the v1.0.0 First Mainnet-release label Oct 2, 2025

diegomrsantos changed the base branch from release-v1.0.0 to unstable October 2, 2025 21:36

diegomrsantos changed the base branch from unstable to release-v1.0.0 October 2, 2025 21:36

diegomrsantos added the do-not-merge label Oct 2, 2025

Copilot AI reviewed Oct 2, 2025

View reviewed changes

anchor/network/src/network.rs Outdated Show resolved Hide resolved

anchor/network/src/network.rs Show resolved Hide resolved

anchor/network/src/handshake/mod.rs Show resolved Hide resolved

anchor/network/src/handshake/mod.rs Show resolved Hide resolved

diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from 8be966d to b7c427c Compare October 6, 2025 09:41

diegomrsantos changed the base branch from release-v1.0.0 to unstable October 6, 2025 09:42

diegomrsantos added 4 commits October 6, 2025 11:43

refactor: remove logging-only SwarmEvent handlers and fix SwarmBuilder

194c116

mergify bot removed the waiting-on-author label Oct 15, 2025

diegomrsantos added 4 commits October 16, 2025 14:49

change log level to debug

df10f13

Merge branch 'unstable' into fix/handshake-connection-closed-race

c13b6a0

Merge branch 'unstable' into fix/handshake-connection-closed-race

2b9af25

docs: update idle_connection_timeout comment to remove outdated Ident…

15edef7

…ify reference

diegomrsantos added ready-for-review This PR is ready to be reviewed and removed ready-for-review This PR is ready to be reviewed labels Oct 17, 2025

Merge branch 'unstable' into fix/handshake-connection-closed-race

3cac64b

diegomrsantos added ready-for-review This PR is ready to be reviewed and removed ready-for-review This PR is ready to be reviewed labels Oct 17, 2025

Merge branch 'unstable' into fix/handshake-connection-closed-race

3b4e2a3

diegomrsantos marked this pull request as draft October 23, 2025 09:51

diegomrsantos marked this pull request as ready for review October 23, 2025 09:51

Merge branch 'unstable' into fix/handshake-connection-closed-race

fefa57b

dknopik reviewed Nov 3, 2025

View reviewed changes

anchor/network/src/network.rs Outdated Show resolved Hide resolved

anchor/network/src/network.rs Show resolved Hide resolved

diegomrsantos added 4 commits November 3, 2025 14:02

Merge branch 'unstable' into fix/handshake-connection-closed-race

b7fd2d7

diegomrsantos requested a review from dknopik November 3, 2025 14:36

dknopik approved these changes Nov 3, 2025

View reviewed changes

dknopik added ready-for-merge and removed ready-for-review This PR is ready to be reviewed labels Nov 3, 2025

mergify bot merged commit e2475f4 into sigp:unstable Nov 3, 2025
17 checks passed

diegomrsantos deleted the fix/handshake-connection-closed-race branch November 3, 2025 16:21

Comments

Conversation

diegomrsantos commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

Observed Symptoms

Proposed Changes

Evolution of Approach

Evidence

Implementation Details

Additional Info

Architecture

Compatibility

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

AgeManning left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dknopik commented Oct 16, 2025

Uh oh!

diegomrsantos commented Oct 16, 2025

Uh oh!

claude-code-actions-sigp bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Summary

Critical Issues

1. Race Condition Fix Necessity (Severity: Discussion Needed)

2. Breaking Change in Public API (Severity: High)

3. Error Handling: Silent Failure (Severity: Medium)

4. Potential Panic in Production Code (Severity: High)

Code Quality Issues

5. Let Chains Syntax (Severity: Info)

6. Clone Performance (Severity: Low)

7. Test Quality Improvement Suggestion

Security Considerations

8. Network Mismatch Handling (Severity: Info)

9. DoS Vector: Resource Exhaustion (Severity: Low)

Performance Considerations

10. Idle Connection Timeout (Severity: Info)

11. Dial Condition Change (Severity: Important)

Testing

12. Test Improvements (Severity: Positive)

13. Test Coverage Gap (Severity: Medium)

Documentation

14. Documentation Quality (Severity: Positive)

15. PR Description Clarity (Severity: Info)

Architecture Assessment

16. Self-Contained Behavior Pattern (Severity: Positive)

17. Event Processing in poll() (Severity: Good)

Minor Issues

18. Unused Import (Severity: Trivial)

19. Logging Consistency (Severity: Low)

Recommendations Summary

Conclusion

Uh oh!

dknopik left a comment

Choose a reason for hiding this comment

Uh oh!

diegomrsantos commented Oct 1, 2025 •

edited

Loading

claude-code-actions-sigp bot commented Oct 23, 2025 •

edited

Loading

17. Event Processing in `poll()` (Severity: Good)