Skip to content

Comments

fix: Disconnect peers on handshake failure#441

Merged
mergify[bot] merged 1 commit intosigp:release-v0.3.0from
diegomrsantos:improve-handshake-test
Aug 26, 2025
Merged

fix: Disconnect peers on handshake failure#441
mergify[bot] merged 1 commit intosigp:release-v0.3.0from
diegomrsantos:improve-handshake-test

Conversation

@diegomrsantos
Copy link
Member

@diegomrsantos diegomrsantos commented Jul 18, 2025

Issue Addressed

#438

Proposed Changes

  • Add explicit peer disconnection: When a handshake fails (e.g., due to network mismatch), the network layer now explicitly disconnects the peer via disconnect_peer_id()
  • Improve disconnection timing: Instead of relying on keep-alive timeouts (~10 seconds), peers are disconnected immediately upon handshake failure
  • Preserve existing test: The handshake test continues to validate failure detection; disconnection behavior will be naturally tested through integration tests that use the full network stack
  • Add testing best practices: Updated CLAUDE.md with guidance about not simulating production behavior in tests

Technical Details

The change is made in Network::handle_handshake_result() where handshake failures are processed. When a handshake fails, we now call self.swarm.disconnect_peer_id(peer_id) to immediately disconnect the peer, rather than waiting for keep-alive timeouts.

This ensures faster and more deterministic peer disconnection on handshake failures, improving network behavior and reducing connection overhead.

@diegomrsantos diegomrsantos requested review from Copilot and dknopik and removed request for dknopik July 18, 2025 12:14
@diegomrsantos diegomrsantos self-assigned this Jul 18, 2025
@diegomrsantos diegomrsantos requested a review from dknopik July 18, 2025 12:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the mismatched_networks_handshake_failed test by adding verification that peers are properly disconnected when handshake fails due to network mismatch. The existing disconnection functionality was already working correctly - this change adds test coverage to verify the behavior.

  • Adds tracking of connection close events for both local and remote peers
  • Extends the test loop condition to wait for both handshake failures and disconnections
  • Adds explicit assertions to verify both handshake failures and disconnections occurred

@diegomrsantos diegomrsantos added the ready-for-review This PR is ready to be reviewed label Jul 18, 2025
@dknopik
Copy link
Member

dknopik commented Jul 21, 2025

I think this works because at some point there is a keepalive timeout as nothing is being sent anymore:

2025-07-21T09:33:38.548712Z DEBUG Swarm::poll: libp2p_swarm: Connection closed with error KeepAliveTimeout: Connected { endpoint: Listener { local_addr: /memory/4092149446297013635, send_back_addr: /memory/12899751130364311786 }, peer_id: PeerId("12D3KooWB1vGoK219R7Dnw7YUi4EcViLPWqjanY8F9P4TF27xL7u") } total_peers=0
2025-07-21T09:33:38.548789Z TRACE libp2p_swarm_test: event=ConnectionClosed { peer_id: PeerId("12D3KooWB1vGoK219R7Dnw7YUi4EcViLPWqjanY8F9P4TF27xL7u"), connection_id: ConnectionId(2), endpoint: Listener { local_addr: /memory/4092149446297013635, send_back_addr: /memory/12899751130364311786 }, num_established: 0, cause: Some(KeepAliveTimeout) }
2025-07-21T09:33:38.548814Z TRACE async_io::block_on: async_io::driver: sleep until notification

The test takes 10 seconds.

So I think #438 is actually valid.

We should modify this test to ensure the disconnection reason is correct (if possible), or ensure that the disconnection is soon after handshake failure.

@dknopik dknopik added waiting-on-author and removed ready-for-review This PR is ready to be reviewed labels Aug 19, 2025
@diegomrsantos diegomrsantos changed the title test: Improve handshake test fix: Disconnect peers on handshake failure Aug 25, 2025
@diegomrsantos diegomrsantos force-pushed the improve-handshake-test branch from 870bad0 to 23c281a Compare August 25, 2025 13:41
@mergify mergify bot added ready-for-review This PR is ready to be reviewed and removed waiting-on-author labels Aug 25, 2025
@dknopik dknopik added the v1.0.0 First Mainnet-release label Aug 26, 2025
dknopik
dknopik previously approved these changes Aug 26, 2025
Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

@dknopik dknopik changed the base branch from unstable to release-v0.3.0 August 26, 2025 06:55
@dknopik dknopik dismissed their stale review August 26, 2025 06:55

The base branch was changed.

@dknopik
Copy link
Member

dknopik commented Aug 26, 2025

Sorry, can you please rebase this from unstable to release-v0.3.0?

@diegomrsantos
Copy link
Member Author

It seems to be good to be merged into release-v0.3.0

@diegomrsantos diegomrsantos force-pushed the improve-handshake-test branch from c699312 to 46cdf05 Compare August 26, 2025 09:58
@diegomrsantos diegomrsantos requested a review from dknopik August 26, 2025 09:59
@dknopik
Copy link
Member

dknopik commented Aug 26, 2025

Currently, other changes are in this PR: 8f9cc96 and 17b8f05

- Add explicit peer disconnection in network layer when handshake fails
- Update CLAUDE.md with testing best practices about not simulating production behavior in tests
- The existing handshake test validates failure detection; disconnection behavior will be tested through integration tests that use the full network stack
@diegomrsantos diegomrsantos force-pushed the improve-handshake-test branch from 46cdf05 to 8a3dd50 Compare August 26, 2025 13:49
@diegomrsantos
Copy link
Member Author

Currently, other changes are in this PR: 8f9cc96 and 17b8f05

done

Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@dknopik dknopik added ready-for-merge and removed ready-for-review This PR is ready to be reviewed labels Aug 26, 2025
@mergify mergify bot merged commit 32e5130 into sigp:release-v0.3.0 Aug 26, 2025
15 checks passed
diegomrsantos added a commit to diegomrsantos/anchor that referenced this pull request Sep 17, 2025
sigp#438


  - **Add explicit peer disconnection**: When a handshake fails (e.g., due to network mismatch), the network layer now explicitly disconnects the peer via `disconnect_peer_id()`
- **Improve disconnection timing**: Instead of relying on keep-alive timeouts (~10 seconds), peers are disconnected immediately upon handshake failure
- **Preserve existing test**: The handshake test continues to validate failure detection; disconnection behavior will be naturally tested through integration tests that use the full network stack
- **Add testing best practices**: Updated CLAUDE.md with guidance about not simulating production behavior in tests
jking-aus pushed a commit to jking-aus/anchor that referenced this pull request Oct 8, 2025
sigp#438


  - **Add explicit peer disconnection**: When a handshake fails (e.g., due to network mismatch), the network layer now explicitly disconnects the peer via `disconnect_peer_id()`
- **Improve disconnection timing**: Instead of relying on keep-alive timeouts (~10 seconds), peers are disconnected immediately upon handshake failure
- **Preserve existing test**: The handshake test continues to validate failure detection; disconnection behavior will be naturally tested through integration tests that use the full network stack
- **Add testing best practices**: Updated CLAUDE.md with guidance about not simulating production behavior in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge v1.0.0 First Mainnet-release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants