Skip to content

Comments

fix: implement periodic peer discovery without connection failure tracking#650

Merged
mergify[bot] merged 13 commits intosigp:release-v1.0.0from
diegomrsantos:fix/periodic-discovery-without-blacklist
Oct 7, 2025
Merged

fix: implement periodic peer discovery without connection failure tracking#650
mergify[bot] merged 13 commits intosigp:release-v1.0.0from
diegomrsantos:fix/periodic-discovery-without-blacklist

Conversation

@diegomrsantos
Copy link
Member

@diegomrsantos diegomrsantos commented Oct 6, 2025

Issue Addressed

Fixes peer count plateau and connection denial issues caused by generic discovery attempting to connect to peers without needed subnets.

Proposed Changes

Subnet-Aware Periodic Discovery

  • Trigger discovery every 30s when peer count is below target
  • Uses subnet filtering via start_subnet_query() instead of generic discover_peers()
  • Only discovers and dials peers that offer our needed subnets
  • Prevents MissingNeededSubnets connection denials

Connection Metrics

  • Track inbound vs outbound connection counts
  • Provides visibility into connection direction distribution
  • Helps monitor peer connectivity patterns

Logging Improvements

  • Connection errors logged at debug level (normal P2P behavior)
  • Reduces log noise while maintaining debugging capability
  • Increased log retention (5 files vs 1) for better historical tracking

Root Cause Analysis

The original PR #649 added generic discover_peers() which found ANY peers on the network:

  1. Discovery finds peer with subnets [1, 2, 3]
  2. We need subnets [5, 6, 7]
  3. We dial the peer
  4. Connection attempt succeeds
  5. Connection manager checks if peer offers needed subnets
  6. Connection denied with MissingNeededSubnets error
  7. Wasted connection attempt, cycle repeats

Solution

Use subnet-aware discovery that filters ENRs before dialing:

  1. Discovery filters by our needed subnets [5, 6, 7]
  2. Only peers offering these subnets are discovered
  3. We dial filtered peers
  4. Connection manager checks subnets
  5. Connection accepted - peer offers what we need
  6. Peer count grows successfully

Additional Info

This PR intentionally excludes the connection failure blacklist mechanism from PR #649 to evaluate a simpler approach.

Two complementary discovery mechanisms:

  1. Per-subnet discovery (existing check_subnet_peers): Ensures minimum 6 peers per subnet
  2. Overall discovery (new, subnet-aware): Helps reach target peer count by discovering for ALL needed subnets

Comparison with PR #649:

  • ✅ Includes: Periodic subnet-aware discovery, connection tracking, logging improvements
  • ❌ Excludes: Connection failure tracking, peer blacklisting, memory-bounded failure records
  • ✅ Fixes: Connection denial bug from generic discovery

Advantages:

  • Simpler implementation with less state management
  • No memory overhead for failure tracking
  • Avoids potential issues with false-positive blacklisting
  • Prevents wasted connection attempts to unsuitable peers
  • Focus on proactive subnet-filtered discovery

Trade-offs:

  • No automatic avoidance of persistently failing peers
  • Relies on discovery to find working peers with correct subnets

diegomrsantos and others added 11 commits October 4, 2025 10:28
Increases max_keep_files for libp2p.log and discv5.log from 1 to 5.
This provides better debugging history without excessive disk usage
(~500MB max for both logs combined at 100MB each).
- Add clarifying comments for ENR subnet state management
- Upgrade log level from debug to info for subnet updates (important state changes)
- Document that subnet bitfield is populated dynamically via gossipsub subscriptions
Logs the subnets bitfield when initiating handshakes to help debug
subnet-based peering issues.
Adds intelligent failure tracking to prevent repeated connection attempts to
persistently failing peers while avoiding memory leaks.

Features:
- Blacklist peers after 5 consecutive failures
- 1-hour expiration for automatic retry
- Hard limit of 1000 records (defense-in-depth)
- Periodic cleanup in heartbeat
- Auto-clear on successful connection

Tested over 2 hours: Connection errors reduced 97% (2,358 → ~60)
…d peers

Implements custom ENR to multiaddr conversion that only returns TCP and QUIC
addresses that libp2p can actually use, avoiding wasted connection attempts.

Changes:
- Add enr_to_multiaddrs() function for TCP/QUIC extraction
- Skip plain UDP addresses (not usable by libp2p)
- Skip blacklisted peers in subnet peer selection
- Properly handle both IPv4 and IPv6 addresses

This reduces unnecessary connection attempts and respects the failure tracking.
Exposes connection state and failure tracking methods needed by the network
event loop, and integrates periodic cleanup in the heartbeat.

Changes:
- Add connected_peers() accessor
- Add record_connection_failure() method
- Call cleanup_expired_failures() in heartbeat (every 30s)

This completes the peer management infrastructure for periodic discovery.
…dling

Fixes peer count plateau by triggering discovery every 30s when below target.
Previously discovery only ran once at startup, limiting peer growth to 12-14.

Changes:
- Add periodic discovery check in heartbeat (every 30s)
- Log connection errors at debug level (normal P2P behavior)
- Record connection failures for blacklist system
- Simplify subnet metadata update with match statement
- Add structured logging for subnet bitfield changes

Tested over 2 hours: 4→30 peers (650% increase), connection errors reduced 97%
Addresses Copilot review feedback - replace inline blacklist check with
dedicated method to improve code maintainability and avoid duplication.
Track and log inbound vs outbound peer connections by:
- Adding counters for inbound_count and outbound_count
- Using endpoint.is_dialer() from ConnectionEstablished/Closed events
- Displaying counts in Network status heartbeat logs

This provides better visibility into connection patterns without
needing to maintain additional HashMaps of peer IDs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace custom enr_to_multiaddrs() with EnrExt trait methods to get
only TCP and QUIC addresses (excludes plain UDP).
…cking

Implements periodic peer discovery to maintain target peer count without
the connection failure blacklist mechanism.

Core changes:
- Trigger peer discovery every 30s when below target peer count
- Track inbound vs outbound connection counts for metrics
- Log connection errors at debug level (normal P2P behavior)

This approach focuses on proactive peer discovery rather than reactive
blacklisting of failing peers.
@diegomrsantos diegomrsantos changed the base branch from stable to release-v1.0.0 October 6, 2025 14:23
@diegomrsantos diegomrsantos self-assigned this Oct 6, 2025
@diegomrsantos diegomrsantos added bug Something isn't working network labels Oct 6, 2025
diegomrsantos and others added 2 commits October 7, 2025 10:29
Changes periodic discovery from generic peer discovery to subnet-aware
discovery, preventing connections from being denied due to MissingNeededSubnets.

Root cause:
- Generic discover_peers() found ANY peers on the network
- We dialed peers that didn't offer our needed subnets
- Connection was accepted initially but then denied with MissingNeededSubnets
- This created a cycle of discovery -> dial -> deny

Solution:
- Use start_subnet_query() instead to filter by needed subnets
- Only discover and dial peers that offer subnets we actually need
- Aligns with existing subnet-aware discovery used elsewhere

This complements the existing per-subnet discovery (which ensures minimum
peers per subnet) by helping reach overall target peer count through
subnet-filtered discovery of all needed subnets simultaneously.
Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!!!

@dknopik dknopik added ready-for-merge v1.0.0 First Mainnet-release labels Oct 7, 2025
@mergify mergify bot merged commit 0b9da22 into sigp:release-v1.0.0 Oct 7, 2025
15 checks passed
petarjuki7 pushed a commit to petarjuki7/anchor that referenced this pull request Oct 16, 2025
…cking (sigp#650)

Fixes peer count plateau and connection denial issues caused by generic discovery attempting to connect to peers without needed subnets.


  ### Subnet-Aware Periodic Discovery
- Trigger discovery every 30s when peer count is below target
- **Uses subnet filtering** via `start_subnet_query()` instead of generic `discover_peers()`
- Only discovers and dials peers that offer our needed subnets
- Prevents `MissingNeededSubnets` connection denials

### Connection Metrics
- Track inbound vs outbound connection counts
- Provides visibility into connection direction distribution
- Helps monitor peer connectivity patterns

### Logging Improvements
- Connection errors logged at debug level (normal P2P behavior)
- Reduces log noise while maintaining debugging capability
- Increased log retention (5 files vs 1) for better historical tracking


Co-Authored-By: diego <diego@sigmaprime.io>

Co-Authored-By: Age Manning <Age@AgeManning.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working network ready-for-merge v1.0.0 First Mainnet-release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants