Skip to content

Comments

Sync fixes for fusaka-devnet-4#7876

Closed
jimmygchen wants to merge 16 commits intosigp:unstablefrom
jimmygchen:fusaka-devnet-4-revive
Closed

Sync fixes for fusaka-devnet-4#7876
jimmygchen wants to merge 16 commits intosigp:unstablefrom
jimmygchen:fusaka-devnet-4-revive

Conversation

@jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Aug 14, 2025

Issue Addressed

@jimmygchen jimmygchen requested a review from jxs as a code owner August 14, 2025 01:17
/// For non sampling columns, requirements is lower, but need to ensure there is peer for
/// publishing.
pub const MIN_SAMPLING_COLUMN_SUBNET_PEERS: u64 = TARGET_SUBNET_PEERS as u64;
pub const MIN_NON_SAMPLING_COLUMN_SUBNET_PEERS: u64 = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need 1 peer and not 0 in this subnets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be useful to have at least one peer per column in the case where we're publishing data columns.

@dapplion
Copy link
Collaborator

Did a first review pass and approach looks solid, just some minor comments

err
),
peer_action: Some(PeerAction::LowToleranceError),
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawanjay176 do we need this change?

This may be due to optimistic EL right?

// An honest optimistic node may propagate blocks which are rejected by an EE, do not
// penalize them.
ExecutionPayloadError::RejectedByExecutionEngine { .. } => false,

.filter(move |(_, info)| {
// We check both the metadata and gossipsub data as we only want to count long-lived subscribed peers
info.is_connected()
&& info.is_synced_or_advanced()
Copy link
Member Author

@jimmygchen jimmygchen Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to think again about this change - this is used to determine if we have enough good peers when starting discovery query:

// Already have target number of peers, no need for subnet discovery
let peers_on_subnet = self
.network_globals
.peers
.read()
.good_peers_on_subnet(s.subnet)
.count();
if peers_on_subnet >= TARGET_SUBNET_PEERS {
trace!(
subnet = ?s.subnet,
reason = "Already connected to desired peers",

So the condition to push a PeerManagerEvent::DiscoverSubnetPeers below must also satisfy this

fn maintain_custody_peers(&mut self) {
let subnets_to_discover: Vec<SubnetDiscovery> = self
.network_globals
.sampling_subnets()
.iter()
.filter_map(|custody_subnet| {
if self
.network_globals
.peers
.read()
.has_good_peers_in_custody_subnet(
custody_subnet,
MIN_SAMPLING_COLUMN_SUBNET_PEERS as usize,
)
{
None
} else {
Some(SubnetDiscovery {
subnet: Subnet::DataColumn(*custody_subnet),
min_ttl: None,
})
}
})
.collect();

Otherwise it might result in pushing a DiscoverSubnetPeers event to network but then it gets ignored by network.

@jimmygchen jimmygchen added the ready-for-review The code is ready for review label Aug 15, 2025
@mergify
Copy link

mergify bot commented Aug 15, 2025

Some required checks have failed. Could you please take a look @jimmygchen? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Aug 15, 2025
@jimmygchen
Copy link
Member Author

@pawanjay176 I've done a bit of clean up, but there's still a bit of mess around maintaining peers & discovery:

#7876 (comment)

I'll revisit next week.

@jimmygchen jimmygchen self-assigned this Aug 18, 2025
jimmygchen added a commit that referenced this pull request Aug 18, 2025
Squashed commit of the following:

commit d8b6a99
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Mon Aug 18 23:48:28 2025 +1000

    Remove unnecessary test

commit 832d862
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Fri Aug 15 16:22:48 2025 +1000

    Clippy

commit 0437243
Merge: a250daa 317dc0f
Author: Jimmy Chen <jimmy@sigmaprime.io>
Date:   Fri Aug 15 16:22:30 2025 +1000

    Merge branch 'unstable' into fusaka-devnet-4-revive

commit a250daa
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Fri Aug 15 16:02:05 2025 +1000

    Clean up and refactor custody peer checks

commit ec7956b
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Fri Aug 15 14:40:06 2025 +1000

    Revert "Increase target peer count to 200."

    This reverts commit 5f38f31.

commit cdc417f
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Thu Aug 14 23:39:10 2025 +1000

    Prevent sampling subnets from getting pruned if below min target threshold.

commit 9ce9318
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Thu Aug 14 17:39:57 2025 +1000

    Check for sync status for custody subnet discovery

commit 1b8f0e4
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Thu Aug 14 17:30:30 2025 +1000

    Fix tests.

commit 1465d72
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Thu Aug 14 11:47:33 2025 +1000

    Only send lookup requests to peers that are synced or advacned.

commit 5f38f31
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Thu Aug 14 11:30:04 2025 +1000

    Increase target peer count to 200.

commit 56ca5bf
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Thu Aug 14 11:14:20 2025 +1000

    Remove penalties for block sidecar coupling.

commit 8e95a54
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Thu Aug 14 11:07:50 2025 +1000

    Revert "Reduce number of head chains we sync to"

    This reverts commit 1f776a4

commit 0144205
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Wed Aug 13 17:33:47 2025 -0700

    Request columns from global peer pool

commit a8f6801
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Wed Aug 13 16:37:12 2025 -0700

    Priorotize status v2

commit c8bbe47
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Wed Aug 13 16:36:56 2025 -0700

    Penalize if invalid EL block

commit 1f776a4
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Wed Aug 13 16:36:13 2025 -0700

    Reduce number of head chains we sync to
mergify bot pushed a commit that referenced this pull request Aug 21, 2025
Prioritise `StatusV2` over `StatusV1` RPC protocol.

A bug discovered during devnet-4 testing and extracted from the sync fixes PR #7876.
mergify bot pushed a commit that referenced this pull request Sep 4, 2025
Closes:
- #7865
- #7855

Changes extracted from earlier PR #7876

This PR fixes two main things with a few other improvements mentioned below:
- Prevent Lighthouse from repeatedly sending `DataColumnByRoot` requests to an unsynced peer, causing lookup sync to get stuck
- Allows Lighthouse to send discovery requests if there isn't enough **synced** peers in the required sampling subnets - this fixes the stuck sync scenario where there isn't enough usable peers in sampling subnet but no discovery is attempted.


  - Make peer discovery queries if custody subnet peer count drops below the minimum threshold
- Update peer pruning logic to prioritise uniform distribution across all data column subnets and avoid pruning sampling peers if the count is below the target threshold (2)
- Check sync status when making discovery requests, to make sure we don't ignore requests if there isn't enough synced peers in the required sampling subnets
- Optimise some of the `PeerDB` functions checking custody peers
- Only send lookup requests to peers that are synced or advanced
@jimmygchen
Copy link
Member Author

Covered in #7912, #7915 and #7946

@jimmygchen jimmygchen closed this Sep 5, 2025
kevaundray pushed a commit to kevaundray/lighthouse that referenced this pull request Sep 13, 2025
Closes:
- sigp#7865
- sigp#7855

Changes extracted from earlier PR sigp#7876

This PR fixes two main things with a few other improvements mentioned below:
- Prevent Lighthouse from repeatedly sending `DataColumnByRoot` requests to an unsynced peer, causing lookup sync to get stuck
- Allows Lighthouse to send discovery requests if there isn't enough **synced** peers in the required sampling subnets - this fixes the stuck sync scenario where there isn't enough usable peers in sampling subnet but no discovery is attempted.

  - Make peer discovery queries if custody subnet peer count drops below the minimum threshold
- Update peer pruning logic to prioritise uniform distribution across all data column subnets and avoid pruning sampling peers if the count is below the target threshold (2)
- Check sync status when making discovery requests, to make sure we don't ignore requests if there isn't enough synced peers in the required sampling subnets
- Optimise some of the `PeerDB` functions checking custody peers
- Only send lookup requests to peers that are synced or advanced
jtraglia pushed a commit to jtraglia/lighthouse that referenced this pull request Sep 16, 2025
Closes:
- sigp#7865
- sigp#7855

Changes extracted from earlier PR sigp#7876

This PR fixes two main things with a few other improvements mentioned below:
- Prevent Lighthouse from repeatedly sending `DataColumnByRoot` requests to an unsynced peer, causing lookup sync to get stuck
- Allows Lighthouse to send discovery requests if there isn't enough **synced** peers in the required sampling subnets - this fixes the stuck sync scenario where there isn't enough usable peers in sampling subnet but no discovery is attempted.


  - Make peer discovery queries if custody subnet peer count drops below the minimum threshold
- Update peer pruning logic to prioritise uniform distribution across all data column subnets and avoid pruning sampling peers if the count is below the target threshold (2)
- Check sync status when making discovery requests, to make sure we don't ignore requests if there isn't enough synced peers in the required sampling subnets
- Optimise some of the `PeerDB` functions checking custody peers
- Only send lookup requests to peers that are synced or advanced
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge waiting-on-author The reviewer has suggested changes and awaits thier implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants