Conversation
|
When adding the subnet logic, we decided to separate it into the What do you think of moving some of the logic here into the |
3b2f0d6 to
7516535
Compare
True, the current implementation needs to be changed. How about? pub enum SubnetEvent {
Join(SubnetId, Vec<CommitteeInfo>),
Leave(SubnetId),
} |
|
This would require us to send |
How about both events with the vec? But the second one maybe |
|
I am also unsure about where we should place the actual calculation of the rate - it kind of fits both crates IMO, depending how you look at it. It feels like an application specific calculation (just like the decision what subnets to subscribe to), and maybe should therefore also live in the |
Good observation. I think I'll move |
6135908 to
222eaea
Compare
|
The current state does not update the params on updated committee data, right? Also, based on the the discussion above, we have to adjust the events passed to |
222eaea to
21e561b
Compare
anchor/subnet_service/src/lib.rs
Outdated
| // For the "all subnets" case, we don't have specific committee info, so pass an empty | ||
| // vec | ||
| if let Err(err) = tx.try_send(SubnetEvent::Join(subnet, Vec::new())) { | ||
| error!(?err, "Impossible error while subscribing to all subnets"); | ||
| } |
There was a problem hiding this comment.
We can still provide information on the committee in this case.
anchor/subnet_service/src/lib.rs
Outdated
| // `previous_subnets` tracks which subnets were joined in the last iteration. | ||
| let mut previous_subnets = HashSet::new(); | ||
| // Track committee info hash for each subnet to detect changes efficiently | ||
| let mut previous_committee_hashes: HashMap<SubnetId, u64> = HashMap::new(); |
There was a problem hiding this comment.
I think these can be unified by hashing subnets, not committees.
But regardless, can we not just calculate the score and compare if the old score is different from the new score?
There was a problem hiding this comment.
you guys probably know better than me, but I think this approach maintains cleaner boundaries: subnet service handles membership, network handles scoring. Wouldn't hashing the subnet mean we cant distinguish joining and committee changes? I dont mind either way just my 0.02.
There was a problem hiding this comment.
Sorry, my comment above is incorrect, we are hashing subnets here. Still, the set and the map can be unified: After an iteration, a subnet is always either in both previous_subnets and
previous_committee_hashes, or neither. So previous_subnets does not provide value anymore, unless I am missing something.
I think this approach maintains cleaner boundaries
IMO it does not. If we have the network crate calculate the message rate, it needs access to the network state as provided by the database, which we avoided until now. This means one more thing passed into it, one more place where we may wait on a lock, and more domain specific logic (the message rate calculation algorithm) inside it. Note that by inside it I mean the code flow, not the location of the file where the message rate is being calculated.
Basically, the way I think of it is: the network crate handles scoring and subnet membership based on the parameters it is passed. It does not care about how we decide about which subnets to describe on, or how we calculate the subnet message rate, as it does not have any domain logic. We do the same approach for messages: it immediately passes them to other components to avoid any domain logic computations in the code flow of the network loop.
There was a problem hiding this comment.
if you unified the hashmap and set could you distinguish for committeeupdate events for committee information changes not just when we join/leave subnet? If we can all good. Thanks for the clarification and dont disagree - the separation as it is here is what I was referring to as a preference and dont want to add state dependencies. Apologies if I oversimplified.
anchor/subnet_service/src/lib.rs
Outdated
| /// Compute a lightweight hash of committee information to detect changes efficiently | ||
| fn compute_committee_hash(committees: &[CommitteeInfo]) -> u64 { | ||
| let mut hasher = std::collections::hash_map::DefaultHasher::new(); | ||
|
|
||
| // Hash the number of committees first | ||
| committees.len().hash(&mut hasher); | ||
|
|
||
| // Hash each committee's essential data | ||
| for committee in committees { | ||
| // Hash committee members by converting to a sorted vector | ||
| let mut members: Vec<_> = committee.committee_members.iter().collect(); | ||
| members.sort_unstable(); | ||
| members.hash(&mut hasher); | ||
|
|
||
| // Hash validator indices | ||
| committee.validator_indices.hash(&mut hasher); | ||
| } | ||
|
|
||
| hasher.finish() | ||
| } |
There was a problem hiding this comment.
It might be interesting to check if this is considerably faster than just calculating the score.
There was a problem hiding this comment.
Recalculating is faster than computing the hashes, so I'll recalculate everything at every epoch
jking-aus
left a comment
There was a problem hiding this comment.
lgtm - just address daniel's comments
dknopik
left a comment
There was a problem hiding this comment.
Very nice! Only a few nitpicks left
anchor/subnet_service/src/lib.rs
Outdated
| let current_state = db.borrow(); | ||
| for subnet in (0..(subnet_count as u64)).map(SubnetId) { | ||
| let committees_info = get_committee_info_for_subnet(&subnet, &*current_state); | ||
| let message_rate = message_rate::calculate_message_rate_for_topic::<E>( | ||
| &committees_info, | ||
| &chain_spec, | ||
| ); | ||
|
|
||
| if let Err(err) = tx.try_send(SubnetEvent::Join(subnet, message_rate)) { | ||
| error!( | ||
| ?err, | ||
| subnet = *subnet, | ||
| "Failed to send subnet join event during initialization" | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Currently, this only runs on startup - so we never update the rate when in --subscribe-all-subnets mode
There was a problem hiding this comment.
Pull Request Overview
Replaces the legacy subnet_tracker with a new subnet_service crate that tracks subnet membership and computes dynamic gossipsub message rates at epoch boundaries, and updates all downstream consumers to use the new API and pre-calculated rates.
- Introduces
subnet_servicewith dynamic message-rate computation and epoch scheduling - Updates network, client, scoring, and message-sender modules to use
subnet_serviceand pre-calculated rates - Removes the old
subnet_trackercrate and adjusts workspace dependencies
Reviewed Changes
Copilot reviewed 23 out of 25 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| anchor/subnet_tracker/src/lib.rs | Deleted old subnet_tracker implementation |
| anchor/subnet_service/src/lib.rs | Added new subnet_service with epoch-based rate updates |
| anchor/network/src/network.rs | Adapted Network to subscribe with optional rates and handle RateUpdate |
| anchor/client/src/lib.rs | Switched from start_subnet_tracker to start_subnet_service |
Comments suppressed due to low confidence (1)
anchor/subnet_service/src/lib.rs:256
- [nitpick] The
handle_epoch_committee_updatelogic emitsRateUpdateevents every epoch—this is critical dynamic behavior. Consider adding unit or integration tests for this function to verify that message rates are recalculated and sent correctly when an epoch boundary is reached.
async fn handle_epoch_committee_update<E: EthSpec>(
| ) -> Vec<CommitteeInfo> { | ||
| network_state | ||
| .clusters() | ||
| .values() | ||
| .filter(|cluster| { | ||
| let cluster_subnet = SubnetId::from_committee(cluster.committee_id(), SUBNET_COUNT); |
There was a problem hiding this comment.
In get_committee_info_for_subnet, using the constant SUBNET_COUNT can lead to inconsistencies if the service is initialized with a different subnet_count. Consider passing the runtime subnet_count into this helper instead of the fixed constant to ensure correct committee-to-subnet mapping.
| ) -> Vec<CommitteeInfo> { | |
| network_state | |
| .clusters() | |
| .values() | |
| .filter(|cluster| { | |
| let cluster_subnet = SubnetId::from_committee(cluster.committee_id(), SUBNET_COUNT); | |
| subnet_count: usize, | |
| ) -> Vec<CommitteeInfo> { | |
| network_state | |
| .clusters() | |
| .values() | |
| .filter(|cluster| { | |
| let cluster_subnet = SubnetId::from_committee(cluster.committee_id(), subnet_count); |
|
Neat! Thanks! One more round of testing over night, fingers crossed! |
The topic params are disabled in the current code |
| tokio::select! { | ||
| // Handle database changes for subnet join/leave (only if not subscribe_all_subnets) | ||
| _ = db.changed(), if !subscribe_all_subnets => { | ||
| handle_subnet_changes::<E>(&tx, &mut db, &mut previous_subnets, subnet_count, &chain_spec, disable_gossipsub_topic_scoring).await; | ||
| } | ||
|
|
||
| // Handle scheduled epoch boundaries (for both modes, but only if scoring is enabled) | ||
| _ = sleep(next_epoch_delay), if !disable_gossipsub_topic_scoring => { | ||
| handle_epoch_committee_update::<E>(&tx, &mut db, &previous_subnets, &chain_spec).await; | ||
| // Recalculate the next epoch delay only after we've processed the epoch boundary | ||
| next_epoch_delay = calculate_duration_to_next_epoch::<E>(&slot_clock); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
select panics if all branches are disabled via if, unless an else branch is provided
I suggest adding an else branch with return, as the network crate should handle this gracefully.
| /// Disables gossipsub topic scoring and message rate calculations. | ||
| pub disable_gossipsub_topic_scoring: bool, |
There was a problem hiding this comment.
Do we want to add a CLI flag to re-enable this, maybe with a note that it is experimental?
…ub_topic_scoring are true
# Conflicts: # Cargo.lock # anchor/fuzz/Cargo.toml # anchor/fuzz/fuzz_targets/setup.rs
92a4596 to
5bd856a
Compare
48c8ca4 to
5bd856a
Compare
5bd856a to
d193578
Compare
dknopik
left a comment
There was a problem hiding this comment.
LGTM, thank you for tackling this!
Thank you for the great review |
This fixes the issues arising with #406. The root cause is as follows: We send messages anonymously, setting a random author ID through `MessageAuthenticity::RandomAuthor`. Libp2p by default penalizes getting sent messages that originate from ourselves. However, as we do not track under which author we published each message, libp2p simply punishes messages that we have published, identified by message ID. The message ID depends only on the message contents. The famed "decided message" is generated after a QBFT instance completes - and as it simply is an aggregation of existing signatures and messages, it can be generated by all participants and actually ends up having identical content in many cases. Nodes generate this message at the same time and send it almost simultaneously. This causes nodes to be penalized for "self" messages, which are not actually self messages. Disable penalizing getting sent messages sent by "self" - which may actually be generated by someone else but happens to be identical.
sigp#371 This PR replaces the old `subnet_tracker` module with a new `subnet_service` that tracks subnets and calculates dynamic message rates on epoch boundaries. It updates downstream consumers (network, client, scoring) to use the new API and pre-calculated rates. - Introduces `subnet_service` crate with dynamic message-rate computation and epoch scheduling - Updates topic-scoring API to accept pre-calculated rates (`new_with_rate`, `topic_score_params_for_subnet_with_rate`) - Refactors all crates (network, client, message_sender, fuzz, peer_manager, handshake, discovery) to depend on `subnet_service` instead of `subnet_tracker` - creates `disable_gossipsub_topic_scoring` and sets it to `false`by default
This fixes the issues arising with sigp#406. The root cause is as follows: We send messages anonymously, setting a random author ID through `MessageAuthenticity::RandomAuthor`. Libp2p by default penalizes getting sent messages that originate from ourselves. However, as we do not track under which author we published each message, libp2p simply punishes messages that we have published, identified by message ID. The message ID depends only on the message contents. The famed "decided message" is generated after a QBFT instance completes - and as it simply is an aggregation of existing signatures and messages, it can be generated by all participants and actually ends up having identical content in many cases. Nodes generate this message at the same time and send it almost simultaneously. This causes nodes to be penalized for "self" messages, which are not actually self messages. Disable penalizing getting sent messages sent by "self" - which may actually be generated by someone else but happens to be identical.
sigp#371 This PR replaces the old `subnet_tracker` module with a new `subnet_service` that tracks subnets and calculates dynamic message rates on epoch boundaries. It updates downstream consumers (network, client, scoring) to use the new API and pre-calculated rates. - Introduces `subnet_service` crate with dynamic message-rate computation and epoch scheduling - Updates topic-scoring API to accept pre-calculated rates (`new_with_rate`, `topic_score_params_for_subnet_with_rate`) - Refactors all crates (network, client, message_sender, fuzz, peer_manager, handshake, discovery) to depend on `subnet_service` instead of `subnet_tracker` - creates `disable_gossipsub_topic_scoring` and sets it to `false`by default
This fixes the issues arising with sigp#406. The root cause is as follows: We send messages anonymously, setting a random author ID through `MessageAuthenticity::RandomAuthor`. Libp2p by default penalizes getting sent messages that originate from ourselves. However, as we do not track under which author we published each message, libp2p simply punishes messages that we have published, identified by message ID. The message ID depends only on the message contents. The famed "decided message" is generated after a QBFT instance completes - and as it simply is an aggregation of existing signatures and messages, it can be generated by all participants and actually ends up having identical content in many cases. Nodes generate this message at the same time and send it almost simultaneously. This causes nodes to be penalized for "self" messages, which are not actually self messages. Disable penalizing getting sent messages sent by "self" - which may actually be generated by someone else but happens to be identical.
Issue Addressed
#371
Proposed Changes
This PR replaces the old
subnet_trackermodule with a newsubnet_servicethat tracks subnets and calculates dynamic message rates on epoch boundaries. It updates downstream consumers (network, client, scoring) to use the new API and pre-calculated rates.subnet_servicecrate with dynamic message-rate computation and epoch schedulingnew_with_rate,topic_score_params_for_subnet_with_rate)subnet_serviceinstead ofsubnet_trackerdisable_gossipsub_topic_scoringand sets it tofalseby defaultAdditional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.