Skip to content

Comments

Validate qbft logic#191

Merged
dknopik merged 22 commits intosigp:unstablefrom
diegomrsantos:validate-qbft-logic
Mar 31, 2025
Merged

Validate qbft logic#191
dknopik merged 22 commits intosigp:unstablefrom
diegomrsantos:validate-qbft-logic

Conversation

@diegomrsantos
Copy link
Member

@diegomrsantos diegomrsantos commented Mar 14, 2025

@diegomrsantos diegomrsantos self-assigned this Mar 14, 2025
@diegomrsantos diegomrsantos force-pushed the validate-qbft-logic branch 8 times, most recently from 91b4120 to 381199f Compare March 21, 2025 13:49
@diegomrsantos diegomrsantos requested a review from Copilot March 21, 2025 14:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR focuses on validating and enforcing QBFT consensus logic while integrating a new Round type and slot_clock dependency into various modules. Key changes include:

  • Introducing a new Round type in ssv_types and removing the embedded Round implementation from qbft_types.
  • Updating consensus validation logic to use slot_clock and improve state management via consensus_state.
  • Adding and updating tests and dependency configurations across several crates.

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
anchor/common/ssv_types/src/round.rs Introduces a new Round type backed by NonZeroUsize
anchor/message_validator/src/consensus_state.rs Implements consensus state management for operators
anchor/message_validator/src/consensus_message.rs Updates consensus validation logic and round validation functions
anchor/message_validator/src/message_counts.rs Adds message count tracking and limits validation
Others (Cargo.toml, network, qbft modules, etc.) Update dependency and parameter passing for slot_clock integration

@diegomrsantos diegomrsantos marked this pull request as ready for review March 25, 2025 19:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors and improves the QBFT logic by updating consensus message validation and round handling, and by integrating a generic slot clock for improved time‐based validations.

  • Introduces a new Round type in the ssv_types module and refactors related QBFT types accordingly.
  • Enhances consensus message validation with detailed round and leader checks using the slot_clock interface.
  • Updates multiple modules (validator, network, message_receiver) to use a generic SlotClock and improved consensus state management.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
anchor/common/ssv_types/src/round.rs Introduces a new Round type wrapping NonZeroUsize for round safety.
anchor/message_validator/src/message_counts.rs Adjusts message count validations with updated error reporting.
anchor/message_validator/src/consensus_state.rs Refactors operator state management for consensus message updates.
anchor/message_validator/src/consensus_message.rs Implements new QBFT logic with round estimation, leader selection, and time‐based validations.
anchor/message_validator/src/lib.rs Updates error variants and integrates consensus state handling.
anchor/message_receiver/src/lib.rs & manager.rs Renames and refactors the MessageReceiver into a generic NetworkMessageReceiver.
anchor/message_sender/src/network.rs Adapts NetworkMessageSender to use a generic SlotClock for time-based operations.
Various Cargo.toml updates Adds dependencies (slot_clock, dashmap, parking_lot) and ensures consistent integration.
anchor/common/qbft/src/qbft_types.rs Removes the duplicate Round implementation in favor of the new one.
anchor/network/src/discovery.rs Minor adjustments to error mappings and ENR building logic.
anchor/common/qbft/src/config.rs Adjusts usage of the Round type in QBFT configuration.
Comments suppressed due to low confidence (1)

anchor/message_sender/src/network.rs:22

  • [nitpick] The introduction of the generic parameter S: SlotClock improves flexibility, but consider adding documentation or more explicit bounds to clarify the expected behavior and constraints for implementers of SlotClock.
pub struct NetworkMessageSender<S: SlotClock> {

diegomrsantos and others added 2 commits March 25, 2025 20:38
@diegomrsantos diegomrsantos requested a review from Copilot March 25, 2025 19:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@diegomrsantos diegomrsantos requested a review from Copilot March 25, 2025 19:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refines the QBFT logic validation and consensus state management while integrating a dedicated Round type and the slot_clock dependency. Key changes include:

  • Added a new Round type in anchor/common/ssv_types/src/round.rs to encapsulate round logic.
  • Extended and integrated consensus state and QBFT validation in message validator modules with enhanced error details.
  • Updated network, message receiver, and message sender modules to propagate the slot_clock and updated validator interfaces.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
anchor/common/ssv_types/src/round.rs Introduces the Round type with conversion, defaulting, and arithmetic operators.
anchor/message_validator/src/consensus_state.rs Adds consensus state management and updates operator/signer state handling.
anchor/message_validator/src/consensus_message.rs Integrates new QBFT logic, including round validation based on slot timing.
anchor/message_validator/src/lib.rs Updates the ValidationFailure enum and validator integration to support new consensus logic.
anchor/message_receiver/src/lib.rs Refactors and renames the message receiver to NetworkMessageReceiver with a trait implementation.
anchor/client/src/lib.rs, anchor/message_sender/src/network.rs, anchor/network/* Updates integration points to pass slot_clock and updated validator instances.

# Conflicts:
#	anchor/client/src/lib.rs
@diegomrsantos diegomrsantos requested a review from Copilot March 26, 2025 10:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors and extends the consensus validation and message handling logic, incorporating enhanced QBFT logic, a generic SlotClock, and improved error contexts. Key changes include new modules for Round handling and ConsensusState, modifications to Validator and MessageReceiver to support a generic SlotClock, and updates to error variants and testing for consensus message validation.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
anchor/common/ssv_types/src/round.rs Adds a dedicated Round type with conversions and arithmetic.
anchor/message_validator/src/{message_counts,consensus_state,consensus_message,lib.rs} Implements refined consensus state tracking, QBFT logic validation, and enhanced error variants.
anchor/message_receiver/src/{lib.rs,manager.rs} Adapts the message receiver to use a trait and generic SlotClock.
anchor/message_sender/src/network.rs Updates NetworkMessageSender to be generic over SlotClock.
Various Cargo.toml files and qbft modules Introduces SlotClock dependency and removes redundant inlined Round type.
Comments suppressed due to low confidence (1)

anchor/message_validator/src/lib.rs:37

  • [nitpick] With the updated error variants (e.g. RoundAlreadyAdvanced now including fields), consider adding a brief code comment to clarify the use of wildcard matches for these variants to aid future developers’ readability.
| ValidationFailure::RoundAlreadyAdvanced { got: _, want: _ } => MessageAcceptance::Ignore,

Copy link
Member

@Zacholme7 Zacholme7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew this was dense! Awesome job though this is great and super useful. Definitely should get @dknopik and @jking-aus eyes on this too.

}

/// Hash a list of operator IDs to create a unique identifier, using SHA-256
fn hash_operators(operators: &Vec<OperatorId>) -> [u8; 32] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: &Vec<OperatorId> to &[OperatorId]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch


/// ConsensusState manages the state for consensus validation across operators and slots
pub(crate) struct ConsensusState {
operators: HashMap<OperatorId, RefCell<OperatorState>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it all works good here, but this RefCell is very hard to reason about with all the clones and could cause some unexpected runtime panics.

I recommend we either really reason about this, also use a DashMap here, or use the HashMap Entry API more directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. I removed the RefCell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, unresolving: Now that we have the DashMap, is an Arc<Mutex<_>> really necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different map and there I need the Mutex cause I can't change the consensus state concurrently

self.consensus_state_map
.entry(message_id.clone())
.or_insert_with(|| {
// Create a new consensus state with storage for two epochs worth of slots
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything that is cleaning up ConsensusState after two epochs have passed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this comment is correct. When does the consensus instance associated with the message IDs stop being active?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im probably just confused. It doesn't stop being active, but my understanding from the comment was that a new fresh instance would be created every two epochs. What happens to the consensus state after two epochs pass and the storage is taken up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message id does not change between instance heights. The only case where we do not clean up is when we leave a committee or a validator is removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of inherits the confusing "message id" from go-ssv, which I hope to get rid of in a future SIP ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is incorrect, I removed it. It's one ConsensusState per message id

if signers.len() == 1 {
validate_round_in_allowed_spread(consensus_message, received_at, slot_clock)?;
}

Copy link
Member

@Zacholme7 Zacholme7 Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can now remove some of these checks from qbft. separate issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember where it is done?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to take a look, would be good to group this into issue #182. Just a passover the whole pipeline to make sure we are not performing redundant work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pub struct Validator {
pub struct Validator<S: SlotClock> {
network_state_rx: Receiver<NetworkState>,
consensus_state_map: DashMap<MessageId, Arc<Mutex<ConsensusState>>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In terms of readability it might help to document the structure of the map. It goes very deep and has a lot of moving parts

Here is my understanding. Top level we have MessageID -> ConsensusState where ConsensusState holds all the state for Operators pertaining to MessageID.

OperatorState contains specific operator data for each slot via a circular buffer. For each slot, there is SignerState which represents signer info if this operator was a signer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I'll add more docs in the next PR.

Comment on lines 177 to 188
// Create a buffer (no sorting, to match Go implementation)
let mut buffer = Vec::new();

// Write each operator ID to the buffer in little endian order
for &operator in operators {
buffer.extend_from_slice(&operator.0.to_le_bytes());
}

// Hash the buffer with SHA-256
let mut hasher = Sha256::new();
hasher.update(&buffer);
hasher.finalize().into()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason behind not sorting? Otherwise this is the same computation for CommitteeId

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a validation checking this list is sorted, so I'm not sure sorting is necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the operators are sorted, this is a duplicate of computing CommitteeId. Can that just be used here?

impl From<Vec<OperatorId>> for CommitteeId {
fn from(mut operator_ids: Vec<OperatorId>) -> Self {
// Sort the operator IDs
operator_ids.sort();
let mut hasher = Sha256::new();
// Add the operator IDs as 32 byte values
for id in operator_ids {
hasher.update((*id as u32).to_le_bytes());
}
// Hash it all
<[u8; COMMITTEE_ID_LEN]>::from(hasher.finalize()).into()
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new version, let me know what you think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Some remarks, but mostly nitpicks.

Anyway, before we merge this I'd like to test this on interop when convenient.

| ValidationFailure::LateSlotMessage
| ValidationFailure::SlotAlreadyAdvanced
| ValidationFailure::RoundAlreadyAdvanced
| ValidationFailure::RoundAlreadyAdvanced { got: _, want: _ }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: use .. instead of listing the fields, is a bit cleaner. Same below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

network_state_rx: watch::Receiver<NetworkState>,
outcome_tx: mpsc::Sender<Outcome>,
validator: Validator,
validator: Validator<S>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the Validator actually has state, this needs to be an Arc, right? Or else the state of the receiver validator and sender validator diverges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Do we want to share the consensus state? I think so.

Comment on lines 169 to 173
if signers.is_empty() {
return Err(ValidationFailure::NoSigners);
}

let signer = signers[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very very nitpick, but a pattern I like in these cases is

let Some(signer) = signers.first() else {
    return Err(ValidationFailure::NoSigners);
}

to combine the check and retrieval.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion, thanks. Had to use:

let Some(&signer) = signers.first() else {
     return Err(ValidationFailure::NoSigners);
};

Comment on lines 258 to 264
// Calculate the first round index
let first_round_index = if height != FIRST_HEIGHT {
// If not the first height, add height % len(committee) to the index
height % committee.len() as u64
} else {
0
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this. 0 % anything is 0, so why the differentiation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, copied their approach and didn't catch this.

Comment on lines +282 to +286
let slot = Slot::new(consensus_message.height);
let slot_start_time = match slot_clock.start_of(slot) {
Some(time) => UNIX_EPOCH + time,
None => return Err(ValidationFailure::SlotStartTimeNotFound),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attestation voting only starts one third into the round - so always starting to measure at the start of the slot seems incorrect

Copy link
Member Author

@diegomrsantos diegomrsantos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, replied to most of the comments. I'll finish tomorrow

network_state_rx: watch::Receiver<NetworkState>,
outcome_tx: mpsc::Sender<Outcome>,
validator: Validator,
validator: Validator<S>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Do we want to share the consensus state? I think so.

Comment on lines 169 to 173
if signers.is_empty() {
return Err(ValidationFailure::NoSigners);
}

let signer = signers[0];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion, thanks. Had to use:

let Some(&signer) = signers.first() else {
     return Err(ValidationFailure::NoSigners);
};

if signers.len() == 1 {
validate_round_in_allowed_spread(consensus_message, received_at, slot_clock)?;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 258 to 264
// Calculate the first round index
let first_round_index = if height != FIRST_HEIGHT {
// If not the first height, add height % len(committee) to the index
height % committee.len() as u64
} else {
0
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, copied their approach and didn't catch this.

}

/// Hash a list of operator IDs to create a unique identifier, using SHA-256
fn hash_operators(operators: &Vec<OperatorId>) -> [u8; 32] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

self.consensus_state_map
.entry(message_id.clone())
.or_insert_with(|| {
// Create a new consensus state with storage for two epochs worth of slots
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is incorrect, I removed it. It's one ConsensusState per message id

| ValidationFailure::LateSlotMessage
| ValidationFailure::SlotAlreadyAdvanced
| ValidationFailure::RoundAlreadyAdvanced
| ValidationFailure::RoundAlreadyAdvanced { got: _, want: _ }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 177 to 188
// Create a buffer (no sorting, to match Go implementation)
let mut buffer = Vec::new();

// Write each operator ID to the buffer in little endian order
for &operator in operators {
buffer.extend_from_slice(&operator.0.to_le_bytes());
}

// Hash the buffer with SHA-256
let mut hasher = Sha256::new();
hasher.update(&buffer);
hasher.finalize().into()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new version, let me know what you think

@diegomrsantos diegomrsantos requested a review from Copilot March 28, 2025 09:29

This comment was marked as resolved.

@diegomrsantos diegomrsantos requested a review from dknopik March 31, 2025 11:44
Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@dknopik dknopik merged commit d2eb521 into sigp:unstable Mar 31, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants