Skip to content

Comments

refactor: remove DutyState to use OperatorState for better parallelism#801

Open
PoulavBhowmick03 wants to merge 3 commits intosigp:unstablefrom
PoulavBhowmick03:refactor/operator_state
Open

refactor: remove DutyState to use OperatorState for better parallelism#801
PoulavBhowmick03 wants to merge 3 commits intosigp:unstablefrom
PoulavBhowmick03:refactor/operator_state

Conversation

@PoulavBhowmick03
Copy link

Issue Addressed

Fixes #745

Proposed Changes

  • Changed duty_state_map from DashMap<MessageId, DutyState> to DashMap<(MessageId, OperatorId), OperatorState> to enable parallel processing of messages
  • Removed DutyState wrapper struct.
  • Updated validation functions

@cla-assistant
Copy link

cla-assistant bot commented Feb 4, 2026

CLA assistant check
All committers have signed the CLA.

}
// Get or create the operator state first, then check if there's a signer state
let Some(signer_state) = operator_state.get_signer_state(&msg_slot) else {
return Ok(());
Copy link
Member

@shane-moore shane-moore Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the pr! This return Ok() causes validate_round_in_allowed_spread to be skipped for single signer messages when there's no prior state. The old continue fell through to the round
spread check after the loop

perhaps move the round spread check before the early return:

if signers.len() == 1 {
    validate_round_in_allowed_spread(consensus_message, validation_context)?;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i missed that. will make the change

Copy link
Member

@shane-moore shane-moore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in anchor/message_validator/src/lib.rs, line 440-443, lets do .ok_or(ValidationFailure::NoSigners)?; instead of .expect

@shane-moore
Copy link
Member

Two loops over signers were removed in this PR

1. validate_qbft_logic - Valid Optimization (but deviates from Go)

The Go implementation loops over all signers for the SeenSigners duplicate check. However, this loop is redundant because:

  • Operator IDs are always sorted
  • Same quorum → same sorted order → same first operator
  • Checking only the first operator's seen_signers is sufficient

While technically redundant, the savings are negligible (a few extra hash
lookups) and matching Go makes the code easier to audit and maintain. so I'm in favor of restoring the loop

2. validate_qbft_message_by_duty_logic - Bug

The Go implementation validates duty count for ALL
signers
:

for _, signer := range signedSSVMessage.OperatorIDs {
    signerStateBySlot := state.Signer(committeeInfo.signerIndex(signer))
    if err := mv.validateDutyCount(..., signerStateBySlot); err != nil {
        return err
    }
}

This loop needs to be restored. Each operator has their own per-epoch duty limit, so all signers must be validated. The new DashMap<(MessageId, OperatorId), OperatorState>
structure still supports this, just look up each operator's state individually.

@dknopik
Copy link
Member

dknopik commented Feb 10, 2026

@PoulavBhowmick03 can you please take a look at the above comments and merge from unstable to fix CI?

@PoulavBhowmick03
Copy link
Author

Two loops over signers were removed in this PR

1. validate_qbft_logic - Valid Optimization (but deviates from Go)

The Go implementation loops over all signers for the SeenSigners duplicate check. However, this loop is redundant because:

  • Operator IDs are always sorted
  • Same quorum → same sorted order → same first operator
  • Checking only the first operator's seen_signers is sufficient

While technically redundant, the savings are negligible (a few extra hash lookups) and matching Go makes the code easier to audit and maintain. so I'm in favor of restoring the loop

2. validate_qbft_message_by_duty_logic - Bug

The Go implementation validates duty count for ALL signers:

for _, signer := range signedSSVMessage.OperatorIDs {
    signerStateBySlot := state.Signer(committeeInfo.signerIndex(signer))
    if err := mv.validateDutyCount(..., signerStateBySlot); err != nil {
        return err
    }
}

This loop needs to be restored. Each operator has their own per-epoch duty limit, so all signers must be validated. The new DashMap<(MessageId, OperatorId), OperatorState> structure still supports this, just look up each operator's state individually.

The Go code receives a state object that can look up any signer's state via state.Signer(committeeInfo.signerIndex(signer))
in that case, should I pass DashMap<(MessageId, OperatorId), OperatorState> to the function for the state lookup?

@PoulavBhowmick03
Copy link
Author

@PoulavBhowmick03 can you please take a look at the above comments and merge from unstable to fix CI?

Yes sure, working on it rn

@dknopik
Copy link
Member

dknopik commented Feb 13, 2026

Hey @PoulavBhowmick03!

Again, thanks for your PR.

A problem that I did not think about when creating the original issue is that "Decided messages", i.e. messages with multiple operators as sender, would require holding a write lock on multiple entries of the map after the refactor. This is not possible without risking a deadlock.

Decided messages are likely to be removed in future: ssvlabs/SIPs#63
The current approach is more likely to work then.

That being said, we might still be able to make this work: SignerState::seen_signers is only used for decided messages. We might be able to create a second DashMap, used exclusively for tracking which decided messages we have seen. Something like: seen_decideds: DashMap<MessageId, SeenDecideds>
This will add additional complexity though.

Another idea might be a more involved general refactor of message validation, which was previously ported from go-ssv in very straightforward way. I am sure there is a better way to structure the existing code.

Apologies that I did not realize that this isn't as straightforward as I thought when creating the issue. Let me know your thoughts!

@PoulavBhowmick03
Copy link
Author

Hey @PoulavBhowmick03!

Again, thanks for your PR.

A problem that I did not think about when creating the original issue is that "Decided messages", i.e. messages with multiple operators as sender, would require holding a write lock on multiple entries of the map after the refactor. This is not possible without risking a deadlock.

Decided messages are likely to be removed in future: ssvlabs/SIPs#63 The current approach is more likely to work then.

That being said, we might still be able to make this work: SignerState::seen_signers is only used for decided messages. We might be able to create a second DashMap, used exclusively for tracking which decided messages we have seen. Something like: seen_decideds: DashMap<MessageId, SeenDecideds> This will add additional complexity though.

Another idea might be a more involved general refactor of message validation, which was previously ported from go-ssv in very straightforward way. I am sure there is a better way to structure the existing code.

Apologies that I did not realize that this isn't as straightforward as I thought when creating the issue. Let me know your thoughts!

Hey! I would need a little time to look into this as well, and the different ways to work around it for now. I will get back to you on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants