Aggregator Committee Runner - Pre-Consensus Completeness Improvement#604
Aggregator Committee Runner - Pre-Consensus Completeness Improvement#604MatheusFranco99 wants to merge 12 commits intoboolefrom
Conversation
Greptile OverviewGreptile SummaryThis PR enhances the Aggregator Committee Runner's pre-consensus phase to improve the liveness vs. completeness trade-off by waiting for more messages when no aggregator is selected. Key Changes:
Critical Issue Found: Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant O1 as Operator 1
participant O2 as Operator 2
participant O3 as Operator 3
participant O4 as Operator 4
participant Runner as Aggregator Committee Runner
participant Beacon as Beacon Node
participant QBFT as QBFT Consensus
Note over Runner: StartNewDuty()
Note over Runner: Reset preConsensusSeenSigners<br/>Reset preConsensusSelectionsTested
O1->>Runner: Pre-consensus message (validator A)
Runner->>Runner: Mark O1 as seen
Note over Runner: Process message, check for quorum
alt No quorum yet
Runner-->>O1: Wait for more messages
end
O2->>Runner: Pre-consensus message (validator A)
Runner->>Runner: Mark O2 as seen
Note over Runner: Process message, check for quorum
O3->>Runner: Pre-consensus message (validator A)
Runner->>Runner: Mark O3 as seen
Note over Runner: Quorum reached for validator A
Runner->>Beacon: IsAggregator(validator A)?
Beacon-->>Runner: Selection result
Runner->>Runner: Mark (validator A, root) as checked
alt Validator A selected
Runner->>QBFT: Start consensus
Note over Runner,QBFT: Proceed to consensus phase
else Validator A not selected
alt All operators seen OR all selections checked
Note over Runner: Terminate duty (finished=true)
else More messages expected
Note over Runner: Wait for more messages
O4->>Runner: Pre-consensus message (validator B)
Runner->>Runner: Mark O4 as seen
Note over Runner: Check if all operators seen now
alt All operators seen AND no aggregators
Note over Runner: Terminate duty (finished=true)
end
end
end
|
Additional Comments (1)
|
…eckedForSelection
GalRogozinski
left a comment
There was a problem hiding this comment.
Pre‑consensus signer “seen” tracking happens before validation in processing. This is fine since we have p2p message validation checking signatures.
So again we see a motivation adding this to spec... Just a little observation not for this PR.
| // If already started consensus, ignore pre-consensus messages. | ||
| // This is important because it avoids redundant processing and prevents the pre-consensus termination checks from being made. | ||
| if r.HasStartedConsensus() { | ||
| return types.NewError(types.AggCommPreConsensusIgnoredSinceAlreadyStartedConsensusErrorCode, "ignoring pre-consensus message since consensus already started") |
There was a problem hiding this comment.
hmmm...
I just wonder if it should be an error... because this isn't really an error (we are expecting N messages in the happy flow)
I think you can test w.o the error by just seeing that the last preconf message doesn't add validators
There was a problem hiding this comment.
Yeah, originally I didn't add it, but didn't know how to test.
I see, I didn't find a way to test it but I'll look at it
There was a problem hiding this comment.
After looking I think we can still just show the output of a single QBFT message. There is also the state introspection in the generated state that should be useful
There was a problem hiding this comment.
we can still just show the output of a single QBFT message
how so?
There was a problem hiding this comment.
we will show the expected CD doesn't change
| // If all duties have been tested for selection or all messages (from all operators) have been seen, terminate. | ||
| if r.HaveCheckedAllDutiesForSelection(aggregatorMap, contributionMap) || r.HasSeenAllPreConsensusSigners() { | ||
| r.BaseRunner.State.Finished = true | ||
| return nil |
There was a problem hiding this comment.
I think it can happen on that you see all preconsensus signers and there is an error for processing selectionProof.
i.e. if the beacon node is bad
| return nil | |
| return anyErr |
There was a problem hiding this comment.
Yes. I think this motivates us to bring the selection check to spec (which is currently in the implementation but doesn't need the beacon ndoe).
Also, about bad communication, maybe the impl should have retries and for spec this matter less? Wdyt?
There was a problem hiding this comment.
hmm dunno if we need to specify what is already specified in ETH spec
even though I think currently we have some things that are duplicate specifications but we don't always do it.
In spec we don't return error for selection, but for getting the aggregate attestations (that impl does refer to the beacon node for it).
It is just about propagating the error correctly upwards. It doesn't mean we retry.
There was a problem hiding this comment.
In spec we don't return error for selection, but for getting the aggregate attestations (that impl does refer to the beacon node for it).
In implementation right? As spec doesn't have this detail
ssv/spectest/tests/runner/preconsensus/single_validator_selected.go
Outdated
Show resolved
Hide resolved
| "Msgs": { | ||
| "1": [ | ||
| { | ||
| "SignedMessage": { | ||
| "Signatures": [ | ||
| "Dh866O3mNb1dj0DSNzv6IpSnnjhgmmpUCDHTpsrtxVEUGs4sgCWyPUdcp/g27EXoyfPwPomQyoCY5M3AwjL0rJPOuMutNFqcqmOy86f7We/Zdg0U3brmnJ2zJAZt4CG9sro/GXzzMzdtDJH0dvhm4Z4vtv87k+485vM2R5CqlDt3j8bfsWRa5EcDiOedxL4V97WeBR/cLV7buIcwBxnYuek9UsS7qDab3wAYkGlDHmXFUPNkC9NcGoiGJklBUF+7Ljsx3It9UycBKw/rEEHw5kQ7lLCwZEqyAs3LN51L1Ao6ZR8LKDchnTCdqaa5xpTeA+rCXeKX2WBz9738PAHImQ==" | ||
| ], | ||
| "OperatorIDs": [ | ||
| 1 | ||
| ], | ||
| "SSVMessage": { | ||
| "MsgType": 0, | ||
| "MsgID": "000003010600000000000000000000000000000000000000cf97adeedb59e05bfd73a2b4c2a8885708c4f4f70c84c64b27120e72ab733b72", | ||
| "Data": "AAAAAAAAAAAMSHEAAAAAAAEAAAAAAAAATAAAAKV7Rbu15E9BsS9PCaIudvca/pUK8NZtWKS73NHxnlw4AAAAAAAAAACEAAAAhAAAAAAAAwEGAAAAAAAAAAAAAAAAAAAAAAAAAM+Xre7bWeBb/XOitMKoiFcIxPT3DITGSycSDnKrczty" | ||
| }, | ||
| "FullData": "BgAAAAAAAAAcAAAAjAAAAJQAAACVAQAAlQEAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAADAAAAAAAAAAQAAADsAAAADEhxAAAAAAAAAAAAAAAAAAECAwQFBgcICQoBAgMEBQYHCAkKAQIDBAUGBwgJCgECAAAAAAAAAAABAgMEBQYHCAkKAQIDBAUGBwgJCgECAwQFBgcICQoBAgEAAAAAAAAAAQIDBAUGBwgJCgECAwQFBgcICQoBAgMEBQYHCAkKAQIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB" | ||
| }, | ||
| "QBFTMessage": { | ||
| "DataRound": 0, | ||
| "Height": 7424012, | ||
| "Identifier": "AAADAQYAAAAAAAAAAAAAAAAAAAAAAAAAz5et7ttZ4Fv9c6K0wqiIVwjE9PcMhMZLJxIOcqtzO3I=", | ||
| "MsgType": 0, |
There was a problem hiding this comment.
not sure why this changed
There was a problem hiding this comment.
I had to change the post decided test. Pls, checkout the description above on the "Side effects" section
| } | ||
|
|
||
| multiSpecTest.Tests = append(multiSpecTest.Tests, &tests.MsgProcessingSpecTest{ | ||
| Name: fmt.Sprintf("single validator quorum not selected doesn't start consensus (%s)", version.String()), |
There was a problem hiding this comment.
go: downloading github.com/ssvlabs/ssv-spec v1.2.3-0.20260216165555-9749b2d49c63
go: github.com/ssvlabs/ssv-spec@agg-comm-pre-consensus-termination-fix-repo-size: create zip: ssv/spectest/generate/state_comparison/tests_MultiMsgProcessingSpecTest/pre_consensus_single_validator_not_selected/single validator quorum not selected doesn't start consensus (electra).json: malformed file path "ssv/spectest/generate/state_comparison/tests_MultiMsgProcessingSpecTest/pre_consensus_single_validator_not_selected/single validator quorum not selected doesn't start consensus (electra).json": invalid char '\''
ssv/spectest/generate/state_comparison/tests_MultiMsgProcessingSpecTest/pre_consensus_single_validator_not_selected/single validator quorum not selected doesn't start consensus (phase0).json: malformed file path "ssv/spectest/generate/state_comparison/tests_MultiMsgProcessingSpecTest/pre_consensus_single_validator_not_selected/single validator quorum not selected doesn't start consensus (phase0).json": invalid char '\''
| Name: fmt.Sprintf("single validator quorum not selected doesn't start consensus (%s)", version.String()), | |
| Name: fmt.Sprintf("single validator quorum not selected does not start consensus (%s)", version.String()), |
shane-moore
left a comment
There was a problem hiding this comment.
hi, reviewed the PR from the lens of anchor, lgtm! on our end, we collect selection proofs independently per validator, so we're unaffected by the cross-validator early termination bug this fixes. def agree it's a good idea to postpone ending the pre-consensus phase in your agg committee runner until after you've received messages from all operators in a committee or after you've received partial sig messages back for all duties in the operator's view
also, thanks for tagging us to review the spec PR! small ask to tag me in lieu of @dknopik in future spec PR's 🙏
| // HasStartedConsensus checks if consensus has already started by verifying if we have a QBFT instance for the duty's slot. | ||
| func (r *AggregatorCommitteeRunner) HasStartedConsensus() bool { | ||
| if r.BaseRunner.QBFTController == nil { | ||
| return false | ||
| } | ||
| if r.BaseRunner.State == nil { | ||
| return false | ||
| } | ||
| if r.BaseRunner.State.StartingDuty == nil { | ||
| return false | ||
| } | ||
| return r.BaseRunner.QBFTController.InstanceForHeight(qbft.Height(r.BaseRunner.State.StartingDuty.DutySlot())) != nil | ||
| } |
There was a problem hiding this comment.
To simplify, the implementation of this func can probably be just this instead ?
// HasStartedConsensus checks if consensus has already started (or maybe even finished) for the current duty.
func (r *AggregatorCommitteeRunner) HasStartedConsensus() bool {
if r.BaseRunner.State == nil {
return false
}
return r.BaseRunner.State.RunningInstance != nil || r.BaseRunner.State.DecidedValue != nil
}There was a problem hiding this comment.
Agree, thanks!
| if !hasNewQuorum { | ||
| // If didn't get any new quorum, didn't yet start QBFT, and has received the last message, then terminate. | ||
| if r.HasSeenAllPreConsensusSigners() { | ||
| if !r.HasStartedConsensus() { | ||
| r.BaseRunner.State.Finished = true | ||
| } | ||
| } |
There was a problem hiding this comment.
!r.HasStartedConsensus() at this point is always true, the early return at the top of ProcessPreConsensus already exits when consensus has started, and basePreConsensusMsgProcessing doesn't start consensus.
Consider removing the inner check to avoid implying there's a path where consensus could have started between the two checks.
if !hasNewQuorum {
if r.HasSeenAllPreConsensusSigners() {
r.BaseRunner.State.Finished = true
}
return nil
}There was a problem hiding this comment.
Agree, thanks!
| // compareConsensusDataSample compares two consensus data samples. | ||
| // It tries decoding into ProposerConsensusData, AggregatorCommitteeConsensusData, or BeaconVote | ||
| // and compare each type accordingly. | ||
| func compareConsensusDataSample(t *testing.T, expectedData []byte, actualData []byte) { |
There was a problem hiding this comment.
Consider adding a require.Fail at the end of compareConsensusDataSample if none of the decode attempts succeed, otherwise the test silently passes without comparing anything.
require.Fail(t, "could not decode consensus data as any known type")There was a problem hiding this comment.
Agree, thanks!
Overview
This PR improves the liveness vs. completeness trade-off of the Pre-Consensus phase of the Aggregator Committee Runner.
More explicitly, with the new change, if a quorum is reached for some validator duty but it isn't selected (as aggregator or sync committee contribution), the duty wont' stop, and it will wait for more messages that may discover selected validators.
Spec Change
AggregatorCommitteeRunnerhave two new state variables:preConsensusSeenSigners: that registers the signers of the pre-consensus messages we've seen.preConsensusDutiesCheckedForSelection: that registers the(validator index, root)tuples (a bijection to validator duties) that we've already checked for selection in pre-consensus.In
PreConsensusprocessing:basePreConsensusMsgProcessing, we can't simply return. Instead, we should check if this is the last message we were waiting for. For example, we received messages from operators 1, 2, and 3. No aggregator have been selected. Then, we receive a message from operator 4, but it doesn't trigger any new quorum (only contributing to already checked duties). In this case, we now check that all operators have been seen, and since no aggregator was selected, we terminate the duty.Tests
TestingBeaconNodeis expanded with a new fieldCommitteeIndexAggregators map[phase0.CommitteeIndex]bool, that allows the test to specify the validators (through their CommitteeIndex`) it expects to be selected as aggregators.Accordingly,
MsgProcessingSpecTestis expanded with two fieldsBeaconAggregators []phase0.CommitteeIndexandBeaconAggregatorsValues []boolto specify the same values for the beacon node. This is needed as the beacon node interface isn't added to the json file. And it needs two field asmapcan't as well be used for json files.MsgProcessingSpecTestalso has a new fieldQBFTProposals [][]byteto specify the expected consensus data in the QBFT proposal (if any). This is needed to check cases in which only some validators are selected, which should reflect in the consensus data of the proposal.New tests
SingleValidatorNotSelected: tests that a single validator quorum that is not selected won't cause the duty to terminate or start consensus.SingleValidatorSelected: tests that a single validator quorum that is selected will cause the consensus to start (one test for aggregator and one for sync committee contribution).TerminatedWhenAllSignersSeen: tests that if all messages from all operators are received but no validator duty is selected as aggregator, then the duty is terminated.TerminatesWhenAllSelectionsChecked: tests that if all validator duties are checked for selection but none is selected, then the duty is terminated.IgnoreIfAlreadyStartedConsensus: tests that if consensus is already started, pre-consensus messages are ignored (and thus don't cause any state change, such termination checks being made).Side effects
runner/preconsensus/post_decided.gohappened because the new unexported states (of the aggregator committee runner) were not being passed to the json files, thus causing post state mismatches (in the json test). However, the logic has been kept the same, simply by passing the messages to theMessagesfield, instead of only adding its impact on the runner as done before.TestAggregatorCommitteeConsensusDataForDutyhas been expanded to put correct selection signatures in the consensus data, which wasn't being tested before by the testing value check functions. This is needed to properly test theQBFTProposalsfield of the new tests, which checks the exact match of the consensus data of the proposal.