Skip to content

spectests: align with robust message replay#576

Closed
iurii-ssv wants to merge 3 commits intomainfrom
spectests-align-with-robust-message-replay
Closed

spectests: align with robust message replay#576
iurii-ssv wants to merge 3 commits intomainfrom
spectests-align-with-robust-message-replay

Conversation

@iurii-ssv
Copy link
Contributor

@iurii-ssv iurii-ssv commented Aug 20, 2025

This PR intends to update/align the code in ssv-spec repo with the changes in ssvlabs/ssv#2445 so that we can have spec-tests passing in ssv repo.

Note, I'm not fully replicating the whole retry-logic implemented in ssvlabs/ssv#2445 - rather I'm just copying over the bare minimum we need to make spec-tests pass (so I'm mostly updating error-messages and re-generating json-test-files) - this is because the code in ssv repo already diverges from the code in ssv-spec repo ... and re-implementing every feature we add from now on in both doesn't make much sense. So, what this PR is doing isn't really viable long-term, ideally we'd want to copy all the spec-tests over to ssv repo and update/adjust them there as we need.

@iurii-ssv iurii-ssv marked this pull request as draft August 20, 2025 12:12
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR synchronizes the ssv-spec repository with changes from the main SSV repository (PR #2445) to implement robust message replay functionality and ensure spec-tests pass correctly. The changes focus on three main areas:

Enhanced Error Reporting: The PR improves error messages throughout the codebase by replacing generic error strings with more descriptive messages that include specific slot numbers and validation contexts. For example, generic "invalid partial sig slot" errors are now replaced with detailed messages like "future partial sig msg, message slot: 50, want slot: 12" or "invalid partial sig slot: 50, want: 7413760". This provides better debugging information when slot validation fails.

Method Name Simplification: The validatePartialSigMsgForSlot method is renamed to validatePartialSigMsg across multiple files, following the principle of shorter, clearer names since the method already takes a slot parameter making the "ForSlot" suffix redundant.

Test Infrastructure Improvements: Multiple test files are updated to enhance error handling in test execution. The GetPostState() methods in various spec test classes now provide formatted error messages when expected errors don't match actual errors, showing both the expected and actual error strings for better debugging.

The changes also include updates to JSON test specification files that define expected error messages for various slot validation scenarios, ensuring the spec tests validate the new error message formats correctly. The modifications span across runner validation logic, signature validation, and comprehensive test suites for both pre-consensus and post-consensus message processing scenarios.

Confidence score: 1/5

  • This PR has a critical blocking issue that makes it unsafe to merge
  • Score severely lowered due to an empty/corrupted test specification file that would break the test suite
  • The file ssv/spectest/generate/tests/tests.MultiMsgProcessingSpecTest_post_consensus_invalid_msg_slot.json is completely empty despite being referenced in the changes

9 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

),
Duty: testingutils.TestingAggregatorDuty(spec.DataVersionElectra),
Messages: []*types.SignedSSVMessage{
testingutils.SignPartialSigSSVMessage(ks, testingutils.SSVMsgAggregator(nil, invalidateSlot(testingutils.PostConsensusAggregatorMsg(ks.Shares[1], 1, spec.DataVersionElectra)))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Using invalidateSlot instead of invalidateSlotV - this may not test the version-specific slot validation logic for Electra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR I'm not really looking at what this test intends to check, I'm just trying to make it work with updated error-messages ...

As for what is the original intent of this test is (and how well invalidateSlot fits it, vs invalidateSlotV) - I'll leave this discussion open for somebody from spec-team to check out.

@iurii-ssv iurii-ssv force-pushed the spectests-align-with-robust-message-replay branch from 75a57db to c60049c Compare August 21, 2025 09:26
Comment on lines 33 to 36
nil,
// TODO: before merge ask engineering how often they see such message in production
"could not process msg: invalid signed message: did not receive proposal for this round",
"no proposal for round: invalid signed message: did not receive proposal for this round",
nil,
ks,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do see quite a bit of these, here is an example from our production cluster:
image

So, with ssvlabs/ssv#2445 we should be able to buffer messages in this scenario and "replay" it later (vs just dropping the message as if it never came, as is done currently)

ExpectedError: "no post consensus phase for voluntary exit",
},
{
Name: fmt.Sprintf("aggregator (%s)", spec.DataVersionPhase0.String()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iurii-ssv
can you explain the reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so in ssvlabs/ssv#2445 I had to introduce a new error to split invalid partial sig slot in 2:

  • invalid partial sig slot
  • ErrFuturePartialSigMsg(this is a newly-defined error current spec-tests don't know about and this change fixes that)

the reason to split it like that is because ErrFuturePartialSigMsg is a retryable error (we can buffer the message and replay it ~100ms later successfully)

@iurii-ssv
Copy link
Contributor Author

This is no longer needed since #585 fully supersedes it - hence closing.

@iurii-ssv iurii-ssv closed this Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants