Skip to content

Comments

add signature collector cleanup#107

Merged
jking-aus merged 2 commits intosigp:unstablefrom
dknopik:clean-up-signature-collectors
Jan 20, 2025
Merged

add signature collector cleanup#107
jking-aus merged 2 commits intosigp:unstablefrom
dknopik:clean-up-signature-collectors

Conversation

@dknopik
Copy link
Member

@dknopik dknopik commented Jan 16, 2025

Issue Addressed

Signature Collector never cleans up, causing a memory leak.

Proposed Changes

Introduce an expiry slot to the SignatureRequest. The signature collector removes all requests older than this expiry slot, plus a safety margin (which is 1 slot for now, we'll see if that is sufficient)

Additional Info

The design of this is very similar to how clean up is done in the QBFT manager.

Along the way, I noticed that the sign_validator_registration_data logic is incorrect, as LH passes the current timestamp, but in SSV we need the epoch timestamp, so I fixed that as well.

@Zacholme7
Copy link
Member

Code wise all looks good to me, good job!

Question on cleanup, What is the point of keeping any old requests around? If enough signatures are collected or slot N+1 begins making sigs for slot N invalid, shouldn't the collector be removed right away in both of these cases?

@dknopik
Copy link
Member Author

dknopik commented Jan 16, 2025

What is the point of keeping any old requests around?

Let's say our local node is a bit under stress for some reason. All the other operators sign something, but we never locally try to get this signature as the processor is a bit clogged up or the beacon node responds slowly or whatever. In this scenario, we need a timeout: at which point can we be sure that it won't be locally needed anymore?

Furthermore, sign_validator_registration_data for example lives longer than one slot, as the signature is valid for a whole epoch. I don't know when exactly the validator_services call sign_validator_registration_data, but if they do it multiple times per epoch we should be able to get the final signature again.

The last example is relevant for QBFT only: but there we decide the BeaconVote once for all validators in a committee. We need to keep the result for a while, as sign requests for the different validators will need that value to perform the signing.

There are probably more weird edge cases or "race conditions" I did not think about, and that is why I decided for the timeout mechanism as easiest prevention.

Hope this makes sense, let me know if something is unclear

Copy link
Member

@jking-aus jking-aus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jking-aus jking-aus merged commit 49f69c0 into sigp:unstable Jan 20, 2025
9 checks passed
@dknopik dknopik deleted the clean-up-signature-collectors branch February 5, 2025 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants