Fix deadlock in message validator#274
Merged
jking-aus merged 1 commit intosigp:unstablefrom Apr 17, 2025
Merged
Conversation
Contributor
There was a problem hiding this comment.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
anchor/message_validator/src/lib.rs:253
- [nitpick] Consider using a scoped block to limit the lifetime of 'network_state' rather than explicitly calling drop. This approach can help prevent accidental misuse of the lock in future modifications.
drop(network_state);
| let operators_pks = self.get_operator_pks(signed_ssv_message.operator_ids())?; | ||
| let operators_pks = | ||
| get_operator_pks(&network_state, signed_ssv_message.operator_ids())?; | ||
| drop(network_state); |
Member
There was a problem hiding this comment.
That's a good catch, thanks. Is there a way we can avoid repeating this in the future? It still seems error-prone.
Member
Author
There was a problem hiding this comment.
Well, we havelockbud in CI, which is supposed to catch stuff like this, but I guess it has its limits.
Otherwise we just have to stay extra vigilant during review, and adopt patterns like either dropping locks ASAP, or even better, acquiring them in a limited scope such as a function.
jking-aus
approved these changes
Apr 17, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I noticed that sometimes, during a burst of incoming messages, Anchor would deadlock. More specifically, threads holding a
processorpermit would deadlock and the processor queue would become full shortly after.The issue was doubly locking the network state. As
RwLock(and thustokio::sync::watch) avoids writer starvation by not allowing new readers while a writer is waiting, so trying to (read) borrow the lock whilea) holding a read lock already, and
b) a writer is waiting
is a deadlock, as we are now waiting for the writer, while the writer is waiting for us to release the already held read lock.
This PR fixes the issue by using the existing borrowed value instead of borrowing again. Furthermore, we explicitly drop the read guard to release it as early as possible.