Fix data columns not persisting for PeerDAS due to a getBlobs race condition#6756
Fix data columns not persisting for PeerDAS due to a getBlobs race condition#6756mergify[bot] merged 5 commits intosigp:unstablefrom
getBlobs race condition#6756Conversation
25e137b to
44f8add
Compare
| // Check if we have all components and entire set is consistent. | ||
| if pending_components.is_available(self.sampling_column_count, log) { | ||
| write_lock.put(block_root, pending_components.clone()); | ||
| write_lock.put(block_root, pending_components.clone_without_column_recv()); |
There was a problem hiding this comment.
These clones where we delete the receiver are OK because we should be using the receiver the first time it is returned inside an available block, right?
There was a problem hiding this comment.
Yes exactly - I added some docs to the function but I still think don't think it's immediately obvious:
Added some more comments above this line in 4173135:
… Add more docs on `data_column_recv`.
| // If `data_column_recv` is `Some`, it means we have all the blobs from engine, and have | ||
| // started computing data columns. We store the receiver in `PendingComponents` for | ||
| // later use when importing the block. | ||
| pending_components.data_column_recv = data_column_recv; |
There was a problem hiding this comment.
If data_column_recv is already Some means we have double reconstruction happening. Should we mind that case or do something?
There was a problem hiding this comment.
Yes, I chatted with Michael about this - so currently it's possible any of these three happens at the same time
- gossip block triggering EL
getBlobsand column computation - rpc block triggering EL
getBlobsand column computation - gossip/rpc column triggering column reconstruction (only one computation at a time)
I initially wanted to address these altogether, but it got a bit messy and I decided to leave this issue to a separate PR and to keep this PR small , as I wanted to get this PR merged first and start interop testing.
# Conflicts: # beacon_node/beacon_chain/src/block_verification_types.rs # beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
ff7088e to
adf1c86
Compare
|
@mergify queue |
✅ The pull request has been merged automaticallyDetailsThe pull request has been merged automatically at dd7591f |
Issue Addressed
This PR fixes a race condition with
getBlobsin PeerDAS, causing data columns not to be persisted on block import.This bug was discovered on a local devnet. The sequence of events below:
process_availability.lighthouse/beacon_node/beacon_chain/src/beacon_chain.rs
Lines 3929 to 3931 in fa6c4c0
getBlobsSidecarsendpoint shows that both supernode and fullnode are missing columnsProposed Changes
After some discussion with @michaelsproul , we think the optimisation #6600 is useful to keep, as it enables data columns to be computed and propagated to the network sooner. The solution proposed is to store the data column
Receiverin theDataAvailaiblityChecker(new field inBlockImportData), so it's always available on block import.TODO