Skip to content

Comments

Tree-sync friendly lookup sync tests#8592

Merged
mergify[bot] merged 50 commits intosigp:unstablefrom
dapplion:more-sync-tests
Feb 13, 2026
Merged

Tree-sync friendly lookup sync tests#8592
mergify[bot] merged 50 commits intosigp:unstablefrom
dapplion:more-sync-tests

Conversation

@dapplion
Copy link
Collaborator

Issue Addressed

Current lookup sync tests are written in an explicit way that assume how the internals of lookup sync work. For example the test would do:

  • Emit unknown block parent message
  • Expect block request for X
  • Respond with successful block request
  • Expect block processing request for X
  • Response with successful processing request
  • etc..

This is unnecessarily verbose. And it will requires a complete re-write when something changes in the internals of lookup sync (has happened a few times, mostly for deneb and fulu).

What we really want to assert is:

  • WHEN: we receive an unknown block parent message
  • THEN: Lookup sync can sync that block
  • ASSERT: Without penalizing peers, without unnecessary retries

Proposed Changes

Keep all existing tests and add new cases but written in the new style described above. The logic to serve and respond to request is in this function fn simulate https://github.com/dapplion/lighthouse/blob/2288a3aeb11164bb1960dc803f41696c984c69ff/beacon_node/network/src/sync/tests/lookups.rs#L301

  • It controls peer behavior based on a CompleteStrategy where you can set for example "respond to BlocksByRoot requests with empty"
  • It actually runs beacon processor messages running their clousures. Now sync tests actually import blocks, increasing the test coverage to the interaction of sync and the da_checker.
  • To achieve the above the tests create real blocks with the test harness. To make the tests as fast as before, I disabled crypto with TestConfig

Along the way I found a couple bugs, which I documented on the diff.

Review guide

Look at lighthouse/beacon_node/network/src/sync/tests/lookups.rs directly (no diff).

Other changes are very minor and should not affect production paths

@dapplion dapplion requested a review from jxs as a code owner December 16, 2025 05:16
@dapplion dapplion added ready-for-review The code is ready for review syncing labels Dec 16, 2025
// removed from the da_checker. Note that ALL components are removed from the da_checker
// so when we re-download and process the block we get the error
// MissingComponentsAfterAllProcessed and get stuck.
lookup.reset_requests();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug found on testing, lookups may be stuck given this sequence of events

// sending retry requests to the disconnecting peer.
for sync_request_id in self.network.peer_disconnected(peer_id) {
self.inject_error(*peer_id, sync_request_id, RPCError::Disconnected);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor bug, we need to remove the peer from the sync states (e.g. self.block_lookups) then inject the disconnect events. Otherwise we may send requests to peers that are already disconnected. I don't think there's risk of sync getting stuck if libp2p rejects sending messages to disconnected peers, but deserves a fix anyway.

@michaelsproul michaelsproul added the test improvement Improve tests label Dec 16, 2025
@mergify
Copy link

mergify bot commented Dec 16, 2025

This pull request has merge conflicts. Could you please resolve them @dapplion? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Dec 16, 2025
@dapplion
Copy link
Collaborator Author

dapplion commented Feb 6, 2026

@jimmygchen Thanks for the review! I've addressed all your feedback:

Already fixed:

  • Documentation, naming improvements (max_seen_peersseen_peers, with_process_result)
  • Removed unused code (disable_fetch_blobs, external_harness field, random queue selection)
  • Fixed panic message and typos

Just fixed (8d29e73):

  • Renamed import_block_before_processwith_import_block_before_process for builder pattern consistency
  • Fixed incomplete test documentation

All 12 review threads resolved. The 2 remaining threads are my own bug notes.

@dapplion dapplion added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Feb 12, 2026
};

data.signature = data.create_signature(&keypair.sk, spec);
data.signature = SignatureBytes::empty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check if we need these changes in this file.
Should we feature gate this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentialy delete this function in a future PR?

);
self.expect_empty_network();
self.expect_no_active_lookups();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation for naming conventions, e.g.

  • .expect_ for mocking / setting up component behaviour
  • .assert_ for making assertion on behaviour

// The signature should be checked for new validators. Return early for a bad
// signature.
if is_valid_deposit_signature(&deposit_data, spec).is_err() {
if verify_signatures.is_true() && is_valid_deposit_signature(&deposit_data, spec).is_err() {
Copy link
Member

@jimmygchen jimmygchen Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add the same check to all other places where is_valid_deposit_signature is called? or would it be cleaner to use feature so we don't have to pass VerifySignatures around and remember to add this check whereever we verify signature?

@mergify
Copy link

mergify bot commented Feb 12, 2026

Some required checks have failed. Could you please take a look @dapplion? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Feb 12, 2026
@dapplion
Copy link
Collaborator Author

dapplion commented Feb 12, 2026

The network-tests CI failure is a pre-existing flaky test unrelated to this PR's changes:

data_column_reconstruction_at_deadline
  left: ["column_reconstruction", "gossip_data_column_sidecar", "worker_freed"]
 right: ["gossip_data_column_sidecar", "worker_freed", "nothing_to_do"]

The test has a latent race: on the last loop iteration, enqueuing the final gossip column triggers reconstruction nearly simultaneously. The assert_event_journal_completes in the loop expects to drain gossip_data_column_sidecar first, but column_reconstruction can fire before it. All merge queue runs on unstable pass — the race just surfaces under different CI timing/load. Worth hardening the event ordering assertion in a follow-up.

@jimmygchen
Copy link
Member

The network-tests CI failure is a pre-existing flaky test unrelated to this PR's changes:

data_column_reconstruction_at_deadline
  left: ["column_reconstruction", "gossip_data_column_sidecar", "worker_freed"]
 right: ["gossip_data_column_sidecar", "worker_freed", "nothing_to_do"]

The test has a latent race: on the last loop iteration, enqueuing the final gossip column triggers reconstruction nearly simultaneously. The assert_event_journal_completes in the loop expects to drain gossip_data_column_sidecar first, but column_reconstruction can fire before it. All merge queue runs on unstable pass — the race just surfaces under different CI timing/load. Worth hardening the event ordering assertion in a follow-up.

are you @dapplion

@dapplion dapplion added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Feb 13, 2026
r.corrupt_last_block_signature();
r.trigger_with_last_block();
r.simulate(SimulateConfig::happy_path()).await;
if cfg!(feature = "fake_crypto") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this condition in the same test feels a bit confusing, as we're testing different outcomes and in the case of fake_crypto scenario, it doesn't match what the test name suggests,

I prefer something like:

  #[tokio::test]
  #[cfg(feature = "fake_crypto")]
  async fn block_signature_not_checked_with_fake_crypto() { ... }

  #[tokio::test]
  #[cfg(not(feature = "fake_crypto"))]
  async fn invalid_block_signature_fails_processing() { ... }

but yeah this is a minor style issue

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dapplion

Nice work! This is a great improvement to the existing tests, which is becoming hard to maintain and will break once we change how sync works.

It looks pretty solid now and we can keep iterating over the test interfaces as we add more tests.

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Feb 13, 2026
@mergify mergify bot added the queued label Feb 13, 2026
@mergify
Copy link

mergify bot commented Feb 13, 2026

Merge Queue Status

Rule: default


This pull request spent 32 minutes 9 seconds in the queue, including 29 minutes 48 seconds running CI.

Required conditions to merge
  • check-success=local-testnet-success
  • check-success=test-suite-success

mergify bot added a commit that referenced this pull request Feb 13, 2026
@mergify mergify bot merged commit f4a6b8d into sigp:unstable Feb 13, 2026
36 checks passed
@mergify mergify bot removed the queued label Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge This PR is ready to merge. syncing test improvement Improve tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants