Skip to content

Conversation

@SuperFluffy
Copy link
Contributor

This patch implements a constant backoff from 1s to 30s when reading the validator config from execution layer state at the end of an epoch. Spamming this every second was excessive, especially while a node was syncing.

The event level was also lowered to WARN - after all, the node is actively trying to recover.

Finally, the event provides extra information about the execution layer, which potentially helps understand why it currently fails to read state.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR enhances retry logic and observability for validator config reads:

Changes:

  • Replaces fixed 1-second retry delay with linear backoff (1s to 30s)
  • Adds network sync status and block height telemetry to help diagnose failures
  • Removes escalation from warn! to error! after 10 attempts (now always warn!)
  • Adds utility functions display_result and display_option for cleaner telemetry formatting

Additional Notes:

  • The new telemetry fields (is_syncing, best_block, target_block, blocks_behind) provide valuable context for debugging validator config read failures, especially during node sync
  • The display_result and display_option utilities are well-implemented with proper error chain handling
  • Removing the warn-to-error escalation reduces log severity for persistent failures; consider whether this aligns with alerting requirements

View this review on Amp

best_block = %tempo_telemetry_util::display_result(&best_block),
target_block,
blocks_behind = %tempo_telemetry_util::display_option(&blocks_behind),
"reading validator config from contract failed; will retry",
Copy link
Contributor

@joshieDo joshieDo Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like this might scare operators, maybe we can add to the msg that this is expected when we're behind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though is_syncing is in there, i feel like it might be missed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, or separating the two cases into different logs

Copy link
Contributor Author

@SuperFluffy SuperFluffy Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, right now it emits errors for hours, so this is an improvement.

WARN is appropriate because it is being handled, yet does constitute a hickup in normal operation and should be looked into.

We already have a log for this (from reth), and it is easily missed. That's why this info is added here.

Separating this into separate events feels incorrect because this is just one event.

@SuperFluffy SuperFluffy added this pull request to the merge queue Feb 9, 2026
Merged via the queue into main with commit d082fdd Feb 9, 2026
15 checks passed
@SuperFluffy SuperFluffy deleted the janis/epoch-reads branch February 9, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants