Skip to content

Milestone event processing introduces unnecessary latency in sync cycle #119

@madumas

Description

@madumas

Description

The milestone synchronization mechanism introduces structural latency at multiple points in the sync pipeline. This latency is inherent to the current design and affects both the syncToTip phase and the event loop at tip, regardless of whether the node eventually catches up.

Root causes

1. Milestone scraper pollDelay = 1s (vs 200ms for spans)

In polygon/heimdall/service.go:89, the milestone scraper polls Heimdall every 1 second:

milestoneScraper := NewScraper(
    "milestones",
    store.Milestones(),
    milestoneFetcher,
    1*time.Second,        // ← 5x slower than spans (200ms)
    ...
)

During syncToTip, each cycle calls SynchronizeMilestones() which blocks on syncEvent.Wait() until the scraper completes a full poll cycle. This adds up to ~1s of dead time per syncToTip iteration, contributing to the 32-58s inter-cycle gap observed in production.

The span scraper uses 200*time.Millisecond (service.go:98). There's no reason for milestones to be 5x slower.

2. futureMilestoneDelay = 1s re-queue polling loop

In polygon/sync/sync.go:289-308, when a milestone arrives ahead of the current tip (which is the common case since Heimdall publishes milestones before the node has executed the blocks):

if milestone.EndBlock().Uint64() > ccb.Tip().Number.Uint64() {
    // finality is already tracked here (line 293) ✓
    go func() {
        time.Sleep(futureMilestoneDelay)  // 1s
        s.tipEvents.events.PushEvent(...)  // re-queue
    }()
    return nil
}

This spawns a goroutine that sleeps 1s and re-pushes the event, repeating until the tip catches up. For a milestone 3 blocks (6s) ahead of tip, this creates ~6 polling goroutines. The finality tracking (lastFinalizedBlockNum) is already done at line 293 before the re-queue — the re-queue only serves CCB pruning and milestone verification, which the next on-time milestone will handle anyway (~32s later).

3. WaitUntilHeimdallIsSynced + SynchronizeSpans on every block event

In polygon/sync/sync.go:364-376, every single block event in the event loop triggers:

err := s.heimdallSync.WaitUntilHeimdallIsSynced(ctx, 200ms)
err = s.heimdallSync.SynchronizeSpans(ctx, math.MaxUint64)

This is fast in steady state, but during span rotation (every 128 blocks / ~256s), SynchronizeSpans must fetch from Heimdall and recompute producer selection, adding ~12s of overhead — a systematic source of lag at a predictable interval.

Production data

From v3.4.0-beta (bor-mainnet, commit 48d7b0b):

  • syncToTip phase: 32-58s inter-cycle gap. Execution + trie dominates, but the 1s scraper poll is wasted time on every iteration.
  • Event loop at tip: FC cycle avg = 2.07s for 2s blocks. Steady-state head age = 2-4s. Every second of unnecessary latency directly translates to falling further behind.

From issue #59 logs (v3.1.2):

[sync] update fork choice done           in=8.7s
[sync] applying new milestone event      milestoneId=3648361 ...
[sync] applying new milestone event      milestoneId=3648362 ...
[span-rotation] need to wait for span rotation ...
[bor.heimdall] anticipating new span update within 8 seconds
[span-rotation] producer set was not updated within 8 seconds

Milestones are processed sequentially after the FC update, then span rotation adds another 8s — the node is idle for seconds doing Heimdall bookkeeping instead of processing blocks.

Suggested improvements

  1. Reduce milestone pollDelay from 1s to 200ms — align with spans, one-line change in service.go:89
  2. Remove the futureMilestoneDelay re-queue loop — finality is already tracked; drop the event and let the next on-time milestone handle CCB pruning and verification
  3. Consider making span synchronization non-blocking for block events that don't fall on a span boundary

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions