Skip to content

Comments

Add multi-worker work support to BeaconProcessor#7720

Closed
jimmygchen wants to merge 1 commit intosigp:unstablefrom
jimmygchen:bp-rayon
Closed

Add multi-worker work support to BeaconProcessor#7720
jimmygchen wants to merge 1 commit intosigp:unstablefrom
jimmygchen:bp-rayon

Conversation

@jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Jul 9, 2025

Issue Addressed

Proposed Changes

(Don't look, the code it splits out is not great 😅 )

Implements generic multi-worker configuration allowing tasks to allocate multiple workers with scoped rayon thread pools:

  • Per-work-type configuration: Each work type can specify worker count (defaults to 1)
  • Column reconstruction: Uses min(4, max_workers) by default
  • Scoped thread pools: Prevents oversubscription by coordinating BeaconProcessor and rayon
  • Implement spawn_multi_worker() with scoped rayon thread pools

NOTE: This hasn't been reviewed / tested. Please do NOT merge.

@jimmygchen jimmygchen added do-not-merge optimization Something to make Lighthouse run more efficiently. labels Jul 9, 2025
@jimmygchen
Copy link
Member Author

Closing this - will write a new impl.

I think there's a fundamental problem with both this solution here and the existing beacon processor - Both async and blocking tasks are handled the same way e.g., some async tasks that wait on network I/O may hold a worker slot unnecessarily.

@jimmygchen jimmygchen closed this Jul 13, 2025
@michaelsproul
Copy link
Member

The async task issue is interesting, maybe we should allow async tasks to go unbounded, and let Tokio's thread pool sort them out?

Alternatively we would decrement the number of active workers every time we yield, but I don't think this sounds easy to do

@jimmygchen
Copy link
Member Author

jimmygchen commented Jul 16, 2025

Yeah i think we could potentially try that.

Should also keep in mind @paulhauner's comment below - there are some expensive async tasks (e..g. reconstruction - although we might be able to split this one into two tasks - blocking reconstruction + async block import)

I think there's two types of async tasks:

Async tasks with very little computational load (e.g. waiting for a response from a server)
Async tasks which have high computational load (e.g. those that might spawn a few blocking threads and await them)
I think (1) doesn't need to take a worker slot, however I think it makes sense that (2) takes a slot since we don't want too many of those running simultaneously.

@michaelsproul
Copy link
Member

Yeah perhaps we could make the blocking tasks started by async tasks go through the beacon processor so that they are tracked and bounded

@jimmygchen
Copy link
Member Author

Yes, so to summarise:

BeaconProcessor

Async task

  • Keep computation light
  • Any blocking computation go through the beacon processor
  • Not bounded by max workers (num_cpus) but probably still need a queuing system to make sure it doesn't go unbounded and cause memory issues

Blocking task

  • Gets allocated 1 thread by default
  • For heavy tasks that requires rayon, allowWorkTypes to acquire more than 1 worker (N) - when executing, create a scoped rayon pool and run the parallel tasks within the scope, and release the workers after it completes

CPU allocation

  • tokio runtime: num_cpus thread (default)
  • BeaconProcessor: running inside the tokio runtime, and maintains blocking workers max_blocking_workers = num_cpus
  • rayon: scoped per task

Memory

  • Both sync and async tasks are bounded by queues of each individual work type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge optimization Something to make Lighthouse run more efficiently.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants