Skip to content

perf(engine): reduce proof worker count for small blocks#22074

Open
gakonst wants to merge 13 commits intomainfrom
yk/lazy-proof-workers-small-blocks
Open

perf(engine): reduce proof worker count for small blocks#22074
gakonst wants to merge 13 commits intomainfrom
yk/lazy-proof-workers-small-blocks

Conversation

@gakonst
Copy link
Member

@gakonst gakonst commented Feb 11, 2026

Summary

Reduce proof worker pool size for blocks with ≤30 transactions to cut per-block spawn overhead.

Motivation

ProofWorkerHandle::new() dominates spawn_payload_processor cost: ~1.5ms of ~2.5ms total per block. On a 31-core machine it spawns 62 workers (31 storage + 31 account), each opening an MDBX read txn + creating trie cursors.

Profiling shows each worker uses only 3.9μs CPU per block — 0.0005% utilization. They are almost entirely idle for blocks under 30 txs.

We cannot skip SRT entirely (PR #22129 tried, regressed +34%) because PreservedSparseTrie is reused block-to-block via state_root anchoring. Switching to Parallel forces full trie recomputation.

Changes

  • Save transaction_count from ExecutionEnv before it is moved
  • Externalize storage_worker_count and account_worker_count as params to ProofWorkerHandle::new() instead of reading from runtime
  • When tx count ≤30, cap workers at min(pool_size, 16) — preserves SRT cache chain while reducing spawn overhead

Expected: MDBX read txns per block 62→16 for small blocks, spawn overhead 1.5ms→0.4ms.

Bench

Results < 30

<30 txns
         Baseline   Feature   Change  
Mean      16.08ms   15.68ms   -2.51%
StdDev     6.13ms    6.10ms
P50       18.55ms   17.93ms   -3.38% 
P90       20.95ms   19.87ms   -5.16% 
P99       28.77ms   29.38ms   +2.15%
Mgas/s     638.86    655.30   +2.57%
image

normal:

Baseline   Feature   Change   
Mean      32.85ms   32.47ms   -1.15% 
StdDev    16.97ms   16.44ms
P50       27.94ms   27.84ms   -0.36%
P90       53.93ms    53.5ms    -0.8% 
P99       91.54ms   91.46ms   -0.08% 
Mgas/s      928.8    939.59   +1.16% 

Profiling data: Samply baseline | Samply feature

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

⚠️ Changelog not found.

A changelog entry is required before merging. We've generated a suggested changelog based on your changes:

Preview
---
reth-engine-tree: patch
reth-trie-parallel: patch
---

Added adaptive proof worker pool sizing that halves worker count for blocks with 30 or fewer transactions to reduce idle overhead when fewer state changes are generated.

Add changelog to commit this to your branch.

@gakonst gakonst force-pushed the yk/lazy-proof-workers-small-blocks branch 2 times, most recently from 7f38920 to 9c323c0 Compare February 12, 2026 10:22
@DaniPopes
Copy link
Member

overhead is reduced with rayon, can we rebench?

@yongkangc
Copy link
Member

@DaniPopes thanks for note, rebenching

@yongkangc yongkangc marked this pull request as ready for review February 12, 2026 18:16
@yongkangc yongkangc marked this pull request as draft February 12, 2026 18:18
gakonst and others added 3 commits February 12, 2026 20:18
For blocks with ≤30 transactions, cap proof workers at 32 each
(storage + account) instead of the full rayon pool size. Fewer
transactions produce fewer state changes, making most workers
idle overhead.

Adds ProofWorkerHandle::with_max_workers() to support capping
worker count while still using the dedicated rayon pools.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019c515e-d52f-77df-95a3-f5e213a81aa4
Remove with_max_workers/new_inner indirection — callers now pass
worker counts directly and cap them at the call site for small blocks.

Amp-Thread-ID: https://ampcode.com/threads/T-019c5312-9cb4-752a-a05b-0e3bce585ed9
In the spawn_payload_processor path, transaction_count comes from
the actual block envelope — it's always the real count. Empty blocks
have even fewer state changes, so capping workers is justified.
@yongkangc yongkangc force-pushed the yk/lazy-proof-workers-small-blocks branch from bf8eadb to 9203128 Compare February 12, 2026 20:18
@yongkangc yongkangc marked this pull request as ready for review February 12, 2026 21:04
@yongkangc yongkangc enabled auto-merge February 12, 2026 21:35
@yongkangc yongkangc added the C-perf A change motivated by improving speed, memory usage or disk footprint label Feb 12, 2026
@yongkangc yongkangc added the A-engine Related to the engine implementation label Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-engine Related to the engine implementation C-perf A change motivated by improving speed, memory usage or disk footprint

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

3 participants