Skip to content

Initialize TLS for idle task#2227

Open
jounathaen wants to merge 1 commit intomainfrom
idle-task-tls
Open

Initialize TLS for idle task#2227
jounathaen wants to merge 1 commit intomainfrom
idle-task-tls

Conversation

@jounathaen
Copy link
Member

@jounathaen jounathaen commented Feb 5, 2026

Fixes #2225

image

I'm not 100% sure if this is the best way to do it , and this is only yet implemented for x86_64, as I haven't yet looked into how TLS is implemented on arm & RISC-V

Questions:

  • Shall we only do this in combination with the instrument feature?
  • Shall we initialize a smaller TLS? I think we only need 8 bytes for rftrace.

@jounathaen
Copy link
Member Author

Ok, for arm the TLS apparently needs to be stored in tpidr_el0; for RISC-V in the TP register.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark Current: ec9652b Previous: af77eb8 Performance Ratio
startup_benchmark Build Time 100.83 s 96.76 s 1.04
startup_benchmark File Size 0.86 MB 0.82 MB 1.06
Startup Time - 1 core 0.94 s (±0.03 s) 0.93 s (±0.03 s) 1.01
Startup Time - 2 cores 0.96 s (±0.03 s) 0.93 s (±0.03 s) 1.03
Startup Time - 4 cores 0.97 s (±0.03 s) 0.95 s (±0.04 s) 1.02
multithreaded_benchmark Build Time 97.43 s 94.96 s 1.03
multithreaded_benchmark File Size 0.96 MB 0.96 MB 1.00
Multithreaded Pi Efficiency - 2 Threads 91.53 % (±8.28 %) 88.34 % (±8.83 %) 1.04
Multithreaded Pi Efficiency - 4 Threads 44.59 % (±3.54 %) 43.96 % (±2.87 %) 1.01
Multithreaded Pi Efficiency - 8 Threads 26.20 % (±1.76 %) 25.67 % (±1.88 %) 1.02
micro_benchmarks Build Time 107.38 s 107.03 s 1.00
micro_benchmarks File Size 0.97 MB 0.97 MB 1.00
Scheduling time - 1 thread 68.01 ticks (±3.46 ticks) 68.40 ticks (±2.80 ticks) 0.99
Scheduling time - 2 threads 39.51 ticks (±5.17 ticks) 36.75 ticks (±3.77 ticks) 1.08
Micro - Time for syscall (getpid) 3.01 ticks (±0.25 ticks) 3.72 ticks (±0.30 ticks) 0.81
Memcpy speed - (built_in) block size 4096 67624.66 MByte/s (±48174.20 MByte/s) 65936.06 MByte/s (±46962.77 MByte/s) 1.03
Memcpy speed - (built_in) block size 1048576 29642.26 MByte/s (±24468.98 MByte/s) 29711.39 MByte/s (±24468.80 MByte/s) 1.00
Memcpy speed - (built_in) block size 16777216 28463.72 MByte/s (±23700.48 MByte/s) 28572.31 MByte/s (±23801.85 MByte/s) 1.00
Memset speed - (built_in) block size 4096 68500.32 MByte/s (±48761.11 MByte/s) 66224.66 MByte/s (±47148.82 MByte/s) 1.03
Memset speed - (built_in) block size 1048576 30405.49 MByte/s (±24905.59 MByte/s) 30455.87 MByte/s (±24906.53 MByte/s) 1.00
Memset speed - (built_in) block size 16777216 29099.21 MByte/s (±24018.80 MByte/s) 29305.64 MByte/s (±24204.32 MByte/s) 0.99
Memcpy speed - (rust) block size 4096 58999.27 MByte/s (±43346.09 MByte/s) 59811.41 MByte/s (±43763.94 MByte/s) 0.99
Memcpy speed - (rust) block size 1048576 29540.25 MByte/s (±24439.40 MByte/s) 29629.02 MByte/s (±24504.10 MByte/s) 1.00
Memcpy speed - (rust) block size 16777216 28315.82 MByte/s (±23599.19 MByte/s) 28401.51 MByte/s (±23669.15 MByte/s) 1.00
Memset speed - (rust) block size 4096 59830.04 MByte/s (±43865.04 MByte/s) 60851.37 MByte/s (±44540.62 MByte/s) 0.98
Memset speed - (rust) block size 1048576 30292.03 MByte/s (±24868.53 MByte/s) 30426.48 MByte/s (±24959.19 MByte/s) 1.00
Memset speed - (rust) block size 16777216 29089.30 MByte/s (±24040.80 MByte/s) 29175.93 MByte/s (±24111.52 MByte/s) 1.00
alloc_benchmarks Build Time 104.28 s 102.72 s 1.02
alloc_benchmarks File Size 0.94 MB 0.89 MB 1.05
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 9458.16 Ticks (±847.78 Ticks) 8547.70 Ticks (±113.80 Ticks) 1.11
Allocations - Average Allocation time (no fail) 9458.16 Ticks (±847.78 Ticks) 8547.70 Ticks (±113.80 Ticks) 1.11
Allocations - Average Deallocation time 1765.18 Ticks (±1155.59 Ticks) 1117.79 Ticks (±513.75 Ticks) 1.58
mutex_benchmark Build Time 110.73 s 111.82 s 0.99
mutex_benchmark File Size 0.97 MB 0.97 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 12.80 ns (±0.72 ns) 13.38 ns (±0.85 ns) 0.96
Mutex Stress Test Average Time per Iteration - 2 Threads 15.68 ns (±0.90 ns) 15.68 ns (±0.79 ns) 1

This comment was automatically generated by workflow using github-action-benchmark.

@jounathaen jounathaen force-pushed the idle-task-tls branch 9 times, most recently from 0403ac4 to 6abc538 Compare February 5, 2026 19:13
@mkroening mkroening self-assigned this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thread local storage is not initialized for the idle/root task -> rftrace failing with infinite recursion in idle task

2 participants