gf256: precompute full multiplication tables by mplekh · Pull Request #33 · itzmeanjan/rlnc

mplekh · 2026-01-29T11:17:39Z

use table lookup in scalar fallback

Replace per-call GF(256) multiplication with a compile-time generated 256×256 lookup table
and use it in the scalar gf256_mul_vec_by_scalar_then_add_into_vec fallback path.

This removes runtime table construction and per-element GF arithmetic, reducing the hot loop to indexed loads and XORs.
SIMD dispatch remains unchanged; SIMD backends continue to use their specialized implementations.

Performance impact:
Significant improvements on non-SIMD targets (e.g. WASM, legacy CPUs):
AMD Phenom II x6:
encode/1.0MB/16-pieces time: [1.1116 ms 1.1245 ms 1.1343 ms]
thrpt: [936.70 MiB/s 944.93 MiB/s 955.83 MiB/s]
change:
time: [−57.581% −56.812% −56.132%] (p = 0.00 < 0.05)
thrpt: [+127.96% +131.55% +135.74%]
Performance has improved.

WASM (Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz):
cargo bench --target wasm32-wasip1 --bench full_rlnc_encoder
before:
encode/1.0MB/16-pieces time: [2.1627 ms 2.1706 ms 2.1798 ms]
thrpt: [487.44 MiB/s 489.52 MiB/s 491.29 MiB/s]
encode/1.0MB/32-pieces time: [2.3975 ms 2.5320 ms 2.6811 ms]
thrpt: [384.66 MiB/s 407.31 MiB/s 430.16 MiB/s]
after:
encode/1.0MB/16-pieces time: [1.1017 ms 1.1120 ms 1.1242 ms]
thrpt: [945.11 MiB/s 955.56 MiB/s 964.42 MiB/s]
encode/1.0MB/32-pieces time: [1.1202 ms 1.1278 ms 1.1372 ms]
thrpt: [906.85 MiB/s 914.45 MiB/s 920.67 MiB/s]

…scalar fallback Replace per-call GF(256) multiplication with a compile-time generated 256×256 lookup table and use it in the scalar gf256_mul_vec_by_scalar_then_add_into_vec fallback path. This removes runtime table construction and per-element GF arithmetic, reducing the hot loop to indexed loads and XORs. SIMD dispatch remains unchanged; SIMD backends continue to use their specialized implementations. Performance impact: Significant improvements on non-SIMD targets (e.g. WASM, legacy CPUs): Benchmark on AMD Phenom II X6 (encode, 1 MB): 16 pieces: −56–58% time, +128–136% throughput 32 pieces: −55–56% time, +123–126% throughput 64 pieces: −55–56% time, +124–130% throughput

itzmeanjan · 2026-01-29T11:55:01Z

Hello @mplekh , thanks for the PR.

I see we are not doing GF2p8 multiplication for non-simd case, actually we are looking up from two tables. And those tables are much smaller in size. In your optimization we need to store extra 64kB for constants. Anyway good speedup.

How did you do this benchmark evaluation? I'm curious to check it myself.

mplekh · 2026-01-29T12:19:59Z

Hi Anjan,
I've benchmarked on old PC (AMD Phenom II x6, no SIMD) by running "make bench" on clean repo and after applying this change. For WASM, I've run benchmark on newer PC with SIMD just to make sure WASM does not benefit from SIMD specializations and uses fallback. Command used: "cargo bench --target wasm32-wasip1 --bench full_rlnc_encoder"
Results are very similar to benchmark on legacy CPU.
Also valgrind instruction count could give insight on perfermance change, for profiling I used "valgrind --tool=callgrind --dump-instr=yes ./target/optimized/examples/full_rlnc" command. Before change 22M IR, after - 12M.
I'll look into SIMD half-order tables, currently I assume there will be two loads per byte with them, while full table needs only one load, it's cache-local since only 256 bytes per call are needed, will fit into L1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gf256: precompute full multiplication tables#33

gf256: precompute full multiplication tables#33
mplekh wants to merge 1 commit intoitzmeanjan:mainfrom
mplekh:gf256-precomputed-mul-table

mplekh commented Jan 29, 2026 •

edited

Loading

Uh oh!

itzmeanjan commented Jan 29, 2026

Uh oh!

mplekh commented Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mplekh commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itzmeanjan commented Jan 29, 2026

Uh oh!

mplekh commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mplekh commented Jan 29, 2026 •

edited

Loading

mplekh commented Jan 29, 2026 •

edited

Loading