perf: various optimizations to eliminate branch misprediction in hash_utils by notashes · Pull Request #20168 · apache/datafusion

notashes · 2026-02-05T12:58:18Z

Which issue does this PR close?

Closes Optimize rehash/null branching in hash util functions #20152

Rationale for this change

Compile time monomorphization helps bring rehash outside the hot loop where it's not required.

What changes are included in this PR?

Currently the PR adds a specialized hash_dictionary_inner() function with const generic parameters that check for nulls in keys, values. It also handles specific edge cases of just nulls in keys or values.

Are these changes tested?

There are no additional tests yet. But I will add 'em as I continue. The benchmark results seem promising.
here's cargo bench --bench with_hashes -- dictionary for

origin/main

Gnuplot not found, using plotters backend
Benchmarking dictionary_utf8_int32: single, no nulls
Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0461 s (470k iterations)
Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing
dictionary_utf8_int32: single, no nulls
                        time:   [10.668 µs 10.700 µs 10.734 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

Benchmarking dictionary_utf8_int32: single, nulls
Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0428 s (409k iterations)
Benchmarking dictionary_utf8_int32: single, nulls: Analyzing
dictionary_utf8_int32: single, nulls
                        time:   [12.269 µs 12.293 µs 12.322 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking dictionary_utf8_int32: multiple, no nulls
Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0864 s (162k iterations)
Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing
dictionary_utf8_int32: multiple, no nulls
                        time:   [31.357 µs 31.426 µs 31.506 µs]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking dictionary_utf8_int32: multiple, nulls
Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0842 s (141k iterations)
Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing
dictionary_utf8_int32: multiple, nulls
                        time:   [36.060 µs 36.135 µs 36.220 µs]
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

feat/brunch-prediction

Gnuplot not found, using plotters backend
Benchmarking dictionary_utf8_int32: single, no nulls
Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0176 s (1.1M iterations)
Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing
dictionary_utf8_int32: single, no nulls
                        time:   [4.7186 µs 4.7496 µs 4.7821 µs]
                        change: [−55.829% −55.537% −55.240%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking dictionary_utf8_int32: single, nulls
Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0295 s (712k iterations)
Benchmarking dictionary_utf8_int32: single, nulls: Analyzing
dictionary_utf8_int32: single, nulls
                        time:   [6.9647 µs 7.0426 µs 7.1281 µs]
                        change: [−43.806% −43.445% −42.993%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  10 (10.00%) high severe

Benchmarking dictionary_utf8_int32: multiple, no nulls
Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0600 s (348k iterations)
Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing
dictionary_utf8_int32: multiple, no nulls
                        time:   [13.365 µs 13.384 µs 13.404 µs]
                        change: [−57.610% −57.464% −57.313%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

Benchmarking dictionary_utf8_int32: multiple, nulls
Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0569 s (242k iterations)
Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing
dictionary_utf8_int32: multiple, nulls
                        time:   [20.785 µs 20.962 µs 21.173 µs]
                        change: [−42.370% −42.001% −41.579%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  14 (14.00%) high severe

Are there any user-facing changes?

adriangb · 2026-02-05T14:28:18Z

it would be great to first merge a benchmark into main (if one doesn't already exist) to show an improvement

notashes · 2026-02-05T18:08:23Z

hey @adriangb, the benchmarks to test changes already exist in main. It was merged with the PR #19373
The benchmark (with_hashes) already measure two key scenarios:

single vs multiple columns (for rehash/combine logic)
with nulls vs without nulls (in keys)

here are some numbers:

group                                                main                                   optimized
-----                                                ----                                   ---------
dictionary_utf8_int32: multiple, no nulls            2.73     31.1±0.46µs        ? ?/sec    1.00     11.4±0.83µs        ? ?/sec
dictionary_utf8_int32: multiple, nulls               1.92     39.4±0.61µs        ? ?/sec    1.00     20.5±0.56µs        ? ?/sec
dictionary_utf8_int32: single, no nulls              2.48     10.5±0.19µs        ? ?/sec    1.00      4.2±0.57µs        ? ?/sec
dictionary_utf8_int32: single, nulls                 1.74     12.1±0.18µs        ? ?/sec    1.00      7.0±0.31µs        ? ?/sec

But I'm curious do we also want to benchmark cases where the values contain nulls? let me know what you think

Dandandan · 2026-02-05T18:30:05Z

run benchmark with_hashes

alamb-ghbot · 2026-02-05T18:30:08Z

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat/brunch-prediction (2f35e0e) to 639971a diff
BENCH_NAME=with_hashes
BENCH_COMMAND=cargo bench --features=parquet --bench with_hashes
BENCH_FILTER=
BENCH_BRANCH_NAME=feat_brunch-prediction
Results will be posted here when complete

alamb-ghbot · 2026-02-05T18:40:49Z

🤖: Benchmark completed

Details

group                                        feat_brunch-prediction                 main
-----                                        ----------------------                 ----
dictionary_utf8_int32: multiple, no nulls    1.00     25.1±0.05µs        ? ?/sec    2.95     74.0±0.61µs        ? ?/sec
dictionary_utf8_int32: multiple, nulls       1.00     66.5±0.12µs        ? ?/sec    1.63    108.4±0.29µs        ? ?/sec
dictionary_utf8_int32: single, no nulls      1.00      8.2±0.03µs        ? ?/sec    3.48     28.6±0.10µs        ? ?/sec
dictionary_utf8_int32: single, nulls         1.00     23.1±0.07µs        ? ?/sec    1.64     38.0±0.59µs        ? ?/sec
int64: multiple, no nulls                    1.00     38.7±0.06µs        ? ?/sec    1.00     38.8±0.39µs        ? ?/sec
int64: multiple, nulls                       1.24     74.2±0.47µs        ? ?/sec    1.00     59.8±0.61µs        ? ?/sec
int64: single, no nulls                      1.00     11.3±0.03µs        ? ?/sec    1.00     11.4±0.22µs        ? ?/sec
int64: single, nulls                         1.08     23.9±0.19µs        ? ?/sec    1.00     22.2±0.64µs        ? ?/sec
large_utf8: multiple, no nulls               1.00    221.9±1.42µs        ? ?/sec    1.01    223.8±5.79µs        ? ?/sec
large_utf8: multiple, nulls                  1.00   270.8±15.07µs        ? ?/sec    1.02   276.6±11.15µs        ? ?/sec
large_utf8: single, no nulls                 1.00     67.8±2.11µs        ? ?/sec    1.01     68.5±5.63µs        ? ?/sec
large_utf8: single, nulls                    1.00     77.8±0.27µs        ? ?/sec    1.00     78.0±0.26µs        ? ?/sec
utf8: multiple, no nulls                     1.00    236.5±3.39µs        ? ?/sec    1.02    240.9±1.63µs        ? ?/sec
utf8: multiple, nulls                        1.00    258.0±0.45µs        ? ?/sec    1.06    274.7±2.21µs        ? ?/sec
utf8: single, no nulls                       1.00     68.0±0.22µs        ? ?/sec    1.00     68.1±0.63µs        ? ?/sec
utf8: single, nulls                          1.00     76.9±0.28µs        ? ?/sec    1.02     78.5±3.26µs        ? ?/sec
utf8_view (small): multiple, no nulls        1.00     47.6±0.27µs        ? ?/sec    1.00     47.7±1.26µs        ? ?/sec
utf8_view (small): multiple, nulls           1.00     78.4±0.88µs        ? ?/sec    1.00     78.2±0.33µs        ? ?/sec
utf8_view (small): single, no nulls          1.00     13.9±0.25µs        ? ?/sec    1.00     13.9±0.44µs        ? ?/sec
utf8_view (small): single, nulls             1.00     23.7±0.19µs        ? ?/sec    1.00     23.7±0.43µs        ? ?/sec
utf8_view: multiple, no nulls                1.00    229.9±3.76µs        ? ?/sec    1.01    232.9±1.88µs        ? ?/sec
utf8_view: multiple, nulls                   1.00    225.7±1.43µs        ? ?/sec    1.00    224.7±1.63µs        ? ?/sec
utf8_view: single, no nulls                  1.01     74.8±6.15µs        ? ?/sec    1.00     74.2±0.90µs        ? ?/sec
utf8_view: single, nulls                     1.01     72.0±0.21µs        ? ?/sec    1.00     71.3±0.29µs        ? ?/sec

perf: optimize for better branch prediction within hash_dictionary

5cebcd2

notashes marked this pull request as draft February 5, 2026 12:58

github-actions bot added the common Related to common crate label Feb 5, 2026

notashes marked this pull request as ready for review February 5, 2026 16:17

notashes marked this pull request as draft February 5, 2026 16:49

Merge branch 'main' into feat/brunch-prediction

2f35e0e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: various optimizations to eliminate branch misprediction in hash_utils#20168

perf: various optimizations to eliminate branch misprediction in hash_utils#20168
notashes wants to merge 2 commits intoapache:mainfrom
notashes:feat/brunch-prediction

notashes commented Feb 5, 2026 •

edited

Loading

Uh oh!

adriangb commented Feb 5, 2026

Uh oh!

notashes commented Feb 5, 2026 •

edited

Loading

Uh oh!

Dandandan commented Feb 5, 2026

Uh oh!

alamb-ghbot commented Feb 5, 2026

Uh oh!

alamb-ghbot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

notashes commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb commented Feb 5, 2026

Uh oh!

notashes commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dandandan commented Feb 5, 2026

Uh oh!

alamb-ghbot commented Feb 5, 2026

Uh oh!

alamb-ghbot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

notashes commented Feb 5, 2026 •

edited

Loading

notashes commented Feb 5, 2026 •

edited

Loading