Skip to content

Use XxHash3 for calculating page checksums#28116

Open
wendigo wants to merge 6 commits intomasterfrom
user/serafin/xxhash-3
Open

Use XxHash3 for calculating page checksums#28116
wendigo wants to merge 6 commits intomasterfrom
user/serafin/xxhash-3

Conversation

@wendigo
Copy link
Contributor

@wendigo wendigo commented Feb 4, 2026

Description

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Feb 4, 2026
@starburstdata-automation
Copy link

starburstdata-automation commented Feb 4, 2026

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: failure
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: Found regressions for:<br/>(presto/tpcds, q23, duration, over by 33.1%)
Benchmark Comparison to the closest run from Master: Report

@starburstdata-automation
Copy link

starburstdata-automation commented Feb 4, 2026

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_unpart.

Building Trino finished with status: success
Benchmark finished with status: success
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: NO Regression found.
Benchmark Comparison to the closest run from Master: Report

@wendigo wendigo force-pushed the user/serafin/xxhash-3 branch from 6d64b99 to d328c05 Compare February 4, 2026 19:55
@starburstdata-automation
Copy link

starburstdata-automation commented Feb 4, 2026

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: failure
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: Found regressions for:<br/>(presto/tpcds, q04, duration, over by 28.1%)<br/>(presto/tpcds, q23, duration, over by 45.8%)
Benchmark Comparison to the closest run from Master: Report

@starburstdata-automation
Copy link

starburstdata-automation commented Feb 5, 2026

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: success
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: NO Regression found.
Benchmark Comparison to the closest run from Master: Report

findepi
findepi previously approved these changes Feb 5, 2026
@findepi
Copy link
Member

findepi commented Feb 5, 2026

Description

Please fill the PR description:

  • what changes
  • why

@starburstdata-automation
Copy link

starburstdata-automation commented Feb 5, 2026

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: success
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: NO Regression found.
Benchmark Comparison to the closest run from Master: Report

@starburstdata-automation
Copy link

starburstdata-automation commented Feb 5, 2026

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_unpart.

Building Trino finished with status: success
Benchmark finished with status: success
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: NO Regression found.
Benchmark Comparison to the closest run from Master: Report

hasher.update(page.byteArray(), page.byteArrayOffset(), page.length());
int size = pages.stream().mapToInt(Slice::length).sum();
long checksum;
if (size > 16384) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that for small slices the cost of calling native outweights the benefits of a faster hashing function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could the magic live in aircompressor ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, if we cannot move this switch there, the code as it stands now absolutely requires code comment

@wendigo
Copy link
Contributor Author

wendigo commented Feb 5, 2026

cc @dain

@findepi findepi dismissed their stale review February 6, 2026 21:39

approved initial version, not current

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants