Support lz4 compression in hdfs by FrankChen021 · Pull Request #18982 · apache/druid

FrankChen021 · 2026-02-04T03:57:31Z

Description

Currently zip(default level 6) is used to compress all files when uploading to hdfs.
the zip compression takes cpu resources and time to upload large files. Another consequence is that it significantly increases task duration and may introduce task pending during task rolling over, which also introduces high lag in kafka.
This PR adds support to using lz4 compression(fastest level) when uploading and decompression during downloading from hdfs.

Metrics Comparison

because task duration is reduced, we didn't observe task pending

push duration is significantly reduced from up to 2 minutes to 30 seconds while the pull duration(pull+decompression) keep the same

file size
although the lz4 may increase the size of segment stored in hdfs, maybe 10% - 20%, but storage cost is acceptable
task hand off
task hand off was reduced because task takes fewer time to compress the data

cpu resources
during the push phase of ingestion tasks, the lz4 does not take more cpu resource compared to zip

and for historicals, we didn't observe any cpu resources increase in a cluster where lz4 formatted segments are frequently downloaded

New Configuration

druid.storage.compressionFormat is added, if not given, it defaults to zip to keep current behaviour

Metrics

Added the following metrics:

hdfs/pull/size
hdfs/pull/duration
hdfs/push/size
hdfs/push/duration

This PR has:

Signed-off-by: Frank Chen <frank.chen021@outlook.com>

processing/src/test/java/org/apache/druid/java/util/common/CompressionUtilsTest.java

processing/src/main/java/org/apache/druid/utils/CompressionUtils.java

clintropolis · 2026-02-04T06:48:21Z

quite a lot of the segment already uses lz4 by default so curious how effective running the whole thing through lz4 again would be, did you do any experiments to compare with just not 'externally' compressing at all in deep storage?

S3 support for honoring druid.storage.zip was added in #18544, and we've been using it in some of our clusters alongside the virtual storage functionality introduced in #18176, since not having to decompress the segments to load them speeds up things quite a lot. It trades some extra space, but has been worth it for our use case.

Also, the new 'V10' segment format introduced in #18880 was built around some future ideas i have about improving the virtual storage functionality to start only downloading the v10 metadata to enable partial downloads of only the parts of the segment that are needed to take part in a query, which will require the segment not be 'externally' compressed in deep storage (columns inside the segment can obviously still be compressed).

Fwiw, I'm not necessarily opposed to making this 'external' segment compression stuff more configurable (but it is certainly a bit tedious with the current interfaces since it needs to be handled by each implementation of segment pusher/puller separately).

FrankChen021 · 2026-02-04T08:25:11Z

@clintropolis thanks for the comments.

quite a lot of the segment already uses lz4 by default so curious how effective running the whole thing through lz4 again would be, did you do any experiments to compare with just not 'externally' compressing at all in deep storage?

I think you're ask for 2 questions here.
the first is, after lz4 compression on columns, will the compression gain? here's a table showing the raw segments and the size under zip/lz4 compression again

You can see that decompressed segment has a size of 11GB, in the HDFS(by the current default zip), it has a size of 2.48GB

Another question I think you're asking is wheter we did experiments to upload the raw files into deep storage?
No.
1st, above table shows the compression over the dir has gain,
2nd, HDFS deep storage currently does not support upload file by file.

As for v10, I do know that we can support partial download, but HDFS as deep storage is different from object storage,
for HDFS, it can't support too many files as object storage, and it can't provide high concurrent access as object storage, querying data directly from hdfs is much slower than querying data from object storage. For us, hdfs is main storage and I don't see there's way for us to migrate from hdfs to object storage, we will stay on hdfs for very long time.

since it needs to be handled by each implementation of segment pusher/puller

No need to worry about it. the compression configuration is only provided on hdfs, and it's implemented in the hdfs, not on a higher level which requires all deep storage to do so.

clintropolis · 2026-02-05T05:44:44Z

You can see that decompressed segment has a size of 11GB, in the HDFS(by the current default zip), it has a size of 2.48GB

interesting, that is a lot bigger of a difference than i expected, though perhaps if there are a lot of complex columns that are not using compression (off by default added in #16863) then a difference of that size makes sense. Basically where I'm thking things are heading is moving away from generic compression in favor of all of the contents of the segment file being compressed.

Another question I think you're asking is wheter we did experiments to upload the raw files into deep storage?

Ah, this was mostly just me wondering about uncompressed sizing, though i expect most of the perf stuff would look better not having to do any extra compression/decompression, for your segments at least it seems like some additional stuff would need to happen to make that viable.

As for v10, I do know that we can support partial download, but HDFS as deep storage is different from object storage,
for HDFS, it can't support too many files as object storage, and it can't provide high concurrent access as object storage, querying data directly from hdfs is much slower than querying data from object storage. For us, hdfs is main storage and I don't see there's way for us to migrate from hdfs to object storage, we will stay on hdfs for very long time.

V10 segments store everything in a single file, druid.segment, so in terms of count it should be no different than having a single .zip or whatever that there is today with externally compressed v9 segments. Though it is fair that partial downloads would potentially increase concurrent access, however with smaller fetches so maybe not quite as bad. None of the partial stuff exists yet though, and virtual storage mode is optional.

FrankChen021 added 8 commits February 4, 2026 10:17

Support lz4 compression for segment

75e5be6

Add pull metrics

1fa9666

Add tests

cb2b03b

Add test

10299cb

Signed-off-by: Frank Chen <frank.chen021@outlook.com>

Clean

66b5949

Clean

338eba1

Add test

c72fbba

Update doc

e7b4c27

FrankChen021 added the Area - Ingestion label Feb 4, 2026

github-actions bot added the Area - Documentation label Feb 4, 2026

github-advanced-security bot found potential problems Feb 4, 2026

View reviewed changes

FrankChen021 added 2 commits February 4, 2026 14:28

Fix test

edf9a28

Remove unused

b556296

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support lz4 compression in hdfs#18982

Support lz4 compression in hdfs#18982
FrankChen021 wants to merge 10 commits intoapache:masterfrom
FrankChen021:lz4-compression-cherrypick

FrankChen021 commented Feb 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clintropolis commented Feb 4, 2026

Uh oh!

FrankChen021 commented Feb 4, 2026 •

edited

Loading

Uh oh!

clintropolis commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FrankChen021 commented Feb 4, 2026

Description

Metrics Comparison

New Configuration

Metrics

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clintropolis commented Feb 4, 2026

Uh oh!

FrankChen021 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clintropolis commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FrankChen021 commented Feb 4, 2026 •

edited

Loading