Add configurable input buffer size for decompression (--ibuf-size)#4537
Add configurable input buffer size for decompression (--ibuf-size)#4537Joy-Majumder wants to merge 1 commit intofacebook:devfrom
Conversation
This feature addresses performance issues when decompressing large files on mechanical hard drives with excessive disk seek operations. Changes: - Add 'inputBufferSize' field to FIO_prefs_t in fileio_types.h - Implement FIO_setInputBufferSize() setter function - Use configured buffer size in FIO_createDResources() - Add --ibuf-size command-line option with help text The option allows users to specify input buffer size (default: 0 for automatic). For large files on slow drives, using --ibuf-size=1024M to --ibuf-size=5120M can significantly improve throughput by reducing disk seek operations. Backward compatible: defaults to ZSTD_DStreamInSize when not specified.
|
Hi @Joy-Majumder! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Description
This PR adds a new command-line option to zstd to allow users to configure the input buffer size during decompression.
Problem
When decompressing large files (1TB+) on mechanical hard drives, both reading and writing occur simultaneously, resulting in:
Solution
This feature enables users to specify a larger input buffer size (1-5GB), allowing sequential disk reads to fill the buffer before decompression begins. This reduces disk seek operations and improves overall throughput.
Usage
Changes
Testing
Performance Impact
Expected improvements for the scenario described (12TB file on mechanical drive):
Actual improvements depend on disk characteristics, available RAM, compression ratio, and CPU speed.