compress-utils aims to simplify data compression by offering a unified interface for various algorithms and languages, while maintaining best-in-class performance.
- 6 built-in data compression algorithms
- 3 languages supported
- Standardized API across all algorithms & languages
- Portable & cross-platform (Linux, macOS, Windows)
- Prebuilt binaries available on major package managers or can be built from source
- Native compression & decompression performance
- Lightweight binary (30 kB with single algorithm, 4 MB with all)
| Algorithm | Description | Benchmarks |
|---|---|---|
| brotli | General-purpose with high-to-very-high compression rates | Benchmarks |
| bzip2 | Very-high compression ratio algorithm | Benchmarks |
| lz4 | Very-high speed compression algorithm | Benchmarks |
| zlib | General-purpose, widely-used (compatible with gzip) |
Benchmarks |
| zstd | High-speed, high-ratio compression algorithm | Benchmarks |
| xz/lzma | Very-high compression ratio algorithm | Benchmarks |
| Language | Package | Code Examples & Docs |
|---|---|---|
| C++ | TBD | C++ API |
| C | TBD | C API |
| Python | compress-utils | Python API |
This project aims to bring a unified interface across all algorithms & all languages (within reason). To make this possible across all targeted languages, the compress-utils API is made available in three flavors:
- Object-Oriented (OOP) - For one-shot compression/decompression
- Functional - For one-shot compression/decompression
- Streaming - For processing data in chunks (large files, network streams, etc.)
Both of these APIs are made dead simple. Here's an OOP example in Python:
from compress_utils import compressor
# Create a 'zstd' compressor object
comp = compressor('zstd')
# Compress data
compressed_data = comp.compress(data)
# Compress data with a compression level (1-10)
compressed_data = comp.compress(data, 5)
# Decompress data
decompressed_data = comp.decompress(compressed_data)Functional usage is similarly simple:
from compress_utils import compress, decompress
# Compress data using `zstd`
compressed_data = compress(data, 'zstd')
# Compress data with a compression level (1-10)
compressed_data = compress(data, 'zstd', 5)
# Decompress data
decompressed_data = decompress(compressed_data, 'zstd')For processing large data in chunks or when data arrives incrementally (e.g., from network streams, large files, or real-time data), use the streaming API:
from compress_utils import CompressStream, DecompressStream
# Create a compression stream
stream = CompressStream('zstd', level=3)
# Process data in chunks
compressed_chunks = []
for chunk in data_chunks:
compressed_chunks.append(stream.compress(chunk))
# Finalize compression (important!)
compressed_chunks.append(stream.finish())
# Decompression works similarly
decompress_stream = DecompressStream('zstd')
decompressed_chunks = []
for chunk in compressed_chunks:
decompressed_chunks.append(decompress_stream.decompress(chunk))
decompressed_chunks.append(decompress_stream.finish())Benefits of streaming:
- Process files that don't fit in memory
- Handle network data that arrives in chunks
- Real-time compression/decompression
- Reduced memory footprint for large datasets
You can find language-specific code examples below:
pip install compress-utils- Install pre-requisites
- CMake
- Conda (if building Python binding)
- Clone repo
git clone https://github.com/dupontcyborg/compress-utils.git
cd compress-utils- Activate Conda environment (if building Python binding)
# If using Conda
conda env create -f environment.yml
conda activate compress-utils
# If using Mamba
mamba env create -f environment.yml
mamba activate compress-utils- Run build script
For Linux/macOS:
build.shFor Windows:
powershell.exe -file build.ps1The built library/libraries will be in dist/<language>
A number of configuration parameters are available for build.sh:
--clean- performs a clean rebuild ofcompress-utils--algorithms=- set which algorithms to include in the build, if not all (e.g.,build.sh --algorithms=brotli,zlib,zstd)--languages=- set which language bindings to build, if not all (e.g.,build.sh --languages=python,js)--release- build release version (higher optimization level)--skip-tests- skip building & running unit tests
To be added
This project is distributed under the MIT License. Read more >
This project utilizes several open-source compression algorithms. Read more >