Skip to content

Multi-algorithm compression & decompression library for C, C++ & Python

License

Notifications You must be signed in to change notification settings

dupontcyborg/compress-utils

Repository files navigation

compress-utils

Algorithms Languages License

GitHub Actions Workflow Status GitHub Release Code Size

compress-utils aims to simplify data compression by offering a unified interface for various algorithms and languages, while maintaining best-in-class performance.

Features

Built-in Compression Algorithms

Algorithm Description Benchmarks
brotli General-purpose with high-to-very-high compression rates Benchmarks
bzip2 Very-high compression ratio algorithm Benchmarks
lz4 Very-high speed compression algorithm Benchmarks
zlib General-purpose, widely-used (compatible with gzip) Benchmarks
zstd High-speed, high-ratio compression algorithm Benchmarks
xz/lzma Very-high compression ratio algorithm Benchmarks

Supported Languages

Language Package Code Examples & Docs
C++ TBD C++ API
C TBD C API
Python compress-utils Python API

Usage

This project aims to bring a unified interface across all algorithms & all languages (within reason). To make this possible across all targeted languages, the compress-utils API is made available in three flavors:

  • Object-Oriented (OOP) - For one-shot compression/decompression
  • Functional - For one-shot compression/decompression
  • Streaming - For processing data in chunks (large files, network streams, etc.)

One-Shot Compression

Both of these APIs are made dead simple. Here's an OOP example in Python:

from compress_utils import compressor

# Create a 'zstd' compressor object
comp = compressor('zstd')

# Compress data
compressed_data = comp.compress(data)

# Compress data with a compression level (1-10)
compressed_data = comp.compress(data, 5)

# Decompress data
decompressed_data = comp.decompress(compressed_data)

Functional usage is similarly simple:

from compress_utils import compress, decompress

# Compress data using `zstd`
compressed_data = compress(data, 'zstd')

# Compress data with a compression level (1-10)
compressed_data = compress(data, 'zstd', 5)

# Decompress data
decompressed_data = decompress(compressed_data, 'zstd')

Streaming API

For processing large data in chunks or when data arrives incrementally (e.g., from network streams, large files, or real-time data), use the streaming API:

from compress_utils import CompressStream, DecompressStream

# Create a compression stream
stream = CompressStream('zstd', level=3)

# Process data in chunks
compressed_chunks = []
for chunk in data_chunks:
    compressed_chunks.append(stream.compress(chunk))

# Finalize compression (important!)
compressed_chunks.append(stream.finish())

# Decompression works similarly
decompress_stream = DecompressStream('zstd')
decompressed_chunks = []
for chunk in compressed_chunks:
    decompressed_chunks.append(decompress_stream.decompress(chunk))
decompressed_chunks.append(decompress_stream.finish())

Benefits of streaming:

  • Process files that don't fit in memory
  • Handle network data that arrives in chunks
  • Real-time compression/decompression
  • Reduced memory footprint for large datasets

Language-Specific Examples

You can find language-specific code examples below:

Setup

Install From Package Manager

Python

pip install compress-utils

Build From Source

  1. Install pre-requisites
  • CMake
  • Conda (if building Python binding)
  1. Clone repo
git clone https://github.com/dupontcyborg/compress-utils.git
cd compress-utils
  1. Activate Conda environment (if building Python binding)
# If using Conda
conda env create -f environment.yml
conda activate compress-utils

# If using Mamba
mamba env create -f environment.yml
mamba activate compress-utils
  1. Run build script

For Linux/macOS:

build.sh

For Windows:

powershell.exe -file build.ps1

The built library/libraries will be in dist/<language>

A number of configuration parameters are available for build.sh:

  • --clean - performs a clean rebuild of compress-utils
  • --algorithms= - set which algorithms to include in the build, if not all (e.g., build.sh --algorithms=brotli,zlib,zstd)
  • --languages= - set which language bindings to build, if not all (e.g., build.sh --languages=python,js)
  • --release - build release version (higher optimization level)
  • --skip-tests - skip building & running unit tests

Benchmarks

To be added

License

This project is distributed under the MIT License. Read more >

Third-Party Code

This project utilizes several open-source compression algorithms. Read more >

About

Multi-algorithm compression & decompression library for C, C++ & Python

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •