A hybrid Rust + Python library that automatically parallelizes Python functions across CPU cores using Rayon.
Raypy lets you write pure Python code, decorate it with @boost, and have it automatically execute in parallel using Rust and Rayon. No changes to your function implementation needed.
from raypy import boost
@boost
def fib(n):
if n <= 1:
return n
return fib(n-1) + fib(n-2)
# Run fib(30) across 1000 inputs in parallel
results = fib([30] * 1000)
print(results)- Zero-copy parallelization: Uses Rust + Rayon for CPU-bound workloads
- GIL management: Properly releases and reacquires Python's Global Interpreter Lock
- Simple decorator: Just add
@boostto enable parallelization - Automatic fallback: Falls back to Python execution if Rust unavailable
- Optimized release builds: LTO and single codegen unit for maximum performance
run_parallel(py_func, inputs): Core function that:- Takes a Python callable and list of integers
- Releases the GIL while running Rayon threads
- Re-acquires GIL in each thread to call the Python function
- Returns list of results
- Compiled as a PyO3 extension module
@boostdecorator: Wraps any Python function to:- Intercept list inputs
- Call Rust
run_parallelfor parallel execution - Fall back to sequential Python if needed
- Support both single integers and lists of integers
- Rust (with
cargo) - Python 3.8+
maturinfor building wheels
- Clone the repository:
git clone https://pro-grammer-SD/raypy.git
cd raypy- Build the Rust extension:
# On Windows
set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
maturin build --release
# On Linux/macOS
export PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
maturin build --release- Install the wheel:
pip install target/wheels/raypy*.whlOr develop locally:
maturin developfrom raypy import boost
@boost
def square(n):
return n * n
# Single value (uses Python)
result = square(5) # Returns 25
# Multiple values (uses Rust + Rayon)
results = square([1, 2, 3, 4, 5]) # Returns [1, 4, 9, 16, 25]from raypy import boost
@boost
def fib(n):
if n <= 1:
return n
return fib(n-1) + fib(n-2)
# Run expensive computation in parallel
nums = [30] * 8
results = fib(nums)
print(results) # [832040, 832040, 832040, 832040, 832040, 832040, 832040, 832040]from raypy import boost
import math
@boost
def is_prime(n):
if n < 2:
return 0
if n == 2:
return 1
if n % 2 == 0:
return 0
for i in range(3, int(math.sqrt(n)) + 1, 2):
if n % i == 0:
return 0
return 1
# Check primality in parallel
numbers = [97, 100, 101, 103, 104, 105, 107]
results = is_prime(numbers) # [1, 0, 1, 1, 0, 0, 1]- Best for CPU-bound functions: Functions that do heavy computation
- Input overhead: Works best with lists of 10+ items (parallelization overhead)
- Function simplicity: Simpler functions show better speedup
- Release mode: Always use
--releasefor production builds
raypy/
βββ Cargo.toml # Rust dependencies
βββ src/
β βββ lib.rs # Rust + PyO3 implementation
βββ raypy.py # Python decorator wrapper
βββ README.md # This file- pyo3 (0.21+): Python bindings
- rayon (1.8+): Data parallelism
- Python 3.8+
- User calls
@boostdecorated function with a list of integers - The decorator calls
run_parallel(func, inputs)from the Rust module - Rust releases the Python GIL with
py.allow_threads() - Rayon spawns threads and maps
py_funcacross all inputs in parallel - Each thread re-acquires the GIL to call the Python function
- Results are collected and returned to Python
- Python decorator returns the result list
This design allows:
- Full parallelization of Python code
- Proper GIL management (not holding it during parallel work)
- Transparent integration with existing Python code
- Function must accept a single
i32and returni32 - Works only with lists of integers
- Requires Rust/Cargo toolchain to build
- GIL re-acquisition per thread has overhead for very quick functions
MIT
Contributions welcome! Please ensure:
- Code builds with
maturin build --release - Examples work as documented
- Tests pass