A comprehensive toolkit for working with Stable Diffusion VAE models, providing image preprocessing utilities and model loading capabilities.
- 🖼️ Image Processing: Efficient image preprocessing and tensor conversions optimized for VAE models
- 🚀 Model Loading: Easy loading of Stable Diffusion VAE models with automatic device selection
- ⚡ Performance: Built-in caching and optimized transforms for faster processing
- 🔧 Flexible API: Both high-level and low-level APIs for different use cases
- 🛡️ Type Safety: Full type hints for better IDE support and code reliability
- 🔐 Secure: No hardcoded tokens - authentication via environment variables only
pip install vae-toolkitFor development:
pip install vae-toolkit[dev]For testing:
pip install vae-toolkit[test]For all extras:
pip install vae-toolkit[all]from vae_toolkit import load_and_preprocess_image, tensor_to_pil
# Load and preprocess an image for VAE encoding
tensor, original_pil = load_and_preprocess_image("path/to/image.png", target_size=512)
print(f"Tensor shape: {tensor.shape}") # [1, 3, 512, 512]
print(f"Value range: [{tensor.min():.2f}, {tensor.max():.2f}]") # [-1.00, 1.00]
# Convert tensor back to PIL image
reconstructed = tensor_to_pil(tensor)
reconstructed.save("reconstructed.png")from vae_toolkit import VAELoader
# Initialize the loader
loader = VAELoader()
# Load Stable Diffusion v1.5 VAE
vae, device = loader.load_sd_vae(
model_name="sd15", # or "sd14" for v1.4
device="auto" # automatically selects GPU/CPU
)
print(f"Model loaded on: {device}")import torch
from vae_toolkit import load_and_preprocess_image, VAELoader, tensor_to_pil
# Setup
loader = VAELoader()
vae, device = loader.load_sd_vae("sd14")
# Load and preprocess image
image_tensor, original = load_and_preprocess_image("input.jpg", target_size=512)
image_tensor = image_tensor.to(device)
# Encode to latent space
with torch.no_grad():
latent = vae.encode(image_tensor).latent_dist.sample()
print(f"Latent shape: {latent.shape}") # [1, 4, 64, 64]
# Decode back to image
with torch.no_grad():
decoded = vae.decode(latent).sample
# Save result
output_image = tensor_to_pil(decoded)
output_image.save("output.png")from vae_toolkit import ImageProcessor
# Create a processor with custom settings
processor = ImageProcessor(
target_size=768,
normalize_mean=(0.5, 0.5, 0.5),
normalize_std=(0.5, 0.5, 0.5)
)
# Process multiple images with the same settings
for image_path in image_paths:
tensor, original = processor.load_and_preprocess(image_path)
# Process tensor...To use models from Hugging Face Hub, set your token as an environment variable:
export HF_TOKEN="your_huggingface_token"
# or
export HUGGING_FACE_HUB_TOKEN="your_huggingface_token"Loads and preprocesses an image for VAE encoding.
Parameters:
image_path(str | Path): Path to the input imagetarget_size(int): Target size for the square output image
Returns:
tuple[torch.Tensor, PIL.Image]: Preprocessed tensor and original PIL image
Converts a tensor to PIL Image format.
Parameters:
tensor(torch.Tensor): Input tensor with shape [C, H, W] or [1, C, H, W]
Returns:
PIL.Image: RGB PIL image
Converts a PIL image to tensor format.
Parameters:
pil_image(PIL.Image): Input PIL imagetarget_size(int | None): Optional target size for resizingnormalize(bool): Whether to normalize to [-1, 1] range
Returns:
torch.Tensor: Tensor with shape [3, H, W]
Main class for loading and managing Stable Diffusion VAE models.
Methods:
-
load_sd_vae(model_name="sd14", device="auto", token=None, use_cache=True)- Loads a Stable Diffusion VAE model
- Returns:
tuple[AutoencoderKL, torch.device]
-
get_optimal_device(preferred_device="auto")- Determines the best available device
- Returns:
torch.device
-
clear_cache()- Clears the model cache to free memory
Gets configuration for a specific model.
Lists all available model identifiers.
Adds a custom model configuration.
sd14: Stable Diffusion v1.4 VAEsd15: Stable Diffusion v1.5 VAE
The toolkit includes custom exceptions for better error handling:
from vae_toolkit import ImageProcessingError
try:
tensor, _ = load_and_preprocess_image("invalid_path.jpg")
except ImageProcessingError as e:
print(f"Failed to process image: {e}")- Use caching: The VAELoader caches models by default to avoid reloading
- Batch processing: Process multiple images together when possible
- Device selection: Use "auto" for automatic GPU/CPU selection
- Memory management: Call
loader.clear_cache()when switching between models
- Python >= 3.8
- PyTorch >= 2.0.0
- torchvision >= 0.15.0
- Pillow >= 9.0.0
- numpy >= 1.20.0
- diffusers >= 0.20.0
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Run tests with pytest:
# Install test dependencies
pip install vae-toolkit[test]
# Run tests
pytest
# Run with coverage
pytest --cov=vae_toolkitThis project is licensed under the MIT License - see the LICENSE file for details.
If you use this toolkit in your research, please cite:
@software{vae-toolkit,
author = {Yus314},
title = {VAE Toolkit: Stable Diffusion VAE utilities},
year = {2024},
url = {https://github.com/mdipcit/vae-toolkit}
}- Built on top of the amazing diffusers library
- Inspired by the Stable Diffusion community
For issues and questions, please use the GitHub Issues page.