A local, GPU-accelerated tool for creating high-quality training datasets for diffusion models (FLUX.1, SDXL, SD 1.5, etc.).
- Image Upscaling - Spandrel-based upscaling (supports ESRGAN, Real-ESRGAN, SwinIR, DAT, and more) for enhancing low-resolution source images
- Inpainting - Remove watermarks, text, and artifacts with LaMa or Stable Diffusion inpainting
- Manual rectangle masks, MobileSAM click-to-segment, and watermark preset regions
- LaMa (fast, automatic), SD 1.5, or SDXL (prompt-guided) backends
- Smart Crop - Face-centric training crops (face_focus, upper_body, full_body) with automatic face detection
- Background Removal - BiRefNet-powered automatic mask generation and transparency
- Processes individual images or all smart crops in batch
- Auto-Captioning - Multiple model options:
- Florence-2 (Base/Large) - Fast, detailed captions
- BLIP (Base/Large) - Lightweight natural language captions
- JoyCaption - High-quality descriptive captions (BF16 or 8-bit quantized)
- WD14 Taggers (ONNX) - Booru-style tags via ViT, ConvNext, or SwinV2
- Export & Push to Hub - Export to Kohya_ss, AI-Toolkit, OneTrainer, or HuggingFace formats, and push directly to the HuggingFace Hub
- Non-Destructive Workflow - Separate input/output directories preserve originals
- Local Processing - Runs entirely on your machine, no cloud dependencies
| Import | Image Tools | Captioning | Export |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
- Python 3.10+
- NVIDIA GPU with CUDA (recommended)
- uv package manager
| Model | VRAM |
|---|---|
| Florence-2 | ~4GB |
| BLIP | ~2-4GB |
| JoyCaption (BF16) | ~17GB (requires 20GB+ GPU) |
| JoyCaption (8-bit) | ~12-16GB (requires 16GB+ GPU) |
| WD14 ONNX | ~2GB |
| BiRefNet | ~4GB |
| Spandrel Upscaler | ~2-4GB |
| MobileSAM | ~1GB |
| LaMa Inpainting | ~2GB |
| SD 1.5 Inpainting | ~6GB |
| SDXL Inpainting | ~10GB |
# Clone the repository
git clone https://github.com/yourusername/dd-creator.git
cd dd-creator
# Run (uv auto-creates venv and installs dependencies)
uv run app.pyOpen your browser to http://127.0.0.1:7860
Place upscaler .pth or .safetensors model files in the models/ directory. Popular options:
The wizard guides you through 4 steps:
- Project Setup - Configure source data (local folder or browser upload) and workspace (new project or continue existing). Scans for existing caption files in both source and output directories so you can pick up where you left off.
- Image Tools - Per-image editing (resize, upscale, inpaint, smart crop, masks, transparency) or bulk processing with smart resize/upscale routing
- Captioning - Generate and edit captions with powerful tools:
- Batch generation with prefix/suffix tags (trigger words, quality tags)
- Automatic Danbooru rating tag filtering (optional, on by default)
- Source caption import: existing captions from your source folder appear for review
- Search/filter images by caption content
- Hygiene tools: fix formatting, deduplicate tags, undo changes
- Bulk operations: add/remove tags, search & replace across all captions
- Validation: ensures all images have saved captions before proceeding
- Export - Review session stats and export to Kohya_ss, AI-Toolkit, OneTrainer, or HuggingFace formats. Optionally push directly to the HuggingFace Hub
dd-creator/
├── app.py # Gradio application entry point
├── src/
│ ├── core/
│ │ ├── state.py # Project state management
│ │ ├── captioning.py # VLM/tagger model wrappers
│ │ ├── segmentation.py # BiRefNet background removal
│ │ ├── upscaling.py # Spandrel upscaling
│ │ ├── inpainting.py # LaMa + SD inpainting backends
│ │ ├── sam_segmenter.py # MobileSAM click-to-segment
│ │ ├── smart_crop.py # Face-centric training crops
│ │ └── export.py # Export formats + HuggingFace Hub push
│ └── ui/
│ ├── wizard.py # 4-step guided workflow
│ └── dashboard.py # Advanced tools (WIP)
├── models/ # User-provided upscaler models
└── assets/ # README screenshots
# Run with auto-reload (if using gradio dev mode)
uv run gradio app.pyMIT



