-
Notifications
You must be signed in to change notification settings - Fork 525
Description
Git commit
Operating System & Version
Arch Linux
GGML backends
HIP
Command-line arguments used
./sd-cli -M img_gen -p "Cinematic, ultra-detailed image of a stylish umbrella girl wearing a bold yellow-and-black themed outfit with racing-inspired accents, standing confidently behind a sleek yellow-and-black NASCAR race car on the track; golden sunlight streaming from a low angle, dramatic high-contrast shadows, glossy reflections on the car’s body, shallow depth of field, dynamic composition, realistic textures, sharp focus, professional motorsport photography style, vibrant colors, 8k realism." --sampling-method euler --steps 9 -W 1024 -H 1024 -b 1 --cfg-scale 1 -s -1 --clip-skip -1 --embd-dir /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/embeddings/ --lora-model-dir /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/loras/ -t 0 --rng cpu --sampler-rng cpu --lora-apply-mode auto -o /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/outputs/txt2img/21.png --diffusion-model /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/unet/z-image-turbo-Q6_K.gguf --vae /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/vae/ae.safetensors --llm /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/text_encoders/Qwen3-4B-Q8_0.gguf --scheduler simple --taesd /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/taesd/taef1-diffusion_pytorch_model.safetensors --offload-to-cpu --fa --color
Steps to reproduce
Build stable-diffusion.cpp using HIP backend, copy sd-cli and sd-server to sd.cpp-webui main folder, using sd.cpp-webui as the frontend to generate image.
Git clone both project (both updated to the latest commit):
git clone https://github.com/leejet/stable-diffusion.cpp
git clone https://github.com/daniandtheweb/sd.cpp-webui
CMAKE HIP build command from stable-diffusion.cpp main folder:
cd stable-diffusion.cpp
git submodule update --init --recursive
mkdir build-rocm
cd build-rocm
cmake .. -G "Ninja" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DGPU_TARGETS="gfx1030" -DAMDGPU_TARGETS="gfx1030" -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON
cmake --build . --config Release -j6
Copy sd-cli and sd-server from ../stable-diffusion.cpp/build-rocm/bin to ../sd.cpp-webui main folder:
cp ./bin/* ../../sd.cpp-webui
Go to sd.cpp-webui folder, install python virtual environment using uv, activate virtual environment and install the requirements.txt:
cd ../../sd.cpp-webui
uv venv --python=3.12 --seed ./venv
source ./venv/bin/activate
uv pip install -r requirements.txt
Run sd.cpp-webui script and start generating some image:
./sdcpp_webui.sh
What you expected to happen
Should generate image (Z-Image-Turbo GGUF Q6_K + Qwen3 4B GGUF Q8_0) using HIP backend as normal (flash attention --fa and offload cpu --offload-to-cpu enabled).
What actually happened
Cannot generating an image in sd.cpp-webui user interface (Subprocess terminated), and there is error logs in the terminal.
Logs / error messages / stack trace
>$ ./sdcpp_webui.sh
Activating virtual environment...
Requirements are satisfied.
Starting the WebUI...
* Running on local URL: http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.
Job submitted! Position in queue: 1.
./sd-cli -M img_gen -p "Cinematic, ultra-detailed image of a stylish umbrella girl wearing a bold yellow-and-black themed outfit with racing-inspired accents, standing confidently behind a sleek yellow-and-black NASCAR race car on the track; golden sunlight streaming from a low angle, dramatic high-contrast shadows, glossy reflections on the car’s body, shallow depth of field, dynamic composition, realistic textures, sharp focus, professional motorsport photography style, vibrant colors, 8k realism." --sampling-method euler --steps 9 -W 1024 -H 1024 -b 1 --cfg-scale 1 -s -1 --clip-skip -1 --embd-dir /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/embeddings/ --lora-model-dir /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/loras/ -t 0 --rng cpu --sampler-rng cpu --lora-apply-mode auto -o /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/outputs/txt2img/21.png --diffusion-model /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/unet/z-image-turbo-Q6_K.gguf --vae /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/vae/ae.safetensors --llm /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/text_encoders/Qwen3-4B-Q8_0.gguf --scheduler simple --taesd /mnt/adata-s70-nvme/Downloads/git/sd.cpp-webui/models/taesd/taef1-diffusion_pytorch_model.safetensors --offload-to-cpu --fa --color
[INFO ] ggml_extend.hpp:78 - ggml_cuda_init: found 1 ROCm devices:
[INFO ] ggml_extend.hpp:78 - Device 0: AMD Radeon RX 6600 XT, gfx1030 (0x1030), VMM: no, Wave Size: 32
[INFO ] stable-diffusion.cpp:260 - loading diffusion model from '/mnt/adata-s70-nvme/Downloads/git/s
d.cpp-webui/models/unet/z-image-turbo-Q6_K.gguf'
[INFO ] model.cpp:370 - load /mnt/adata-s70-nvme/Downloads/git/s
d.cpp-webui/models/unet/z-image-turbo-Q6_K.gguf using gguf format
[INFO ] stable-diffusion.cpp:307 - loading llm from '/mnt/adata-s70-nvme/Downloads/git/s
d.cpp-webui/models/text_encoders/Qwen3-4B-Q8_0.gguf'
[INFO ] model.cpp:370 - load /mnt/adata-s70-nvme/Downloads/git/s
d.cpp-webui/models/text_encoders/Qwen3-4B-Q8_0.gguf using gguf format
[INFO ] stable-diffusion.cpp:321 - loading vae from '/mnt/adata-s70-nvme/Downloads/git/s
d.cpp-webui/models/vae/ae.safetensors'
[INFO ] model.cpp:373 - load /mnt/adata-s70-nvme/Downloads/git/s
d.cpp-webui/models/vae/ae.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:337 - Version: Z-Image
[INFO ] stable-diffusion.cpp:365 - Weight type stat: f32: 634 | q8_0: 253 | q6_K: 180 | bf16: 28
[INFO ] stable-diffusion.cpp:366 - Conditioner weight type stat: f32: 145 | q8_0: 253
[INFO ] stable-diffusion.cpp:367 - Diffusion model weight type stat: f32: 245 | q6_K: 180 | bf16: 28
[INFO ] stable-diffusion.cpp:368 - VAE weight type stat: f32: 244
[INFO ] stable-diffusion.cpp:721 - Using flash attention
[INFO ] stable-diffusion.cpp:735 - Using flash attention in the diffusion model
|==================================================| 1095/1095 - 769.50it/s
[INFO ] model.cpp:1629 - loading tensors completed, taking 1.42s (process: 0.00s, read: 1.22s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[INFO ] tae.hpp:570 - loading taesd from '/mnt/adata-s70-nvme/Downloads/git/s
d.cpp-webui/models/taesd/taef1-diffusion_pytorch_model.safetensors', decode_only = true
[INFO ] model.cpp:373 - load /mnt/adata-s70-nvme/Downloads/git/s
d.cpp-webui/models/taesd/taef1-diffusion_pytorch_model.safetensors using safetensors format
|==================================================| 134/134 - 670.00it/s
[INFO ] model.cpp:1629 - loading tensors completed, taking 0.20s (process: 0.00s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[INFO ] tae.hpp:592 - taesd model loaded
[INFO ] stable-diffusion.cpp:876 - total params memory size = 9716.92MB (VRAM 9716.92MB, RAM 0.00MB): text_encoders 4076.43MB(VRAM), diffusion_model 5638.14MB(VRAM), vae 2.35MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:945 - running in FLOW mode
[INFO ] stable-diffusion.cpp:3527 - sampling using Euler method
[INFO ] denoiser.hpp:518 - get_sigmas with Simple scheduler
[INFO ] stable-diffusion.cpp:3654 - TXT2IMG
[INFO ] ggml_extend.hpp:1862 - qwen3 offload params (4076.43 MB, 398 tensors) to runtime backend (ROCm0), taking 0.73s
[INFO ] stable-diffusion.cpp:3271 - get_learned_condition completed, taking 988 ms
[INFO ] stable-diffusion.cpp:3382 - generating image: 1/1 - seed 2104107100
[INFO ] ggml_extend.hpp:1862 - z_image offload params (5638.17 MB, 453 tensors) to runtime backend (ROCm0), taking 0.49s
/mnt/adata-s70-nvme/Downloads/git/s
table-diffusion.cpp/ggml/src/ggml-cuda/template-instances/../fattn-common.cuh:919: GGML_ASSERT(max_blocks_per_sm > 0) failed
./sd-cli(+0xa90cc9) [0x651a51f9ccc9]
./sd-cli(+0xa90c95) [0x651a51f9cc95]
./sd-cli(+0x476c9) [0x651a515536c9]
./sd-cli(+0x980615) [0x651a51e8c615]
./sd-cli(+0x9edfbf) [0x651a51ef9fbf]
./sd-cli(+0x36fe4c) [0x651a5187be4c]
./sd-cli(+0x36dd71) [0x651a51879d71]
./sd-cli(+0xaab64c) [0x651a51fb764c]
./sd-cli(+0x16c31b) [0x651a5167831b]
./sd-cli(+0x1ee000) [0x651a516fa000]
./sd-cli(+0x1e897f) [0x651a516f497f]
./sd-cli(+0x265597) [0x651a51771597]
./sd-cli(+0x12234d) [0x651a5162e34d]
./sd-cli(+0x133b1a) [0x651a5163fb1a]
./sd-cli(+0x113acb) [0x651a5161facb]
./sd-cli(+0x1179c6) [0x651a516239c6]
./sd-cli(+0x627ce) [0x651a5156e7ce]
/usr/lib/libc.so.6(+0x27635) [0x7fbfa4627635]
/usr/lib/libc.so.6(__libc_start_main+0x89) [0x7fbfa46276e9]
./sd-cli(+0x52d45) [0x651a5155ed45]
Subprocess terminated.
Additional context / environment details
>$ inxi -bza
System:
Kernel: 6.12.66-1-lts arch: x86_64 bits: 64 compiler: gcc v: 15.2.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/vmlinuz-linux-lts
root=UUID=a87019d4-bd4a-49da-8d3d-65cfd4eb6683 rw rootfstype=ext4
loglevel=3 quiet
Desktop: GNOME v: 49.2 tk: GTK v: 3.24.51 wm: gnome-shell
tools: gsd-screensaver-proxy dm: GDM v: 49.2 Distro: Arch Linux
Machine:
Type: Desktop Mobo: ASRock model: B550M Pro4 serial: <superuser required>
uuid: <superuser required> Firmware: UEFI vendor: American Megatrends LLC.
v: P3.90 date: 09/30/2025
CPU:
Info: quad core AMD Ryzen 3 3100 [MT MCP] arch: Zen 2 speed (MHz): avg: 3595
min/max: 550/3905
Graphics:
Device-1: Advanced Micro Devices [AMD/ATI] Navi 23 [Radeon RX 6600/6600
XT/6600M] vendor: Sapphire driver: amdgpu v: kernel arch: RDNA-2
code: Navi-2x process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4
speed: 16 GT/s lanes: 16 ports: active: DP-2 empty: DP-1, DP-3, HDMI-A-1,
Writeback-1 bus-ID: 09:00.0 chip-ID: 1002:73ff class-ID: 0300
Device-2: Logitech Webcam C270 driver: snd-usb-audio,uvcvideo type: USB
rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 3-2.1:3
chip-ID: 046d:0825 class-ID: 0102 serial: <filter>
Display: wayland server: X.org v: 1.21.1.21 with: Xwayland v: 24.1.9
compositor: gnome-shell driver: gpu: amdgpu resolution: 2560x1440
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 25.3.3-arch1.3
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 6600 XT (radeonsi
navi23 LLVM 21.1.6 DRM 3.61 6.12.66-1-lts) device-ID: 1002:73ff
memory: 7.81 GiB unified: no display-ID: :0.0
Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
gpu: amd-smi,amdgpu_top x11: xprop,xrandr
Network:
Device-1: Intel Wi-Fi 6E AX210/AX1675 2x2 [Typhoon Peak] driver: iwlwifi
v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 bus-ID: 05:00.0
chip-ID: 8086:2725 class-ID: 0280
Device-2: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
vendor: ASRock driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1
port: f000 bus-ID: 06:00.0 chip-ID: 10ec:8168 class-ID: 0200
Drives:
Local Storage: total: 2.55 TiB used: 700.58 GiB (26.9%)
Info:
Memory: total: 64 GiB note: est. available: 62.73 GiB used: 5.59 GiB (8.9%)
Processes: 377 Power: uptime: 16h 23m states: freeze,mem,disk
suspend: deep avail: s2idle wakeups: 3 hibernate: platform avail: shutdown,
reboot, suspend, test_resume image: 25.07 GiB services: gsd-power,upowerd
Init: systemd v: 259 default: graphical tool: systemctl
Packages: 1528 pm: pacman pkgs: 1449 libs: 430 tools: gnome-software,yay
pm: flatpak pkgs: 79 Compilers: clang: 21.1.6 gcc: 15.2.1 Shell: Zsh v: 5.9
running-in: kgx inxi: 3.3.40
>$ rocminfo | grep Name
Name: AMD Ryzen 3 3100 4-Core Processor
Marketing Name: AMD Ryzen 3 3100 4-Core Processor
Vendor Name: CPU
Name: gfx1030
Marketing Name: AMD Radeon RX 6600 XT
Vendor Name: AMD
Name: amdgcn-amd-amdhsa--gfx1030
Name: amdgcn-amd-amdhsa--gfx10-3-generic
>$ rocm-smi
WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status
======================================== ROCm System Management Interface ========================================
================================================== Concise Info ==================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
==================================================================================================================
0 1 0x73ff, 60385 46.0°C 6.0W N/A, N/A, 0 700Mhz 96Mhz 0% auto 145.0W 13% 1%
==================================================================================================================
============================================== End of ROCm SMI Log ===============================================
>$ amd-smi
+------------------------------------------------------------------------------+
| AMD-SMI 26.2.0+unknown amdgpu version: Linuxver ROCm version: 7.1.1 |
| VBIOS version: 020.003.000.030.000000 |
| Platform: Linux Baremetal |
|-------------------------------------+----------------------------------------|
| BDF GPU-Name | Mem-Uti Temp UEC Power-Usage |
| GPU HIP-ID OAM-ID Partition-Mode | GFX-Uti Fan Mem-Usage |
|=====================================+========================================|
| 0000:09:00.0 AMD Radeon RX 6600 XT | 1 % 45 °C 0 4/145 W |
| 0 0 N/A N/A | 6 % 0.0 % 1064/8176 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes: |
| GPU PID Process Name GTT_MEM VRAM_MEM MEM_USAGE CU % |
|==============================================================================|
| No running processes found |
+------------------------------------------------------------------------------+
>$ uv pip list
Using Python 3.12.12 environment at: venv
Package Version
------------------ -----------
aiofiles 24.1.0
annotated-doc 0.0.4
annotated-types 0.7.0
anyio 4.12.1
brotli 1.2.0
certifi 2026.1.4
charset-normalizer 3.4.4
click 8.3.1
fastapi 0.128.0
ffmpy 1.0.0
filelock 3.20.3
fsspec 2026.1.0
gradio 5.44.1
gradio-client 1.12.1
groovy 0.1.2
h11 0.16.0
hf-xet 1.2.0
httpcore 1.0.9
httpx 0.28.1
huggingface-hub 0.36.0
idna 3.11
jinja2 3.1.6
markdown-it-py 4.0.0
markupsafe 3.0.3
mdurl 0.1.2
numpy 2.4.1
orjson 3.11.5
packaging 26.0
pandas 2.3.3
pillow 11.3.0
pip 25.3
pydantic 2.11.10
pydantic-core 2.33.2
pydub 0.25.1
pygments 2.19.2
python-dateutil 2.9.0.post0
python-multipart 0.0.21
pytz 2025.2
pyyaml 6.0.3
requests 2.32.5
rich 14.2.0
ruff 0.14.14
safehttpx 0.1.7
semantic-version 2.10.0
shellingham 1.5.4
six 1.17.0
starlette 0.50.0
tomlkit 0.13.3
tqdm 4.67.1
typer 0.21.1
typing-extensions 4.15.0
typing-inspection 0.4.2
tzdata 2025.3
urllib3 2.6.3
uvicorn 0.40.0
websockets 15.0.1
>$ cat ~/.config/environment.d/amd-rocm.conf
# For AMD ROCm and HIP
ROCM_PATH="/opt/rocm"
HIP_PATH="/opt/rocm"
HSA_OVERRIDE_GFX_VERSION=10.3.0
MIOPEN_FIND_MODE=2
#AMD_SERIALIZE_KERNEL=3
# For ComfyUI
COMFYUI_ENABLE_MIOPEN=1
# For Pytorch related
TRITON_ROCM_ARCH=gfx1030
PYTORCH_ROCM_ARCH=gfx1030
#TORCH_USE_HIP_DSA=1