Enables compatibility between diffusers CPU offloading and xFuser paralleism#147
Enables compatibility between diffusers CPU offloading and xFuser paralleism#147BBuf wants to merge 1 commit intoTencent-Hunyuan:mainfrom
Conversation
|
Running: produces the following video: sample_video_pr_147.mp4@feifeibear any suggestions what is going on? Did you get correct output @BBuf? Thank you! 🙏 |
You can try turning off Offload. If the problem still persists, it indicates an issue with the model itself - it cannot properly generate videos at the 624x832 resolution. |
|
@BBuf thanks for the quick reply, I see. I was just following the "Supported Parallel Configurations" listed in here which indicates that Without cpu offloading, produces OOM error: |
|
@BBuf what parameters did you try (assuming you got sensible output video)? |
I try follow command in A800 node: torchrun --nproc_per_node=8 sample_video.py --video-size 720 1280 --video-length 129 --infer-steps 30 --prompt "A cute rabbit family eating dinner in their burrow." --use-cpu-offload --flow-reverse --save-path ./results --ring-degree 4 --ulysses-degree 2 --seed 42And the result is normal: |
Same thing with |

The previous incompatibility was caused by diffusers not being aware of the local rank in distributed environments, which made it always assume it was rank 0. This led to the
model.to(device)call at line 1174 in pipeline_utils.py constantly copying the DiT model from other ranks to rank 0, causing memory OOM issues.The bug was fixed by passing the device corresponding to the local_rank to
pipeline.enable_sequential_cpu_offload. As a result, diffusers' CPU offloading and xFuser parallelization can now be used together.