-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[core] make flux hidden states contiguous #13068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses performance issues with NVFP4 quantization by making hidden states contiguous after split operations in the Flux transformer attention processor. The fix enables significant speed improvements when using NVFP4 quantization by ensuring tensors have contiguous memory layout before being passed to linear layers.
Changes:
- Added
.contiguous()calls tohidden_statesandencoder_hidden_statesaftersplit_with_sizesoperation inFluxAttnProcessor
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| hidden_states = attn.to_out[0](hidden_states.contiguous()) | ||
| hidden_states = attn.to_out[1](hidden_states) | ||
| encoder_hidden_states = attn.to_add_out(encoder_hidden_states) | ||
| encoder_hidden_states = attn.to_add_out(encoder_hidden_states.contiguous()) |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same contiguous() fix should be applied to the FluxIPAdapterAttnProcessor class. Lines 240-242 have an identical split_with_sizes pattern but are missing the contiguous() calls. For consistency and to ensure NVFP4 quantization benefits are available across all attention processors, please add contiguous() calls at lines 240 and 242 similar to these changes.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
What does this PR do?
Fixes pytorch/ao#3783.
NVFP4 has nice speed benefits, which this PR allows:
More results are in https://gist.github.com/sayakpaul/6e6883db921149a87d35cfde4b4dd5d8.