Skip to content

Conversation

@sayakpaul
Copy link
Member

What does this PR do?

Fixes pytorch/ao#3783.

NVFP4 has nice speed benefits, which this PR allows:

image image

More results are in https://gist.github.com/sayakpaul/6e6883db921149a87d35cfde4b4dd5d8.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses performance issues with NVFP4 quantization by making hidden states contiguous after split operations in the Flux transformer attention processor. The fix enables significant speed improvements when using NVFP4 quantization by ensuring tensors have contiguous memory layout before being passed to linear layers.

Changes:

  • Added .contiguous() calls to hidden_states and encoder_hidden_states after split_with_sizes operation in FluxAttnProcessor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +133 to +135
hidden_states = attn.to_out[0](hidden_states.contiguous())
hidden_states = attn.to_out[1](hidden_states)
encoder_hidden_states = attn.to_add_out(encoder_hidden_states)
encoder_hidden_states = attn.to_add_out(encoder_hidden_states.contiguous())
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same contiguous() fix should be applied to the FluxIPAdapterAttnProcessor class. Lines 240-242 have an identical split_with_sizes pattern but are missing the contiguous() calls. For consistency and to ensure NVFP4 quantization benefits are available across all attention processors, please add contiguous() calls at lines 240 and 242 similar to these changes.

Copilot uses AI. Check for mistakes.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul requested review from dg845 and yiyixuxu February 3, 2026 03:50
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NVFP4 seems to fail when batch size is greater than 1

3 participants