Skip to content

NVFP4 seems to fail when batch size is greater than 1 #3783

@sayakpaul

Description

@sayakpaul

Error:

NotImplementedError: NVFP4Tensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.expand', overload='default')>, types=(<class 'torchao.prototype.mx_formats.nvfp4_tensor.NVFP4Tensor'>,), arg_types=(<class 'torchao.prototype.mx_formats.nvfp4_tensor.NVFP4Tensor'>, <class 'list'>), kwarg_types={}

Code:

from diffusers import DiffusionPipeline
import torch

from torchao.quantization import quantize_
from torchao.prototype.mx_formats.inference_workflow import (
     NVFP4DynamicActivationNVFP4WeightConfig,
     NVFP4WeightOnlyConfig,
)

config = NVFP4WeightOnlyConfig(
    use_dynamic_per_tensor_scale=True,
)

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
).to("cuda")

quantize_(pipe.transformer, config=config)
pipe.transformer.compile_repeated_blocks(fullgraph=True)

_ = pipe("a dog", num_images_per_prompt=4)

Same error happens with NVFP4DynamicActivationNVFP4WeightConfig as well.

I am using PyTorch 2.10.0 and nightly TorchAO. I am on B200 with CUDA 12.9.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions