Skip to content

run openfold on blackwell architecture GPU #562

@destinypikachu

Description

@destinypikachu

I installed openfold on 5090D GPU and got following errors when excute run_unit_test script:

[2026-01-04 20:16:46,721] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/cuda/__init__.py:235: UserWarning: 
NVIDIA GeForce RTX 5090 D with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5090 D GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(
/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, weight, bias=None):
/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
sss.................EEEEEEs.....s/home/destinypikachu/projects/openfold/openfold/utils/precision_utils.py:72: DeprecationWarning: torch.get_autocast_gpu_dtype() is deprecated. Please use torch.get_autocast_dtype('cuda') instead. (Triggered internally at /opt/conda/conda-bld/pytorch_1729647382455/work/torch/csrc/autograd/init.cpp:787.)
  fp16_enabled = torch.get_autocast_gpu_dtype() == torch.float16
..Es.sss.ss.E.EEsssssssss.sss....ssssssEEs.s.s.ss.s....E.s.s..ss...ss.sEEsEE...s........
======================================================================
ERROR: test_compare_evoformer_bf16 (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run evoformer comparison test with BF16 precision.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 224, in test_compare_evoformer_bf16
    self.compare_evoformer(dtype=torch.bfloat16, eps=4e-2)
  File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 176, in compare_evoformer
    "msa": torch.rand(n_seq, n_res, consts.c_m, device='cuda', dtype=dtype),
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_compare_evoformer_fp32 (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run evoformer comparison test with FP32 precision.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 228, in test_compare_evoformer_fp32
    self.compare_evoformer(dtype=torch.float32, eps=2e-2)
  File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 176, in compare_evoformer
    "msa": torch.rand(n_seq, n_res, consts.c_m, device='cuda', dtype=dtype),
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_compare_model (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run full model with and without using DeepSpeed Evoformer attention kernel
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 303, in test_compare_model
    batch["aatype"] = batch["aatype"].long()
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_compare_template_stack (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Compare Template Stack output with and without using DeepSpeed Evoformer attention kernel.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 253, in test_compare_template_stack
    model = compare_utils.get_global_pretrained_openfold()
  File "/home/destinypikachu/projects/openfold/tests/compare_utils.py", line 82, in get_global_pretrained_openfold
    raise FileNotFoundError(
FileNotFoundError: Cannot load pretrained parameters. Make sure to run the 
                installation script before running tests.

======================================================================
ERROR: test_ds_kernel_vs_attention_backward (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Compare backward pass for regular attention vs. DeepSpeed Evoformer kernel.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 95, in test_ds_kernel_vs_attention_backward
    q, kv, mask, biases = random_attention_inputs(batch_size=batch_size,
  File "/home/destinypikachu/projects/openfold/tests/data_utils.py", line 140, in random_attention_inputs
    mask_bias = inf * (mask - 1)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_ds_kernel_vs_attention_forward (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Compare regular attention vs. DeepSpeed Evoformer kernel.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 79, in test_ds_kernel_vs_attention_forward
    self.compare_attention_types(use_flash=False)
  File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 49, in compare_attention_types
    q, kv, mask, biases = random_attention_inputs(batch_size=batch_size,
  File "/home/destinypikachu/projects/openfold/tests/data_utils.py", line 140, in random_attention_inputs
    mask_bias = inf * (mask - 1)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_shape (tests.test_evoformer.TestExtraMSAStack)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_evoformer.py", line 266, in test_shape
    m = torch.rand((batch_size, s_t, n_res, c_m), device="cuda")
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_import_jax_weights_ (tests.test_import_weights.TestImportWeights)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_import_weights.py", line 36, in test_import_jax_weights_
    import_jax_weights_(
  File "/home/destinypikachu/projects/openfold/openfold/utils/import_weights.py", line 650, in import_jax_weights_
    data = np.load(npz_path)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 451, in load
    fid = stack.enter_context(open(os.fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/home/destinypikachu/projects/openfold/tests/../openfold/resources/params/params_model_1_ptm.npz'

======================================================================
ERROR: test_attention_core_backward (tests.test_kernels.TestAttentionCore)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_kernels.py", line 47, in test_attention_core_backward
    mask_bias = (1e9 * mask - 1)[..., None, None, :].to(dtype)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_attention_core_forward (tests.test_kernels.TestAttentionCore)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_kernels.py", line 23, in test_attention_core_forward
    mask_bias = (1e9 * mask - 1)[..., None, None, :].to(dtype)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_dry_run (tests.test_model.TestModel)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_model.py", line 103, in test_dry_run
    out = model(batch)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/model.py", line 581, in forward
    outputs, m_1_prev, z_prev, x_prev, early_stop = self.iteration(
  File "/home/destinypikachu/projects/openfold/openfold/model/model.py", line 237, in iteration
    pair_mask = seq_mask[..., None] * seq_mask[..., None, :]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_dry_run_seqemb_mode (tests.test_model.TestModel)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_model.py", line 143, in test_dry_run_seqemb_mode
    out = model(batch)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/model.py", line 581, in forward
    outputs, m_1_prev, z_prev, x_prev, early_stop = self.iteration(
  File "/home/destinypikachu/projects/openfold/openfold/model/model.py", line 237, in iteration
    pair_mask = seq_mask[..., None] * seq_mask[..., None, :]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_lma_vs_attention (tests.test_primitives.TestLMA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_primitives.py", line 31, in test_lma_vs_attention
    q, kv, _, biases = random_attention_inputs(batch_size=consts.batch_size,
  File "/home/destinypikachu/projects/openfold/tests/data_utils.py", line 140, in random_attention_inputs
    mask_bias = inf * (mask - 1)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


======================================================================
ERROR: test_tri_mul_in_inference (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 158, in test_tri_mul_in_inference
    self._tri_mul_inplace(incoming=True)
  File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 135, in _tri_mul_inplace
    out_stock = module(
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/triangular_multiplicative_update.py", line 531, in forward
    z = self.layer_norm_in(z)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/primitives.py", line 255, in forward
    out = nn.functional.layer_norm(
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2900, in layer_norm
    return torch.layer_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

======================================================================
ERROR: test_tri_mul_in_inference_bf16 (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 161, in test_tri_mul_in_inference_bf16
    self._tri_mul_inplace(incoming=True, dtype=torch.bfloat16)
  File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 135, in _tri_mul_inplace
    out_stock = module(
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/triangular_multiplicative_update.py", line 531, in forward
    z = self.layer_norm_in(z)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/primitives.py", line 247, in forward
    out = nn.functional.layer_norm(
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2900, in layer_norm
    return torch.layer_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

======================================================================
ERROR: test_tri_mul_out_inference (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 152, in test_tri_mul_out_inference
    self._tri_mul_inplace()
  File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 135, in _tri_mul_inplace
    out_stock = module(
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/triangular_multiplicative_update.py", line 531, in forward
    z = self.layer_norm_in(z)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/primitives.py", line 255, in forward
    out = nn.functional.layer_norm(
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2900, in layer_norm
    return torch.layer_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

======================================================================
ERROR: test_tri_mul_out_inference_bf16 (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 155, in test_tri_mul_out_inference_bf16
    self._tri_mul_inplace(dtype=torch.bfloat16)
  File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 135, in _tri_mul_inplace
    out_stock = module(
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/triangular_multiplicative_update.py", line 531, in forward
    z = self.layer_norm_in(z)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/destinypikachu/projects/openfold/openfold/model/primitives.py", line 247, in forward
    out = nn.functional.layer_norm(
  File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2900, in layer_norm
    return torch.layer_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

----------------------------------------------------------------------
Ran 121 tests in 8.186s

FAILED (errors=17, skipped=44)

Test(s) failed. Make sure you've installed all Python dependencies.

I have tried to update cuda to 12.8 and pytorch to 2.9.1, other errors occured.
is there any environment building strategies apply to a blackwell architecture GPU?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions