-
Notifications
You must be signed in to change notification settings - Fork 660
Open
Description
I installed openfold on 5090D GPU and got following errors when excute run_unit_test script:
[2026-01-04 20:16:46,721] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/cuda/__init__.py:235: UserWarning:
NVIDIA GeForce RTX 5090 D with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5090 D GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(
/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, weight, bias=None):
/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
sss.................EEEEEEs.....s/home/destinypikachu/projects/openfold/openfold/utils/precision_utils.py:72: DeprecationWarning: torch.get_autocast_gpu_dtype() is deprecated. Please use torch.get_autocast_dtype('cuda') instead. (Triggered internally at /opt/conda/conda-bld/pytorch_1729647382455/work/torch/csrc/autograd/init.cpp:787.)
fp16_enabled = torch.get_autocast_gpu_dtype() == torch.float16
..Es.sss.ss.E.EEsssssssss.sss....ssssssEEs.s.s.ss.s....E.s.s..ss...ss.sEEsEE...s........
======================================================================
ERROR: test_compare_evoformer_bf16 (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run evoformer comparison test with BF16 precision.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 224, in test_compare_evoformer_bf16
self.compare_evoformer(dtype=torch.bfloat16, eps=4e-2)
File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 176, in compare_evoformer
"msa": torch.rand(n_seq, n_res, consts.c_m, device='cuda', dtype=dtype),
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_compare_evoformer_fp32 (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run evoformer comparison test with FP32 precision.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 228, in test_compare_evoformer_fp32
self.compare_evoformer(dtype=torch.float32, eps=2e-2)
File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 176, in compare_evoformer
"msa": torch.rand(n_seq, n_res, consts.c_m, device='cuda', dtype=dtype),
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_compare_model (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run full model with and without using DeepSpeed Evoformer attention kernel
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 303, in test_compare_model
batch["aatype"] = batch["aatype"].long()
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_compare_template_stack (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Compare Template Stack output with and without using DeepSpeed Evoformer attention kernel.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 253, in test_compare_template_stack
model = compare_utils.get_global_pretrained_openfold()
File "/home/destinypikachu/projects/openfold/tests/compare_utils.py", line 82, in get_global_pretrained_openfold
raise FileNotFoundError(
FileNotFoundError: Cannot load pretrained parameters. Make sure to run the
installation script before running tests.
======================================================================
ERROR: test_ds_kernel_vs_attention_backward (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Compare backward pass for regular attention vs. DeepSpeed Evoformer kernel.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 95, in test_ds_kernel_vs_attention_backward
q, kv, mask, biases = random_attention_inputs(batch_size=batch_size,
File "/home/destinypikachu/projects/openfold/tests/data_utils.py", line 140, in random_attention_inputs
mask_bias = inf * (mask - 1)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_ds_kernel_vs_attention_forward (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Compare regular attention vs. DeepSpeed Evoformer kernel.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 79, in test_ds_kernel_vs_attention_forward
self.compare_attention_types(use_flash=False)
File "/home/destinypikachu/projects/openfold/tests/test_deepspeed_evo_attention.py", line 49, in compare_attention_types
q, kv, mask, biases = random_attention_inputs(batch_size=batch_size,
File "/home/destinypikachu/projects/openfold/tests/data_utils.py", line 140, in random_attention_inputs
mask_bias = inf * (mask - 1)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_shape (tests.test_evoformer.TestExtraMSAStack)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_evoformer.py", line 266, in test_shape
m = torch.rand((batch_size, s_t, n_res, c_m), device="cuda")
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_import_jax_weights_ (tests.test_import_weights.TestImportWeights)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_import_weights.py", line 36, in test_import_jax_weights_
import_jax_weights_(
File "/home/destinypikachu/projects/openfold/openfold/utils/import_weights.py", line 650, in import_jax_weights_
data = np.load(npz_path)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 451, in load
fid = stack.enter_context(open(os.fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/home/destinypikachu/projects/openfold/tests/../openfold/resources/params/params_model_1_ptm.npz'
======================================================================
ERROR: test_attention_core_backward (tests.test_kernels.TestAttentionCore)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_kernels.py", line 47, in test_attention_core_backward
mask_bias = (1e9 * mask - 1)[..., None, None, :].to(dtype)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_attention_core_forward (tests.test_kernels.TestAttentionCore)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_kernels.py", line 23, in test_attention_core_forward
mask_bias = (1e9 * mask - 1)[..., None, None, :].to(dtype)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_dry_run (tests.test_model.TestModel)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_model.py", line 103, in test_dry_run
out = model(batch)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/model.py", line 581, in forward
outputs, m_1_prev, z_prev, x_prev, early_stop = self.iteration(
File "/home/destinypikachu/projects/openfold/openfold/model/model.py", line 237, in iteration
pair_mask = seq_mask[..., None] * seq_mask[..., None, :]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_dry_run_seqemb_mode (tests.test_model.TestModel)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_model.py", line 143, in test_dry_run_seqemb_mode
out = model(batch)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/model.py", line 581, in forward
outputs, m_1_prev, z_prev, x_prev, early_stop = self.iteration(
File "/home/destinypikachu/projects/openfold/openfold/model/model.py", line 237, in iteration
pair_mask = seq_mask[..., None] * seq_mask[..., None, :]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_lma_vs_attention (tests.test_primitives.TestLMA)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_primitives.py", line 31, in test_lma_vs_attention
q, kv, _, biases = random_attention_inputs(batch_size=consts.batch_size,
File "/home/destinypikachu/projects/openfold/tests/data_utils.py", line 140, in random_attention_inputs
mask_bias = inf * (mask - 1)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
======================================================================
ERROR: test_tri_mul_in_inference (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 158, in test_tri_mul_in_inference
self._tri_mul_inplace(incoming=True)
File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 135, in _tri_mul_inplace
out_stock = module(
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/triangular_multiplicative_update.py", line 531, in forward
z = self.layer_norm_in(z)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/primitives.py", line 255, in forward
out = nn.functional.layer_norm(
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2900, in layer_norm
return torch.layer_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
======================================================================
ERROR: test_tri_mul_in_inference_bf16 (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 161, in test_tri_mul_in_inference_bf16
self._tri_mul_inplace(incoming=True, dtype=torch.bfloat16)
File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 135, in _tri_mul_inplace
out_stock = module(
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/triangular_multiplicative_update.py", line 531, in forward
z = self.layer_norm_in(z)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/primitives.py", line 247, in forward
out = nn.functional.layer_norm(
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2900, in layer_norm
return torch.layer_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
======================================================================
ERROR: test_tri_mul_out_inference (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 152, in test_tri_mul_out_inference
self._tri_mul_inplace()
File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 135, in _tri_mul_inplace
out_stock = module(
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/triangular_multiplicative_update.py", line 531, in forward
z = self.layer_norm_in(z)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/primitives.py", line 255, in forward
out = nn.functional.layer_norm(
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2900, in layer_norm
return torch.layer_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
======================================================================
ERROR: test_tri_mul_out_inference_bf16 (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 155, in test_tri_mul_out_inference_bf16
self._tri_mul_inplace(dtype=torch.bfloat16)
File "/home/destinypikachu/projects/openfold/tests/test_triangular_multiplicative_update.py", line 135, in _tri_mul_inplace
out_stock = module(
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/triangular_multiplicative_update.py", line 531, in forward
z = self.layer_norm_in(z)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/destinypikachu/projects/openfold/openfold/model/primitives.py", line 247, in forward
out = nn.functional.layer_norm(
File "/home/destinypikachu/miniconda3/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2900, in layer_norm
return torch.layer_norm(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
----------------------------------------------------------------------
Ran 121 tests in 8.186s
FAILED (errors=17, skipped=44)
Test(s) failed. Make sure you've installed all Python dependencies.
I have tried to update cuda to 12.8 and pytorch to 2.9.1, other errors occured.
is there any environment building strategies apply to a blackwell architecture GPU?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels