-
Notifications
You must be signed in to change notification settings - Fork 107
Description
I was looking at building kernels however I get RuntimeError: CUDA error: mapping of buffer object failed:
Loading expert weights - 100.0% ..
Syncing with other peers..
[rank4]: Traceback (most recent call last):
[rank4]: File "/workspace/Tutel/llm_moe_tutel.py", line 433, in
[rank4]: sigp = torch.ops.tutel_ops.uncached_exchange(sigp[0], net.simple_all_gather(sigp[1]), world_rank)
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1158, in call
[rank4]: return self._op(*args, **(kwargs or {}))
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: RuntimeError: CUDA error: mapping of buffer object failed
[rank4]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank4]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank4]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.