[Bug] Torch conflict when using vllm-plugin-FL on BI-V150

1. Iluvatar release their vllm0.11+corex, which can serve Qwen/Qwen3-0.6B correctly
2. Using the release image of vllm0.11+corex, I cannot compile vllm0.13.0, which is required by vllm-plugin-fl, because of NVCC or CUDA's version
3. Fortunately, I tried directly download and install vllm0.13.0 wheel built on NVIDIA with "--no-deps" and it worked. However, when serving model like Qwen/Qwen3-0.6B, it triggered exception at the very beginning "from vllm.entrypoints.cli.main import main"
```
File "/usr/local/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 42, in <module>
    import torch.distributed._symmetric_memory
  File "/usr/local/corex-4.4.0/lib64/python3/dist-packages/torch/distributed/_symmetric_memory/__init__.py", line 16, in <module>
    from torch._C._distributed_c10d import _SymmetricMemory, Work as _Work
ImportError: cannot import name '_SymmetricMemory' from 'torch._C._distributed_c10d' (unknown location)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
```
I think it is the conflict between vllm0.13.0 and torch-corex, which is repaired in vllm-corex series.
4. Thus, I tried install vllm-plugin with FlagCX and FlagGems. Though some small exception occured, there's still trivial methods to avoid or fix them. After that, I have an environment with:
```
vllm                              0.13.0
vllm_fl                           0.0.0
flag_gems                         4.2.1rc0
flagcx                            0.8.0              /home/FlagCX/plugin/torch
```
5. Unfortunately, no matter I enable or disable FlagCX or FlagGems, the conflict between vllm/vllm-fl and torch-corex still exists

I wonder if  there is any help about solving the mismatch between vllm-plugin-fl and multi-backend torch, like torch-npu, torch-metax, torch-corex? Each torch-backend may have their own APIs to cooperate with their SDK/Driver/..., not only Computing operators covered by FlagGems and Communicating operators covered by FlagCX.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Torch conflict when using vllm-plugin-FL on BI-V150 #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Torch conflict when using vllm-plugin-FL on BI-V150 #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions