Skip to content

[Bug] Torch conflict when using vllm-plugin-FL on BI-V150 #27

@shh2000

Description

@shh2000
  1. Iluvatar release their vllm0.11+corex, which can serve Qwen/Qwen3-0.6B correctly
  2. Using the release image of vllm0.11+corex, I cannot compile vllm0.13.0, which is required by vllm-plugin-fl, because of NVCC or CUDA's version
  3. Fortunately, I tried directly download and install vllm0.13.0 wheel built on NVIDIA with "--no-deps" and it worked. However, when serving model like Qwen/Qwen3-0.6B, it triggered exception at the very beginning "from vllm.entrypoints.cli.main import main"
File "/usr/local/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 42, in <module>
    import torch.distributed._symmetric_memory
  File "/usr/local/corex-4.4.0/lib64/python3/dist-packages/torch/distributed/_symmetric_memory/__init__.py", line 16, in <module>
    from torch._C._distributed_c10d import _SymmetricMemory, Work as _Work
ImportError: cannot import name '_SymmetricMemory' from 'torch._C._distributed_c10d' (unknown location)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

I think it is the conflict between vllm0.13.0 and torch-corex, which is repaired in vllm-corex series.
4. Thus, I tried install vllm-plugin with FlagCX and FlagGems. Though some small exception occured, there's still trivial methods to avoid or fix them. After that, I have an environment with:

vllm                              0.13.0
vllm_fl                           0.0.0
flag_gems                         4.2.1rc0
flagcx                            0.8.0              /home/FlagCX/plugin/torch
  1. Unfortunately, no matter I enable or disable FlagCX or FlagGems, the conflict between vllm/vllm-fl and torch-corex still exists

I wonder if there is any help about solving the mismatch between vllm-plugin-fl and multi-backend torch, like torch-npu, torch-metax, torch-corex? Each torch-backend may have their own APIs to cooperate with their SDK/Driver/..., not only Computing operators covered by FlagGems and Communicating operators covered by FlagCX.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions