Skip to content

Per-node configuration ignored due to flag defaults — prevents mixed MIG and non-MIG node setups #87

@Frk2208

Description

@Frk2208

In a heterogeneous cluster (MIG and non-MIG GPU nodes), the Volcano vGPU device plugin ignores per-node settings from the volcano-vgpu-node-config ConfigMap because command-line flag defaults override them.

Environment:
Kubernetes: v1.28.15
Volcano: v1.13.0

Problem:
Defaults from --mig-strategy (none) and --device-split-count (2) in main.go are applied before per-node configs are read, overriding settings in ConfigMap.

Observed Behavior:

Non-MIG node (gpu24042)
Config: "operatingmode": "hami-core", "devicesplitcount": 4

Result: volcano.sh/vgpu-number: "2" (ignored 4)
  allocatable:
    cpu: "4"
    ephemeral-storage: "58801084319"
    hugepages-2Mi: "0"
    memory: 16274732Ki
    pods: "110"
    volcano.sh/vgpu-cores: "100"
    volcano.sh/vgpu-memory: "12288"
    volcano.sh/vgpu-number: "2"
  capacity:
    cpu: "4"
    ephemeral-storage: 63803260Ki
    hugepages-2Mi: "0"
    memory: 16377132Ki
    pods: "110"
    volcano.sh/vgpu-cores: "100"
    volcano.sh/vgpu-memory: "12288"
    volcano.sh/vgpu-number: "2"

MIG node (gracehopper)
Config: "operatingmode": "mig"
Result: MIG mode not activated; advertises zero resources.

Expected Behavior:

Non-MIG node honors devicesplitcount: 4.

allocatable:
  cpu: "72"
  ephemeral-storage: "849546416770"
  hugepages-2Mi: "0"
  hugepages-16Gi: "0"
  hugepages-512Mi: "0"
  memory: 548096704Ki
  nvidia.com/gpu: "0"
  nvidia.com/mig-1g.12gb: "0"
  nvidia.com/mig-3g.48gb: "0"
  pods: "110"
  volcano.sh/vgpu-cores: "0"
  volcano.sh/vgpu-memory: "0"
  volcano.sh/vgpu-number: "0"
capacity:
  cpu: "72"
  ephemeral-storage: 921816860Ki
  hugepages-2Mi: "0"
  hugepages-16Gi: "0"
  hugepages-512Mi: "0"
  memory: 548199104Ki
  nvidia.com/gpu: "0"
  nvidia.com/mig-1g.12gb: "0"
  nvidia.com/mig-3g.48gb: "0"
  pods: "110"
  volcano.sh/vgpu-cores: "0"
  volcano.sh/vgpu-memory: "0"
  volcano.sh/vgpu-number: "4"

in the yaml:

gpu24042: |
  {
      "nodeconfig": [
          {
              "name": "gpu24042",
              "operatingmode": "hami-core",
              "devicememoryscaling": 1,
              "devicesplitcount": 4,
              "migstrategy":"none"
          }
      ]
  }
gracehopper: |
  {
      "nodeconfig": [
          {
              "name": "gracehopper",
              "operatingmode": "mig",
              "migstrategy": "mixed"
          }
      ]
  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions