fix(mem): correct GPU memory accounting (host vs container) and memory limits accordingly by loiht2 · Pull Request #153 · Project-HAMi/HAMi-core

loiht2 · 2026-01-23T12:21:59Z

Fixes incorrect GPU memory reporting inside the container vs. on the host (as shown by nvidia-smi).
Enforces GPU memory limits using the corrected container-visible memory, preventing incorrect OOM enforcement.

GPU Memory Usage

In container

Command: nvidia-smi

Command: nvidia-smi -a

    FB Memory Usage
        Total                             : 3072 MiB
        Reserved                          : 274 MiB
        Used                              : 2584 MiB
        Free                              : 488 MiB

On host

Command: nvidia-smi

Command: nvidia-smi -a

    FB Memory Usage
        Total                             : 32768 MiB
        Reserved                          : 274 MiB
        Used                              : 2588 MiB
        Free                              : 29907 MiB

GPU Memory Limit Enforcement (OOM scenario)

I run the same pod, which requires ~2588 MiB GPU memory. However, in this test, the ResourceClaim requests only 2GiB (2048 MiB) GPU memory, so the pod hits GPU OOM.

Pod log showing the OOM:

…its accordingly Signed-off-by: Hoang Thanh Loi <loi.hoangthanh.24@gmail.com>

…its accordingly (updated) Signed-off-by: Hoang Thanh Loi <loi.hoangthanh.24@gmail.com>

hami-robot · 2026-01-23T12:22:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: loiht2
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Hoang Thanh Loi added 2 commits January 23, 2026 11:09

fix(mem): correct container vs host gpu memory and enforce memory lim…

1cdc65f

…its accordingly Signed-off-by: Hoang Thanh Loi <loi.hoangthanh.24@gmail.com>

fix(mem): correct container vs host gpu memory and enforce memory lim…

896e22d

…its accordingly (updated) Signed-off-by: Hoang Thanh Loi <loi.hoangthanh.24@gmail.com>

hami-robot bot requested a review from archlitchi January 23, 2026 12:22

hami-robot bot added the dco-signoff: yes label Jan 23, 2026

hami-robot bot requested a review from chaunceyjiang January 23, 2026 12:22

hami-robot bot added the size/M label Jan 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mem): correct GPU memory accounting (host vs container) and memory limits accordingly#153

fix(mem): correct GPU memory accounting (host vs container) and memory limits accordingly#153
loiht2 wants to merge 2 commits intoProject-HAMi:mainfrom
loiht2:fix/container-memory

loiht2 commented Jan 23, 2026

Uh oh!

hami-robot bot commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

loiht2 commented Jan 23, 2026

GPU Memory Usage

In container

On host

GPU Memory Limit Enforcement (OOM scenario)

Uh oh!

hami-robot bot commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant