Skip to content

[BUG]: Nvidia GRID 570 CUDA 12.8.93 #21

@DmitriiKuvshinov

Description

@DmitriiKuvshinov

Version

v2.0.0

Describe the bug.

Hey,
I want to launch nim with: 6xA100
Nvidia GRID version 570.124.06 CUDA Version: 12.8

$ docker logs nemoretriever-ranking-ms
===================================
== NVIDIA NIM for Text Reranking ==
===================================

NVIDIA Release 1.3.1
Model: nvidia/llama-3.2-nv-rerankqa-1b-v2

Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This NIM container is governed by the NVIDIA AI Product Agreement here:
https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/
A copy of this license can be found under /opt/nim/LICENSE.
The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/).
Third Party Software Attributions and Licenses can be found under /opt/nim/NOTICE

Overriding NIM_LOG_LEVEL: replacing NIM_LOG_LEVEL=unset with NIM_LOG_LEVEL=INFO
Traceback (most recent call last):
  File "/opt/nim/start_server.d/nim_manifest_profile.py", line 166, in <module>
    system = get_info()
Exception: Failed to query NVML device info: an internal driver error occured

Checking - docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi return table of nvidia-smi result

Sat Apr 19 18:42:27 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID A100D-2-20C               On  |   00000000:02:00.0 Off |                   On |
| N/A   N/A    P0            N/A  /  N/A  |                  N/A   |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   1  GRID A100D-2-20C               On  |   00000000:02:02.0 Off |                   On |
| N/A   N/A    P0            N/A  /  N/A  |                  N/A   |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   2  GRID A100D-2-20C               On  |   00000000:02:03.0 Off |                   On |
| N/A   N/A    P0            N/A  /  N/A  |                  N/A   |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   3  GRID A100D-2-20C               On  |   00000000:02:04.0 Off |                   On |
| N/A   N/A    P0            N/A  /  N/A  |                  N/A   |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   4  GRID A100D-2-20C               On  |   00000000:02:05.0 Off |                   On |
| N/A   N/A    P0            N/A  /  N/A  |                  N/A   |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   5  GRID A100D-2-20C               On  |   00000000:02:06.0 Off |                   On |
| N/A   N/A    P0            N/A  /  N/A  |                  N/A   |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|        Shared         |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC  DEC  OFA  JPG |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  0    0   0   0  |               1MiB / 18412MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 0MiB /  4096MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  1    0   0   0  |               1MiB / 18412MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 0MiB /  4096MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  2    0   0   0  |               1MiB / 18412MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 0MiB /  4096MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  3    0   0   0  |               1MiB / 18412MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 0MiB /  4096MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  4    0   0   0  |               1MiB / 18412MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 0MiB /  4096MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  5    0   0   0  |               1MiB / 18412MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 0MiB /  4096MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Full env printout

MODEL_DIRECTORY=/data/nvidia/.cache/model-cache
NVIDIA_API_KEY=nvapi-lx..........jcOILb
  • I agree to follow THIS PROJECT's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions