A Prometheus exporter that enhances DCGM (Data Center GPU Manager) metrics with Docker container mapping information. This exporter is designed to work alongside DCGM Exporter, enriching its metrics with container-level information to provide better observability of GPU usage in containerized environments.
- Re-exports DCGM metrics with added container information (primary use case)
- Maps GPU metrics to Docker container names
- Adds container name information to DCGM metrics
- Real-time monitoring of GPU processes and their container associations
- Configurable update intervals and logging levels
- Go 1.x or higher
- NVIDIA GPU(s)
- NVIDIA drivers installed
- DCGM Exporter running in your environment
nvidia-smicommand-line tool- Docker runtime
You can download pre-built binaries for Linux (AMD64 and ARM64) from the releases page.
# Download the latest release for your architecture
# For AMD64:
curl -L -o dcgm-container-mapper "https://github.com/brtnshrdr/dcgm-container-mapper/releases/latest/download/dcgm-container-mapper-linux-amd64"
# For ARM64:
curl -L -o dcgm-container-mapper "https://github.com/brtnshrdr/dcgm-container-mapper/releases/latest/download/dcgm-container-mapper-linux-arm64"
# Make it executable
chmod +x dcgm-container-mapper-linux-amd64# Clone the repository
git clone https://github.com/brtnshrdr/dcgm-container-mapper.git
cd dcgm-container-mapper
# Install dependencies
go mod tidy
# Build the binaries
./build.shThe most common usage is with DCGM re-export enabled:
./dcgm-container-mapper --reexport-dcgm --dcgm-port 9400This mode is recommended because:
- It preserves all valuable DCGM metrics (GPU utilization, memory usage, temperature, etc.)
- Adds container context to these metrics (container name, pod name, namespace)
- Maintains compatibility with existing DCGM-based dashboards while adding container visibility
- Enables better correlation between GPU metrics and container performance
--reexport-dcgm: Enable re-exporting of DCGM metrics [default: false] (Recommended to enable)--dcgm-port: DCGM exporter port to read from [default: "9400"]--port: Port to listen on [default: "9100"]--listen-address: Address to listen on [default: "localhost"]--update-interval: Interval to update GPU information [default: 5s]--log-level: Set logging level (debug, info, warn, error) [default: "info"]
The exporter provides metrics in two modes:
When running with --reexport-dcgm, all DCGM metrics are re-exported with additional container context labels:
exported_podexported_containerexported_namespace
exported_pod will always equal exported_container, and exported_namespace will always be "docker". This is to align with "Kubernetes mode" (DCGM_EXPORTER_KUBERNETES=true) of the DCGM exporter.
This enriches the standard DCGM metrics with container information, making it easier to track GPU usage per container/pod.
Without --reexport-dcgm, only basic GPU-to-container mapping is provided:
# HELP dcgm_container_mapping Mapping between GPU ID and container and process name
# TYPE dcgm_container_mapping gauge
Metric format:
dcgm_container_mapping{gpu="0",modelName="Tesla V100",UUID="GPU-xxx",container="container_name",process="process_name"} 0
- Start the exporter:
./dcgm-container-mapper --port 9100 --log-level debug- Access metrics:
curl http://localhost:9100/metricsAdd the following to your prometheus.yml:
scrape_configs:
- job_name: 'dcgm-container-mapper'
static_configs:
- targets: ['localhost:9100']