The kubernetes-mcp-server supports distributed tracing and metrics via OpenTelemetry (OTEL). Observability is optional and disabled by default.
The server automatically traces all operations through middleware without requiring any code changes to individual tools:
-
MCP Tool Calls - Every tool invocation with details:
- Tool name
- Success/failure status
- Duration
- Error details (when applicable)
-
HTTP Requests - All HTTP endpoints when running in HTTP mode:
- Request method and path
- Response status
- Client information
- Duration
Note: When running in STDIO mode only MCP tool calls are traced since there is no HTTP server.
The server collects and exposes metrics through two mechanisms:
-
Stats Endpoint (
/stats) - JSON endpoint for real-time statistics:- Tool call counts by name
- Tool call errors
- HTTP request counts by method/path/status
- Server uptime
-
OTLP Export - When an endpoint is configured, metrics are also exported to your OTLP backend every 30 seconds.
Option A: Jaeger (traces only)
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
docker.io/jaegertracing/all-in-one:latestAccess the Jaeger UI at http://localhost:16686
Note: Jaeger only supports traces, not metrics. To disable metrics export and avoid warnings about
MetricsServicebeing unimplemented, setOTEL_METRICS_EXPORTER=none.
Option B: Grafana LGTM Stack (traces + metrics + logs)
For full observability with metrics support:
docker run -d --name lgtm \
-p 3000:3000 \
-p 4317:4317 \
-p 4318:4318 \
docker.io/grafana/otel-lgtm:latestAccess Grafana at http://localhost:3000 (default credentials: admin/admin)
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Run the server
npx -y kubernetes-mcp-server@latestMake some tool calls through your MCP client, then view traces in the Jaeger UI.
When you call resources_get for a Pod, you'll see a trace like this in Jaeger:
Trace ID: abc123def456789
Duration: 145ms
└─ tools/call resources_get [145ms]
├─ mcp.method.name: tools/call
├─ gen_ai.tool.name: resources_get
├─ gen_ai.operation.name: execute_tool
├─ rpc.jsonrpc.version: 2.0
├─ network.transport: pipe
└─ Status: OK
If the tool call triggers an HTTP request (in HTTP mode), you'll also see:
Trace ID: abc123def456789
Duration: 150ms
├─ POST /message [150ms]
│ ├─ http.request.method: POST
│ ├─ url.path: /message
│ ├─ http.response.status_code: 200
│ ├─ client.address: 192.168.1.100
│ │
│ └─ tools/call resources_get [145ms]
├─ mcp.method.name: tools/call
├─ gen_ai.tool.name: resources_get
├─ gen_ai.operation.name: execute_tool
├─ rpc.jsonrpc.version: 2.0
├─ network.transport: tcp
└─ Status: OK
OpenTelemetry can be configured via TOML config file or environment variables. Environment variables take precedence over TOML config values.
Note: Telemetry is automatically enabled when an endpoint is configured. Use enabled = false in TOML to explicitly disable it.
| TOML Field | Environment Variable | Description |
|---|---|---|
enabled |
- | Explicit enable/disable (overrides all) |
endpoint |
OTEL_EXPORTER_OTLP_ENDPOINT |
OTLP endpoint URL |
protocol |
OTEL_EXPORTER_OTLP_PROTOCOL |
Protocol: grpc or http/protobuf |
traces_sampler |
OTEL_TRACES_SAMPLER |
Sampling strategy |
traces_sampler_arg |
OTEL_TRACES_SAMPLER_ARG |
Sampling ratio (0.0-1.0) |
Add a [telemetry] section to your config file:
[telemetry]
# Optional: explicitly enable/disable (omit to auto-enable when endpoint is set)
enabled = true
endpoint = "http://localhost:4317"
# Protocol: "grpc" (default) or "http/protobuf"
protocol = "grpc"
# Trace sampling strategy
# Options: "always_on", "always_off", "traceidratio", "parentbased_always_on", "parentbased_always_off", "parentbased_traceidratio"
traces_sampler = "traceidratio"
# Sampling ratio for ratio-based samplers (0.0 to 1.0)
traces_sampler_arg = 0.1Enable with endpoint:
[telemetry]
endpoint = "http://localhost:4317"Production with sampling:
[telemetry]
endpoint = "http://tempo-distributor:4317"
traces_sampler = "traceidratio"
traces_sampler_arg = 0.05 # 5% samplingExplicitly disable:
[telemetry]
enabled = falseEnvironment variables take precedence over TOML config. This allows you to override config file settings at runtime.
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317Note: The server gracefully handles failures. If the endpoint is unreachable, the server logs a warning and continues without tracing.
# Service name (defaults to "kubernetes-mcp-server")
export OTEL_SERVICE_NAME=kubernetes-mcp-server
# Service version (auto-detected from binary, rarely needs manual override)
export OTEL_SERVICE_VERSION=1.0.0
# Additional resource attributes (useful for multi-environment deployments)
export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production,team=platform"The server supports both gRPC and HTTP/protobuf protocols:
# gRPC (default, port 4317)
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# HTTP/protobuf (port 4318)
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
# Secure endpoints (HTTPS/gRPC with TLS)
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-secure.example.com:4317
# Custom CA certificate (for self-signed certificates)
export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca.crtBy default, the server uses ParentBased(AlwaysSample) sampling:
- Root spans (no parent): Always sampled (100%)
- Child spans: Inherit parent's sampling decision
This is ideal for development but may generate high trace volumes in production.
For production with high traffic, use ratio-based sampling:
# Sample 10% of traces
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1always_on- Sample everything (default for root spans)always_off- Disable tracing entirelytraceidratio- Sample a percentage (requiresOTEL_TRACES_SAMPLER_ARGbetween 0.0 and 1.0)parentbased_always_on- Respect parent span, default to always_onparentbased_always_off- Respect parent span, default to always_offparentbased_traceidratio- Respect parent span, default to ratio
# Development: Sample everything
export OTEL_TRACES_SAMPLER=always_on
# Production: 5% sampling (good for high-traffic services)
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.05
# Temporarily disable tracing
export OTEL_TRACES_SAMPLER=always_off
# Or just unset the endpoint
unset OTEL_EXPORTER_OTLP_ENDPOINTAdd the MCP server to your project's .mcp.json or global ~/.claude/settings.json:
{
"mcpServers": {
"kubernetes": {
"command": "npx",
"args": ["-y", "kubernetes-mcp-server@latest"],
"env": {
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317",
"OTEL_TRACES_SAMPLER": "always_on"
}
}
}
}For Jaeger (traces only): Add "OTEL_METRICS_EXPORTER": "none" to disable metrics export.
Note: In STDIO mode, only MCP tool calls are traced (no HTTP request spans).
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubernetes-mcp-server
spec:
template:
spec:
containers:
- name: kubernetes-mcp-server
image: quay.io/containers/kubernetes_mcp_server:latest
env:
# OTLP endpoint (required to enable tracing)
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://tempo-distributor.observability:4317"
# Sampling (recommended for production)
- name: OTEL_TRACES_SAMPLER
value: "traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
value: "0.1" # 10% sampling
# Resource attributes (helps identify this deployment)
- name: OTEL_RESOURCE_ATTRIBUTES
value: "deployment.environment=production,k8s.cluster.name=prod-us-west-2"
# Kubernetes metadata (optional, helps correlate traces with K8s resources)
- name: KUBERNETES_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeNameNote: The Kubernetes metadata environment variables are optional but recommended for production deployments. They help correlate traces with specific pods, namespaces, and nodes.
docker run \
-e OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4317 \
-e OTEL_TRACES_SAMPLER=always_on \
quay.io/containers/kubernetes_mcp_server:latestEach tool call creates a span following MCP and OpenTelemetry semantic conventions:
Span Name Format: {mcp.method.name} {target} (e.g., "tools/call resources_get")
Attributes:
mcp.method.name- MCP protocol method (e.g., "tools/call") [Required]gen_ai.tool.name- Name of the tool being called (e.g., "resources_get", "helm_install") [Required for tool calls]gen_ai.operation.name- Set to "execute_tool" for tool calls [Recommended]rpc.jsonrpc.version- JSON-RPC version (typically "2.0") [Recommended]network.transport- Transport protocol: "pipe" for STDIO, "tcp" for HTTP [Recommended]error.type- Error classification: "tool_error" for tool failures, "_OTHER" for other errors [Conditional]
HTTP requests create spans following OpenTelemetry HTTP semantic conventions:
Span Name Format: {METHOD} {path} (e.g., "POST /message")
Attributes:
http.request.method- Request method (GET, POST, etc.) [Required]url.path- URL path [Required]url.scheme- URL scheme (http or https) [Required]server.address- Server host [Recommended]network.protocol.name- Protocol name (http) [Recommended]network.protocol.version- Protocol version (HTTP/1.1, HTTP/2) [Recommended]client.address- Client IP address [Recommended]http.route- Normalized route pattern (when different from path) [Conditional]user_agent.original- User agent string (when present) [Conditional]http.request.body.size- Request body size (when present) [Conditional]http.response.status_code- Response status code [Required]error.type- HTTP status code for 4xx/5xx responses [Conditional]
Note: HTTP spans only appear when running in HTTP mode. STDIO mode (Claude Code) only creates MCP tool call spans. The /healthz endpoint is not traced to reduce noise.
When running in HTTP mode, the server exposes a /stats endpoint that returns real-time statistics as JSON:
curl http://localhost:8080/statsExample response:
{
"total_tool_calls": 42,
"tool_call_errors": 2,
"tool_calls_by_name": {
"resources_list": 15,
"pods_get": 12,
"helm_list": 10,
"resources_get": 5
},
"total_http_requests": 100,
"http_requests_by_path": {
"/mcp": 50,
"/sse": 30,
"/message": 20
},
"uptime_seconds": 3600.5
}The stats endpoint is useful for:
- Health monitoring and alerting
- Quick debugging without a full observability stack
- Integration with simple monitoring systems
Note: The /stats endpoint is only available in HTTP mode. In STDIO mode, use OTLP export for metrics.
When running in HTTP mode, the server exposes a /metrics endpoint for Prometheus scraping:
curl http://localhost:8080/metricsThis endpoint returns metrics in OpenMetrics/Prometheus text format, suitable for scraping by Prometheus or compatible systems.
| Metric | Type | Description |
|---|---|---|
k8s_mcp_tool_calls_total |
Counter | Total MCP tool calls (labeled by tool_name) |
k8s_mcp_tool_errors_total |
Counter | Total MCP tool errors (labeled by tool_name) |
k8s_mcp_tool_duration_seconds |
Histogram | Tool call duration in seconds |
k8s_mcp_http_requests_total |
Counter | HTTP requests (labeled by http_request_method, url_path, http_response_status_class) |
k8s_mcp_server_info |
Gauge | Server info (labeled by version, go_version) |
scrape_configs:
- job_name: 'kubernetes-mcp-server'
static_configs:
- targets: ['localhost:8080']
metrics_path: /metricsWhen deployed in Kubernetes with the Helm chart, enable the ServiceMonitor:
metrics:
serviceMonitor:
enabled: true
interval: 30sNote: The /metrics endpoint is only available in HTTP mode.
-
Check endpoint is set:
echo $OTEL_EXPORTER_OTLP_ENDPOINT
-
Check server logs (increase verbosity):
# Look for "OpenTelemetry tracing initialized successfully" kubernetes-mcp-server -v 2If tracing fails to initialize, you'll see:
Failed to create OTLP exporter, tracing disabled: <error details> -
Verify OTLP collector is reachable:
# For gRPC endpoint (port 4317) telnet localhost 4317 # For HTTP endpoint (port 4318) curl http://localhost:4318/v1/traces
-
Check sampling - you might be sampling at 0% or using
always_off:echo $OTEL_TRACES_SAMPLER echo $OTEL_TRACES_SAMPLER_ARG
-
Verify service name:
echo $OTEL_SERVICE_NAME
Search for this service name in your tracing UI (defaults to "kubernetes-mcp-server").
-
Check backend configuration - ensure your OTLP collector is forwarding to the right backend.
-
Verify protocol compatibility:
- If using HTTP-based backends, ensure you set
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf - Check if you need port 4317 (gRPC) or 4318 (HTTP)
- If using HTTP-based backends, ensure you set
If using HTTPS/secure endpoints:
-
Certificate errors:
# Provide custom CA certificate export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca.crt
-
Self-signed certificates:
# For testing only - not recommended for production export OTEL_EXPORTER_OTLP_INSECURE=true
Tracing has minimal performance overhead:
- Middleware tracing: Typically 1-2ms per tool call
- Network overhead: Spans are batched and exported every 5 seconds
- Memory: Approximately 1-5MB for span buffers
- CPU: Negligible (<1% for most workloads)
For production deployments with high traffic, use ratio-based sampling to reduce costs while maintaining observability.
The OpenTelemetry SDK automatically detects and adds resource attributes from the environment:
- Host information: hostname, OS, architecture
- Process information: PID, executable name
- Container information: container ID (when running in containers)
- Kubernetes information: pod name, namespace (when K8s env vars are present)
These are merged with any attributes you set via OTEL_RESOURCE_ATTRIBUTES.
When the kubernetes-mcp-server is part of a distributed system:
- Parent spans are automatically detected and respected
- Trace context is propagated via standard W3C Trace Context headers
- Sampling decisions from parent spans are inherited (via ParentBased sampler)
This means traces can span multiple services seamlessly.
Add custom attributes to help identify and filter traces:
export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=staging,team=platform,region=us-west-2,version=v1.2.3"These attributes appear on all spans from this service instance and are useful for:
- Filtering traces by environment (prod vs staging)
- Analyzing performance by region or deployment
- Tracking issues to specific versions or teams