This file provides guidance to AI coding assistants (Claude Code, GitHub Copilot, etc.) when working with code in this repository.
make build- Compiles the cluster-health-analyzer binarymake test- Runs unit testsmake test-verbose- Runs unit tests with verbose outputmake precommit- Execute linting and testing (run before submitting PRs)
make lint- Run golangci-lint with project configurationmake generate- Run Go code generation
make proxy- Port-forward to thanos-querier for local developmentmake run- Execute cluster-health-analyzer server with disabled authentication- Listens on
https://localhost:8443/metrics - Requires proxy to be running for Thanos access
- Listens on
make run-mcp- Run the MCP (Model Context Protocol) server locallymake simulate- Generate test data and create metrics file from CSV
make deploy- Deploy services to a cluster (requiresoc login)make undeploy- Remove services from the cluster
The cluster-health-analyzer binary has three main subcommands:
Main server mode that analyzes cluster health in real-time:
- Connects to Thanos querier to fetch cluster alerts
- Processes alerts into incident groups
- Maps alerts to high-level components
- Exposes Prometheus metrics at
/metricsendpoint - Authentication can be disabled for testing with
--disable-auth-for-testingflag
Development tool for generating test data:
- Reads alert definitions from CSV file
- Creates simulated Prometheus metrics
- Useful for testing without live cluster
- See
development.mdfor CSV format details
Model Context Protocol server for AI integration:
- Provides structured interface for AI assistants
- Exposes cluster health analysis capabilities
cmd/ - CLI commands and entry points
cmd/serve/- Main server implementationcmd/simulate/- Test data generationcmd/mcp/- MCP server implementation
pkg/health/ - Core health analysis logic
- Alert processing and grouping (incident detection)
- Component mapping and ranking
- Kubernetes health checking
- Alert matching rules
pkg/processor/ - Data processing pipeline
- Transforms raw alerts into structured health data
- Applies heuristics for root cause analysis
pkg/prom/ - Prometheus integration
- Metrics exposition
- Thanos querier client
- Alert fetching and parsing
pkg/alertmanager/ - Alertmanager integration
- Alert silence detection and handling
pkg/server/ - HTTP server and API
- Metrics endpoint
- Authentication handling
- TLS configuration (self-signed certificates for local dev)
pkg/mcp/ - Model Context Protocol implementation
- AI assistant integration layer
pkg/common/ - Shared utilities
- Label processing
- Common constants and types
pkg/utils/ - General utilities
- Helper functions
- Testing utilities
pkg/test/ - Test helpers and fixtures
- Shared test data
- Mock implementations
The analyzer exposes two main Prometheus metrics:
-
cluster_health_components_map
- Maps individual alerts to components
- Labels include: alert name, namespace, severity, silencing status, incident group ID
-
cluster_health_components
- Provides component metadata and ranking
- Shows component layer (e.g., "compute", "storage")
- Includes importance ranking for prioritization
Always run before creating pull requests:
make precommit # Runs linting and tests- Test files follow Go convention:
*_test.go - Use table-driven tests where appropriate
- Mock external dependencies (Kubernetes, Prometheus) for unit tests
- Test utilities available in
pkg/test/andpkg/utils/
- Login to OpenShift cluster:
oc login - Start Thanos proxy:
make proxy - In separate terminal, run server:
make run - Access metrics at:
https://localhost:8443/metrics
For development without a live cluster:
- Create CSV file with alert definitions (see
development.md) - Run:
make simulate - Use
promtoolto convert metrics to TSDB - Copy to cluster Prometheus with
kubectl cp
- Production: Uses Kubernetes service account via
$KUBECONFIG - Testing: Disable with
--disable-auth-for-testingflag
- Server expects Thanos querier available via port-forward
- Default:
http://localhost:9090(configured via proxy) - Queries alerts using PromQL
- Local development uses self-signed certificates
- Metrics endpoint always served over HTTPS
- Certificate handling in
pkg/server/
- Go version: 1.24
- Main dependencies:
- Kubernetes client-go and apimachinery
- Prometheus client_golang and alertmanager
- OpenShift API libraries
- Cobra for CLI framework
- Run
go mod tidyandgo mod vendorto update dependencies
- Follow standard Go formatting (
gofmt) - Import groups: stdlib, external dependencies, current project
- Use golangci-lint rules defined in project configuration
- Keep functions focused and testable