This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Configuration Anomaly Detection (CAD) is a Go-based system that reduces manual SRE effort by pre-investigating alerts, detecting cluster anomalies, and sending relevant communications to cluster owners. It integrates with PagerDuty webhooks and uses Tekton pipelines for automated remediation.
make build- Build all subprojects (cadctl and interceptor)make build-cadctl- Build only the cadctl binary to./bin/cadctlmake build-interceptor- Build only the interceptor binary to./bin/interceptor
make test- Run all tests for both cadctl and interceptormake test-cadctl- Run unit tests for cadctl and pkg modulesmake test-interceptor- Run unit tests for interceptormake test-interceptor-e2e- Run e2e tests for interceptor
make lint- Lint all subprojectsmake lint-cadctl- Lint cadctl using golangci-lintmake lint-interceptor- Lint interceptor using golangci-lint
make generate-cadctl- Generate mocks for cadctl using mockgen
For testing against clusters:
- Create a test cluster - Manual tests requiring cluster ID need an actual cluster to be created first
./test/generate_incident.sh <alertname> <clusterid>- Create test incident payload with the cluster IDsource test/set_stage_env.sh- Export required environment variables from vault./bin/cadctl investigate --payload-path payload- Run investigation
Note: Tests that require a cluster ID (like manual tests using shell scripts) need you to create a cluster first and provide its ID. Only then can you trigger the PagerDuty alert for that cluster to have local CAD run an investigation on it.
cadctl - CLI tool implementing alert investigations and remediations
- Entry point:
cadctl/main.go - Commands:
cadctl/cmd/ - Investigations registry:
pkg/investigations/registry.go
interceptor - Tekton interceptor for webhook filtering
- Entry point:
interceptor/main.go - Filters PagerDuty webhooks and validates signatures
- Determines if alerts have implemented handlers
investigations - Modular alert investigation implementations
- Location:
pkg/investigations/ - Each investigation implements the
Investigationinterface - Investigations include: chgm, ccam, clustermonitoringerrorbudgetburn, etc.
Investigations follow a consistent pattern:
- Implement
Investigationinterface frompkg/investigations/investigation/investigation.go - Include
metadata.yamlfor RBAC permissions - Testing directory with manual test procedures
- Auto-registered in
pkg/investigations/registry.go
Pre-initialized clients available in investigation resources:
- AWS (
pkg/aws) - Instance info, CloudTrail events - OCM (
pkg/ocm) - Cluster info, service logs, limited support reasons - PagerDuty (
pkg/pagerduty) - Alert info, incident management, notes - K8s (
pkg/k8s) - Kubernetes API client - osd-network-verifier (
pkg/networkverifier) - Network verification
- PagerDuty webhook → Tekton EventListener
- Interceptor validates and filters webhooks
- If handler exists → PipelineRun starts
- Pipeline executes
cadctl investigate - Investigation runs and posts results to PagerDuty
make bootstrap-investigation- Generates boilerplate code and directory structure- Implement investigation logic in generated files
- Add test objects/scripts to
testing/directory - Update investigation-specific README with testing procedures
- Follow progressive deployment: Informing Stage (read-only) → Actioning Stage (read/write)
For local development (available via source test/set_stage_env.sh):
CAD_OCM_CLIENT_ID,CAD_OCM_CLIENT_SECRET,CAD_OCM_URL- OCM client configurationCAD_PD_EMAIL,CAD_PD_PW,CAD_PD_TOKEN,CAD_PD_USERNAME- PagerDuty authenticationCAD_SILENT_POLICY- PagerDuty silent policyPD_SIGNATURE- PagerDuty webhook signature validationBACKPLANE_URL,BACKPLANE_INITIAL_ARN- Backplane accessCAD_PROMETHEUS_PUSHGATEWAY- Metrics endpoint
Optional:
BACKPLANE_PROXY- Required for local developmentCAD_EXPERIMENTAL_ENABLED=true- Enable experimental investigationsLOG_LEVEL- Logging level (default: info)