A Claude Code plugin for monitoring etcd cluster health and analyzing performance in OpenShift environments.
This plugin provides commands to help diagnose and troubleshoot etcd-related issues in OpenShift clusters. Etcd is the critical distributed key-value store that holds all cluster state for Kubernetes/OpenShift, and maintaining its health and performance is essential for cluster stability.
Performs a comprehensive health check of the etcd cluster, examining:
- Etcd pod status and availability
- Cluster health and member status
- Leadership election status
- Database size and fragmentation
- Disk space utilization
- Recent error logs
- Performance metrics (with
--verboseflag)
Usage:
/etcd:health-check [--verbose]
Example:
/etcd:health-check
/etcd:health-check --verbose
Analyzes etcd performance metrics to identify latency issues and bottlenecks, including:
- Disk I/O performance (commit latency, fsync duration)
- Network latency between etcd peers
- Request/response performance by operation type
- Leader stability and proposal metrics
- Database size and fragmentation
- Performance warnings from logs
Usage:
/etcd:analyze-performance [--duration <minutes>]
Example:
/etcd:analyze-performance
/etcd:analyze-performance --duration 15
All commands require:
- OpenShift CLI (oc) - Install from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/
- Active cluster connection - Must be authenticated to an OpenShift cluster
- Cluster admin permissions - Required to access etcd pods and metrics
- Running etcd pods - At least one etcd pod must be running
# Add the marketplace (if not already added)
/plugin marketplace add openshift-eng/ai-helpers
# Install the etcd plugin
/plugin install etcd@ai-helpers# Clone the repository
git clone https://github.com/openshift-eng/ai-helpers.git
# Link to your Claude Code plugins directory
ln -s $(pwd)/ai-helpers/plugins/etcd ~/.claude/plugins/etcdWhen experiencing cluster-wide problems:
- Run
/etcd:health-checkto verify etcd cluster status - If issues are found, run
/etcd:analyze-performanceto identify bottlenecks - Follow the recommendations provided in the output
For proactive performance monitoring:
- Run
/etcd:analyze-performance --duration 30for comprehensive analysis - Review disk I/O and network latency metrics
- Compare against recommended thresholds
- Implement suggested optimizations
Before scaling operations:
- Check current database size with
/etcd:health-check - Analyze performance trends with
/etcd:analyze-performance - Identify if hardware upgrades are needed
Problem: Backend commit P99 > 100ms or WAL fsync P99 > 10ms
Solutions:
- Migrate to SSD or NVMe storage
- Use dedicated disks for etcd (not shared with OS)
- Check for competing I/O workloads
Problem: Leader changes > 5
Solutions:
- Check network connectivity between etcd nodes
- Ensure nodes are in same datacenter/availability zone
- Verify no clock skew between nodes
Problem: Database size > 8GB or high fragmentation
Solutions:
- Run etcd defragmentation
- Review event retention policies
- Check for excessive key creation
Recommended thresholds for healthy etcd:
- Backend commit P99: < 100ms
- WAL fsync P99: < 10ms
- Peer RTT P99: < 50ms
- Leader changes: < 5 total
- Database size: < 8GB
- Disk usage: < 80%
- Commands require cluster-admin or equivalent permissions
- Access to etcd allows viewing all cluster secrets
- Metrics and logs may contain sensitive information
- Performance data should be treated as confidential
- Etcd Documentation: https://etcd.io/docs/
- OpenShift Etcd Docs: https://docs.openshift.com/container-platform/latest/backup_and_restore/control_plane_backup_and_restore/
- Performance Tuning: https://etcd.io/docs/latest/tuning/
To contribute improvements or report issues:
- Visit https://github.com/openshift-eng/ai-helpers
- Open an issue or pull request
- Follow the contribution guidelines in the repository
This plugin is part of the ai-helpers project and follows the same license terms.