A production-ready Kubernetes operator for managing Valkey clusters, built in Rust using kube-rs.
- High Availability: Automatic cluster formation with configurable masters and replicas
- Secure by Default: Required TLS (via cert-manager) and authentication
- Horizontal Scaling: Add/remove masters with automatic slot migration
- Rolling Upgrades: Zero-downtime upgrades via ValkeyUpgrade CRD with coordinated failover
- Observability: Optional Prometheus metrics exporter sidecar
- ACL Support: Fine-grained access control with Valkey 7+ ACL system
- Panic-Free: All production code paths handle errors gracefully
- Kubernetes 1.35+
- Rust 1.92+ (for building from source)
- cert-manager installed in the cluster
- kubectl configured
# Install CRDs
kubectl apply -f config/crd/
# Install RBAC
kubectl apply -f config/rbac/
# Deploy the operator
kubectl apply -f config/deploy/Or using Helm:
helm install valkey-operator ./charts/valkey-operator- Create a password secret:
apiVersion: v1
kind: Secret
metadata:
name: valkey-auth
type: Opaque
stringData:
password: "your-secure-password"- Create a ValkeyCluster:
apiVersion: valkey-operator.smoketurner.com/v1alpha1
kind: ValkeyCluster
metadata:
name: my-cluster
spec:
masters: 3
replicasPerMaster: 1
tls:
issuerRef:
name: my-cluster-issuer
kind: ClusterIssuer
auth:
secretRef:
name: valkey-auth- Watch the cluster come up:
kubectl get vc -w| Field | Description | Default |
|---|---|---|
masters |
Number of master nodes (min 3) | 3 |
replicasPerMaster |
Replicas per master | 1 |
tls.issuerRef |
cert-manager issuer reference | Required |
auth.secretRef |
Secret containing password | Required |
auth.acl |
ACL configuration for fine-grained access | Disabled |
persistence.enabled |
Enable persistent storage | true |
persistence.size |
PVC size | 10Gi |
readService.enabled |
Enable read-only service | false |
metricsExporter.enabled |
Enable Prometheus exporter sidecar | false |
replication.disklessSync |
Enable diskless replication | false |
replication.minReplicasToWrite |
Minimum replicas for write acknowledgment | 1 (when replicas exist) |
See docs/features.md for detailed configuration options.
apiVersion: valkey-operator.smoketurner.com/v1alpha1
kind: ValkeyCluster
metadata:
name: production-cluster
spec:
masters: 6
replicasPerMaster: 2
tls:
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
auth:
secretRef:
name: valkey-auth
acl:
enabled: true
configSecretRef:
name: valkey-acl
key: acl.conf
persistence:
enabled: true
size: 50Gi
storageClassName: ssd
resources:
requests:
cpu: "1"
memory: "4Gi"
limits:
cpu: "2"
memory: "8Gi"
readService:
enabled: true
metricsExporter:
enabled: true
port: 9121
replication:
disklessSync: true
minReplicasToWrite: 1
minReplicasMaxLag: 10Create a ValkeyUpgrade resource to perform zero-downtime upgrades:
apiVersion: valkey-operator.smoketurner.com/v1alpha1
kind: ValkeyUpgrade
metadata:
name: upgrade-to-9.1
spec:
clusterRef:
name: my-cluster
targetVersion: "9.1.0"The operator performs per-shard rolling upgrades:
- Upgrade replicas first
- Wait for replication sync
- Execute coordinated failover
- Upgrade old master (now replica)
- Move to next shard
See docs/state-machines.md for detailed upgrade workflow.
| CRD | Purpose |
|---|---|
ValkeyCluster |
Deploy and manage Valkey cluster topology |
ValkeyUpgrade |
Handle rolling upgrades with failover orchestration |
For each ValkeyCluster, the operator creates:
- StatefulSet: Valkey pods with stable identity
- Headless Service: Cluster discovery
- Client Service: Client access endpoint
- Read Service (optional): Read-only traffic distribution
- PodDisruptionBudget: Maintain quorum
- Certificate: TLS certs via cert-manager
make build # Build release binary
make docker-build # Build Docker image
make docker-push # Push Docker imagemake test # Run unit tests
make test-integration # Run integration tests
make lint # Run clippy lintsmake install # Install CRD and RBAC
make run # Run operator locally- State Machines - Lifecycle phases and transitions
- Features - Detailed feature documentation
- CLAUDE.md - AI assistant instructions
| Target | Description |
|---|---|
make build |
Build release binary |
make run |
Run operator locally |
make test |
Run unit tests |
make test-integration |
Run integration tests |
make lint |
Run clippy lints |
make fmt |
Format code |
make install |
Install CRD and RBAC |
make deploy |
Deploy to cluster |
make docker-build |
Build Docker image |
All production code paths handle errors gracefully. Clippy lints deny unwrap(), expect(), and panic!().
Every reconcile operation can be safely retried. Uses server-side apply for atomic updates.
TLS and authentication are required, not optional. The operator enforces security best practices.
Resource lifecycle is managed through formal FSMs with defined phase transitions. See docs/state-machines.md.
MIT