A Kubernetes controller that automatically manages the controller.kubernetes.io/pod-deletion-cost annotation on pods. This annotation influences which pods are terminated first during scale-down operations, enabling smarter and more resilient downscaling behavior.
The controller is designed to be extensible with a plugin-based architecture, allowing multiple algorithms for calculating pod deletion costs. Currently, it includes a zone-aware distribution algorithm that ensures even pod termination across availability zones.
When Kubernetes scales down a Deployment or ReplicaSet, it uses a specific algorithm to determine which pods to delete first. The default selection criteria are:
- Pods that are unassigned (not scheduled to a node)
- Pods in Pending or Unknown phase
- Pods not ready vs. pods ready
- Pods with lower pod-deletion-cost annotation value
- Pods with more recent creation timestamps
- Random selection as a tiebreaker
The issue: Without explicit pod-deletion-cost annotations, Kubernetes may delete pods unevenly across availability zones during scale-down. This can lead to:
- Imbalanced zone distribution - One zone might lose significantly more pods than others
- Reduced resilience - Workloads become vulnerable to zone failures
- Topology constraint violations - Even with
topologySpreadConstraints, scale-down doesn't respect zone distribution
For example, if you have 6 pods spread across 3 zones (2 per zone) and scale down to 3 pods, Kubernetes might delete 2 pods from Zone A and 1 from Zone B, leaving you with an unbalanced distribution (0 in Zone A, 1 in Zone B, 2 in Zone C).
Kubernetes provides the controller.kubernetes.io/pod-deletion-cost annotation (stable since v1.22) to influence pod deletion order:
- Lower values = Higher deletion priority (deleted first)
- Higher values = Lower deletion priority (deleted last)
- Valid range: -2147483648 to 2147483647
This controller automatically assigns these values based on configurable algorithms, ensuring predictable and resilient scale-down behavior.
- Kubernetes ReplicaSet - Pod Deletion Cost
- KEP-2255: Pod Deletion Cost
- Understanding K8s Scale-In Algorithm
- Descheduler - TopologySpreadConstraint
The zone algorithm (default) ensures even pod distribution across availability zones during scale-down.
- Pod Detection - Controller watches for pods belonging to enabled Deployments
- Zone Identification - Determines the pod's zone from its node's
topology.kubernetes.io/zonelabel - Cost Calculation - Assigns unique deletion costs within each zone, starting from MaxInt32 (2147483647) and descending
- Annotation - Applies
controller.kubernetes.io/pod-deletion-costto the pod
Within each zone, pods receive descending cost values:
- First pod in zone:
2147483647(most protected) - Second pod in zone:
2147483646 - And so on...
Different zones independently allocate their own cost values. This ensures that during scale-down, Kubernetes removes pods evenly across zones.
Initial state: 6 pods across 3 zones
Zone A: Pod1 (cost: 2147483647), Pod2 (cost: 2147483646)
Zone B: Pod3 (cost: 2147483647), Pod4 (cost: 2147483646)
Zone C: Pod5 (cost: 2147483647), Pod6 (cost: 2147483646)
After scaling down to 3 pods:
Kubernetes deletes pods with the lowest costs first. Since each zone has pods with cost 2147483646, one pod is removed from each zone:
Zone A: Pod1 (cost: 2147483647)
Zone B: Pod3 (cost: 2147483647)
Zone C: Pod5 (cost: 2147483647)
Result: Even distribution maintained across all zones.
VERSION=v0.0.0-alpha.2
# Pull the chart (optional)
helm pull oci://ghcr.io/lablabs/pod-deletion-cost-controller/pod-deletion-cost-controller \
--version ${VERSION}
# Install
helm upgrade --install pod-deletion-cost-controller \
oci://ghcr.io/lablabs/pod-deletion-cost-controller/pod-deletion-cost-controller \
--namespace operations \
--create-namespace \
--version ${VERSION}Key configuration options in values.yaml:
# Algorithms to enable
algorithms:
- "zone"
# Logging configuration
log:
devel: false
encoder: console # or "json"
level: 3 # 0=debug, 1=info, 2=warn, 3=error
# Health probes
health:
enabled: true
port: 8001
# Metrics
metrics:
enabled: true
service:
ports:
metrics: 9000
# High availability
replicaCount: 1
pdb:
enabled: false
maxUnavailable: 1Add the pod-deletion-cost.lablabs.io/enabled: "true" annotation to your Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
annotations:
pod-deletion-cost.lablabs.io/enabled: "true"
spec:
replicas: 6
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: my-app
containers:
- name: my-app
image: my-app:latest| Annotation | Required | Default | Description |
|---|---|---|---|
pod-deletion-cost.lablabs.io/enabled |
Yes | - | Set to "true" to enable the controller |
pod-deletion-cost.lablabs.io/type |
No | zone |
Algorithm type to use |
pod-deletion-cost.lablabs.io/spread-by |
No | topology.kubernetes.io/zone |
Node label key for topology spreading |
For on-premises or custom environments, you can specify a different node label for topology spreading:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
annotations:
pod-deletion-cost.lablabs.io/enabled: "true"
pod-deletion-cost.lablabs.io/spread-by: "topology.kubernetes.io/rack"
spec:
replicas: 6
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/rack
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: my-app
containers:
- name: my-app
image: my-app:latestWhile zone is the default algorithm, you can explicitly specify it:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
annotations:
pod-deletion-cost.lablabs.io/enabled: "true"
pod-deletion-cost.lablabs.io/type: "zone"The controller uses an extensible plugin-based architecture, making it easy to add new algorithms for different use cases. We welcome contributions!
See CONTRIBUTING.md for:
- Architecture overview and key components
- Step-by-step guide for adding new algorithms
- Development setup and build instructions
- Testing requirements
- Code style guidelines
Apache License 2.0 - see LICENSE for details.
