[1.34–1.35] Powered‑off node stays Ready indefinitely; Lease expired but controller actions stall until kubelite restart

#### **Summary**
I have 3 nodes, dmhost1, dmhost2 and dmhost3. In this example I've powered down dmhost1. 
On MicroK8s 1.34 and 1.35, a powered‑off node can remain Ready=True indefinitely (pods still show Running) even though the node’s Lease renewTime is stale (kubelet offline). The problem resolves immediately after restarting microk8s.daemon-kubelite on a healthy node, suggesting a kubelite/controller‑manager watch or write‑path stall in the node‑lifecycle reconciliation.
This is reproducible and appears to be a regression (did not observe in earlier MicroK8s releases, eg 1.32).
To me looks like a kubelite level stall, or stuck not watching leases

**Environment**

MicroK8s channels: 1.35/stable (also reproducible on 1.34/stable)
Kubernetes: v1.35.0 (server & client)
HA: Yes — dqlite with 3 voters
OS: Ubuntu 22.04.5 LTS (kernel 5.15.0-164-generic)
Container runtime: containerd 2.1.3
CNI: Calico
API service endpoints :
default/kubernetes -> 10.173.128.165:16443, 10.173.128.166:16443

MicroK8s status
high-availability: yes
  datastore master nodes:
    10.173.128.164:19001
    10.173.128.166:19001
    10.173.128.165:19001

Nodes (example):
NAME      STATUS   ROLES    VERSION   INTERNAL-IP
dmhost1   Ready    <none>   v1.35.0   10.173.128.164
dmhost2   Ready    <none>   v1.35.0   10.173.128.165
dmhost3   Ready    <none>   v1.35.0   10.173.128.166

Log Snippets:
labuser@dmhost2:~$ date
Wed  4 Feb 12:01:18 UTC 2026
labuser@dmhost2:~$ ping dmhost1
PING dmhost1 (10.173.128.164) 56(84) bytes of data.
From dmhost2 (10.173.128.165) icmp_seq=1 Destination Host Unreachable
From dmhost2 (10.173.128.165) icmp_seq=2 Destination Host Unreachable
From dmhost2 (10.173.128.165) icmp_seq=3 Destination Host Unreachable
^C
--- dmhost1 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3029ms
pipe 3
labuser@dmhost2:~$ microk8s kubectl get lease -n kube-node-lease dmhost1 -o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2026-01-19T19:03:15Z"
  name: dmhost1
  namespace: kube-node-lease
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: dmhost1
    uid: 9dfa6bcc-5c7b-4405-b648-c4f5edca51b4
  resourceVersion: "7004700"
  uid: 6b13c86b-c57c-4056-a435-760351b52d7f
spec:
  holderIdentity: dmhost1
  leaseDurationSeconds: 40
  renewTime: "2026-02-04T10:46:27.257957Z"


Lease:
  HolderIdentity:  dmhost1
  AcquireTime:     <unset>
  RenewTime:       Wed, 04 Feb 2026 10:46:27 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 02 Feb 2026 18:43:35 +0000   Mon, 02 Feb 2026 18:43:35 +0000   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Wed, 04 Feb 2026 10:46:35 +0000   Mon, 02 Feb 2026 17:46:01 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Wed, 04 Feb 2026 10:46:35 +0000   Mon, 02 Feb 2026 17:46:01 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Wed, 04 Feb 2026 10:46:35 +0000   Mon, 02 Feb 2026 17:46:01 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Wed, 04 Feb 2026 10:46:35 +0000   Mon, 02 Feb 2026 17:50:00 +0000   KubeletReady                 kubelet is posting ready status

labuser@dmhost2:~$ microk8s kubectl get lease -n kube-system kube-controller-manager -o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2026-01-19T19:03:05Z"
  name: kube-controller-manager
  namespace: kube-system
  resourceVersion: "7021853"
  uid: 9ecf9c97-2af1-4575-92d4-c2bab5144ec4
spec:
  acquireTime: "2026-02-04T10:47:43.452780Z"
  holderIdentity: dmhost3_9cb98c2e-2d0b-4404-99c4-5613e7eed79a
  leaseDurationSeconds: 60
  leaseTransitions: 18
  renewTime: "2026-02-04T12:02:44.917919Z"

####Observed behavior

Powered‑off node Lease shows stale renewTime:
HolderIdentity:  dmhost1
RenewTime:       2026-02-04T10:46:27Z   # stale long after shutdown

Node conditions remain Ready=True long after the Lease expired:
Type: Ready   Status: True
LastHeartbeatTime: 2026-02-04T10:46:35Z
Reason: KubeletReady

Controller‑manager leader Lease healthy and renewing:
holderIdentity: dmhost3_9cb98c2e-...
leaseDurationSeconds: 60
renewTime: 2026-02-04T11:39:57Z

kubelite logs around the stuck period include repeated timeouts updating Leases and Node status (sample excerpt):
E0204 10:54:11 writers.go:123] "Unhandled Error" err="apiserver was unable to write a JSON response: http: Handler timeout"
E0204 10:54:18 controller.go:251] "Failed to update lease" err="Put \"https://127.0.0.1:16443/apis/coordination.k8s.io/.../leases/kube-controller-manager\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
E0204 10:54:21 kubelet_node_status.go:474] "Error updating node status, will retry" err="failed to patch status ... http: Handler timeout"
E0204 10:54:31 kubelet_node_status.go:461] "Unable to update node status" err="update node status exceeds retry count"
E0204 10:54:18 timeout.go:140] "Post-timeout activity" method="PUT" path="/apis/coordination.k8s.io/v1/.../leases/..."
{"logger":"etcd-client","msg":"retrying of unary invoke"}  # repeated (MicroK8s logs label the KV retry layer as etcd; cluster is dqlite-backed)

After:
sudo snap restart microk8s.daemon-kubelite

→ The node promptly becomes NotReady and eviction resumes.

######Expected behavior
Once a node’s Lease expires (kubelet stopped) and heartbeats are missed beyond the grace period, the node‑lifecycle controller should mark the node NotReady within ~1 minute and begin standard eviction timing (subject to tolerations).

#####Actual behavior

Node remains Ready=True indefinitely (observed >1 hour).
Lease is stale, confirming kubelet is offline.
Controller‑manager leader is healthy, but internal controller actions (Lease/Node status updates) time out and do not progress until kubelite is restarted.


#####mpact

False health state for failed nodes.
Evictions do not trigger; workloads are not rescheduled.
Failure drills and real outages appear healthy when they are not.


#### Reproduction Steps
Yes, I can reproduce. Typically I fail a node, that works correctly, I bring it back and wait for >30 mins to allow to settle. Subsequent node failures may produce this issue.

Start with a healthy 3‑node HA MicroK8s cluster on 1.34 or 1.35 (dqlite 3 voters).
Confirm controller‑manager leadership:
Shellmicrok8s kubectl get lease -n kube-system kube-controller-manager -o yamlShow more lines

Power off one node at the host level (e.g., dmhost1).
Wait well beyond node-monitor-grace-period (e.g., >60–120s; we observed >60 minutes).
Observe:

The powered‑off node remains Ready=True.
Pods on that node still show Running.

Check the node Lease:
Shellmicrok8s kubectl get lease -n kube-node-lease dmhost1 -o yamlShow more lines
→ renewTime is stale (stops around the power‑off time), confirming kubelet is offline.
Check the controller‑manager leader Lease:
Shellmicrok8s kubectl get lease -n kube-system kube-controller-manager -o yamlShow more lines
→ holderIdentity present; renewTime fresh (CM is alive and leading).
Workaround: Restart kubelite:
Shellsudo snap restart microk8s.daemon-kubeliteShow more lines
→ Within ~60 seconds, the powered‑off node flips to NotReady and normal eviction behavior resumes.

#### Introspection Report


#### Can you suggest a fix?


#### Are you interested in contributing with a fix?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.34–1.35] Powered‑off node stays Ready indefinitely; Lease expired but controller actions stall until kubelite restart #5386

Summary

Reproduction Steps

Introspection Report

Can you suggest a fix?

Are you interested in contributing with a fix?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[1.34–1.35] Powered‑off node stays Ready indefinitely; Lease expired but controller actions stall until kubelite restart #5386

Description

Summary

Reproduction Steps

Introspection Report

Can you suggest a fix?

Are you interested in contributing with a fix?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions