Skip to content

monitortest failures caused by duplicate interval entries being processed incorrectly by operatorloganalyzer after leader re-election #30351

@lyarwood

Description

@lyarwood

We are seeing monitortest failures that appear to be caused by duplicate interval entries being processed incorrectly by the operatorloganalyzer after leader re-elections occur after an initial deployment:

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/69955/rehearse-69955-periodic-ci-openshift-ovn-kubernetes-release-4.21-periodics-e2e-metal-ipi-ovn-bgp-virt-ipv4/1975452464802959360

{  failed during interval construction
missing acquiring stage for namespace/openshift-cnv node/master-2 pod/virt-operator-6bd67758f-w44bv uid/a47fcdac-971d-4c93-bd88-384045b0e0c5 container/virt-operator: all intervals
	Oct 08 04:39:16.932 - 1s    I namespace/openshift-cnv node/master-2 pod/virt-operator-6bd67758f-w44bv uid/a47fcdac-971d-4c93-bd88-384045b0e0c5 container/virt-operator reason/StartedAcquiring I1008 04:39:16.932872       1 leaderelection.go:257] attempting to acquire leader lease openshift-cnv/virt-operator...
	Oct 08 04:39:16.932 - 1s    I namespace/openshift-cnv node/master-2 pod/virt-operator-6bd67758f-w44bv uid/a47fcdac-971d-4c93-bd88-384045b0e0c5 container/virt-operator reason/StartedAcquiring I1008 04:39:16.932872       1 leaderelection.go:257] attempting to acquire leader lease openshift-cnv/virt-operator...
	Oct 08 04:39:16.944 - 1s    I namespace/openshift-cnv node/master-2 pod/virt-operator-6bd67758f-w44bv uid/a47fcdac-971d-4c93-bd88-384045b0e0c5 container/virt-operator reason/Acquired I1008 04:39:16.944721       1 leaderelection.go:271] successfully acquired lease openshift-cnv/virt-operator
	Oct 08 04:39:16.944 - 1s    I namespace/openshift-cnv node/master-2 pod/virt-operator-6bd67758f-w44bv uid/a47fcdac-971d-4c93-bd88-384045b0e0c5 container/virt-operator reason/Acquired I1008 04:39:16.944721       1 leaderelection.go:271] successfully acquired lease openshift-cnv/virt-operator}

We (KubeVirt) are looking into the cause of these re-elections, specifically the failure to renew the original lease caused by our client hitting a rate limit and timing out below:

virt-operator: error retrieving resource lock openshift-cnv/virt-operator: client rate limiter Wait returned an error: context deadline exceeded
kubevirt/kubevirt#15835

That said the actual leader re-election looks correct within the pod logs and shouldn't trigger the monitortest failure AFAICT:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-ovn-kubernetes-release-4.21-periodics-e2e-metal-ipi-ovn-bgp-virt-dualstack/1975758174606594048/artifacts/e2e-metal-ipi-ovn-bgp-virt-dualstack/gather-extra/artifacts/pods/openshift-cnv_virt-operator-6bd67758f-w44bv_virt-operator_previous.log | grep leaderelection
I1008 04:17:14.519215       1 leaderelection.go:257] attempting to acquire leader lease openshift-cnv/virt-operator...
I1008 04:17:14.528850       1 leaderelection.go:271] successfully acquired lease openshift-cnv/virt-operator
E1008 04:39:10.071315       1 leaderelection.go:429] Failed to update lock optimistically: Put "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-cnv/leases/virt-operator": context deadline exceeded, falling back to slow path
E1008 04:39:10.071629       1 leaderelection.go:436] error retrieving resource lock openshift-cnv/virt-operator: client rate limiter Wait returned an error: context deadline exceeded
I1008 04:39:10.071699       1 leaderelection.go:297] failed to renew lease openshift-cnv/virt-operator: context deadline exceeded
{"component":"virt-operator","level":"info","msg":"leaderelection lost","pos":"stdlib.go:105","timestamp":"2025-10-08T04:39:10.071890Z","ts":"2025/10/08 04:39:10"}

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-ovn-kubernetes-release-4.21-periodics-e2e-metal-ipi-ovn-bgp-virt-dualstack/1975758174606594048/artifacts/e2e-metal-ipi-ovn-bgp-virt-dualstack/gather-extra/artifacts/pods/openshift-cnv_virt-operator-6bd67758f-w44bv_virt-operator.log | grep leaderelection
I1008 04:39:16.932872       1 leaderelection.go:257] attempting to acquire leader lease openshift-cnv/virt-operator...
I1008 04:39:16.944721       1 leaderelection.go:271] successfully acquired lease openshift-cnv/virt-operator

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions