Skip to content

SPIRE-450: Fix e2e test nodeSelector/tolerations for OpenShift CI#95

Open
sayak-redhat wants to merge 2 commits intoopenshift:mainfrom
sayak-redhat:SPIRE-450
Open

SPIRE-450: Fix e2e test nodeSelector/tolerations for OpenShift CI#95
sayak-redhat wants to merge 2 commits intoopenshift:mainfrom
sayak-redhat:SPIRE-450

Conversation

@sayak-redhat
Copy link
Contributor

@sayak-redhat sayak-redhat commented Feb 25, 2026

  • Add dual tolerations for both node-role.kubernetes.io/master and node-role.kubernetes.io/control-plane to avoid scheduling issues across Kind and OpenShift CI environments
  • Increase WaitForStatefulSetReady timeout from 5m to 8m for this test

Summary by CodeRabbit

  • Tests
    • E2E tests updated to allow server scheduling on Kubernetes control‑plane nodes by adding the required toleration.
    • Extended E2E test timeout to reduce flakiness and improve stability for longer-running scenarios.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 25, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sayak-redhat: This pull request references SPIRE-450 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Add dual tolerations for both node-role.kubernetes.io/master and node-role.kubernetes.io/control-plane to avoid scheduling issues across Kind and OpenShift CI environments
  • Increase WaitForStatefulSetReady timeout from 5m to 8m for this test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Walkthrough

Added a control-plane toleration to the SPIRE Server pod spec in the end-to-end test and increased the test default timeout from 5 to 8 minutes.

Changes

Cohort / File(s) Summary
E2E Test Configuration
test/e2e/e2e_test.go
Inserted a toleration with key node-role.kubernetes.io/control-plane, operator Exists, and effect NoSchedule into SpireServer.Spec.Tolerations within the e2e test.
E2E Test Constants
test/e2e/utils/constants.go
Updated DefaultTimeout from 5 * time.Minute to 8 * time.Minute to increase the test timeout.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning PR increases DefaultTimeout globally by 60% (5→8 min), affecting 40+ timeout calls across all cluster operations rather than targeting the specific SPIRE Server StatefulSet scheduling issue. Create a dedicated SpireServerReadyTimeout constant instead. Update only the specific WaitForStatefulSetReady call for SPIRE Server to use it while keeping DefaultTimeout at 5 minutes.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly references the main changes: fixing e2e test nodeSelector/tolerations for OpenShift CI, which aligns with the dual toleration addition and timeout increase described in the PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Stable And Deterministic Test Names ✅ Passed Pull request changes do not introduce dynamic information in test titles, maintaining stable and deterministic test names across all test runs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Feb 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sayak-redhat
Once this PR has been reviewed and has the lgtm label, please assign swghosh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sayak-redhat: This pull request references SPIRE-450 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Add dual tolerations for both node-role.kubernetes.io/master and node-role.kubernetes.io/control-plane to avoid scheduling issues across Kind and OpenShift CI environments
  • Increase WaitForStatefulSetReady timeout from 5m to 8m for this test

Summary by CodeRabbit

  • Tests
  • Enhanced end-to-end test configuration for SPIRE Server deployment on Kubernetes control-plane nodes with improved toleration settings.
  • Extended timeout threshold for StatefulSet readiness verification to ensure more reliable test execution and stability.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sayak-redhat: This pull request references SPIRE-450 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Add dual tolerations for both node-role.kubernetes.io/master and node-role.kubernetes.io/control-plane to avoid scheduling issues across Kind and OpenShift CI environments
  • Increase WaitForStatefulSetReady timeout from 5m to 8m for this test

Summary by CodeRabbit

  • Tests
  • Enhanced end-to-end test configuration for SPIRE Server deployment on Kubernetes control-plane nodes with improved toleration settings.
  • Extended timeout threshold for StatefulSet readiness verification to ensure more reliable test execution and stability.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/e2e_test.go`:
- Line 527: Create a new context with an 8-minute timeout and pass that to
utils.WaitForStatefulSetReady instead of the existing testCtx (which uses
DefaultTimeout of 5 minutes); specifically, before calling
WaitForStatefulSetReady(create a context via context.WithTimeout or equivalent)
and use that newCtx when invoking utils.WaitForStatefulSetReady(testCtx ->
newCtx, clientset, utils.SpireServerStatefulSetName, utils.OperatorNamespace,
8*time.Minute), ensuring you cancel the context when done.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between e1faf0a and a50d416.

📒 Files selected for processing (1)
  • test/e2e/e2e_test.go


By("Waiting for SPIRE Server StatefulSet to become Ready")
utils.WaitForStatefulSetReady(testCtx, clientset, utils.SpireServerStatefulSetName, utils.OperatorNamespace, utils.DefaultTimeout)
utils.WaitForStatefulSetReady(testCtx, clientset, utils.SpireServerStatefulSetName, utils.OperatorNamespace, 8*time.Minute)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Check how testCtx is created in e2e test (around Line 75) ==="
sed -n '70,85p' test/e2e/e2e_test.go

echo
echo "=== 2) Find DefaultTimeout definition ==="
rg -n 'DefaultTimeout\s*=' test/e2e/

echo
echo "=== 3) Inspect WaitForStatefulSetReady implementation ==="
rg -n -A20 'func WaitForStatefulSetReady\(' test/e2e/utils/utils.go

echo
echo "=== 4) Check Line 527 and surrounding context ==="
sed -n '520,535p' test/e2e/e2e_test.go

Repository: openshift/zero-trust-workload-identity-manager

Length of output: 3048


Create a dedicated context with 8-minute timeout for WaitForStatefulSetReady at Line 527.

The testCtx has a 5-minute deadline (DefaultTimeout), but this call requests an 8-minute wait. Ginkgo's Eventually respects the context's deadline, so the operation will fail at 5 minutes instead of waiting the requested 8 minutes. Create a separate context with the full 8-minute timeout:

-			utils.WaitForStatefulSetReady(testCtx, clientset, utils.SpireServerStatefulSetName, utils.OperatorNamespace, 8*time.Minute)
+			waitCtx, cancel := context.WithTimeout(context.Background(), 8*time.Minute)
+			DeferCleanup(cancel)
+			utils.WaitForStatefulSetReady(waitCtx, clientset, utils.SpireServerStatefulSetName, utils.OperatorNamespace, 8*time.Minute)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
utils.WaitForStatefulSetReady(testCtx, clientset, utils.SpireServerStatefulSetName, utils.OperatorNamespace, 8*time.Minute)
waitCtx, cancel := context.WithTimeout(context.Background(), 8*time.Minute)
DeferCleanup(cancel)
utils.WaitForStatefulSetReady(waitCtx, clientset, utils.SpireServerStatefulSetName, utils.OperatorNamespace, 8*time.Minute)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/e2e_test.go` at line 527, Create a new context with an 8-minute
timeout and pass that to utils.WaitForStatefulSetReady instead of the existing
testCtx (which uses DefaultTimeout of 5 minutes); specifically, before calling
WaitForStatefulSetReady(create a context via context.WithTimeout or equivalent)
and use that newCtx when invoking utils.WaitForStatefulSetReady(testCtx ->
newCtx, clientset, utils.SpireServerStatefulSetName, utils.OperatorNamespace,
8*time.Minute), ensuring you cancel the context when done.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sayak-redhat: This pull request references SPIRE-450 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Add dual tolerations for both node-role.kubernetes.io/master and node-role.kubernetes.io/control-plane to avoid scheduling issues across Kind and OpenShift CI environments
  • Increase WaitForStatefulSetReady timeout from 5m to 8m for this test

Summary by CodeRabbit

  • Tests
  • E2E tests updated to allow the server to schedule on Kubernetes control-plane nodes by adding the appropriate toleration, improving test coverage for control-plane deployments.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sayak-redhat: This pull request references SPIRE-450 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Add dual tolerations for both node-role.kubernetes.io/master and node-role.kubernetes.io/control-plane to avoid scheduling issues across Kind and OpenShift CI environments
  • Increase WaitForStatefulSetReady timeout from 5m to 8m for this test

Summary by CodeRabbit

  • Tests
  • E2E tests updated to allow server scheduling on Kubernetes control‑plane nodes by adding the necessary toleration.
  • Improves test coverage and robustness for control‑plane deployments, reducing false failures when control‑plane nodes are used for scheduling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sayak-redhat: This pull request references SPIRE-450 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Add dual tolerations for both node-role.kubernetes.io/master and node-role.kubernetes.io/control-plane to avoid scheduling issues across Kind and OpenShift CI environments
  • Increase WaitForStatefulSetReady timeout from 5m to 8m for this test

Summary by CodeRabbit

  • Tests
  • E2E tests updated to allow server scheduling on Kubernetes control‑plane nodes by adding the required toleration.
  • Extended E2E test timeout to reduce flakiness and improve stability for longer-running scenarios.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/utils/constants.go`:
- Line 46: DefaultTimeout was broadened to 8m and impacts many e2e waits; revert
DefaultTimeout back to its original value (5 * time.Minute) in constants.go, add
a new targeted exported constant (e.g., SpireServerStatefulSetReadyTimeout = 8 *
time.Minute) and replace the specific call site that needs the longer wait by
passing SpireServerStatefulSetReadyTimeout to WaitForStatefulSetReady in the e2e
test (the call referencing WaitForStatefulSetReady for the SPIRE Server
StatefulSet in e2e_test.go). Ensure no other call sites use the new constant so
only that single StatefulSet wait is extended.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between bee344f and 1248048.

📒 Files selected for processing (1)
  • test/e2e/utils/constants.go

DefaultInterval = 10 * time.Second
ShortInterval = 5 * time.Second
DefaultTimeout = 5 * time.Minute
DefaultTimeout = 8 * time.Minute
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n --type go 'DefaultTimeout' test/e2e -C 2

Repository: openshift/zero-trust-workload-identity-manager

Length of output: 14952


Global DefaultTimeout change affects 40+ call sites across the e2e test suite, contradicting the PR's stated targeted intent.

The PR description indicates this timeout increase is scoped to a single WaitForStatefulSetReady call for the SPIRE Server StatefulSet. However, DefaultTimeout is an exported package-level constant used throughout the entire e2e test suite—affecting context timeouts, condition waits, StatefulSet readiness checks, DaemonSet availability checks, Deployment updates, and rolling update waits. This 60% timeout increase (5 → 8 minutes) globally extends waits for all e2e tests, potentially masking slow/hanging tests and inflating total CI wall-clock time.

Introduce a dedicated constant instead and use it only at the specific call site:

♻️ Proposed fix: introduce a targeted constant
 	DefaultInterval = 10 * time.Second
 	ShortInterval   = 5 * time.Second
-	DefaultTimeout  = 8 * time.Minute
+	DefaultTimeout             = 5 * time.Minute
+	SpireServerReadyTimeout    = 8 * time.Minute
 	ShortTimeout    = 2 * time.Minute

Then update the specific call site in test/e2e/e2e_test.go:

-WaitForStatefulSetReady(..., utils.DefaultTimeout)
+WaitForStatefulSetReady(..., utils.SpireServerReadyTimeout)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DefaultTimeout = 8 * time.Minute
DefaultInterval = 10 * time.Second
ShortInterval = 5 * time.Second
DefaultTimeout = 5 * time.Minute
SpireServerReadyTimeout = 8 * time.Minute
ShortTimeout = 2 * time.Minute
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/utils/constants.go` at line 46, DefaultTimeout was broadened to 8m
and impacts many e2e waits; revert DefaultTimeout back to its original value (5
* time.Minute) in constants.go, add a new targeted exported constant (e.g.,
SpireServerStatefulSetReadyTimeout = 8 * time.Minute) and replace the specific
call site that needs the longer wait by passing
SpireServerStatefulSetReadyTimeout to WaitForStatefulSetReady in the e2e test
(the call referencing WaitForStatefulSetReady for the SPIRE Server StatefulSet
in e2e_test.go). Ensure no other call sites use the new constant so only that
single StatefulSet wait is extended.

@openshift-ci
Copy link

openshift-ci bot commented Feb 25, 2026

@sayak-redhat: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sayak-redhat
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants