Add RHOBS API integration and HCP osde2e tests#468
Add RHOBS API integration and HCP osde2e tests#468rpodishe wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rpodishe The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #468 +/- ##
=======================================
Coverage 55.91% 55.91%
=======================================
Files 31 31
Lines 2749 2749
=======================================
Hits 1537 1537
Misses 1120 1120
Partials 92 92 🚀 New features to boost your workflow:
|
1e1697a to
1a6f160
Compare
WalkthroughAdds comprehensive end-to-end test infrastructure in Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Harness
participant Operator as RouteMonitor Operator
participant KubeAPI as Kubernetes API Server
participant MockRHOBS as mockRHOBSServer
Test->>KubeAPI: Create Route/RouteMonitor/HCP Resources
KubeAPI-->>Operator: Inform operator of CR changes
Operator->>KubeAPI: Create Service/ServiceMonitor/PrometheusRule
Operator->>MockRHOBS: POST /probes (create probe) [with tenant header]
MockRHOBS-->>Operator: 201 Created + probe metadata
Operator->>MockRHOBS: GET /probes/{id} (status polling)
MockRHOBS-->>Operator: 200 OK (probe status)
Test->>MockRHOBS: GET /dump-headers (verify tenant isolation)
Test->>Operator: Delete RouteMonitor
Operator->>MockRHOBS: DELETE /probes/{id}
Operator->>KubeAPI: Cleanup dependent resources
Test->>Operator: Exec curl /metrics (via impersonation) -> validate metrics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
> [!CAUTION]
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)test/e2e/route_monitor_operator_tests.go (1)
353-361:⚠️ Potential issue | 🟠 Major
DeferCleanupregistered too late — resources leak on early test failure.
DeferCleanupis called at the very end of theItblock. If any assertion fails before reaching Line 353 (e.g., pod creation fails at Line 231, or any intermediate check), the cleanup function is never registered, leaving the pod, service, and route behind. Move cleanup registration immediately after each resource is created.Proposed fix: register cleanup eagerly after each resource creation
Move cleanup calls right after creation, e.g., after Line 231:
err := k8s.Create(ctx, pod) Expect(err).ShouldNot(HaveOccurred(), "Unable to create a pod") + DeferCleanup(func(ctx context.Context) { + _ = k8s.Delete(ctx, pod) + })Do the same for
svc(after Line 270) andappRoute(after Line 288). Then remove the combinedDeferCleanupblock at Lines 353–361.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/route_monitor_operator_tests.go` around lines 353 - 361, The DeferCleanup is registered too late (only at the end) causing resource leaks on early failures; update the test to register a DeferCleanup immediately after each resource is created: after creating pod (register cleanup to delete pod), after creating svc (register cleanup to delete svc), and after creating appRoute (register cleanup to delete appRoute) using the same deletion calls (Expect(k8s.Delete(...)).Should(Succeed(), ...)) and then remove the combined DeferCleanup block that references pod, svc, and appRoute at the end; keep the cleanup logic and messages but attach them right after their respective creations so pod, svc, and appRoute are always cleaned up even if earlier assertions fail.
🧹 Nitpick comments (3)
test/e2e/route_monitor_operator_tests.go (3)
653-655: Misleading skip message whendynamic.NewForConfigfails.
dynamic.NewForConfigfailing is a client configuration error, not "HostedControlPlane CRD not available." This is an unlikely edge case, but the message could mislead debugging. ConsiderSkip("failed to create impersonating dynamic client: ...")instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/route_monitor_operator_tests.go` around lines 653 - 655, The Skip call currently logs "HostedControlPlane CRD not available..." when dynamic.NewForConfig fails; change this to accurately reflect a client creation error by replacing the message with something like "failed to create impersonating dynamic client: <error>" and include the err value, i.e. update the Skip invocation used after dynamic.NewForConfig to log the real failure (reference dynamic.NewForConfig and Skip in the test/e2e/route_monitor_operator_tests.go file).
509-516: Fragile string matching for SLO verification in PrometheusRule.The check
strings.Contains(ruleJSON, "0.9995") && !strings.Contains(ruleJSON, "0.995)")relies on exact PromQL formatting in the marshalled JSON. The trailing)in"0.995)"is needed to avoid a false match (since"0.9995"contains"0.995"), but any change in PromQL template formatting (whitespace, parenthesization) would silently break this assertion. Consider unmarshalling the PrometheusRule and checking the SLO field value directly, or at least add a comment explaining the substring choice.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/route_monitor_operator_tests.go` around lines 509 - 516, The current fragile check uses strings.Contains(ruleJSON, "0.9995") && !strings.Contains(ruleJSON, "0.995)") on the marshalled rule (ruleJSON) which can break with formatting changes; instead unmarshal ruleJSON into the PrometheusRule structure (or an Unstructured/structured object) and inspect the specific SLO field or the rule's PromQL expression value directly (e.g., locate the rule in the PrometheusRule.Spec.Groups[].Rules[] and read the Expr string) and assert it contains the numeric 0.9995 (or equals the expected expression) rather than relying on substring matching; update the wait loop that currently logs via GinkgoLogr.Info to perform this parsed-check and remove the fragile substring test.
744-746: Unconditional impersonation — inconsistent with the HCP test pattern.Line 746 always sets
backplane-cluster-adminimpersonation without first attempting the default credentials. The HCP test (Lines 643–660) correctly tries default creds first, then falls back to impersonation. Applying the same try-then-fallback approach here would make the test work on more cluster configurations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/route_monitor_operator_tests.go` around lines 744 - 746, The code unconditionally sets cfg.Impersonate to "backplane-cluster-admin"; change it to follow the HCP test pattern: first attempt to use the default credentials from rest.CopyConfig(k8s.GetConfig()) (e.g., try a lightweight API call or client creation with that cfg) and only if that attempt fails, set cfg.Impersonate = rest.ImpersonationConfig{UserName: "backplane-cluster-admin"} and retry; modify the block around the existing cfg := rest.CopyConfig(k8s.GetConfig()) so it tries the default creds, detects failure, then falls back to impersonation for MC access.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/route_monitor_operator_tests.go`:
- Around line 542-550: The cleanup for update test resources (DeferCleanup block
referencing updateTestName, pod, svc, appRoute) is registered too late and can
be skipped if creation/assertion fails; move and register individual
DeferCleanup calls immediately after each resource is created—i.e., right after
the pod creation (register cleanup to delete pod), after the service creation
(register cleanup to delete svc), and after the route/appRoute creation
(register cleanup to delete appRoute)—so each created resource has its own
DeferCleanup that calls k8s.Delete and Expect(...).Should(Succeed(...)) to
ensure removal even on early failures.
- Around line 787-789: The current code calls Skip(fmt.Sprintf(...)) when the
metrics endpoint poll returns err, which hides real regressions; change this to
fail the test (e.g., t.Fatalf or FailNow with a descriptive message) so
unreachable or missing metrics cause CI failures, and only call Skip for the
specific exec-unsupported condition (detect the executor error type/message such
as "exec mechanism unsupported" or a sentinel like ErrExecUnsupported) while
keeping the Skip path limited to that case; update the block around Skip and err
to perform an explicit failure on general errors and reserve Skip for the
unsupported-exec scenario.
- Around line 582-586: The test currently skips the status check if the returned
probe is nil; update the assertion flow to explicitly assert that the result
from client.GetProbe(ctx, "e2e-cluster-123") is non-nil before inspecting
fields. Specifically, after calling GetProbe, add an explicit
Expect(deleted).ShouldNot(BeNil(), "GetProbe should return a probe after
delete") (or equivalent) and only then assert
Expect(deleted.Status).Should(Equal("terminating"), "probe should be terminating
after deletion"); reference the GetProbe call and the deleted variable to locate
where to insert the non-nil assertion.
---
Outside diff comments:
In `@test/e2e/route_monitor_operator_tests.go`:
- Around line 353-361: The DeferCleanup is registered too late (only at the end)
causing resource leaks on early failures; update the test to register a
DeferCleanup immediately after each resource is created: after creating pod
(register cleanup to delete pod), after creating svc (register cleanup to delete
svc), and after creating appRoute (register cleanup to delete appRoute) using
the same deletion calls (Expect(k8s.Delete(...)).Should(Succeed(), ...)) and
then remove the combined DeferCleanup block that references pod, svc, and
appRoute at the end; keep the cleanup logic and messages but attach them right
after their respective creations so pod, svc, and appRoute are always cleaned up
even if earlier assertions fail.
---
Nitpick comments:
In `@test/e2e/route_monitor_operator_tests.go`:
- Around line 653-655: The Skip call currently logs "HostedControlPlane CRD not
available..." when dynamic.NewForConfig fails; change this to accurately reflect
a client creation error by replacing the message with something like "failed to
create impersonating dynamic client: <error>" and include the err value, i.e.
update the Skip invocation used after dynamic.NewForConfig to log the real
failure (reference dynamic.NewForConfig and Skip in the
test/e2e/route_monitor_operator_tests.go file).
- Around line 509-516: The current fragile check uses strings.Contains(ruleJSON,
"0.9995") && !strings.Contains(ruleJSON, "0.995)") on the marshalled rule
(ruleJSON) which can break with formatting changes; instead unmarshal ruleJSON
into the PrometheusRule structure (or an Unstructured/structured object) and
inspect the specific SLO field or the rule's PromQL expression value directly
(e.g., locate the rule in the PrometheusRule.Spec.Groups[].Rules[] and read the
Expr string) and assert it contains the numeric 0.9995 (or equals the expected
expression) rather than relying on substring matching; update the wait loop that
currently logs via GinkgoLogr.Info to perform this parsed-check and remove the
fragile substring test.
- Around line 744-746: The code unconditionally sets cfg.Impersonate to
"backplane-cluster-admin"; change it to follow the HCP test pattern: first
attempt to use the default credentials from rest.CopyConfig(k8s.GetConfig())
(e.g., try a lightweight API call or client creation with that cfg) and only if
that attempt fails, set cfg.Impersonate = rest.ImpersonationConfig{UserName:
"backplane-cluster-admin"} and retry; modify the block around the existing cfg
:= rest.CopyConfig(k8s.GetConfig()) so it tries the default creds, detects
failure, then falls back to impersonation for MC access.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
test/e2e/route_monitor_operator_tests.go
1a6f160 to
dbe5181
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
test/e2e/route_monitor_operator_tests.go (2)
199-361:⚠️ Potential issue | 🟠 MajorRegister cleanup immediately after each create to avoid leaks on early failures.
If any Create/Expect fails before the final
DeferCleanupblock, the pod/service/route can leak and destabilize later tests. Consider registering cleanup right after each Create (as you did in the update test) and remove the end-of-test cleanup block.Suggested fix (move cleanup to creation sites)
err := k8s.Create(ctx, pod) Expect(err).ShouldNot(HaveOccurred(), "Unable to create a pod") +DeferCleanup(func(ctx context.Context) { + By("Deleting test pod " + routeMonitorName) + Expect(k8s.Delete(ctx, pod)).Should(Succeed(), "Failed to delete pod") +}) @@ err = k8s.Create(ctx, svc) Expect(err).ShouldNot(HaveOccurred(), "Unable to create service %s/%s", svc.Namespace, svc.Name) +DeferCleanup(func(ctx context.Context) { + By("Deleting test service " + routeMonitorName) + Expect(k8s.Delete(ctx, svc)).Should(Succeed(), "Failed to delete service") +}) @@ err = k8s.Create(ctx, appRoute) Expect(err).ShouldNot(HaveOccurred(), "Unable to create route %s/%s", appRoute.Namespace, appRoute.Name) +DeferCleanup(func(ctx context.Context) { + By("Deleting test route " + routeMonitorName) + Expect(k8s.Delete(ctx, appRoute)).Should(Succeed(), "Failed to delete route") +}) @@ -DeferCleanup(func(ctx context.Context) { - By("Cleaning up setup") - By("Deleting test pod " + routeMonitorName) - Expect(k8s.Delete(ctx, pod)).Should(Succeed(), "Failed to delete pod") - By("Deleting test service " + routeMonitorName) - Expect(k8s.Delete(ctx, svc)).Should(Succeed(), "Failed to delete service") - By("Deleting test route " + routeMonitorName) - Expect(k8s.Delete(ctx, appRoute)).Should(Succeed(), "Failed to delete route") -})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/route_monitor_operator_tests.go` around lines 199 - 361, Register cleanup immediately after each successful k8s.Create to avoid leaks: after creating pod (k8s.Create(ctx, pod)), service (k8s.Create(ctx, svc)), route (k8s.Create(ctx, appRoute)) and RouteMonitor (k8s.Create(ctx, rmo)) call DeferCleanup with a small closure that deletes that specific resource via k8s.Delete(ctx, <resource>) and asserts success; then remove the final end-of-test DeferCleanup block that deletes all resources at the bottom so cleanup is tied to each creation site and will run even if later Creates/Asserts fail.
637-719:⚠️ Potential issue | 🟡 MinorAvoid skipping the HCP test on unexpected errors.
The current flow skips on any list error after impersonation, which can hide real regressions (e.g., API connectivity issues). Consider skipping only on
NotFound/Forbiddenand failing on other errors.Suggested guardrails for error handling
hcpList, err := dynamicClient.Resource(hcpGVR).Namespace("").List(ctx, metav1.ListOptions{Limit: 1}) if err != nil { - // Retry with impersonation for clusters where default creds can't list + if k8serrors.IsNotFound(err) { + Skip(fmt.Sprintf("HostedControlPlane CRD not available on this cluster: %v", err)) + } + if !k8serrors.IsForbidden(err) { + Expect(err).ShouldNot(HaveOccurred(), "unexpected error listing HostedControlPlanes") + } + // Retry with impersonation for clusters where default creds can't list cfg.Impersonate = rest.ImpersonationConfig{UserName: "backplane-cluster-admin"} dynamicClient, err = dynamic.NewForConfig(cfg) - if err != nil { - Skip(fmt.Sprintf("HostedControlPlane CRD not available on this cluster: %v", err)) - } + Expect(err).ShouldNot(HaveOccurred(), "failed to create dynamic client with impersonation") hcpList, err = dynamicClient.Resource(hcpGVR).Namespace("").List(ctx, metav1.ListOptions{Limit: 1}) if err != nil { - Skip(fmt.Sprintf("HostedControlPlane CRD not available on this cluster: %v", err)) + if k8serrors.IsNotFound(err) || k8serrors.IsForbidden(err) { + Skip(fmt.Sprintf("HostedControlPlane CRD not available on this cluster: %v", err)) + } + Expect(err).ShouldNot(HaveOccurred(), "unexpected error listing HostedControlPlanes with impersonation") } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/route_monitor_operator_tests.go` around lines 637 - 719, The error handling around the initial HostedControlPlane list (calls to dynamicClient.Resource(hcpGVR).Namespace("").List and the retry after setting cfg.Impersonate) is too broad—change the logic so you only call Skip when the error is a NotFound or Forbidden (use k8serrors.IsNotFound(err) || k8serrors.IsForbidden(err)); for any other error return/fail the test (e.g., use Expect(err).ShouldNot(HaveOccurred() or propagate the error) instead of skipping) so genuine failures (network/API issues) surface; update both the first List error branch and the retry List error branch that currently call Skip with the generic message, keeping references to hcpGVR, cfg.Impersonate, dynamicClient, and hcpList.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/e2e/route_monitor_operator_tests.go`:
- Around line 199-361: Register cleanup immediately after each successful
k8s.Create to avoid leaks: after creating pod (k8s.Create(ctx, pod)), service
(k8s.Create(ctx, svc)), route (k8s.Create(ctx, appRoute)) and RouteMonitor
(k8s.Create(ctx, rmo)) call DeferCleanup with a small closure that deletes that
specific resource via k8s.Delete(ctx, <resource>) and asserts success; then
remove the final end-of-test DeferCleanup block that deletes all resources at
the bottom so cleanup is tied to each creation site and will run even if later
Creates/Asserts fail.
- Around line 637-719: The error handling around the initial HostedControlPlane
list (calls to dynamicClient.Resource(hcpGVR).Namespace("").List and the retry
after setting cfg.Impersonate) is too broad—change the logic so you only call
Skip when the error is a NotFound or Forbidden (use k8serrors.IsNotFound(err) ||
k8serrors.IsForbidden(err)); for any other error return/fail the test (e.g., use
Expect(err).ShouldNot(HaveOccurred() or propagate the error) instead of
skipping) so genuine failures (network/API issues) surface; update both the
first List error branch and the retry List error branch that currently call Skip
with the generic message, keeping references to hcpGVR, cfg.Impersonate,
dynamicClient, and hcpList.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
test/e2e/route_monitor_operator_tests.go
|
@rpodishe: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit