alert-mgmt-01: k8s foundation by sradco · Pull Request #787 · openshift/monitoring-plugin

sradco · 2026-02-25T12:52:41Z

Alert Management API — Part 1/8: k8s foundation

Summary

Client factory for dynamic and typed Kubernetes clients
Namespace resolution helpers
Auth context extraction from HTTP requests
Core types (AlertingRuleSource, sortable slices)
Shared vars (label names, configmap keys)
PrometheusRule types and rule parsing/filtering helpers
Feature-flagged API skeleton: GET /api/v1/alerting/health stub (returns 501 Not Implemented) to make the intended API shape/call-path reviewable early

Dependencies

This PR is part of a stacked series. Please review in order.

→ This PR — k8s foundation + health stub skeleton
Pending — alert listing/query and filter primitives
Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD
Pending — management read paths (alerts, rules)
Pending — management write paths (create, delete, bulk update)
Pending — management API router + server wiring (replaces the stub with real handlers)
Pending — documentation, CI workflow, e2e tests
Pending — single alert rule update + delete-by-ID

Summary by CodeRabbit

Release Notes

Chores
- Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.
New Features
- Added cluster connectivity verification and comprehensive monitoring health checks.
- Implemented Prometheus alert and rule management functionality.
- Added support for namespace-level cluster monitoring configuration.
- Enhanced alerting system with alert relabeling and rule management capabilities.

coderabbitai · 2026-02-25T12:57:00Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 094a00dd-f778-450b-8e02-848acef5dc1a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

The PR upgrades core Kubernetes and Prometheus dependencies to v0.34.2+ while introducing a comprehensive Kubernetes client abstraction layer with managers for Prometheus rules, namespace monitoring, alerting health, and alert configuration management.

Changes

Cohort / File(s)	Summary
Dependency Upgrades `go.mod`	Upgraded core Kubernetes APIs (k8s.io/api, k8s.io/client-go, k8s.io/apiserver) from v0.31.x/v0.30.x to v0.34.2; added Prometheus (v1.23.2), OpenShift (api, client-go), go-openapi swag modules, and expanded testing/integration dependencies.
Core Client Infrastructure `pkg/k8s/auth_context.go`, `pkg/k8s/client_factory.go`, `pkg/k8s/client.go`	Introduced context-based bearer token storage, public client factory NewClient(), and internal client implementation with clientset wiring and manager initialization for Kubernetes, Prometheus, and OpenShift resources.
Type Definitions & Interfaces `pkg/k8s/types.go`, `pkg/k8s/prometheus_rules_types.go`	Defined public Client interface with subsystem aggregation (alerting, Prometheus, namespaces, config maps) and supporting interfaces (PrometheusAlertsInterface, PrometheusRuleInterface, AlertingRuleInterface, RelabeledRulesInterface); added Prometheus alerting API response types (PrometheusRuleGroup, PrometheusRuleAlert).
Manager Implementations `pkg/k8s/namespace.go`, `pkg/k8s/prometheus_rule.go`	Implemented namespaceManager using SharedIndexInformer to track cluster-monitoring-labeled namespaces and prometheusRuleManager with List, Get, Update, Delete, and AddRule operations for PrometheusRule Kubernetes resources.
Configuration Constants `pkg/k8s/vars.go`	Added 27 public constants defining default namespaces, route names, API paths, port numbers, and alert source/backend identifiers for cluster monitoring and alerting infrastructure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 With whiskers twitching, a client takes form,
Kubernetes calls, Prometheus warms,
Alerts and rules in harmony bound,
New managers dance, new interfaces found!
Hop-hop, the cluster now speaks as one.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title 'alert-mgmt-01: k8s foundation' directly corresponds to the PR's main objective of adding foundational Kubernetes infrastructure for alert management, as evidenced by the additions of client factory, namespace helpers, auth context, types, and PrometheusRule managers.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (1)

pkg/k8s/client.go (1)
57-62: routeClientset is created but not stored on the client struct.

The routeClientset is only passed to newPrometheusAlerts. If future methods on client need route access, it won't be available. This is fine if prometheusAlerts is the only consumer, but worth noting for awareness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/k8s/client.go` around lines 57 - 62, The client constructor builds a
routeClientset but doesn't store it on the client struct; update the client
struct (type client) to include a routeClientset field and assign the created
routeClientset to that field in the constructor where c := &client{...} is
created, so routeClientset is available for future methods; keep passing
routeClientset into newPrometheusAlerts as before but also store it on the
client instance (reference symbols: routeClientset, client struct,
newPrometheusAlerts, prometheusAlerts).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Around line 12-13: Update the Prometheus endpoint constants in vars.go so they
include the required "/api" prefix: change the PrometheusAlertsPath and
PrometheusRulesPath constants from "/v1/alerts" and "/v1/rules" to
"/api/v1/alerts" and "/api/v1/rules" respectively; locate the symbols
PrometheusAlertsPath and PrometheusRulesPath in vars.go and modify their string
values to the corrected paths.

In `@pkg/k8s/client.go`:
- Around line 36-94: The Client interface methods ConfigMaps()
ConfigMapInterface and AlertingHealth(ctx context.Context) (AlertingHealth,
error) are declared but not implemented on the *client type; add receiver
methods on *client named ConfigMaps and AlertingHealth that return the
appropriate values (for ConfigMaps return the configmap manager/implementation
instance held on the client or create one similarly to namespaceManager; for
AlertingHealth call or create the alert-health checker using existing clients
like monitoringv1clientset or prometheusAlerts and return (AlertingHealth,
error)). Ensure the method signatures exactly match the interface (ConfigMaps()
ConfigMapInterface and AlertingHealth(ctx context.Context) (AlertingHealth,
error)) so the compile-time assertion var _ Client = (*client)(nil) succeeds.

In `@pkg/k8s/namespace.go`:
- Around line 69-75: WaitForNamedCacheSync call's boolean return value is
ignored: after starting nm.informer.Run(ctx.Done()) you must check the result of
cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(),
nm.informer.HasSynced) and if it returns false, return an error instead of
returning nm nil; update the function that constructs/returns nm to propagate a
descriptive error (e.g., "namespace informer failed to sync") when
WaitForNamedCacheSync returns false so the caller knows the informer never
synced.

In `@pkg/k8s/prometheus_rule.go`:
- Around line 21-39: Change newPrometheusRuleManager to return
(*prometheusRuleManager, error), check the boolean result of
cache.WaitForNamedCacheSync and return an error if it returns false (e.g.,
context cancelled or sync failed) instead of discarding it; update the function
signature and its return statements (prometheusRuleManager constructor) and
adjust the caller in pkg/k8s/client.go to handle the returned error (as shown in
other managers like newNamespaceManager/newAlertRelabelConfigManager) so
creation fails fast when the informer never syncs.
- Around line 82-89: The Delete handler in prometheusRuleManager (func Delete)
returns an error that only mentions the resource name; update the error
formatting to include both namespace and name (same pattern used in
Update/AddRule) by changing the fmt.Errorf call to include namespace and name in
the message so logs show "failed to delete PrometheusRule <namespace>/<name>:
%w".
- Around line 45-58: The List method prometheusRuleManager.List currently
ignores the namespace parameter and returns all items from
prm.informer.GetStore().List(); update it to filter the returned PrometheusRule
objects by namespace: iterate prs as before, assert item to
*monitoringv1.PrometheusRule (pr), then skip entries where pr.GetNamespace() !=
namespace when namespace is non-empty (and if you want to preserve previous
behavior, treat an empty namespace as "no filtering" to return all). Ensure the
function still returns a slice of monitoringv1.PrometheusRule built from the
filtered items and returns nil error.

In `@pkg/k8s/types.go`:
- Around line 157-159: The IsClusterMonitoringNamespace method currently returns
only bool which conflates “not cluster-monitoring” with lookup/read errors;
update the NamespaceInterface by changing IsClusterMonitoringNamespace(name
string) bool to IsClusterMonitoringNamespace(name string) (bool, error) and then
update all implementations of that method (and callers) to return (true, nil) or
(false, nil) for a definitive negative result and to return (false, err) when
namespace retrieval/label read fails so errors are propagated rather than
silently treated as false.
- Around line 147-150: RelabeledRulesInterface currently has List(ctx
context.Context) []monitoringv1.Rule and Get(ctx context.Context, id string)
(monitoringv1.Rule, bool) which cannot surface backend or context errors; change
the contract so List returns ([]monitoringv1.Rule, error) and Get returns
(monitoringv1.Rule, bool, error) (or alternatively (*monitoringv1.Rule, error)
if preferred), then update all implementations of RelabeledRulesInterface and
their callers to propagate and handle the error return (including
context/cancellation and backend failure cases) so failures are not silently
swallowed; update unit tests and any call sites that assume the old signatures
accordingly.

---

Nitpick comments:
In `@pkg/k8s/client.go`:
- Around line 57-62: The client constructor builds a routeClientset but doesn't
store it on the client struct; update the client struct (type client) to include
a routeClientset field and assign the created routeClientset to that field in
the constructor where c := &client{...} is created, so routeClientset is
available for future methods; keep passing routeClientset into
newPrometheusAlerts as before but also store it on the client instance
(reference symbols: routeClientset, client struct, newPrometheusAlerts,
prometheusAlerts).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2ce787b and 2c24949.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (9)

go.mod
pkg/k8s/auth_context.go
pkg/k8s/client.go
pkg/k8s/client_factory.go
pkg/k8s/namespace.go
pkg/k8s/prometheus_rule.go
pkg/k8s/prometheus_rules_types.go
pkg/k8s/types.go
pkg/k8s/vars.go

go.mod

pkg/k8s/client.go

pkg/k8s/namespace.go

coderabbitai · 2026-02-25T12:57:04Z

pkg/k8s/prometheus_rule.go

+func newPrometheusRuleManager(ctx context.Context, clientset *monitoringv1client.Clientset) *prometheusRuleManager {
+	informer := cache.NewSharedIndexInformer(
+		prometheusRuleListWatchForAllNamespaces(clientset),
+		&monitoringv1.PrometheusRule{},
+		0,
+		cache.Indexers{},
+	)
+
+	go informer.Run(ctx.Done())
+
+	cache.WaitForNamedCacheSync("PrometheusRule informer", ctx.Done(),
+		informer.HasSynced,
+	)
+
+	return &prometheusRuleManager{
+		clientset: clientset,
+		informer:  informer,
+	}
+}


⚠️ Potential issue | 🟠 Major

WaitForNamedCacheSync return value is discarded; no error returned to caller.

WaitForNamedCacheSync returns false if synchronization fails (e.g., context cancelled), but the result is ignored and newPrometheusRuleManager does not return an error. Other managers in this PR (newNamespaceManager, newAlertRelabelConfigManager) return (*manager, error), but this one only returns *prometheusRuleManager. If the cache never syncs, operations will silently work with stale or empty data.

🐛 Proposed fix: return an error on sync failure

-func newPrometheusRuleManager(ctx context.Context, clientset *monitoringv1client.Clientset) *prometheusRuleManager { +func newPrometheusRuleManager(ctx context.Context, clientset *monitoringv1client.Clientset) (*prometheusRuleManager, error) { informer := cache.NewSharedIndexInformer( prometheusRuleListWatchForAllNamespaces(clientset), &monitoringv1.PrometheusRule{}, 0, cache.Indexers{}, ) go informer.Run(ctx.Done()) - cache.WaitForNamedCacheSync("PrometheusRule informer", ctx.Done(), + if !cache.WaitForNamedCacheSync("PrometheusRule informer", ctx.Done(), informer.HasSynced, - ) + ) { + return nil, fmt.Errorf("failed to sync PrometheusRule informer cache") + } return &prometheusRuleManager{ clientset: clientset, informer: informer, - } + }, nil }

Note: This change requires updating the call site in pkg/k8s/client.go (Line 64) to handle the error:

c.prometheusRuleManager, err = newPrometheusRuleManager(ctx, monitoringv1clientset) if err != nil { return nil, fmt.Errorf("failed to create prometheus rule manager: %w", err) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/k8s/prometheus_rule.go` around lines 21 - 39, Change newPrometheusRuleManager to return (*prometheusRuleManager, error), check the boolean result of cache.WaitForNamedCacheSync and return an error if it returns false (e.g., context cancelled or sync failed) instead of discarding it; update the function signature and its return statements (prometheusRuleManager constructor) and adjust the caller in pkg/k8s/client.go to handle the returned error (as shown in other managers like newNamespaceManager/newAlertRelabelConfigManager) so creation fails fast when the informer never syncs.

coderabbitai · 2026-02-25T12:57:04Z

pkg/k8s/prometheus_rule.go

+func (prm *prometheusRuleManager) List(ctx context.Context, namespace string) ([]monitoringv1.PrometheusRule, error) {
+	prs := prm.informer.GetStore().List()
+
+	prometheusRules := make([]monitoringv1.PrometheusRule, 0, len(prs))
+	for _, item := range prs {
+		pr, ok := item.(*monitoringv1.PrometheusRule)
+		if !ok {
+			continue
+		}
+		prometheusRules = append(prometheusRules, *pr)
+	}
+
+	return prometheusRules, nil
+}


⚠️ Potential issue | 🔴 Critical

Bug: List ignores the namespace parameter.

The namespace parameter is accepted but never used. The method returns all PrometheusRule resources from the informer store regardless of namespace. This silently returns a superset of results when callers expect namespace-scoped results.

🐛 Proposed fix to filter by namespace

func (prm *prometheusRuleManager) List(ctx context.Context, namespace string) ([]monitoringv1.PrometheusRule, error) { prs := prm.informer.GetStore().List() prometheusRules := make([]monitoringv1.PrometheusRule, 0, len(prs)) for _, item := range prs { pr, ok := item.(*monitoringv1.PrometheusRule) if !ok { continue } + if namespace != "" && pr.Namespace != namespace { + continue + } prometheusRules = append(prometheusRules, *pr) } return prometheusRules, nil }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/k8s/prometheus_rule.go` around lines 45 - 58, The List method prometheusRuleManager.List currently ignores the namespace parameter and returns all items from prm.informer.GetStore().List(); update it to filter the returned PrometheusRule objects by namespace: iterate prs as before, assert item to *monitoringv1.PrometheusRule (pr), then skip entries where pr.GetNamespace() != namespace when namespace is non-empty (and if you want to preserve previous behavior, treat an empty namespace as "no filtering" to return all). Ensure the function still returns a slice of monitoringv1.PrometheusRule built from the filtered items and returns nil error.

pkg/k8s/prometheus_rule.go

coderabbitai · 2026-02-25T12:57:04Z

pkg/k8s/types.go

+	List(ctx context.Context) []monitoringv1.Rule
+
+	// Get retrieves the relabeled rule for a given id
+	Get(ctx context.Context, id string) (monitoringv1.Rule, bool)


⚠️ Potential issue | 🟠 Major

Expose failures in RelabeledRulesInterface reads.

At Line [147] and Line [150], the API can’t report failures even though calls are context-aware. This forces implementations to swallow backend/config errors and can present incorrect empty data.

💡 Suggested contract update

type RelabeledRulesInterface interface { // List retrieves the relabeled rules for a given PrometheusRule - List(ctx context.Context) []monitoringv1.Rule + List(ctx context.Context) ([]monitoringv1.Rule, error) // Get retrieves the relabeled rule for a given id - Get(ctx context.Context, id string) (monitoringv1.Rule, bool) + Get(ctx context.Context, id string) (monitoringv1.Rule, bool, error) // Config returns the list of alert relabel configs Config() []*relabel.Config }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/k8s/types.go` around lines 147 - 150, RelabeledRulesInterface currently has List(ctx context.Context) []monitoringv1.Rule and Get(ctx context.Context, id string) (monitoringv1.Rule, bool) which cannot surface backend or context errors; change the contract so List returns ([]monitoringv1.Rule, error) and Get returns (monitoringv1.Rule, bool, error) (or alternatively (*monitoringv1.Rule, error) if preferred), then update all implementations of RelabeledRulesInterface and their callers to propagate and handle the error return (including context/cancellation and backend failure cases) so failures are not silently swallowed; update unit tests and any call sites that assume the old signatures accordingly.

pkg/k8s/types.go

openshift-ci-robot · 2026-02-25T13:26:15Z

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Add the foundational k8s layer for alert management:

Client factory for dynamic and typed Kubernetes clients

Namespace resolution helpers

Auth context extraction from HTTP requests

Core types (AlertingRuleSource, sortable slices)

Shared vars (label names, configmap keys)

PrometheusRule types and rule parsing/filtering helpers

Summary by CodeRabbit

Release Notes

Chores

Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

New Features

Added cluster connectivity verification and comprehensive monitoring health checks.

Implemented Prometheus alert and rule management functionality.

Added support for namespace-level cluster monitoring configuration.

Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-02-25T13:26:24Z

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

Client factory for dynamic and typed Kubernetes clients

Namespace resolution helpers

Auth context extraction from HTTP requests

Core types (AlertingRuleSource, sortable slices)

Shared vars (label names, configmap keys)

PrometheusRule types and rule parsing/filtering helpers

Dependencies

This PR is part of a stacked series. Please review in order.

→ This PR — k8s foundation

Pending — alert listing/query and filter primitives

Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD

Pending — management read paths (alerts, rules)

Pending — management write paths (create, delete, bulk update)

Pending — management API router + server wiring

Pending — documentation, CI workflow, e2e tests

Pending — single alert rule update + delete-by-ID

Files

pkg/k8s/client_factory.go, client.go, namespace.go, auth_context.go, types.go, vars.go, prometheus_rules_types.go, prometheus_rule.go, go.mod, go.sum

Summary by CodeRabbit

Release Notes

Chores

Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

New Features

Added cluster connectivity verification and comprehensive monitoring health checks.

Implemented Prometheus alert and rule management functionality.

Added support for namespace-level cluster monitoring configuration.

Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

pkg/k8s/const.go

pkg/k8s/prometheus_rules_types.go

pkg/k8s/namespace.go

openshift-ci-robot · 2026-02-25T17:40:36Z

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

Client factory for dynamic and typed Kubernetes clients

Namespace resolution helpers

Auth context extraction from HTTP requests

Core types (AlertingRuleSource, sortable slices)

Shared vars (label names, configmap keys)

PrometheusRule types and rule parsing/filtering helpers

Dependencies

This PR is part of a stacked series. Please review in order.

→ This PR — k8s foundation

Pending — alert listing/query and filter primitives

Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD

Pending — management read paths (alerts, rules)

Pending — management write paths (create, delete, bulk update)

Pending — management API router + server wiring

Pending — documentation, CI workflow, e2e tests

Pending — single alert rule update + delete-by-ID

Files

pkg/k8s/client_factory.go, client.go, namespace.go, auth_context.go, types.go, vars.go, prometheus_rules_types.go, prometheus_rule.go, go.mod, go.sum

Summary by CodeRabbit

Release Notes

Chores

Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

New Features

Added cluster connectivity verification and comprehensive monitoring health checks.

Implemented Prometheus alert and rule management functionality.

Added support for namespace-level cluster monitoring configuration.

Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-02-25T17:48:44Z

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

Client factory for dynamic and typed Kubernetes clients

Namespace resolution helpers

Auth context extraction from HTTP requests

Core types (AlertingRuleSource, sortable slices)

Shared vars (label names, configmap keys)

PrometheusRule types and rule parsing/filtering helpers

Dependencies

This PR is part of a stacked series. Please review in order.

→ This PR — k8s foundation

Pending — alert listing/query and filter primitives

Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD

Pending — management read paths (alerts, rules)

Pending — management write paths (create, delete, bulk update)

Pending — management API router + server wiring

Pending — documentation, CI workflow, e2e tests

Pending — single alert rule update + delete-by-ID

Files

pkg/k8s/client_factory.go, client.go, namespace.go, auth_context.go, types.go, vars.go, prometheus_rules_types.go, prometheus_rule.go, go.mod, go.sum

Summary by CodeRabbit

Release Notes

Chores

Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

New Features

Added cluster connectivity verification and comprehensive monitoring health checks.

Implemented Prometheus alert and rule management functionality.

Added support for namespace-level cluster monitoring configuration.

Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-02-25T17:50:53Z

@sradco: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/verify-deps	`0ed1db6`	link	true	`/test verify-deps`
ci/prow/images	`0ed1db6`	link	true	`/test images`
ci/prow/okd-scos-images	`0ed1db6`	link	true	`/test okd-scos-images`
ci/prow/translations	`0ed1db6`	link	true	`/test translations`
ci/prow/lint	`0ed1db6`	link	true	`/test lint`
ci/prow/e2e-aws-ovn	`0ed1db6`	link	true	`/test e2e-aws-ovn`
ci/prow/periodics-images	`0ed1db6`	link	true	`/test periodics-images`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot · 2026-02-25T17:58:37Z

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

Client factory for dynamic and typed Kubernetes clients

Namespace resolution helpers

Auth context extraction from HTTP requests

Core types (AlertingRuleSource, sortable slices)

Shared vars (label names, configmap keys)

PrometheusRule types and rule parsing/filtering helpers

Feature-flagged API skeleton: GET /api/v1/alerting/health stub (returns 501 Not Implemented) to make the intended API shape/call-path reviewable early

Dependencies

This PR is part of a stacked series. Please review in order.

→ This PR — k8s foundation + health stub skeleton

Pending — alert listing/query and filter primitives

Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD

Pending — management read paths (alerts, rules)

Pending — management write paths (create, delete, bulk update)

Pending — management API router + server wiring (replaces the stub with real handlers)

Pending — documentation, CI workflow, e2e tests

Pending — single alert rule update + delete-by-ID

Files

pkg/k8s/client_factory.go, client.go, namespace.go, auth_context.go, types.go, vars.go, prometheus_rules_types.go, prometheus_rule.go, pkg/server.go, cmd/plugin-backend.go, go.mod, go.sum

Summary by CodeRabbit

Release Notes

Chores

Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

New Features

Added cluster connectivity verification and comprehensive monitoring health checks.

Implemented Prometheus alert and rule management functionality.

Added support for namespace-level cluster monitoring configuration.

Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-02-25T19:31:07Z

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

Client factory for dynamic and typed Kubernetes clients

Namespace resolution helpers

Auth context extraction from HTTP requests

Core types (AlertingRuleSource, sortable slices)

Shared vars (label names, configmap keys)

PrometheusRule types and rule parsing/filtering helpers

Feature-flagged API skeleton: GET /api/v1/alerting/health stub (returns 501 Not Implemented) to make the intended API shape/call-path reviewable early

Dependencies

This PR is part of a stacked series. Please review in order.

→ This PR — k8s foundation + health stub skeleton

Pending — alert listing/query and filter primitives

Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD

Pending — management read paths (alerts, rules)

Pending — management write paths (create, delete, bulk update)

Pending — management API router + server wiring (replaces the stub with real handlers)

Pending — documentation, CI workflow, e2e tests

Pending — single alert rule update + delete-by-ID

Summary by CodeRabbit

Release Notes

Chores

Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

New Features

Added cluster connectivity verification and comprehensive monitoring health checks.

Implemented Prometheus alert and rule management functionality.

Added support for namespace-level cluster monitoring configuration.

Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

sradco · 2026-02-25T22:57:39Z

@jgbernalp , @jan--f , @avlitman , @machacekondra Please review this PR.

pkg/k8s/prometheus_rule.go

jgbernalp · 2026-03-03T10:01:37Z

pkg/k8s/namespace.go

+
+	go nm.informer.Run(ctx.Done())
+
+	cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(),


should we handle the error returned here?

pkg/k8s/types.go

…nt API Add the foundational k8s layer for alert management: - Client factory for dynamic and typed Kubernetes clients - Namespace resolution helpers - Auth context extraction from HTTP requests - Core types (AlertingRuleSource, sortable slices) - Shared vars (label names, configmap keys) - PrometheusRule types and rule parsing/filtering helpers Signed-off-by: Shirly Radco <sradco@redhat.com> Signed-off-by: João Vilaça <jvilaca@redhat.com> Signed-off-by: Aviv Litman <alitman@redhat.com> Signed-off-by: machadovilaca <machadovilaca@gmail.com> Co-authored-by: AI Assistant <noreply@cursor.com>

sradco · 2026-03-05T10:46:52Z

@simonpasquier, @jgbernalp thank you for the reviews. I fixed the issues you raised.
Is there PR ready to be merged?

simonpasquier

/lgtm

openshift-ci · 2026-03-05T14:05:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: simonpasquier, sradco
Once this PR has been reviewed and has the lgtm label, please assign peteryurkovich for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from PeterYurkovich and zhuje February 25, 2026 12:53

sradco changed the title ~~k8s: add client factory, namespaces, auth context, base types and PrometheusRule parsing~~ alert-mgmt-01-k8s-foundation Feb 25, 2026

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

sradco changed the base branch from main to alerts-management-api February 25, 2026 13:21

sradco changed the title ~~alert-mgmt-01-k8s-foundation~~ alert-mgmt-01: k8s foundation Feb 25, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 25, 2026

simonpasquier reviewed Feb 25, 2026

View reviewed changes

pkg/k8s/const.go Outdated Show resolved Hide resolved

pkg/k8s/const.go Show resolved Hide resolved

pkg/k8s/const.go Show resolved Hide resolved

pkg/k8s/prometheus_rules_types.go Show resolved Hide resolved

pkg/k8s/namespace.go Show resolved Hide resolved

sradco force-pushed the alert-mgmt-01-k8s-foundation branch from 3f23644 to ef9bb18 Compare February 25, 2026 17:33

sradco changed the base branch from alerts-management-api to main February 25, 2026 17:40

sradco force-pushed the alert-mgmt-01-k8s-foundation branch from ef9bb18 to 0ed1db6 Compare February 25, 2026 17:48

sradco changed the base branch from main to alerts-management-api February 25, 2026 17:48

sradco force-pushed the alert-mgmt-01-k8s-foundation branch from 2bf65d3 to e38bbf8 Compare February 25, 2026 18:18

This was referenced Feb 25, 2026

alert-mgmt-02: k8s: add alert listing/query and filter primitives sradco/monitoring-plugin#1

Open

alert-mgmt-03: relabel config, alerting health & AlertingRule CRD sradco/monitoring-plugin#2

Open

sradco force-pushed the alert-mgmt-01-k8s-foundation branch 2 times, most recently from fb22c01 to 10099ad Compare February 25, 2026 22:52

sradco force-pushed the alert-mgmt-01-k8s-foundation branch 4 times, most recently from ab9515e to a82ec2e Compare February 27, 2026 11:57

sradco force-pushed the alert-mgmt-01-k8s-foundation branch 2 times, most recently from 0480720 to 1cbfb7d Compare February 27, 2026 13:12

sradco force-pushed the alert-mgmt-01-k8s-foundation branch from 1cbfb7d to 789cbb0 Compare March 2, 2026 10:04

sradco mentioned this pull request Mar 2, 2026

alert-mgmt-05: GET /health endpoint sradco/monitoring-plugin#4

Open

jgbernalp reviewed Mar 3, 2026

View reviewed changes

pkg/k8s/prometheus_rule.go Outdated Show resolved Hide resolved

jgbernalp reviewed Mar 3, 2026

View reviewed changes

pkg/k8s/types.go Outdated Show resolved Hide resolved

sradco force-pushed the alert-mgmt-01-k8s-foundation branch from 789cbb0 to ea94fbb Compare March 5, 2026 10:44

simonpasquier reviewed Mar 5, 2026

View reviewed changes

openshift-ci bot assigned simonpasquier Mar 5, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2026


		go nm.informer.Run(ctx.Done())

		cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(),

Conversation

sradco commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alert Management API — Part 1/8: k8s foundation

Summary

Dependencies

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 25, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

openshift-ci-robot commented Feb 25, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alert Management API — Part 1/8: k8s foundation

Summary

Dependencies

Files

Summary by CodeRabbit

Release Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 25, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alert Management API — Part 1/8: k8s foundation

Summary

Dependencies

Files

Summary by CodeRabbit

Release Notes

Uh oh!

openshift-ci-robot commented Feb 25, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alert Management API — Part 1/8: k8s foundation

Summary

Dependencies

Files

Summary by CodeRabbit

Release Notes

Uh oh!

openshift-ci bot commented Feb 25, 2026

Uh oh!

openshift-ci-robot commented Feb 25, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alert Management API — Part 1/8: k8s foundation

Summary

Dependencies

Files

Summary by CodeRabbit

sradco commented Feb 25, 2026 •

edited

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

openshift-ci-robot commented Feb 25, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 25, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 25, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 25, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 25, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 25, 2026 •

edited by openshift-ci bot

Loading