Skip to content

alert-mgmt-01: k8s foundation#787

Open
sradco wants to merge 1 commit intoopenshift:alerts-management-apifrom
sradco:alert-mgmt-01-k8s-foundation
Open

alert-mgmt-01: k8s foundation#787
sradco wants to merge 1 commit intoopenshift:alerts-management-apifrom
sradco:alert-mgmt-01-k8s-foundation

Conversation

@sradco
Copy link

@sradco sradco commented Feb 25, 2026

Alert Management API — Part 1/8: k8s foundation

Summary

  • Client factory for dynamic and typed Kubernetes clients
  • Namespace resolution helpers
  • Auth context extraction from HTTP requests
  • Core types (AlertingRuleSource, sortable slices)
  • Shared vars (label names, configmap keys)
  • PrometheusRule types and rule parsing/filtering helpers
  • Feature-flagged API skeleton: GET /api/v1/alerting/health stub (returns 501 Not Implemented) to make the intended API shape/call-path reviewable early

Dependencies

This PR is part of a stacked series. Please review in order.

  1. → This PR — k8s foundation + health stub skeleton
  2. Pending — alert listing/query and filter primitives
  3. Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD
  4. Pending — management read paths (alerts, rules)
  5. Pending — management write paths (create, delete, bulk update)
  6. Pending — management API router + server wiring (replaces the stub with real handlers)
  7. Pending — documentation, CI workflow, e2e tests
  8. Pending — single alert rule update + delete-by-ID

Summary by CodeRabbit

Release Notes

  • Chores

    • Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.
  • New Features

    • Added cluster connectivity verification and comprehensive monitoring health checks.
    • Implemented Prometheus alert and rule management functionality.
    • Added support for namespace-level cluster monitoring configuration.
    • Enhanced alerting system with alert relabeling and rule management capabilities.

@sradco sradco changed the title k8s: add client factory, namespaces, auth context, base types and PrometheusRule parsing alert-mgmt-01-k8s-foundation Feb 25, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 094a00dd-f778-450b-8e02-848acef5dc1a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

The PR upgrades core Kubernetes and Prometheus dependencies to v0.34.2+ while introducing a comprehensive Kubernetes client abstraction layer with managers for Prometheus rules, namespace monitoring, alerting health, and alert configuration management.

Changes

Cohort / File(s) Summary
Dependency Upgrades
go.mod
Upgraded core Kubernetes APIs (k8s.io/api, k8s.io/client-go, k8s.io/apiserver) from v0.31.x/v0.30.x to v0.34.2; added Prometheus (v1.23.2), OpenShift (api, client-go), go-openapi swag modules, and expanded testing/integration dependencies.
Core Client Infrastructure
pkg/k8s/auth_context.go, pkg/k8s/client_factory.go, pkg/k8s/client.go
Introduced context-based bearer token storage, public client factory NewClient(), and internal client implementation with clientset wiring and manager initialization for Kubernetes, Prometheus, and OpenShift resources.
Type Definitions & Interfaces
pkg/k8s/types.go, pkg/k8s/prometheus_rules_types.go
Defined public Client interface with subsystem aggregation (alerting, Prometheus, namespaces, config maps) and supporting interfaces (PrometheusAlertsInterface, PrometheusRuleInterface, AlertingRuleInterface, RelabeledRulesInterface); added Prometheus alerting API response types (PrometheusRuleGroup, PrometheusRuleAlert).
Manager Implementations
pkg/k8s/namespace.go, pkg/k8s/prometheus_rule.go
Implemented namespaceManager using SharedIndexInformer to track cluster-monitoring-labeled namespaces and prometheusRuleManager with List, Get, Update, Delete, and AddRule operations for PrometheusRule Kubernetes resources.
Configuration Constants
pkg/k8s/vars.go
Added 27 public constants defining default namespaces, route names, API paths, port numbers, and alert source/backend identifiers for cluster monitoring and alerting infrastructure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 With whiskers twitching, a client takes form,
Kubernetes calls, Prometheus warms,
Alerts and rules in harmony bound,
New managers dance, new interfaces found!
Hop-hop, the cluster now speaks as one.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title 'alert-mgmt-01: k8s foundation' directly corresponds to the PR's main objective of adding foundational Kubernetes infrastructure for alert management, as evidenced by the additions of client factory, namespace helpers, auth context, types, and PrometheusRule managers.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
pkg/k8s/client.go (1)

57-62: routeClientset is created but not stored on the client struct.

The routeClientset is only passed to newPrometheusAlerts. If future methods on client need route access, it won't be available. This is fine if prometheusAlerts is the only consumer, but worth noting for awareness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/k8s/client.go` around lines 57 - 62, The client constructor builds a
routeClientset but doesn't store it on the client struct; update the client
struct (type client) to include a routeClientset field and assign the created
routeClientset to that field in the constructor where c := &client{...} is
created, so routeClientset is available for future methods; keep passing
routeClientset into newPrometheusAlerts as before but also store it on the
client instance (reference symbols: routeClientset, client struct,
newPrometheusAlerts, prometheusAlerts).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Around line 12-13: Update the Prometheus endpoint constants in vars.go so they
include the required "/api" prefix: change the PrometheusAlertsPath and
PrometheusRulesPath constants from "/v1/alerts" and "/v1/rules" to
"/api/v1/alerts" and "/api/v1/rules" respectively; locate the symbols
PrometheusAlertsPath and PrometheusRulesPath in vars.go and modify their string
values to the corrected paths.

In `@pkg/k8s/client.go`:
- Around line 36-94: The Client interface methods ConfigMaps()
ConfigMapInterface and AlertingHealth(ctx context.Context) (AlertingHealth,
error) are declared but not implemented on the *client type; add receiver
methods on *client named ConfigMaps and AlertingHealth that return the
appropriate values (for ConfigMaps return the configmap manager/implementation
instance held on the client or create one similarly to namespaceManager; for
AlertingHealth call or create the alert-health checker using existing clients
like monitoringv1clientset or prometheusAlerts and return (AlertingHealth,
error)). Ensure the method signatures exactly match the interface (ConfigMaps()
ConfigMapInterface and AlertingHealth(ctx context.Context) (AlertingHealth,
error)) so the compile-time assertion var _ Client = (*client)(nil) succeeds.

In `@pkg/k8s/namespace.go`:
- Around line 69-75: WaitForNamedCacheSync call's boolean return value is
ignored: after starting nm.informer.Run(ctx.Done()) you must check the result of
cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(),
nm.informer.HasSynced) and if it returns false, return an error instead of
returning nm nil; update the function that constructs/returns nm to propagate a
descriptive error (e.g., "namespace informer failed to sync") when
WaitForNamedCacheSync returns false so the caller knows the informer never
synced.

In `@pkg/k8s/prometheus_rule.go`:
- Around line 21-39: Change newPrometheusRuleManager to return
(*prometheusRuleManager, error), check the boolean result of
cache.WaitForNamedCacheSync and return an error if it returns false (e.g.,
context cancelled or sync failed) instead of discarding it; update the function
signature and its return statements (prometheusRuleManager constructor) and
adjust the caller in pkg/k8s/client.go to handle the returned error (as shown in
other managers like newNamespaceManager/newAlertRelabelConfigManager) so
creation fails fast when the informer never syncs.
- Around line 82-89: The Delete handler in prometheusRuleManager (func Delete)
returns an error that only mentions the resource name; update the error
formatting to include both namespace and name (same pattern used in
Update/AddRule) by changing the fmt.Errorf call to include namespace and name in
the message so logs show "failed to delete PrometheusRule <namespace>/<name>:
%w".
- Around line 45-58: The List method prometheusRuleManager.List currently
ignores the namespace parameter and returns all items from
prm.informer.GetStore().List(); update it to filter the returned PrometheusRule
objects by namespace: iterate prs as before, assert item to
*monitoringv1.PrometheusRule (pr), then skip entries where pr.GetNamespace() !=
namespace when namespace is non-empty (and if you want to preserve previous
behavior, treat an empty namespace as "no filtering" to return all). Ensure the
function still returns a slice of monitoringv1.PrometheusRule built from the
filtered items and returns nil error.

In `@pkg/k8s/types.go`:
- Around line 157-159: The IsClusterMonitoringNamespace method currently returns
only bool which conflates “not cluster-monitoring” with lookup/read errors;
update the NamespaceInterface by changing IsClusterMonitoringNamespace(name
string) bool to IsClusterMonitoringNamespace(name string) (bool, error) and then
update all implementations of that method (and callers) to return (true, nil) or
(false, nil) for a definitive negative result and to return (false, err) when
namespace retrieval/label read fails so errors are propagated rather than
silently treated as false.
- Around line 147-150: RelabeledRulesInterface currently has List(ctx
context.Context) []monitoringv1.Rule and Get(ctx context.Context, id string)
(monitoringv1.Rule, bool) which cannot surface backend or context errors; change
the contract so List returns ([]monitoringv1.Rule, error) and Get returns
(monitoringv1.Rule, bool, error) (or alternatively (*monitoringv1.Rule, error)
if preferred), then update all implementations of RelabeledRulesInterface and
their callers to propagate and handle the error return (including
context/cancellation and backend failure cases) so failures are not silently
swallowed; update unit tests and any call sites that assume the old signatures
accordingly.

---

Nitpick comments:
In `@pkg/k8s/client.go`:
- Around line 57-62: The client constructor builds a routeClientset but doesn't
store it on the client struct; update the client struct (type client) to include
a routeClientset field and assign the created routeClientset to that field in
the constructor where c := &client{...} is created, so routeClientset is
available for future methods; keep passing routeClientset into
newPrometheusAlerts as before but also store it on the client instance
(reference symbols: routeClientset, client struct, newPrometheusAlerts,
prometheusAlerts).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2ce787b and 2c24949.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (9)
  • go.mod
  • pkg/k8s/auth_context.go
  • pkg/k8s/client.go
  • pkg/k8s/client_factory.go
  • pkg/k8s/namespace.go
  • pkg/k8s/prometheus_rule.go
  • pkg/k8s/prometheus_rules_types.go
  • pkg/k8s/types.go
  • pkg/k8s/vars.go

Comment on lines +21 to +39
func newPrometheusRuleManager(ctx context.Context, clientset *monitoringv1client.Clientset) *prometheusRuleManager {
informer := cache.NewSharedIndexInformer(
prometheusRuleListWatchForAllNamespaces(clientset),
&monitoringv1.PrometheusRule{},
0,
cache.Indexers{},
)

go informer.Run(ctx.Done())

cache.WaitForNamedCacheSync("PrometheusRule informer", ctx.Done(),
informer.HasSynced,
)

return &prometheusRuleManager{
clientset: clientset,
informer: informer,
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

WaitForNamedCacheSync return value is discarded; no error returned to caller.

WaitForNamedCacheSync returns false if synchronization fails (e.g., context cancelled), but the result is ignored and newPrometheusRuleManager does not return an error. Other managers in this PR (newNamespaceManager, newAlertRelabelConfigManager) return (*manager, error), but this one only returns *prometheusRuleManager. If the cache never syncs, operations will silently work with stale or empty data.

🐛 Proposed fix: return an error on sync failure
-func newPrometheusRuleManager(ctx context.Context, clientset *monitoringv1client.Clientset) *prometheusRuleManager {
+func newPrometheusRuleManager(ctx context.Context, clientset *monitoringv1client.Clientset) (*prometheusRuleManager, error) {
 	informer := cache.NewSharedIndexInformer(
 		prometheusRuleListWatchForAllNamespaces(clientset),
 		&monitoringv1.PrometheusRule{},
 		0,
 		cache.Indexers{},
 	)
 
 	go informer.Run(ctx.Done())
 
-	cache.WaitForNamedCacheSync("PrometheusRule informer", ctx.Done(),
+	if !cache.WaitForNamedCacheSync("PrometheusRule informer", ctx.Done(),
 		informer.HasSynced,
-	)
+	) {
+		return nil, fmt.Errorf("failed to sync PrometheusRule informer cache")
+	}
 
 	return &prometheusRuleManager{
 		clientset: clientset,
 		informer:  informer,
-	}
+	}, nil
 }

Note: This change requires updating the call site in pkg/k8s/client.go (Line 64) to handle the error:

c.prometheusRuleManager, err = newPrometheusRuleManager(ctx, monitoringv1clientset)
if err != nil {
    return nil, fmt.Errorf("failed to create prometheus rule manager: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/k8s/prometheus_rule.go` around lines 21 - 39, Change
newPrometheusRuleManager to return (*prometheusRuleManager, error), check the
boolean result of cache.WaitForNamedCacheSync and return an error if it returns
false (e.g., context cancelled or sync failed) instead of discarding it; update
the function signature and its return statements (prometheusRuleManager
constructor) and adjust the caller in pkg/k8s/client.go to handle the returned
error (as shown in other managers like
newNamespaceManager/newAlertRelabelConfigManager) so creation fails fast when
the informer never syncs.

Comment on lines +45 to +58
func (prm *prometheusRuleManager) List(ctx context.Context, namespace string) ([]monitoringv1.PrometheusRule, error) {
prs := prm.informer.GetStore().List()

prometheusRules := make([]monitoringv1.PrometheusRule, 0, len(prs))
for _, item := range prs {
pr, ok := item.(*monitoringv1.PrometheusRule)
if !ok {
continue
}
prometheusRules = append(prometheusRules, *pr)
}

return prometheusRules, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: List ignores the namespace parameter.

The namespace parameter is accepted but never used. The method returns all PrometheusRule resources from the informer store regardless of namespace. This silently returns a superset of results when callers expect namespace-scoped results.

🐛 Proposed fix to filter by namespace
 func (prm *prometheusRuleManager) List(ctx context.Context, namespace string) ([]monitoringv1.PrometheusRule, error) {
 	prs := prm.informer.GetStore().List()
 
 	prometheusRules := make([]monitoringv1.PrometheusRule, 0, len(prs))
 	for _, item := range prs {
 		pr, ok := item.(*monitoringv1.PrometheusRule)
 		if !ok {
 			continue
 		}
+		if namespace != "" && pr.Namespace != namespace {
+			continue
+		}
 		prometheusRules = append(prometheusRules, *pr)
 	}
 
 	return prometheusRules, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/k8s/prometheus_rule.go` around lines 45 - 58, The List method
prometheusRuleManager.List currently ignores the namespace parameter and returns
all items from prm.informer.GetStore().List(); update it to filter the returned
PrometheusRule objects by namespace: iterate prs as before, assert item to
*monitoringv1.PrometheusRule (pr), then skip entries where pr.GetNamespace() !=
namespace when namespace is non-empty (and if you want to preserve previous
behavior, treat an empty namespace as "no filtering" to return all). Ensure the
function still returns a slice of monitoringv1.PrometheusRule built from the
filtered items and returns nil error.

pkg/k8s/types.go Outdated
Comment on lines +147 to +150
List(ctx context.Context) []monitoringv1.Rule

// Get retrieves the relabeled rule for a given id
Get(ctx context.Context, id string) (monitoringv1.Rule, bool)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Expose failures in RelabeledRulesInterface reads.

At Line [147] and Line [150], the API can’t report failures even though calls are context-aware. This forces implementations to swallow backend/config errors and can present incorrect empty data.

💡 Suggested contract update
 type RelabeledRulesInterface interface {
 	// List retrieves the relabeled rules for a given PrometheusRule
-	List(ctx context.Context) []monitoringv1.Rule
+	List(ctx context.Context) ([]monitoringv1.Rule, error)

 	// Get retrieves the relabeled rule for a given id
-	Get(ctx context.Context, id string) (monitoringv1.Rule, bool)
+	Get(ctx context.Context, id string) (monitoringv1.Rule, bool, error)

 	// Config returns the list of alert relabel configs
 	Config() []*relabel.Config
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/k8s/types.go` around lines 147 - 150, RelabeledRulesInterface currently
has List(ctx context.Context) []monitoringv1.Rule and Get(ctx context.Context,
id string) (monitoringv1.Rule, bool) which cannot surface backend or context
errors; change the contract so List returns ([]monitoringv1.Rule, error) and Get
returns (monitoringv1.Rule, bool, error) (or alternatively (*monitoringv1.Rule,
error) if preferred), then update all implementations of RelabeledRulesInterface
and their callers to propagate and handle the error return (including
context/cancellation and backend failure cases) so failures are not silently
swallowed; update unit tests and any call sites that assume the old signatures
accordingly.

@sradco sradco changed the base branch from main to alerts-management-api February 25, 2026 13:21
@sradco sradco changed the title alert-mgmt-01-k8s-foundation alert-mgmt-01: k8s foundation Feb 25, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 25, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Add the foundational k8s layer for alert management:

  • Client factory for dynamic and typed Kubernetes clients
  • Namespace resolution helpers
  • Auth context extraction from HTTP requests
  • Core types (AlertingRuleSource, sortable slices)
  • Shared vars (label names, configmap keys)
  • PrometheusRule types and rule parsing/filtering helpers

Summary by CodeRabbit

Release Notes

  • Chores

  • Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

  • New Features

  • Added cluster connectivity verification and comprehensive monitoring health checks.

  • Implemented Prometheus alert and rule management functionality.

  • Added support for namespace-level cluster monitoring configuration.

  • Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

  • Client factory for dynamic and typed Kubernetes clients
  • Namespace resolution helpers
  • Auth context extraction from HTTP requests
  • Core types (AlertingRuleSource, sortable slices)
  • Shared vars (label names, configmap keys)
  • PrometheusRule types and rule parsing/filtering helpers

Dependencies

This PR is part of a stacked series. Please review in order.

  1. → This PR — k8s foundation
  2. Pending — alert listing/query and filter primitives
  3. Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD
  4. Pending — management read paths (alerts, rules)
  5. Pending — management write paths (create, delete, bulk update)
  6. Pending — management API router + server wiring
  7. Pending — documentation, CI workflow, e2e tests
  8. Pending — single alert rule update + delete-by-ID

Files

pkg/k8s/client_factory.go, client.go, namespace.go, auth_context.go, types.go, vars.go, prometheus_rules_types.go, prometheus_rule.go, go.mod, go.sum

Summary by CodeRabbit

Release Notes

  • Chores

  • Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

  • New Features

  • Added cluster connectivity verification and comprehensive monitoring health checks.

  • Implemented Prometheus alert and rule management functionality.

  • Added support for namespace-level cluster monitoring configuration.

  • Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sradco sradco force-pushed the alert-mgmt-01-k8s-foundation branch from 3f23644 to ef9bb18 Compare February 25, 2026 17:33
@sradco sradco changed the base branch from alerts-management-api to main February 25, 2026 17:40
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

  • Client factory for dynamic and typed Kubernetes clients
  • Namespace resolution helpers
  • Auth context extraction from HTTP requests
  • Core types (AlertingRuleSource, sortable slices)
  • Shared vars (label names, configmap keys)
  • PrometheusRule types and rule parsing/filtering helpers

Dependencies

This PR is part of a stacked series. Please review in order.

  1. → This PR — k8s foundation
  2. Pending — alert listing/query and filter primitives
  3. Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD
  4. Pending — management read paths (alerts, rules)
  5. Pending — management write paths (create, delete, bulk update)
  6. Pending — management API router + server wiring
  7. Pending — documentation, CI workflow, e2e tests
  8. Pending — single alert rule update + delete-by-ID

Files

pkg/k8s/client_factory.go, client.go, namespace.go, auth_context.go, types.go, vars.go, prometheus_rules_types.go, prometheus_rule.go, go.mod, go.sum

Summary by CodeRabbit

Release Notes

  • Chores

  • Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

  • New Features

  • Added cluster connectivity verification and comprehensive monitoring health checks.

  • Implemented Prometheus alert and rule management functionality.

  • Added support for namespace-level cluster monitoring configuration.

  • Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sradco sradco force-pushed the alert-mgmt-01-k8s-foundation branch from ef9bb18 to 0ed1db6 Compare February 25, 2026 17:48
@sradco sradco changed the base branch from main to alerts-management-api February 25, 2026 17:48
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

  • Client factory for dynamic and typed Kubernetes clients
  • Namespace resolution helpers
  • Auth context extraction from HTTP requests
  • Core types (AlertingRuleSource, sortable slices)
  • Shared vars (label names, configmap keys)
  • PrometheusRule types and rule parsing/filtering helpers

Dependencies

This PR is part of a stacked series. Please review in order.

  1. → This PR — k8s foundation
  2. Pending — alert listing/query and filter primitives
  3. Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD
  4. Pending — management read paths (alerts, rules)
  5. Pending — management write paths (create, delete, bulk update)
  6. Pending — management API router + server wiring
  7. Pending — documentation, CI workflow, e2e tests
  8. Pending — single alert rule update + delete-by-ID

Files

pkg/k8s/client_factory.go, client.go, namespace.go, auth_context.go, types.go, vars.go, prometheus_rules_types.go, prometheus_rule.go, go.mod, go.sum

Summary by CodeRabbit

Release Notes

  • Chores

  • Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

  • New Features

  • Added cluster connectivity verification and comprehensive monitoring health checks.

  • Implemented Prometheus alert and rule management functionality.

  • Added support for namespace-level cluster monitoring configuration.

  • Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 25, 2026

@sradco: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-deps 0ed1db6 link true /test verify-deps
ci/prow/images 0ed1db6 link true /test images
ci/prow/okd-scos-images 0ed1db6 link true /test okd-scos-images
ci/prow/translations 0ed1db6 link true /test translations
ci/prow/lint 0ed1db6 link true /test lint
ci/prow/e2e-aws-ovn 0ed1db6 link true /test e2e-aws-ovn
ci/prow/periodics-images 0ed1db6 link true /test periodics-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

  • Client factory for dynamic and typed Kubernetes clients
  • Namespace resolution helpers
  • Auth context extraction from HTTP requests
  • Core types (AlertingRuleSource, sortable slices)
  • Shared vars (label names, configmap keys)
  • PrometheusRule types and rule parsing/filtering helpers
  • Feature-flagged API skeleton: GET /api/v1/alerting/health stub (returns 501 Not Implemented) to make the intended API shape/call-path reviewable early

Dependencies

This PR is part of a stacked series. Please review in order.

  1. → This PR — k8s foundation + health stub skeleton
  2. Pending — alert listing/query and filter primitives
  3. Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD
  4. Pending — management read paths (alerts, rules)
  5. Pending — management write paths (create, delete, bulk update)
  6. Pending — management API router + server wiring (replaces the stub with real handlers)
  7. Pending — documentation, CI workflow, e2e tests
  8. Pending — single alert rule update + delete-by-ID

Files

pkg/k8s/client_factory.go, client.go, namespace.go, auth_context.go, types.go, vars.go, prometheus_rules_types.go, prometheus_rule.go, pkg/server.go, cmd/plugin-backend.go, go.mod, go.sum

Summary by CodeRabbit

Release Notes

  • Chores

  • Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

  • New Features

  • Added cluster connectivity verification and comprehensive monitoring health checks.

  • Implemented Prometheus alert and rule management functionality.

  • Added support for namespace-level cluster monitoring configuration.

  • Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sradco sradco force-pushed the alert-mgmt-01-k8s-foundation branch from 2bf65d3 to e38bbf8 Compare February 25, 2026 18:18
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2026

@sradco: This pull request references mgmt-01 which is a valid jira issue.

Details

In response to this:

Alert Management API — Part 1/8: k8s foundation

Summary

  • Client factory for dynamic and typed Kubernetes clients
  • Namespace resolution helpers
  • Auth context extraction from HTTP requests
  • Core types (AlertingRuleSource, sortable slices)
  • Shared vars (label names, configmap keys)
  • PrometheusRule types and rule parsing/filtering helpers
  • Feature-flagged API skeleton: GET /api/v1/alerting/health stub (returns 501 Not Implemented) to make the intended API shape/call-path reviewable early

Dependencies

This PR is part of a stacked series. Please review in order.

  1. → This PR — k8s foundation + health stub skeleton
  2. Pending — alert listing/query and filter primitives
  3. Pending — relabel config, relabeled rules, alerting health, AlertingRule CRD
  4. Pending — management read paths (alerts, rules)
  5. Pending — management write paths (create, delete, bulk update)
  6. Pending — management API router + server wiring (replaces the stub with real handlers)
  7. Pending — documentation, CI workflow, e2e tests
  8. Pending — single alert rule update + delete-by-ID

Summary by CodeRabbit

Release Notes

  • Chores

  • Updated core Kubernetes, Prometheus, OpenAPI, and OpenShift dependencies to the latest versions, expanding platform compatibility.

  • New Features

  • Added cluster connectivity verification and comprehensive monitoring health checks.

  • Implemented Prometheus alert and rule management functionality.

  • Added support for namespace-level cluster monitoring configuration.

  • Enhanced alerting system with alert relabeling and rule management capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sradco
Copy link
Author

sradco commented Feb 25, 2026

@jgbernalp , @jan--f , @avlitman , @machacekondra Please review this PR.

@sradco sradco force-pushed the alert-mgmt-01-k8s-foundation branch 4 times, most recently from ab9515e to a82ec2e Compare February 27, 2026 11:57

go nm.informer.Run(ctx.Done())

cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we handle the error returned here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

…nt API

Add the foundational k8s layer for alert management:
- Client factory for dynamic and typed Kubernetes clients
- Namespace resolution helpers
- Auth context extraction from HTTP requests
- Core types (AlertingRuleSource, sortable slices)
- Shared vars (label names, configmap keys)
- PrometheusRule types and rule parsing/filtering helpers

Signed-off-by: Shirly Radco <sradco@redhat.com>
Signed-off-by: João Vilaça <jvilaca@redhat.com>
Signed-off-by: Aviv Litman <alitman@redhat.com>
Signed-off-by: machadovilaca <machadovilaca@gmail.com>
Co-authored-by: AI Assistant <noreply@cursor.com>
@sradco sradco force-pushed the alert-mgmt-01-k8s-foundation branch from 789cbb0 to ea94fbb Compare March 5, 2026 10:44
@sradco
Copy link
Author

sradco commented Mar 5, 2026

@simonpasquier, @jgbernalp thank you for the reviews. I fixed the issues you raised.
Is there PR ready to be merged?

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: simonpasquier, sradco
Once this PR has been reviewed and has the lgtm label, please assign peteryurkovich for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants