Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Alcamech The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@Alcamech: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| - Invalid PDBs could block node drains | ||
| - Manual interventions detected | ||
|
|
||
| **Alert**: `UpgradeClusterCheckFailedSRE` (paging) |
There was a problem hiding this comment.
Do we have this alert?
I remember we didn't implement this alert.
|
|
||
| **Paging Alerts Tracked** (from `pkg/metrics/metrics.go:74-81`): | ||
| - `UpgradeConfigValidationFailedSRE` | ||
| - `UpgradeClusterCheckFailedSRE` |
There was a problem hiding this comment.
I don't remember we have this alert
| - `UpgradeControlPlaneUpgradeTimeoutSRE` | ||
| - `UpgradeNodeUpgradeTimeoutSRE` | ||
| - `UpgradeNodeDrainFailedSRE` | ||
|
|
There was a problem hiding this comment.
UpgradeStateNotificationFailureSRE this alert is missing
What type of PR is this?
Documentation
What this PR does / why we need it?
Adds an "Adding New Metrics" guide to docs/metrics.md with step-by-step instructions for defining, registering, and verifying Prometheus metrics locally.
Adds an Metrics Tracding guide to docs/metrics-tracing.md that provides a comprehensive mapping of all Prometheus metrics
Which Jira/Github issue(s) this PR fixes?
OCM-19740