Skip to content

Conversation

@kusumachalasani
Copy link
Contributor

@kusumachalasani kusumachalasani commented Feb 5, 2026

Description

Two prereq scripts for openshift and kind/minikube to generate runtime recommendations.

  1. To enable user workload monitoring on openshift.
  2. To enable kube-state-metrics labels.

Type of change

  • Bug fix
  • New feature
  • Docs update
  • Breaking change (What changes might users need to make in their application due to this PR?)
  • Requires DB changes
  • Script Updates

How has this been tested?

Tested on local environment.

Test Configuration

  • Kubernetes clusters tested on:
    kind, minikube and openshift

Checklist 🎯

  • Followed coding guidelines
  • Comments added
  • Dependent changes merged
  • Documentation updated
  • Tests added or updated

Additional information

Include any additional information such as links, test results, screenshots here

Summary by Sourcery

Add helper scripts to prepare Kubernetes clusters for runtime recommendations by enabling user workload monitoring on OpenShift and configuring kube-state-metrics label collection.

New Features:

  • Introduce a script to enable user workload monitoring via the OpenShift cluster monitoring config.
  • Introduce a script to add kube-state-metrics metric-labels-allowlist arguments, with auto-detection of the deployment when not specified.

Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 5, 2026

Reviewer's Guide

Adds two Bash helper scripts to configure monitoring prerequisites for runtime recommendations by enabling user workload monitoring on OpenShift and updating kube-state-metrics deployments to expose specific pod labels as metrics.

Flow diagram for enable_user_workload_monitoring_openshift.sh script

flowchart TD
  A["Start script"] --> B["Check oc command is available"]
  B -->|missing| Z1["Exit with error about missing oc"]
  B -->|present| C["Check openshift-monitoring namespace exists"]
  C -->|missing| Z2["Exit with error about missing namespace"]
  C -->|exists| D["Get existing cluster-monitoring-config ConfigMap data.config.yaml"]

  D -->|empty or not found| E["Create new cluster-monitoring-config ConfigMap with config.yaml: enableUserWorkload: true"]
  E --> F["Apply ConfigMap using oc apply -f -"]
  F --> G["End script"]

  D -->|exists| H["Check if config.yaml contains enableUserWorkload: true"]
  H -->|already true| I["Print already enabled message"]
  I --> G

  H -->|not enabled| J["Determine if enableUserWorkload key exists"]
  J -->|key exists| K["Replace enableUserWorkload value with true using sed"]
  J -->|key missing| L["Append enableUserWorkload: true to existing config"]

  K --> M["Build updated ConfigMap manifest with preserved settings"]
  L --> M
  M --> N["Apply updated ConfigMap using oc apply -f -"]
  N --> O["Wait for pods in openshift-user-workload-monitoring namespace to become Ready"]
  O --> P["List pods in openshift-user-workload-monitoring namespace"]
  P --> G
Loading

Flow diagram for enable_kube_state_metrics_labels.sh script

flowchart TD
  A["Start script"] --> B["Parse flags: -n namespace, -d deployment, -h help"]
  B -->|help requested| C["Show usage and exit"]
  B -->|continue| D["Check kubectl and jq commands are available"]
  D -->|missing| Z1["Exit with error about missing dependency"]
  D -->|present| E{Are NAMESPACE and DEPLOYMENT both set?}

  E -->|yes| H["Get specified deployment JSON via kubectl -n NAMESPACE get deployment DEPLOYMENT -o json"]
  E -->|no| F["Search all namespaces for deployments with name matching kube-state-metrics"]

  F --> G{Number of matching deployments}
  G -->|0| Z2["Exit with error: no kube-state-metrics deployment found"]
  G -->|>1| Z3["Print list, ask user to specify NAMESPACE and DEPLOYMENT, then exit"]
  G -->|1| H

  H --> I["Use jq to inspect containers named kube-state-metrics and list their args"]
  I --> J{Any arg contains metric-labels-allowlist?}

  J -->|yes| K["Print message: allowlist already present, no change needed"]
  K --> L["End script"]

  J -->|no| M["Patch deployment JSON with jq: append LABEL_ARG to args of kube-state-metrics container"]
  M --> N["Apply modified deployment via kubectl apply -f -"]
  N --> O["Print done message"]
  O --> L
Loading

File-Level Changes

Change Details Files
Introduce an OpenShift script to enable user workload monitoring via the cluster monitoring ConfigMap.
  • Add dependency checks for the oc CLI and presence of the openshift-monitoring namespace.
  • Fetch existing cluster-monitoring-config ConfigMap config.yaml content if present and branch behavior based on its existence.
  • Create the ConfigMap with enableUserWorkload: true when it does not exist.
  • When the ConfigMap exists, update or append enableUserWorkload: true while preserving other YAML content and reapply the ConfigMap.
  • After enabling, wait for pods in the openshift-user-workload-monitoring namespace to become Ready and print their status.
scripts/enable_user_workload_monitoring_openshift.sh
Introduce a Kubernetes-agnostic script to add kube-state-metrics label allowlist arguments to an existing deployment.
  • Parse optional -n (namespace) and -d (deployment) flags and provide a usage helper.
  • Add dependency checks for kubectl and jq.
  • If namespace or deployment not provided, auto-discover a unique kube-state-metrics deployment across all namespaces, erroring on zero or multiple matches.
  • Inspect the kube-state-metrics container args for an existing metric-labels-allowlist flag and exit if already present.
  • Otherwise, patch the deployment JSON via jq to append the desired --metric-labels-allowlist=pods=[app,app.kubernetes.io/layer,version] argument and apply it back with kubectl.
scripts/enable_kube_state_metrics_labels.sh

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In enable_kube_state_metrics_labels.sh, the jq mutation assumes .args already exists on the kube-state-metrics container; consider initializing it (e.g., .| (.args //= []) | .args += [$ARG]) to avoid failures when args are not defined.
  • enable_user_workload_monitoring_openshift.sh fully replaces the cluster-monitoring-config ConfigMap and may drop existing configuration; consider merging with the current config.yaml or at least preserving existing keys instead of overwriting.
  • Both scripts assume kubectl/oc and jq are available; adding explicit checks with clear error messages will make failures easier to understand in environments where those tools are missing.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `enable_kube_state_metrics_labels.sh`, the jq mutation assumes `.args` already exists on the `kube-state-metrics` container; consider initializing it (e.g., `.| (.args //= []) | .args += [$ARG]`) to avoid failures when args are not defined.
- `enable_user_workload_monitoring_openshift.sh` fully replaces the `cluster-monitoring-config` ConfigMap and may drop existing configuration; consider merging with the current `config.yaml` or at least preserving existing keys instead of overwriting.
- Both scripts assume `kubectl`/`oc` and `jq` are available; adding explicit checks with clear error messages will make failures easier to understand in environments where those tools are missing.

## Individual Comments

### Comment 1
<location> `scripts/enable_kube_state_metrics_labels.sh:30-32` </location>
<code_context>
+  echo -n "Adding --metric-labels-allowlist argument..."
+
+  kubectl -n "$NAMESPACE" get deployment "$DEPLOYMENT" -o json | \
+  jq --arg ARG "$LABEL_ARG" '
+    .spec.template.spec.containers |=
+      map(if .name=="kube-state-metrics"
+          then .args += [$ARG]
+          else .
</code_context>

<issue_to_address>
**issue (bug_risk):** Make `.args` update null-safe in `jq` to avoid failures when the args array is missing

If the `kube-state-metrics` container has no `args` field or it’s `null`, `.args += [$ARG]` will fail. Default to an empty array instead:

```bash
jq --arg ARG "$LABEL_ARG" '
  .spec.template.spec.containers |=
    map(if .name=="kube-state-metrics"
        then .args = ((.args // []) + [$ARG])
        else .
        end)
'
```
</issue_to_address>

### Comment 2
<location> `scripts/enable_user_workload_monitoring_openshift.sh:11-19` </location>
<code_context>
+else
+  echo "Enabling User Workload Monitoring..."
+
+  oc apply -f - <<EOF
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: cluster-monitoring-config
+  namespace: openshift-monitoring
+data:
+  config.yaml: |
+    enableUserWorkload: true
+EOF
+
</code_context>

<issue_to_address>
**issue (bug_risk):** Avoid overwriting existing cluster-monitoring-config content when enabling user workload monitoring

This approach replaces any existing `cluster-monitoring-config` content, discarding other settings already present in `config.yaml`. Instead, read the current config and patch in `enableUserWorkload: true` only if needed (for example, using `oc get ... -o yaml` piped through `yq`/`sed`/`jq` to merge) rather than replacing the entire `config.yaml` block.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
@kusumachalasani
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In both scripts, consider adding set -euo pipefail and consistently quoting variables (e.g., $NAMESPACE, $CONFIGMAP) to make the scripts more robust against failures and word-splitting issues.
  • The enable_kube_state_metrics_labels.sh script picks the first kube-state-metrics deployment via grep ... | head -n 1, which may be ambiguous if multiple instances exist; it would be safer to either fail when more than one deployment is found or allow the user to specify the namespace/deployment explicitly.
  • The enable_kube_state_metrics_labels.sh script assumes jq is available; adding an explicit check with a clear error message (command -v jq / which jq) would make the dependency clearer and failure mode friendlier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In both scripts, consider adding `set -euo pipefail` and consistently quoting variables (e.g., `$NAMESPACE`, `$CONFIGMAP`) to make the scripts more robust against failures and word-splitting issues.
- The `enable_kube_state_metrics_labels.sh` script picks the first `kube-state-metrics` deployment via `grep ... | head -n 1`, which may be ambiguous if multiple instances exist; it would be safer to either fail when more than one deployment is found or allow the user to specify the namespace/deployment explicitly.
- The `enable_kube_state_metrics_labels.sh` script assumes `jq` is available; adding an explicit check with a clear error message (`command -v jq` / `which jq`) would make the dependency clearer and failure mode friendlier.

## Individual Comments

### Comment 1
<location> `scripts/enable_user_workload_monitoring_openshift.sh:46-47` </location>
<code_context>
+      UPDATED="$EXISTING"$'\n'"enableUserWorkload: true"
+    fi
+
+    oc -n $NAMESPACE patch configmap $CONFIGMAP --type merge -p \
+      "{\"data\":{\"config.yaml\":\"$(echo "$UPDATED" | sed 's/"/\\"/g')\"}}"
+
+    echo "Done!"
</code_context>

<issue_to_address>
**issue (bug_risk):** Newlines in the patched config may produce invalid JSON for the `oc patch` payload.

`$UPDATED` is inserted with only double quotes escaped, so any literal newlines remain and make the JSON payload invalid once `enableUserWorkload` is part of a multi‑line config. Either escape newlines to `\n` before interpolation, or switch to `--patch-file` / `oc apply -f -` with a proper YAML/JSON document instead of constructing JSON in the shell.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
@kusumachalasani
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In enable_user_workload_monitoring_openshift.sh, consider making the openshift-user-workload-monitoring namespace a configurable variable at the top of the script (similar to NAMESPACE and CONFIGMAP) to allow flexibility for different cluster setups.
  • In enable_kube_state_metrics_labels.sh, the container name kube-state-metrics and the hard‑coded label allowlist might not match all deployments; it would be more flexible to expose these as flags or variables so users can target differently named containers or customize the label set.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `enable_user_workload_monitoring_openshift.sh`, consider making the `openshift-user-workload-monitoring` namespace a configurable variable at the top of the script (similar to `NAMESPACE` and `CONFIGMAP`) to allow flexibility for different cluster setups.
- In `enable_kube_state_metrics_labels.sh`, the container name `kube-state-metrics` and the hard‑coded label allowlist might not match all deployments; it would be more flexible to expose these as flags or variables so users can target differently named containers or customize the label set.

## Individual Comments

### Comment 1
<location> `scripts/enable_user_workload_monitoring_openshift.sh:21-22` </location>
<code_context>
+fi
+
+echo -n "Fetching cluster monitoring config..."
+EXISTING=$(oc -n $NAMESPACE get configmap $CONFIGMAP \
+  -o jsonpath='{.data.config\.yaml}' 2>/dev/null)
+
+if [ -z "$EXISTING" ]; then
</code_context>

<issue_to_address>
**suggestion:** Quote variable expansions when calling `oc` to avoid edge cases with special characters in names.

For example:

```bash
EXISTING=$(oc -n "$NAMESPACE" get configmap "$CONFIGMAP" \
  -o jsonpath='{.data.config\.yaml}' 2>/dev/null)
```

Please update the remaining `oc` calls in this script similarly.

Suggested implementation:

```
echo -n "Fetching cluster monitoring config..."
EXISTING=$(oc -n "$NAMESPACE" get configmap "$CONFIGMAP" \
  -o jsonpath='{.data.config\.yaml}' 2>/dev/null)

```

Scan the rest of `scripts/enable_user_workload_monitoring_openshift.sh` for any other `oc` invocations and:
1. Quote namespace and resource names, e.g. `oc -n "$NAMESPACE" get configmap "$CONFIGMAP"`; `oc -n "$NAMESPACE" get pods`, etc.
2. Quote any other variable expansions passed as arguments to `oc` (labels, selectors, filenames, etc.), e.g. `oc get pods -l "$LABEL_SELECTOR"` to avoid word splitting and glob expansion issues.
3. Leave literals and non-variable arguments as-is (e.g. `-o json`, `apply -f -`).
</issue_to_address>

### Comment 2
<location> `scripts/enable_user_workload_monitoring_openshift.sh:51-53` </location>
<code_context>
+    echo "Not enabled."
+    echo -n "Enabling user workload monitoring (preserving other settings)..."
+
+    # If enableUserWorkload exists but false → replace it
+    if echo "$EXISTING" | grep -q "enableUserWorkload:"; then
+      UPDATED=$(echo "$EXISTING" | sed 's/enableUserWorkload:.*/enableUserWorkload: true/')
+    else
+      # Otherwise append it
</code_context>

<issue_to_address>
**issue (bug_risk):** Using `sed` on raw YAML risks corrupting formatting or nested keys; consider a structured update.

The current pattern will match any `enableUserWorkload:` token, regardless of nesting, indentation, or whether it appears in comments, so it may alter the wrong location or produce invalid YAML. Please use a YAML-aware tool (e.g., `yq` or `oc patch` on `.data["config.yaml"]`) to target only the intended key while preserving structure and formatting.
</issue_to_address>

### Comment 3
<location> `scripts/enable_user_workload_monitoring_openshift.sh:70-72` </location>
<code_context>
+    echo "Done!"
+
+    echo "Waiting for user workload monitoring pods..."
+    oc wait --for=condition=Ready pod --all -n openshift-user-workload-monitoring --timeout=300s
+    oc get pods -n openshift-user-workload-monitoring
+
</code_context>

<issue_to_address>
**suggestion:** Waiting on `--all` pods may hang or fail due to unrelated non-ready pods.

Because this waits on all pods in the namespace, any unrelated failed or intentionally non-ready pod will cause the command to time out. Consider narrowing the wait (e.g., via a label selector for the specific user workload monitoring components) instead of targeting all pods.

```suggestion
    echo "Done!"

    echo "Waiting for user workload monitoring pods to become Ready..."
    # Wait only for the user workload monitoring pods, not all pods in the namespace, to avoid
    # hanging on unrelated failed or intentionally non-ready pods.
    oc wait --for=condition=Ready pod \
      -n openshift-user-workload-monitoring \
      -l 'app.kubernetes.io/part-of=user-workload-monitoring' \
      --timeout=300s
    oc get pods -n openshift-user-workload-monitoring
```
</issue_to_address>

### Comment 4
<location> `scripts/enable_kube_state_metrics_labels.sh:55-58` </location>
<code_context>
+  echo "Found $DEPLOYMENT in $NAMESPACE namespace."
+fi
+
+EXISTS=$(kubectl -n "$NAMESPACE" get deployment "$DEPLOYMENT" -o json | \
+jq -r '.spec.template.spec.containers[]
+       | select(.name=="kube-state-metrics")
+       | .args[]? ' | grep -c "metric-labels-allowlist")
+
+if [ "$EXISTS" -gt 0 ]; then
</code_context>

<issue_to_address>
**issue (bug_risk):** The existence check may skip adding the pods allowlist when a different allowlist is already present.

The `EXISTS` check only matches the substring `metric-labels-allowlist`. If another allowlist (e.g., for nodes) exists but no pods entry does, the script will incorrectly skip adding `pods=[app,...]`.

If you need to guarantee the pods allowlist specifically, consider matching the full `LABEL_ARG` (or at least `pods=[`) instead of just `metric-labels-allowlist`, or extend the logic to merge/augment any existing allowlists.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +21 to +22
EXISTING=$(oc -n $NAMESPACE get configmap $CONFIGMAP \
-o jsonpath='{.data.config\.yaml}' 2>/dev/null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Quote variable expansions when calling oc to avoid edge cases with special characters in names.

For example:

EXISTING=$(oc -n "$NAMESPACE" get configmap "$CONFIGMAP" \
  -o jsonpath='{.data.config\.yaml}' 2>/dev/null)

Please update the remaining oc calls in this script similarly.

Suggested implementation:

echo -n "Fetching cluster monitoring config..."
EXISTING=$(oc -n "$NAMESPACE" get configmap "$CONFIGMAP" \
  -o jsonpath='{.data.config\.yaml}' 2>/dev/null)

Scan the rest of scripts/enable_user_workload_monitoring_openshift.sh for any other oc invocations and:

  1. Quote namespace and resource names, e.g. oc -n "$NAMESPACE" get configmap "$CONFIGMAP"; oc -n "$NAMESPACE" get pods, etc.
  2. Quote any other variable expansions passed as arguments to oc (labels, selectors, filenames, etc.), e.g. oc get pods -l "$LABEL_SELECTOR" to avoid word splitting and glob expansion issues.
  3. Leave literals and non-variable arguments as-is (e.g. -o json, apply -f -).

Comment on lines +51 to +53
# If enableUserWorkload exists but false → replace it
if echo "$EXISTING" | grep -q "enableUserWorkload:"; then
UPDATED=$(echo "$EXISTING" | sed 's/enableUserWorkload:.*/enableUserWorkload: true/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Using sed on raw YAML risks corrupting formatting or nested keys; consider a structured update.

The current pattern will match any enableUserWorkload: token, regardless of nesting, indentation, or whether it appears in comments, so it may alter the wrong location or produce invalid YAML. Please use a YAML-aware tool (e.g., yq or oc patch on .data["config.yaml"]) to target only the intended key while preserving structure and formatting.

Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
@kusumachalasani kusumachalasani moved this to Under Review in Monitoring Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Under Review

Development

Successfully merging this pull request may close these issues.

1 participant