-
Notifications
You must be signed in to change notification settings - Fork 62
Prereq scripts for runtime recommendations #1787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: mvp_demo
Are you sure you want to change the base?
Conversation
Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
Reviewer's GuideAdds two Bash helper scripts to configure monitoring prerequisites for runtime recommendations by enabling user workload monitoring on OpenShift and updating kube-state-metrics deployments to expose specific pod labels as metrics. Flow diagram for enable_user_workload_monitoring_openshift.sh scriptflowchart TD
A["Start script"] --> B["Check oc command is available"]
B -->|missing| Z1["Exit with error about missing oc"]
B -->|present| C["Check openshift-monitoring namespace exists"]
C -->|missing| Z2["Exit with error about missing namespace"]
C -->|exists| D["Get existing cluster-monitoring-config ConfigMap data.config.yaml"]
D -->|empty or not found| E["Create new cluster-monitoring-config ConfigMap with config.yaml: enableUserWorkload: true"]
E --> F["Apply ConfigMap using oc apply -f -"]
F --> G["End script"]
D -->|exists| H["Check if config.yaml contains enableUserWorkload: true"]
H -->|already true| I["Print already enabled message"]
I --> G
H -->|not enabled| J["Determine if enableUserWorkload key exists"]
J -->|key exists| K["Replace enableUserWorkload value with true using sed"]
J -->|key missing| L["Append enableUserWorkload: true to existing config"]
K --> M["Build updated ConfigMap manifest with preserved settings"]
L --> M
M --> N["Apply updated ConfigMap using oc apply -f -"]
N --> O["Wait for pods in openshift-user-workload-monitoring namespace to become Ready"]
O --> P["List pods in openshift-user-workload-monitoring namespace"]
P --> G
Flow diagram for enable_kube_state_metrics_labels.sh scriptflowchart TD
A["Start script"] --> B["Parse flags: -n namespace, -d deployment, -h help"]
B -->|help requested| C["Show usage and exit"]
B -->|continue| D["Check kubectl and jq commands are available"]
D -->|missing| Z1["Exit with error about missing dependency"]
D -->|present| E{Are NAMESPACE and DEPLOYMENT both set?}
E -->|yes| H["Get specified deployment JSON via kubectl -n NAMESPACE get deployment DEPLOYMENT -o json"]
E -->|no| F["Search all namespaces for deployments with name matching kube-state-metrics"]
F --> G{Number of matching deployments}
G -->|0| Z2["Exit with error: no kube-state-metrics deployment found"]
G -->|>1| Z3["Print list, ask user to specify NAMESPACE and DEPLOYMENT, then exit"]
G -->|1| H
H --> I["Use jq to inspect containers named kube-state-metrics and list their args"]
I --> J{Any arg contains metric-labels-allowlist?}
J -->|yes| K["Print message: allowlist already present, no change needed"]
K --> L["End script"]
J -->|no| M["Patch deployment JSON with jq: append LABEL_ARG to args of kube-state-metrics container"]
M --> N["Apply modified deployment via kubectl apply -f -"]
N --> O["Print done message"]
O --> L
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 2 issues, and left some high level feedback:
- In
enable_kube_state_metrics_labels.sh, the jq mutation assumes.argsalready exists on thekube-state-metricscontainer; consider initializing it (e.g.,.| (.args //= []) | .args += [$ARG]) to avoid failures when args are not defined. enable_user_workload_monitoring_openshift.shfully replaces thecluster-monitoring-configConfigMap and may drop existing configuration; consider merging with the currentconfig.yamlor at least preserving existing keys instead of overwriting.- Both scripts assume
kubectl/ocandjqare available; adding explicit checks with clear error messages will make failures easier to understand in environments where those tools are missing.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `enable_kube_state_metrics_labels.sh`, the jq mutation assumes `.args` already exists on the `kube-state-metrics` container; consider initializing it (e.g., `.| (.args //= []) | .args += [$ARG]`) to avoid failures when args are not defined.
- `enable_user_workload_monitoring_openshift.sh` fully replaces the `cluster-monitoring-config` ConfigMap and may drop existing configuration; consider merging with the current `config.yaml` or at least preserving existing keys instead of overwriting.
- Both scripts assume `kubectl`/`oc` and `jq` are available; adding explicit checks with clear error messages will make failures easier to understand in environments where those tools are missing.
## Individual Comments
### Comment 1
<location> `scripts/enable_kube_state_metrics_labels.sh:30-32` </location>
<code_context>
+ echo -n "Adding --metric-labels-allowlist argument..."
+
+ kubectl -n "$NAMESPACE" get deployment "$DEPLOYMENT" -o json | \
+ jq --arg ARG "$LABEL_ARG" '
+ .spec.template.spec.containers |=
+ map(if .name=="kube-state-metrics"
+ then .args += [$ARG]
+ else .
</code_context>
<issue_to_address>
**issue (bug_risk):** Make `.args` update null-safe in `jq` to avoid failures when the args array is missing
If the `kube-state-metrics` container has no `args` field or it’s `null`, `.args += [$ARG]` will fail. Default to an empty array instead:
```bash
jq --arg ARG "$LABEL_ARG" '
.spec.template.spec.containers |=
map(if .name=="kube-state-metrics"
then .args = ((.args // []) + [$ARG])
else .
end)
'
```
</issue_to_address>
### Comment 2
<location> `scripts/enable_user_workload_monitoring_openshift.sh:11-19` </location>
<code_context>
+else
+ echo "Enabling User Workload Monitoring..."
+
+ oc apply -f - <<EOF
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: cluster-monitoring-config
+ namespace: openshift-monitoring
+data:
+ config.yaml: |
+ enableUserWorkload: true
+EOF
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid overwriting existing cluster-monitoring-config content when enabling user workload monitoring
This approach replaces any existing `cluster-monitoring-config` content, discarding other settings already present in `config.yaml`. Instead, read the current config and patch in `enableUserWorkload: true` only if needed (for example, using `oc get ... -o yaml` piped through `yq`/`sed`/`jq` to merge) rather than replacing the entire `config.yaml` block.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 1 issue, and left some high level feedback:
- In both scripts, consider adding
set -euo pipefailand consistently quoting variables (e.g.,$NAMESPACE,$CONFIGMAP) to make the scripts more robust against failures and word-splitting issues. - The
enable_kube_state_metrics_labels.shscript picks the firstkube-state-metricsdeployment viagrep ... | head -n 1, which may be ambiguous if multiple instances exist; it would be safer to either fail when more than one deployment is found or allow the user to specify the namespace/deployment explicitly. - The
enable_kube_state_metrics_labels.shscript assumesjqis available; adding an explicit check with a clear error message (command -v jq/which jq) would make the dependency clearer and failure mode friendlier.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In both scripts, consider adding `set -euo pipefail` and consistently quoting variables (e.g., `$NAMESPACE`, `$CONFIGMAP`) to make the scripts more robust against failures and word-splitting issues.
- The `enable_kube_state_metrics_labels.sh` script picks the first `kube-state-metrics` deployment via `grep ... | head -n 1`, which may be ambiguous if multiple instances exist; it would be safer to either fail when more than one deployment is found or allow the user to specify the namespace/deployment explicitly.
- The `enable_kube_state_metrics_labels.sh` script assumes `jq` is available; adding an explicit check with a clear error message (`command -v jq` / `which jq`) would make the dependency clearer and failure mode friendlier.
## Individual Comments
### Comment 1
<location> `scripts/enable_user_workload_monitoring_openshift.sh:46-47` </location>
<code_context>
+ UPDATED="$EXISTING"$'\n'"enableUserWorkload: true"
+ fi
+
+ oc -n $NAMESPACE patch configmap $CONFIGMAP --type merge -p \
+ "{\"data\":{\"config.yaml\":\"$(echo "$UPDATED" | sed 's/"/\\"/g')\"}}"
+
+ echo "Done!"
</code_context>
<issue_to_address>
**issue (bug_risk):** Newlines in the patched config may produce invalid JSON for the `oc patch` payload.
`$UPDATED` is inserted with only double quotes escaped, so any literal newlines remain and make the JSON payload invalid once `enableUserWorkload` is part of a multi‑line config. Either escape newlines to `\n` before interpolation, or switch to `--patch-file` / `oc apply -f -` with a proper YAML/JSON document instead of constructing JSON in the shell.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 4 issues, and left some high level feedback:
- In
enable_user_workload_monitoring_openshift.sh, consider making theopenshift-user-workload-monitoringnamespace a configurable variable at the top of the script (similar toNAMESPACEandCONFIGMAP) to allow flexibility for different cluster setups. - In
enable_kube_state_metrics_labels.sh, the container namekube-state-metricsand the hard‑coded label allowlist might not match all deployments; it would be more flexible to expose these as flags or variables so users can target differently named containers or customize the label set.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `enable_user_workload_monitoring_openshift.sh`, consider making the `openshift-user-workload-monitoring` namespace a configurable variable at the top of the script (similar to `NAMESPACE` and `CONFIGMAP`) to allow flexibility for different cluster setups.
- In `enable_kube_state_metrics_labels.sh`, the container name `kube-state-metrics` and the hard‑coded label allowlist might not match all deployments; it would be more flexible to expose these as flags or variables so users can target differently named containers or customize the label set.
## Individual Comments
### Comment 1
<location> `scripts/enable_user_workload_monitoring_openshift.sh:21-22` </location>
<code_context>
+fi
+
+echo -n "Fetching cluster monitoring config..."
+EXISTING=$(oc -n $NAMESPACE get configmap $CONFIGMAP \
+ -o jsonpath='{.data.config\.yaml}' 2>/dev/null)
+
+if [ -z "$EXISTING" ]; then
</code_context>
<issue_to_address>
**suggestion:** Quote variable expansions when calling `oc` to avoid edge cases with special characters in names.
For example:
```bash
EXISTING=$(oc -n "$NAMESPACE" get configmap "$CONFIGMAP" \
-o jsonpath='{.data.config\.yaml}' 2>/dev/null)
```
Please update the remaining `oc` calls in this script similarly.
Suggested implementation:
```
echo -n "Fetching cluster monitoring config..."
EXISTING=$(oc -n "$NAMESPACE" get configmap "$CONFIGMAP" \
-o jsonpath='{.data.config\.yaml}' 2>/dev/null)
```
Scan the rest of `scripts/enable_user_workload_monitoring_openshift.sh` for any other `oc` invocations and:
1. Quote namespace and resource names, e.g. `oc -n "$NAMESPACE" get configmap "$CONFIGMAP"`; `oc -n "$NAMESPACE" get pods`, etc.
2. Quote any other variable expansions passed as arguments to `oc` (labels, selectors, filenames, etc.), e.g. `oc get pods -l "$LABEL_SELECTOR"` to avoid word splitting and glob expansion issues.
3. Leave literals and non-variable arguments as-is (e.g. `-o json`, `apply -f -`).
</issue_to_address>
### Comment 2
<location> `scripts/enable_user_workload_monitoring_openshift.sh:51-53` </location>
<code_context>
+ echo "Not enabled."
+ echo -n "Enabling user workload monitoring (preserving other settings)..."
+
+ # If enableUserWorkload exists but false → replace it
+ if echo "$EXISTING" | grep -q "enableUserWorkload:"; then
+ UPDATED=$(echo "$EXISTING" | sed 's/enableUserWorkload:.*/enableUserWorkload: true/')
+ else
+ # Otherwise append it
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `sed` on raw YAML risks corrupting formatting or nested keys; consider a structured update.
The current pattern will match any `enableUserWorkload:` token, regardless of nesting, indentation, or whether it appears in comments, so it may alter the wrong location or produce invalid YAML. Please use a YAML-aware tool (e.g., `yq` or `oc patch` on `.data["config.yaml"]`) to target only the intended key while preserving structure and formatting.
</issue_to_address>
### Comment 3
<location> `scripts/enable_user_workload_monitoring_openshift.sh:70-72` </location>
<code_context>
+ echo "Done!"
+
+ echo "Waiting for user workload monitoring pods..."
+ oc wait --for=condition=Ready pod --all -n openshift-user-workload-monitoring --timeout=300s
+ oc get pods -n openshift-user-workload-monitoring
+
</code_context>
<issue_to_address>
**suggestion:** Waiting on `--all` pods may hang or fail due to unrelated non-ready pods.
Because this waits on all pods in the namespace, any unrelated failed or intentionally non-ready pod will cause the command to time out. Consider narrowing the wait (e.g., via a label selector for the specific user workload monitoring components) instead of targeting all pods.
```suggestion
echo "Done!"
echo "Waiting for user workload monitoring pods to become Ready..."
# Wait only for the user workload monitoring pods, not all pods in the namespace, to avoid
# hanging on unrelated failed or intentionally non-ready pods.
oc wait --for=condition=Ready pod \
-n openshift-user-workload-monitoring \
-l 'app.kubernetes.io/part-of=user-workload-monitoring' \
--timeout=300s
oc get pods -n openshift-user-workload-monitoring
```
</issue_to_address>
### Comment 4
<location> `scripts/enable_kube_state_metrics_labels.sh:55-58` </location>
<code_context>
+ echo "Found $DEPLOYMENT in $NAMESPACE namespace."
+fi
+
+EXISTS=$(kubectl -n "$NAMESPACE" get deployment "$DEPLOYMENT" -o json | \
+jq -r '.spec.template.spec.containers[]
+ | select(.name=="kube-state-metrics")
+ | .args[]? ' | grep -c "metric-labels-allowlist")
+
+if [ "$EXISTS" -gt 0 ]; then
</code_context>
<issue_to_address>
**issue (bug_risk):** The existence check may skip adding the pods allowlist when a different allowlist is already present.
The `EXISTS` check only matches the substring `metric-labels-allowlist`. If another allowlist (e.g., for nodes) exists but no pods entry does, the script will incorrectly skip adding `pods=[app,...]`.
If you need to guarantee the pods allowlist specifically, consider matching the full `LABEL_ARG` (or at least `pods=[`) instead of just `metric-labels-allowlist`, or extend the logic to merge/augment any existing allowlists.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| EXISTING=$(oc -n $NAMESPACE get configmap $CONFIGMAP \ | ||
| -o jsonpath='{.data.config\.yaml}' 2>/dev/null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Quote variable expansions when calling oc to avoid edge cases with special characters in names.
For example:
EXISTING=$(oc -n "$NAMESPACE" get configmap "$CONFIGMAP" \
-o jsonpath='{.data.config\.yaml}' 2>/dev/null)Please update the remaining oc calls in this script similarly.
Suggested implementation:
echo -n "Fetching cluster monitoring config..."
EXISTING=$(oc -n "$NAMESPACE" get configmap "$CONFIGMAP" \
-o jsonpath='{.data.config\.yaml}' 2>/dev/null)
Scan the rest of scripts/enable_user_workload_monitoring_openshift.sh for any other oc invocations and:
- Quote namespace and resource names, e.g.
oc -n "$NAMESPACE" get configmap "$CONFIGMAP";oc -n "$NAMESPACE" get pods, etc. - Quote any other variable expansions passed as arguments to
oc(labels, selectors, filenames, etc.), e.g.oc get pods -l "$LABEL_SELECTOR"to avoid word splitting and glob expansion issues. - Leave literals and non-variable arguments as-is (e.g.
-o json,apply -f -).
| # If enableUserWorkload exists but false → replace it | ||
| if echo "$EXISTING" | grep -q "enableUserWorkload:"; then | ||
| UPDATED=$(echo "$EXISTING" | sed 's/enableUserWorkload:.*/enableUserWorkload: true/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Using sed on raw YAML risks corrupting formatting or nested keys; consider a structured update.
The current pattern will match any enableUserWorkload: token, regardless of nesting, indentation, or whether it appears in comments, so it may alter the wrong location or produce invalid YAML. Please use a YAML-aware tool (e.g., yq or oc patch on .data["config.yaml"]) to target only the intended key while preserving structure and formatting.
Signed-off-by: kusuma chalasani <kchalasa@redhat.com>
Description
Two prereq scripts for openshift and kind/minikube to generate runtime recommendations.
Type of change
How has this been tested?
Tested on local environment.
Test Configuration
kind, minikube and openshift
Checklist 🎯
Additional information
Include any additional information such as links, test results, screenshots here
Summary by Sourcery
Add helper scripts to prepare Kubernetes clusters for runtime recommendations by enabling user workload monitoring on OpenShift and configuring kube-state-metrics label collection.
New Features: