Skip to content

Fixes#182

Open
jcpowermac wants to merge 3 commits intoredhat-openshift-ecosystem:mainfrom
jcpowermac:fixes
Open

Fixes#182
jcpowermac wants to merge 3 commits intoredhat-openshift-ecosystem:mainfrom
jcpowermac:fixes

Conversation

@jcpowermac
Copy link
Collaborator

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

jcpowermac and others added 2 commits November 20, 2025 16:24
This is a workaround for a bug in provider-certification-plugins
where platform.sh uses ${SHARED_DIR} but entrypoint-tests.sh never
defines it, causing "SHARED_DIR: unbound variable" errors.

The proper fix belongs in provider-certification-plugins by adding:
  declare -gr SHARED_DIR="/tmp/shared"
to entrypoint-tests.sh before platform.sh is called.

This workaround sets SHARED_DIR as an environment variable in all
plugin pod templates that use platform.sh.

Affected plugins:
- openshift-kube-conformance
- openshift-conformance-validated
- openshift-conformance-replay
- openshift-cluster-upgrade

Error seen:
  /tmp/shared/platform.sh: line 97: SHARED_DIR: unbound variable

This commit should be reverted once the fix is merged in
provider-certification-plugins and new plugin images are released.
This commit contains the complete fix for the kubernetes/conformance
suite error in OCP 4.20+. The fix required THREE iterations to get
right due to subtle multi-container pod architecture issues.

## Problem
Starting with OCP 4.20, the kubernetes/conformance suite was split
into sub-suites (parallel, serial, minimal variants). The parent
suite no longer exists, causing OPCT to fail with:
  "suite kubernetes/conformance does not exist"

## Root Causes Found
1. Suite reorganization in openshift-tests binary (OCP 4.20+)
2. Plugin Go code hardcoding suite name instead of reading env var
3. **CRITICAL**: DEFAULT_SUITE_NAME set in wrong container!

## The Subtle Bug - Container Environment Isolation
Environment variables in multi-container pods are container-scoped.
We initially set DEFAULT_SUITE_NAME in the "tests" container, but
the Go plugin code that reads it runs in the "plugin" container.

The fix required adding DEFAULT_SUITE_NAME to BOTH containers:
- Line 55-56: tests container (for bash scripts)
- Line 89-90: plugin container (for Go code) ← **CRITICAL FIX**

## Changes in This Commit

### Version Detection
- pkg/run/run.go: Added setKubeConformanceSuiteName() method
- Detects cluster version and sets appropriate suite name
- OCP < 4.20: kubernetes/conformance
- OCP >= 4.20: kubernetes/conformance/parallel

### Template Updates
- data/templates/plugins/openshift-kube-conformance.yaml:
  - Updated to use {{ .KubeConformanceSuiteName }}
  - Added DEFAULT_SUITE_NAME to plugin container env (THE FIX!)
  - Added SHARED_DIR workaround for plugin bug

### Documentation
- internal/report/slo.go: Updated suite description
- docs/review/rules.md: Updated with version-specific info
- docs/devel/guide.md: Updated examples
- CHANGES_OCP420_SUITE_FIX.md: Complete fix documentation
- SESSION_LEARNINGS_2025-11-20.md: Debugging journey lessons
- OPCT_ARCHITECTURE.md: Architecture reference

## Testing
Verified with OCP 4.20 cluster that tests now run with:
  /usr/bin/openshift-tests run kubernetes/conformance/parallel

## Dependencies
Requires provider-certification-plugins commit 23f29b7:
- Plugin must read DEFAULT_SUITE_NAME from environment
- Without that commit, plugin still hardcodes suite name

## Related Commits
- This repo workaround (c831c68): SHARED_DIR env var
- Plugin repo (23f29b7): Read DEFAULT_SUITE_NAME - REQUIRED
- Plugin repo (4c7e163): SHARED_DIR variable definition

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@openshift-ci openshift-ci bot requested a review from jinhli November 20, 2025 22:27
@openshift-ci
Copy link

openshift-ci bot commented Nov 20, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jcpowermac for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested a review from rvanderp3 November 20, 2025 22:27
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 3, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants