-
Notifications
You must be signed in to change notification settings - Fork 83
chore: Confirm that a public ConfigMap exists to check version #250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore: Confirm that a public ConfigMap exists to check version #250
Conversation
|
🎉 Welcome to the Kubeflow SDK! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a non-blocking control-plane version metadata check for the Kubeflow Trainer Kubernetes backend. The check verifies that the kubeflow-trainer-public ConfigMap exists and contains a kubeflow_trainer_api_version field, logging warnings when metadata is unavailable but never failing client initialization. This represents a more lenient approach compared to the superseded PR #223, which enforced strict version compatibility.
Changes:
- Added
verify_backend()method to check for control-plane version metadata availability - Added comprehensive unit tests covering success and failure scenarios
- Imported
osmodule for environment variable access
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
kubeflow/trainer/backends/kubernetes/backend.py |
Implements non-blocking version check via verify_backend() method called during initialization; adds os import for namespace configuration |
kubeflow/trainer/backends/kubernetes/backend_test.py |
Adds test helper function and five test cases covering success, missing ConfigMap, missing data key, None data, and dev version scenarios; includes minor docstring formatting improvement |
|
/ok-to-test |
|
/retest |
| """Verify that the Trainer control plane exposes version metadata. | ||
|
|
||
| This check only ensures that the public control-plane ConfigMap exists | ||
| and contains a ``kubeflow_trainer_api_version`` field. It does not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field name is kubeflow_trainer_version: https://github.com/kubeflow/trainer/blob/master/manifests/overlays/manager/kustomization.yaml#L31
| enforce version compatibility and never raises. | ||
| """ | ||
|
|
||
| system_namespace = os.getenv("KUBEFLOW_SYSTEM_NAMESPACE", "kubeflow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default system namespace is:
| system_namespace = os.getenv("KUBEFLOW_SYSTEM_NAMESPACE", "kubeflow") | |
| system_namespace = os.getenv("KUBEFLOW_SYSTEM_NAMESPACE", "kubeflow-system") |
| except Exception as e: | ||
| logger.warning( | ||
| "Trainer control-plane version info is not available: " | ||
| f"failed to read ConfigMap '{config_map_name}' in namespace " | ||
| f"'{system_namespace}', error: {e!r}." | ||
| ) | ||
| return | ||
|
|
||
| data = getattr(config_map, "data", None) | ||
|
|
||
| if not isinstance(data, dict): | ||
| logger.warning( | ||
| "Trainer control-plane version info is not available: " | ||
| f"ConfigMap '{config_map_name}' in namespace " | ||
| f"'{system_namespace}' has no data dictionary." | ||
| ) | ||
| return | ||
|
|
||
| server_version = data.get("kubeflow_trainer_api_version") | ||
|
|
||
| if not server_version: | ||
| logger.warning( | ||
| "Trainer control-plane version info is not available: " | ||
| f"ConfigMap '{config_map_name}' in namespace " | ||
| f"'{system_namespace}' is missing the " | ||
| "'kubeflow_trainer_api_version' data key." | ||
| ) | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just put everything under the same try: condition, and log warning if something fails.
Additionally, you can log the Exeception in the message.
You should expect that configMap has the correct data there.
| if server_version == "dev": | ||
| return | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this
| if server_version == "dev": | |
| return |
| ) | ||
|
|
||
|
|
||
| def test_version_check_success(caplog): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refactor this tests to use TestCases like described here: https://github.com/kubeflow/sdk/blob/aab449ac829ac02c804758e6d72f0b2652a2d783/AGENTS.md
Pull Request Test Coverage Report for Build 21873621460Details
💛 - Coveralls |
|
@andreyvelich , Thanks for the review, addressed all your comments |
|
|
||
| return [record.getMessage() for record in handler.records] | ||
|
|
||
| def test_version_check_success(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor these tests with a single test_verify_backend() function and TestCases, like described here: https://github.com/kubeflow/sdk/blob/main/AGENTS.md#3-testing-requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich, updated backend_test.py file, This matches the AGENTS.md pattern: a single pytest function, parametrized with the shared TestCase dataclass to cover multiple scenarios.
andreyvelich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @sameerdattav!
/assign @astefanutti @akshaychitneni @kramaranya @szaher @Fiona-Waters
| }, | ||
| ), | ||
| TestCase( | ||
| name="missing data key logs warning", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update your test cases according to your updated implementation?
E.g. we don't check data key for now.
|
@andreyvelich, |
|
/retest |
andreyvelich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @kramaranya @astefanutti @kubeflow/kubeflow-sdk-team @Fiona-Waters
astefanutti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@sameerdattav Can you try to rebase your PR to fix the CI? |
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Surya Sameer Datta Vaddadi <137607947+sameerdattav@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Surya Sameer Datta Vaddadi <137607947+sameerdattav@users.noreply.github.com>
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Surya Sameer Datta Vaddadi <137607947+sameerdattav@users.noreply.github.com>
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
0db01b1 to
a17d830
Compare
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
|
@andreyvelich @astefanutti |
|
@andreyvelich, |
|
Sure, that sounds good, k8s also has the same setting: https://github.com/kubernetes/kubernetes/blob/master/.gitattributes#L2 |
This PR adds a non-blocking control-plane version metadata check to the Kubernetes backend.
The backend now verifies that the public Trainer ConfigMap exists and contains a kubeflow_trainer_api_version field, logging warnings when metadata is unavailable but never failing client initialization. Focused unit tests cover the expected behaviors.
Supersedes #223
Fixes #221