-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-3314: adding beta PRR documentation #5877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi @carlbraganza. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
ab59cb0 to
55efa98
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: carlbraganza The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
55efa98 to
0495898
Compare
6f475d4 to
943a783
Compare
|
General comment: Please provide more details when you say "CSI driver specific." For e2e tests, please include links to the test files. |
943a783 to
1d3cb5e
Compare
stlaz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PRR shadow review
| NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`. | ||
| --> | ||
|
|
||
| The feature can be disabled by removing the sidecar from the CSI driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the design text, add these changes:
-
Explain the abbreviation "SP" which is often used throughout the text. I'm assuming it's "Storage Provider".
-
Where it says:
The full specification of the Kubernetes SnapshotMetadata API will be published in the source code repository of the [external-snapshot-metadata sidecar](https://github.com/kubernetes/enhancements/blob/1d3cb5e92e7cb3a815974b172f238e23b957ed96/keps/sig-storage/3314-csi-changed-block-tracking/README.md#the-external-snapshot-metadata-sidecar).
add a link to the actual API spec.
-
In the
The External Snapshot Metadata Sidecarsection, add a link to the repo with the code.
In the same section -The sidecar must be configured with the name of this CR object.<- it's unclear what this means. -
Since you're referring to it from your official docs multiple times. add a concrete RBAC rule definition that would let a backup client scrape the snapshot metadata to the
Risks and Mitigationssection. You can just limit the rule to what you've got in your actual authorizer (thevolumesnapshotsSAR check). -
Also, in the
Risks and Mitigations, explain how you implement rate limiting to address theUncontrolled accessissue raised by sig-auth. -
In the e2e section of the
Test Plan, add detailed explanation how you're going to test your sidecar's authenticator and authorizer. Make these a requirement for beta in theGraduation criteriasection.
In the code of your sidecar:
- Make the authorization test introspect the request so that you actually test that you're sending the correct arguments (https://github.com/kubernetes-csi/external-snapshot-metadata/blob/main/pkg/internal/authz/sar_authorizer_test.go#L33)
- See https://github.com/kubernetes/kubernetes/blob/08764697f483b3bd33cad4e8f2878a04674be272/staging/src/k8s.io/apiserver/pkg/authentication/authenticatorfactory/delegating.go#L96-L99 for a more robust token authentication implementation
In your sidecar repo:
- Create an issue where you explain how you plan on improving the e2e testing for the repository. Currently the only e2e tests live in a
.github/workflowsdirectory in a file calledintegration-test.yamlas a long bash script.
Make these improvements a requirement for Beta in theGraduation criteriasection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback.
Explain the abbreviation "SP" ...
I added a link to https://github.com/container-storage-interface/spec/blob/master/spec.md#terminology for terminology in the Proposal section.
... add a link to the repo with the code ...
Added a link to kubernetes-csi/external-snapshot-metadata.
The KEP was written before the associatedexternal-snapshot-repositorywas created so lacks these links to the code.
add a concrete RBAC rule definition that would let a backup client scrape the snapshot metadata to the Risks and Mitigations section...
Added a link to deploy/example at the end of the section.
... explain how you implement rate limiting to address the Uncontrolled access issue raised by sig-auth.
We do not implement rate limiting. Our mitigation of the Uncontrolled access is partially achieved as a side-effect of requiring K8s authentication and authorization (TokenRequest/TokenReview/SAR) in order to use the service. This was implied previously, so I added language to call it out explicititly.
In the e2e section of the Test Plan, add detailed explanation how you're going to test your sidecar's authenticator and authorizer. Make these a requirement for beta in the Graduation criteria section.
Hmm... the Test Plan does not provide that level of detail on anything. I've added a link to the E2E test PR in case anyone wants to see the actual test logic, and I also added an explicit requirement for authentication and authorization checks to the Beta graduation criteria.
Make the authorization test introspect...
I created Issue 206 to track your suggestion for the authorization unit test.
| installed by the CSI driver, or by | ||
| reinstalling the CSI driver without this feature enabled. | ||
|
|
||
| It can also be effectively disabled by modifying the CSI driver specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would still likely have at least one cluster-admin in the cluster that could still use it. I wouldn't say this is a valid way to disable the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll remove this.
| --> | ||
|
|
||
| No. | ||
| The presence of `SnapshotMetadataService` CR objects indicates that this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is about test presence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert to the previous text.
| will rollout across nodes. | ||
| --> | ||
|
|
||
| The CSI driver installs the `SnapshotMetadataService` CRD, creates an instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the CSI driver should be responsible for API presence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as per the KEP.
This feature adds functionality to a CSI driver and follows the established pattern of existing CSI driver sidecars that provide for volume snapshots, etc, in how the sidecar is deployed (https://kubernetes-csi.github.io/docs/deploying.html).
However, unlike the other CSI features, access to this new sidecar bypasses the K8s API server for the most part, as it involves a lot of I/O. Secure access to the sidecar requires that the client know the server's certificate and audience string - the CR exists to convey this information, and how this CR gets created is up to the CSI driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but the cluster operator/administrator should be responsible for the APIs that exist in the cluster, correct?
If my assumption is wrong, please explain how conflicts in different CRD versions should be handled by the CSI drivers, e.g.. a field gets added into the CRD but an old version of the CSI driver attempts to reconcile the CRD to its desired version.
| --> | ||
|
|
||
| CSI driver specific. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API is common for all the CSI drivers I believe. Please fill this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No upgrade or rollback was tested.
CSI sidecar's are deployed as part of a CSI driver installation and hence are vendor specific in nature. The CSI specifies how a sidecar container communicates with the CSI driver container, but does not specify how the containers are deployed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| The CSI driver installs the `SnapshotMetadataService` CRD, creates an instance | ||
| of this CR, and deploys the `external-snapshot-metadata` sidecar. | ||
| Each CSI driver vendor independently must explicitly handle failure to construct | ||
| these objects during rollout or rollback of their driver fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain how a backup client should handle rollbacks that would cause the feature to suddenly disappear/stop working for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added:
Any rollback will cause active operations by a backup application to fail.
Such failure is similar to a network failure that the application ought to be
capable of handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think there might be a typo I missed in the first read.
| these objects during rollout or rollback of their driver fails. | |
| these objects during rollout or rollback if their driver fails. |
| - Event Reason: A `GetMetadataAllocated` Normal Event with a response containing the details of the | ||
| metadata blocks of the snapshot from the CSI driver. | ||
| - Event Reason: A `GetMetadataDelta` Normal Event with a response containing the delta of the metadata | ||
| blocks of the 2 snapshots from the CSI driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't be logged into events but should just be kept as a response to the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. Reverted.
|
|
||
| CSI driver specific. | ||
| Log records from the `external-snapshot-metadata` sidecar should be visible in | ||
| the CSI driver logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which logs would appear in the CSI driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-wrote to be:
Log records from the
external-snapshot-metadatasidecar container should be visible in its logs,
assuming a typical deployment such as that described in
Deploying CSI Driver on Kubernetes.
| deleting the `SnapshotMetadataService` CR or rolling back the feature. | ||
| It is also possible that the operator may be able to disable the use of the | ||
| feature in the backup application concerned. | ||
| - The [integration tests](https://github.com/kubernetes-csi/external-snapshot-metadata/blob/main/.github/workflows/integration-test.yaml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Test Plan section you explicitly claim integration tests are not suitable for the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Thanks for pointing that out! I removed that comment and moved this link to that section.
| - Testing: Are there any tests for failure mode? If not, describe why. | ||
| --> | ||
|
|
||
| - Failure due to RBAC permissions require the operator to update security |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should follow the template from the .md comment above.
carlbraganza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stlaz I hope I've addressed all your feedback.
| deleting the `SnapshotMetadataService` CR or rolling back the feature. | ||
| It is also possible that the operator may be able to disable the use of the | ||
| feature in the backup application concerned. | ||
| - The [integration tests](https://github.com/kubernetes-csi/external-snapshot-metadata/blob/main/.github/workflows/integration-test.yaml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Thanks for pointing that out! I removed that comment and moved this link to that section.
| installed by the CSI driver, or by | ||
| reinstalling the CSI driver without this feature enabled. | ||
|
|
||
| It can also be effectively disabled by modifying the CSI driver specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll remove this.
| NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`. | ||
| --> | ||
|
|
||
| The feature can be disabled by removing the sidecar from the CSI driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback.
Explain the abbreviation "SP" ...
I added a link to https://github.com/container-storage-interface/spec/blob/master/spec.md#terminology for terminology in the Proposal section.
... add a link to the repo with the code ...
Added a link to kubernetes-csi/external-snapshot-metadata.
The KEP was written before the associatedexternal-snapshot-repositorywas created so lacks these links to the code.
add a concrete RBAC rule definition that would let a backup client scrape the snapshot metadata to the Risks and Mitigations section...
Added a link to deploy/example at the end of the section.
... explain how you implement rate limiting to address the Uncontrolled access issue raised by sig-auth.
We do not implement rate limiting. Our mitigation of the Uncontrolled access is partially achieved as a side-effect of requiring K8s authentication and authorization (TokenRequest/TokenReview/SAR) in order to use the service. This was implied previously, so I added language to call it out explicititly.
In the e2e section of the Test Plan, add detailed explanation how you're going to test your sidecar's authenticator and authorizer. Make these a requirement for beta in the Graduation criteria section.
Hmm... the Test Plan does not provide that level of detail on anything. I've added a link to the E2E test PR in case anyone wants to see the actual test logic, and I also added an explicit requirement for authentication and authorization checks to the Beta graduation criteria.
Make the authorization test introspect...
I created Issue 206 to track your suggestion for the authorization unit test.
| --> | ||
|
|
||
| No. | ||
| The presence of `SnapshotMetadataService` CR objects indicates that this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert to the previous text.
| will rollout across nodes. | ||
| --> | ||
|
|
||
| The CSI driver installs the `SnapshotMetadataService` CRD, creates an instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as per the KEP.
This feature adds functionality to a CSI driver and follows the established pattern of existing CSI driver sidecars that provide for volume snapshots, etc, in how the sidecar is deployed (https://kubernetes-csi.github.io/docs/deploying.html).
However, unlike the other CSI features, access to this new sidecar bypasses the K8s API server for the most part, as it involves a lot of I/O. Secure access to the sidecar requires that the client know the server's certificate and audience string - the CR exists to convey this information, and how this CR gets created is up to the CSI driver.
| --> | ||
|
|
||
| CSI driver specific. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No upgrade or rollback was tested.
CSI sidecar's are deployed as part of a CSI driver installation and hence are vendor specific in nature. The CSI specifies how a sidecar container communicates with the CSI driver container, but does not specify how the containers are deployed.
|
|
||
| CSI driver specific. | ||
| Log records from the `external-snapshot-metadata` sidecar should be visible in | ||
| the CSI driver logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-wrote to be:
Log records from the
external-snapshot-metadatasidecar container should be visible in its logs,
assuming a typical deployment such as that described in
Deploying CSI Driver on Kubernetes.
| The CSI driver installs the `SnapshotMetadataService` CRD, creates an instance | ||
| of this CR, and deploys the `external-snapshot-metadata` sidecar. | ||
| Each CSI driver vendor independently must explicitly handle failure to construct | ||
| these objects during rollout or rollback of their driver fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added:
Any rollback will cause active operations by a backup application to fail.
Such failure is similar to a network failure that the application ought to be
capable of handling.
| - Event Reason: A `GetMetadataAllocated` Normal Event with a response containing the details of the | ||
| metadata blocks of the snapshot from the CSI driver. | ||
| - Event Reason: A `GetMetadataDelta` Normal Event with a response containing the delta of the metadata | ||
| blocks of the 2 snapshots from the CSI driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. Reverted.
1d3cb5e to
53cbccf
Compare
53cbccf to
8d7844d
Compare
| --> | ||
|
|
||
| *Please refer to | ||
| [CSI Terminology](https://github.com/container-storage-interface/spec/blob/master/spec.md#terminology) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
permanent links everywhere, please
| [TokenReview](https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-review-v1/) and | ||
| [SubjectAccessReview](https://kubernetes.io/docs/reference/kubernetes-api/authorization-resources/subject-access-review-v1/) | ||
| APIs are controlled by Kubernetes security policy. | ||
| A side effect of the use of these APIs is that it partially mitigates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kubernetes/sig-auth-proposals This KEP suggests that an unauthenticated client can cause N authenticated clients to fan out requests to the KAS until KAS starts rate-limiting each of these authenticated clients. N is the number of CSI driver instances installed in the cluster.
Is this ok?
|
|
||
| The proposal does not specify how such a security policy is to be configured. | ||
|
|
||
| An example of such a policy can be seen in [deploy/example](https://github.com/kubernetes-csi/external-snapshot-metadata/tree/main/deploy/example) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline the specific RBAC rule here
| published in the source code repository of the | ||
| [external-snapshot-metadata sidecar](#the-external-snapshot-metadata-sidecar). | ||
| [external-snapshot-metadata sidecar](#the-external-snapshot-metadata-sidecar) | ||
| (see [proto/schema.proto](https://github.com/kubernetes-csi/external-snapshot-metadata/blob/main/proto/schema.proto)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
permanent link, please
| No integration tests are required. This feature is better tested with e2e tests. | ||
| [Integration tests](https://github.com/kubernetes-csi/external-snapshot-metadata/blob/main/.github/workflows/integration-test.yaml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Permanent link, please.
Why is this considered an integration test?
| will rollout across nodes. | ||
| --> | ||
|
|
||
| The CSI driver installs the `SnapshotMetadataService` CRD, creates an instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but the cluster operator/administrator should be responsible for the APIs that exist in the cluster, correct?
If my assumption is wrong, please explain how conflicts in different CRD versions should be handled by the CSI drivers, e.g.. a field gets added into the CRD but an old version of the CSI driver attempts to reconcile the CRD to its desired version.
| The CSI driver installs the `SnapshotMetadataService` CRD, creates an instance | ||
| of this CR, and deploys the `external-snapshot-metadata` sidecar. | ||
| Each CSI driver vendor independently must explicitly handle failure to construct | ||
| these objects during rollout or rollback of their driver fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think there might be a typo I missed in the first read.
| these objects during rollout or rollback of their driver fails. | |
| these objects during rollout or rollback if their driver fails. |
| Such failure is similar to a network failure that the application ought to be | ||
| capable of handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Such failure is similar to a network failure that the application ought to be | |
| capable of handling. | |
| Such failure is similar to a network failure that the backup client application ought to be | |
| capable of handling gracefully. |
| --> | ||
|
|
||
| CSI driver specific. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| block mode backups should become more efficient in both storage | ||
| space and time utilization. | ||
| The `GetMetadataAllocated` and `GetMetadataDelta` snapshot metadata operations | ||
| should be successful most of time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define "most of the time" with a specific number.
Uh oh!
There was an error while loading. Please reload this page.