SREP-2644: Managed Policy additions for Karpenter/AutoNode on ROSA HCP#2653
SREP-2644: Managed Policy additions for Karpenter/AutoNode on ROSA HCP#2653MitaliBhalla wants to merge 1 commit intoopenshift:masterfrom
Conversation
- Add Karpenter controller credentials policy with least-privilege permissions - Add ec2:DescribeCapacityReservations for capacity reservation support - Add kms:ViaService condition to KMS grants (following CAPA pattern) - Update control plane operator policy with security group tagging - Update installer policy with SQS queue validation Addresses PR review feedback: - EC2 Describe actions: Cannot be tag-conditioned (AWS API limitation) - KMS grants: Added kms:ViaService condition per existing CAPA pattern - PassRole: Already has iam:PassedToService condition - IAM Get/ListInstanceProfiles: Read-only, require Resource '*' per AWS docs Tested and validated on latest hypershift operator release. Signed-off-by: Mitali Bhalla <mbhalla@redhat.com>
|
@MitaliBhalla: This pull request references SREP-2644 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
1 similar comment
|
@MitaliBhalla: This pull request references SREP-2644 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: MitaliBhalla The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@MitaliBhalla: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
joshbranham
left a comment
There was a problem hiding this comment.
Some comments, sorry if they were addressed elsewhere, just foreseeing AWS questions
| "Action": [ | ||
| "ec2:CreateTags" | ||
| ], | ||
| "Resource": "arn:aws:ec2:*:*:security-group/*", |
There was a problem hiding this comment.
I may have missed it, but why does the CPO need this permission now as part of Karpenter?
There was a problem hiding this comment.
In my understanding,The Control Plane Operator adds karpenter.sh/discovery tags to the cluster's security groups when AutoNode is enabled as a Day-2 operation.
Karpenter uses these tags to discover which security groups to attach to provisioned nodes. Without this permission, the CPO cannot tag the existing Red Hat-managed security groups, and Karpenter won't be able to find the correct security groups for node provisioning.
The permission is scoped to only allow tagging on security groups that already have aws:ResourceTag/red-hat-managed: "true"
There was a problem hiding this comment.
I see now in the code where it is tagging the security group so this makes sense, thanks for clarifying. https://github.com/openshift/hypershift/blob/913aaded6c1cdc2b9683ffe20260fe95fbc6d43d/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L2655-L2659
| "Sid": "AllowListSQSQueues", | ||
| "Effect": "Allow", | ||
| "Action": [ | ||
| "sqs:ListQueues" |
There was a problem hiding this comment.
I presume this is primarily for the installer to validate that the queue provided is actually existent?
There was a problem hiding this comment.
Correct. This allows Cluster Service to validate that the customer-provided SQS queue exists before cluster creation, preventing misconfiguration errors.
There was a problem hiding this comment.
@joshbranham / @MitaliBhalla Bala confirmed that this was not a hard requirement as long as we have clear error messages on the Karpenter APIs if the SQS queue is mis-configured. See the requirements in https://issues.redhat.com/browse/AUTOSCALE-354
We can remove this permission request from the installer role.
| "Sid": "SSMReadActions", | ||
| "Effect": "Allow", | ||
| "Action": [ | ||
| "ssm:GetParameter" |
There was a problem hiding this comment.
This a pretty sensitive call, lots of customer data could be accessed. I see we are wildcarding on /aws/service/*, is there any tag conditioning we can do? I presume karpenter looks up something in this path?
There was a problem hiding this comment.
Great catch on the sensitivity concern, However, the resource is already scoped to only AWS-managed service parameters, not customer parameters:
"Resource": "arn:aws:ssm:*::parameter/aws/service/*"
The double colon (::) in the ARN - this means no account ID, indicating AWS-owned public parameters only.
Karpenter uses this to look up official AMI IDs. For example:
/aws/service/bottlerocket/aws-k8s-*/x86_64/latest/image_id/aws/service/ami-amazon-linux-latest/*
These are read-only, publicly available AWS parameters - no customer data is accessible through this path.
| "sqs:DeleteMessage", | ||
| "sqs:GetQueueUrl", | ||
| "sqs:ReceiveMessage" |
There was a problem hiding this comment.
Can these be conditioned or resource scoped?
There was a problem hiding this comment.
Unfortunately, we can't easily scope these to a specific queue ARN because:
- The queue is customer-provided - we don't know the queue name/ARN at policy creation time
sqs:GetQueueUrlis used to resolve the queue name to a URL, which requiresResource: "*"
However, we could potentially add a tag condition if customers are required to tag their interruption queues:
"Condition": { "StringEquals": { "aws:ResourceTag/rosa-karpenter-interruption-queue": "true" } }
Do we want to require customers to tag their SQS queues for this to work?
This adds a customer setup step but improves security scoping.
Alternatively, the current approach matches how other Karpenter deployments handle this - the queue ARN is validated at runtime by the controller.
There was a problem hiding this comment.
I don't see AWS accepting this policy with unrestricted SQS permissions so we would likely have to require tags on the resources. The way we do this for KMS etc is to require red-hat: true to denote the customer made the resource but is granting us access to it.
What type of PR is this?
feature
What this PR does / why we need it?
Adds AWS managed policy support for Karpenter/AutoNode on ROSA HCP:
Improvements over original PR #2581:
Which Jira/Github issue(s) this PR fixes?
SREP-2644
Special notes for your reviewer:
Pre-checks (if applicable):
Tested latest changes against a cluster
Included documentation changes with PR
If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with: