Skip to content

(apigw) update acl role caching key to include namespaces#5140

Open
sujay-hashicorp wants to merge 3 commits intomainfrom
sujay/fix/apigw-acl-role-binding
Open

(apigw) update acl role caching key to include namespaces#5140
sujay-hashicorp wants to merge 3 commits intomainfrom
sujay/fix/apigw-acl-role-binding

Conversation

@sujay-hashicorp
Copy link
Contributor

@sujay-hashicorp sujay-hashicorp commented Feb 25, 2026

A customer observed intermittent API Gateway outages (Permission denied: lacks service:write) when gateways with the same name were deployed in different Kubernetes namespaces.
The investigation found cross-namespace ACL resource collisions during reconcile/cleanup, where role/policy/binding-rule state for one gateway could affect another and remove required permissions.

Changes proposed in this PR

  • Fix API Gateway ACL cache collisions by keying policy/role/binding-rule caches with gatewayName + namespace instead of only gatewayName.
  • Make managed ACL resource names namespace-scoped to match the cache key (managed-gateway-acl-role-<gateway>-<namespace> and api-gateway-policy-for-<gateway>-<namespace>), and use the same key during cleanup so RemoveRoleBinding only deletes resources for the correct gateway instance.
  • Update TestCache_RemoveRoleBinding to seed/read cache entries using the new namespaced key.

Before Fix

apigw_0

After Fix

apigw_1

How I've tested this PR

  • cd control-plane && go test ./api-gateway/cache -run TestCache_RemoveRoleBinding -count=1
  • cd control-plane && go test ./api-gateway/cache -count=1
  • Manual validation on the BoFA repro (same API Gateway name across multiple Kubernetes namespaces): verified ACL role/policy names are namespace-specific and role-binding cleanup no longer targets the other namespace's gateway resources.

Test outputs for this branch build

initial token list
AccessorID:       65e56a0e-c303-f34c-44c4-b5b7f771d0b2
SecretID:         1916b2fd-df19-c154-47e9-4ff88ea7f3a0
Partition:        default
Namespace:        default
Description:      token created via login: {"component":"connect-injector","pod":"consul/consul-connect-injector-84d949bdf9-294tb"}
Local:            true
Auth Method:      consul-k8s-component-auth-method (Namespace: default)
Create Time:      2026-02-25 07:34:13.773217843 +0000 UTC
Roles:
   3a2c6f17-0bc0-c755-bf62-bacdc890d510 - consul-connect-inject-acl-role

AccessorID:       b94c14ad-56a5-8994-68b6-79b379df6bd6
SecretID:         3305e7b1-bcbd-e6ae-3cf8-203180cf72a4
Partition:        default
Namespace:        default
Description:      token created via login: {"gateway":"consul/api-gateway-test"}
Local:            true
Auth Method:      consul-k8s-component-auth-method (Namespace: default)
Create Time:      2026-02-25 07:35:21.39153718 +0000 UTC
Roles:
   5ace5299-89b2-19cc-dffb-e6e3e99279fa - managed-gateway-acl-role-api-gateway-test-consul

AccessorID:       ebbad0fd-4334-7df1-1a34-c81b5cec18e9
SecretID:         67136fac-52ef-0dae-67c9-4bacfabcc440
Partition:        default
Namespace:        default
Description:      Server Token for 10.244.0.6
Local:            false
Create Time:      2026-02-25 07:34:01.20479317 +0000 UTC
Policies:
   db2cca8a-c178-a47f-1127-6ef6b7e346a6 - agent-token

AccessorID:       8b70483a-98e6-3f71-22b0-f93880f8b2b6
SecretID:         763e863c-13d3-5716-94f0-02c26c6e73df
Partition:        default
Namespace:        default
Description:      Bootstrap Token (Global Management)
Local:            false
Create Time:      2026-02-25 07:34:01.19033042 +0000 UTC
Policies:
   00000000-0000-0000-0000-000000000001 - global-management

AccessorID:       07d3833c-0d0c-9754-9fe8-84b5c922b43d
SecretID:         89acb3f5-6296-0dcc-091b-9c900f79e319
Partition:        default
Namespace:        default
Description:      enterprise-license-token Token
Local:            true
Create Time:      2026-02-25 07:34:01.410934712 +0000 UTC
Policies:
   4ab96508-2e3f-b315-c4aa-34dbd2a19a81 - enterprise-license-token

AccessorID:       00000000-0000-0000-0000-000000000002
SecretID:         anonymous
Partition:        default
Namespace:        default
Description:      
Local:            false
Create Time:      2026-02-25 07:33:52.519416055 +0000 UTC
Policies:
   1921421f-bbac-b7de-e8ea-1ed0cb485aa6 - anonymous-token-policy

AccessorID:       3a681a36-b022-79fa-6ce4-f51bac50b7ff
SecretID:         d54f01ca-3b38-3bb3-dcb1-446a598266fc
Partition:        default
Namespace:        default
Description:      token created via login: {"pod":"default/echo-1-5db9795946-xt4mw","pod-uid":"cdeb0447-bfe5-4c60-80b7-49ea07e6f3a0"}
Local:            true
Auth Method:      consul-k8s-auth-method (Namespace: default)
Create Time:      2026-02-25 07:35:19.30093022 +0000 UTC
Service Identities:
   echo-1 (Datacenters: all)
post test ns gateway apply token list
AccessorID:       1baf8440-1501-6dda-e8a1-47293d7a14e8
SecretID:         305f84fe-b04e-011d-cbcb-c48af92f8dd4
Partition:        default
Namespace:        default
Description:      token created via login: {"component":"connect-injector","pod":"consul/consul-connect-injector-68b94ff4c-fnccv"}
Local:            true
Auth Method:      consul-k8s-component-auth-method (Namespace: default)
Create Time:      2026-02-25 07:45:38.846190382 +0000 UTC
Roles:
   3a2c6f17-0bc0-c755-bf62-bacdc890d510 - consul-connect-inject-acl-role

AccessorID:       ebbad0fd-4334-7df1-1a34-c81b5cec18e9
SecretID:         67136fac-52ef-0dae-67c9-4bacfabcc440
Partition:        default
Namespace:        default
Description:      Server Token for 10.244.0.6
Local:            false
Create Time:      2026-02-25 07:34:01.20479317 +0000 UTC
Policies:
   db2cca8a-c178-a47f-1127-6ef6b7e346a6 - agent-token

AccessorID:       f6000c29-5e94-be0a-f063-f05c2a2ddc11
SecretID:         6de180d2-b12c-7a91-ccc2-cf72fc6b3d83
Partition:        default
Namespace:        default
Description:      token created via login: {"gateway":"test/api-gateway-test"}
Local:            true
Auth Method:      consul-k8s-component-auth-method (Namespace: default)
Create Time:      2026-02-25 07:47:36.322795214 +0000 UTC
Roles:
   343ce51e-7ced-936b-da00-4fdf4980bcfe - managed-gateway-acl-role-api-gateway-test-test

AccessorID:       8b70483a-98e6-3f71-22b0-f93880f8b2b6
SecretID:         763e863c-13d3-5716-94f0-02c26c6e73df
Partition:        default
Namespace:        default
Description:      Bootstrap Token (Global Management)
Local:            false
Create Time:      2026-02-25 07:34:01.19033042 +0000 UTC
Policies:
   00000000-0000-0000-0000-000000000001 - global-management

AccessorID:       07d3833c-0d0c-9754-9fe8-84b5c922b43d
SecretID:         89acb3f5-6296-0dcc-091b-9c900f79e319
Partition:        default
Namespace:        default
Description:      enterprise-license-token Token
Local:            true
Create Time:      2026-02-25 07:34:01.410934712 +0000 UTC
Policies:
   4ab96508-2e3f-b315-c4aa-34dbd2a19a81 - enterprise-license-token

AccessorID:       00000000-0000-0000-0000-000000000002
SecretID:         anonymous
Partition:        default
Namespace:        default
Description:      
Local:            false
Create Time:      2026-02-25 07:33:52.519416055 +0000 UTC
Policies:
   1921421f-bbac-b7de-e8ea-1ed0cb485aa6 - anonymous-token-policy

AccessorID:       3a681a36-b022-79fa-6ce4-f51bac50b7ff
SecretID:         d54f01ca-3b38-3bb3-dcb1-446a598266fc
Partition:        default
Namespace:        default
Description:      token created via login: {"pod":"default/echo-1-5db9795946-xt4mw","pod-uid":"cdeb0447-bfe5-4c60-80b7-49ea07e6f3a0"}
Local:            true
Auth Method:      consul-k8s-auth-method (Namespace: default)
Create Time:      2026-02-25 07:35:19.30093022 +0000 UTC
Service Identities:
   echo-1 (Datacenters: all)

AccessorID:       f4a13a76-8469-2ed9-541c-d79114a8f63b
SecretID:         ef3db705-dc2d-ab93-0736-d98c09aa169d
Partition:        default
Namespace:        default
Description:      token created via login: {"gateway":"consul/api-gateway-test"}
Local:            true
Auth Method:      consul-k8s-component-auth-method (Namespace: default)
Create Time:      2026-02-25 07:46:58.859929127 +0000 UTC
Roles:
   ee6fe4f4-6525-aa95-6070-f9954d65276f - managed-gateway-acl-role-api-gateway-test-consul

How I expect reviewers to test this PR

  • Run unit tests:
    • cd control-plane && go test ./api-gateway/cache -count=1
  • Run the multi-namespace API Gateway repro:
    • Deploy gateway api-gateway-test in consul namespace.
    • Deploy gateway with same name in test namespace.
    • Verify ACL objects are distinct per namespace and that deleting/updating one gateway does not remove role/policy/binding-rule used by the other.
    • Verify gateway token still retains service:write after reconcile events (e.g., route updates / cert rotation).

Checklist

PCI review checklist

  • I have documented a clear reason for, and description of, the change I am making.

  • If applicable, I've documented a plan to revert these changes if they require more than reverting the pull request.
    Revert plan: revert this PR to restore previous keying/naming behavior.

  • If applicable, I've documented the impact of any changes to security controls.
    Impact: no new security controls were added or removed; this change only scopes ACL resource identity to prevent cross-namespace collisions.

@sujay-hashicorp sujay-hashicorp requested review from a team as code owners February 25, 2026 08:14
@sujay-hashicorp sujay-hashicorp added pr/no-changelog PR does not need a corresponding .changelog entry pr/no-backport signals that a PR will not contain a backport label labels Feb 25, 2026
@sujay-hashicorp sujay-hashicorp force-pushed the sujay/fix/apigw-acl-role-binding branch from 7070adf to c86fd0e Compare February 25, 2026 08:31
@sujay-hashicorp sujay-hashicorp removed the pr/no-changelog PR does not need a corresponding .changelog entry label Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr/no-backport signals that a PR will not contain a backport label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant