-
Notifications
You must be signed in to change notification settings - Fork 168
Open
Description
Description
I refer these documents
- Installation: https://aigateway.envoyproxy.io/docs/getting-started/installation
- Usage-based Rate Limiting: https://aigateway.envoyproxy.io/docs/capabilities/traffic/usage-based-ratelimiting/
- Token based ratelimiting example: https://github.com/envoyproxy/ai-gateway/tree/main/examples/token_ratelimit
Install and Deploy
helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm \
--version v0.0.0-latest \
--namespace envoy-ai-gateway-system \
--create-namespace
helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
--version v0.0.0-latest \
--namespace envoy-gateway-system \
--create-namespace \
-f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/manifests/envoy-gateway-values.yaml \
-f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/examples/token_ratelimit/envoy-gateway-values-addon.yaml
helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm \
--version v0.0.0-latest \
--namespace envoy-ai-gateway-system \
--create-namespace
kubectl wait --timeout=2m -n envoy-ai-gateway-system deployment/ai-gateway-controller --for=condition=Available
kubectl apply -f redis.yaml
- I modified token_ratelimit.yaml a little from https://github.com/envoyproxy/ai-gateway/blob/main/examples/token_ratelimit/token_ratelimit.yaml to my endpoint and only use llm_input_token limit
kubectl apply -f token_ratelimit.yaml ## Copyright Envoy AI Gateway Authors # SPDX-License-Identifier: Apache-2.0 # The full text of the Apache license is available in the LICENSE file at # the root of the repo. apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: envoy-ai-gateway-token-ratelimit spec: controllerName: gateway.envoyproxy.io/gatewayclass-controller --- apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: envoy-ai-gateway-token-ratelimit namespace: default spec: gatewayClassName: envoy-ai-gateway-token-ratelimit listeners: - name: http protocol: HTTP port: 80 infrastructure: parametersRef: group: gateway.envoyproxy.io kind: EnvoyProxy name: envoy-ai-gateway-token-ratelimit --- apiVersion: aigateway.envoyproxy.io/v1alpha1 kind: AIGatewayRoute metadata: name: envoy-ai-gateway-token-ratelimit namespace: default spec: parentRefs: - name: envoy-ai-gateway-token-ratelimit kind: Gateway group: gateway.networking.k8s.io rules: - matches: - headers: - type: Exact name: x-ai-eg-model value: Qwen/Qwen3-0.6B backendRefs: - name: envoy-ai-gateway-token-ratelimit-testupstream # The following metadata keys are used to store the costs from the LLM request. llmRequestCosts: - metadataKey: llm_input_token type: InputToken --- apiVersion: aigateway.envoyproxy.io/v1alpha1 kind: AIServiceBackend metadata: name: envoy-ai-gateway-token-ratelimit-testupstream namespace: default spec: schema: name: OpenAI backendRef: name: envoy-ai-gateway-token-ratelimit-testupstream kind: Backend group: gateway.envoyproxy.io --- apiVersion: gateway.envoyproxy.io/v1alpha1 kind: Backend metadata: name: envoy-ai-gateway-token-ratelimit-testupstream namespace: default spec: endpoints: - ip: address: 172.18.246.74 port: 8000 --- apiVersion: gateway.envoyproxy.io/v1alpha1 kind: BackendTrafficPolicy metadata: name: envoy-ai-gateway-token-ratelimit-policy namespace: default spec: # Applies the rate limit policy to the gateway. targetRefs: - name: envoy-ai-gateway-token-ratelimit kind: Gateway group: gateway.networking.k8s.io rateLimit: type: Global global: rules: # This configures the input token limit, and it has a different budget than others, # so it will be rate limited separately. - clientSelectors: - headers: # Have the rate limit budget be per unique "x-user-id" header value. - name: x-user-id type: Distinct limit: # Configures the number of "tokens" allowed per hour, per user. requests: 1000 unit: Hour cost: request: from: Number number: 0 response: from: Metadata metadata: # This is the fixed namespace for the metadata used by AI Gateway. namespace: io.envoy.ai_gateway # Limit on the input token. key: llm_input_token --- apiVersion: gateway.envoyproxy.io/v1alpha1 kind: EnvoyProxy metadata: name: envoy-ai-gateway-token-ratelimit namespace: default spec: provider: type: Kubernetes kubernetes: envoyService: type: NodePort envoyDeployment: container: resources: {}
Issue
- When I do the Inference, and get response successfully, x-ratelimit-remaining cannot decrease
- Inference request:
curl -i --noproxy "*" -X POST 192.168.141.23:30995/v1/chat/completions -H "Content-Type: application/json" -H "x-user-id: user123" -d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [
{
"role": "user",
"content": "Where is the capital in Japan"
}
],
"stream": false,
"max_tokens": 32
}'
- Response
HTTP/1.1 200 OK
date: Mon, 12 Jan 2026 09:09:11 GMT
server: uvicorn
content-length: 734
content-type: application/json
x-ratelimit-limit: 1000, 1000;w=3600
x-ratelimit-remaining: 1000
x-ratelimit-reset: 3045
{
"id": "chatcmpl-f2927ee4-5245-44fd-b596-2a3f8fb5ac64",
"object": "chat.completion",
"created": 1768208951,
"model": "Qwen/Qwen3-0.6B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<tool_call>\nOkay, the user is asking where the capital of Japan is. I know that Japan's capital is Tokyo. Let me confirm that. Yes, Tokyo",
"refusal": null,
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": [],
"reasoning_content": null
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"token_ids": null
}
],
"service_tier": null,
"system_fingerprint": null,
"usage": {
"prompt_tokens": 14,
"total_tokens": 46,
"completion_tokens": 32,
"prompt_tokens_details": null
},
"prompt_logprobs": null,
"prompt_token_ids": null,
"kv_transfer_params": null
}
But when I modify request cost as below, and then do the inference, I can see x-ratelimit-remaining decrease successfully,
I don't know why I can't enable ratelimit from backend LLM token usage.
cost:
request:
from: Number
number: 1 # modify request cost to 1 temporarily
Environment
kubectl get gateway
NAME CLASS ADDRESS PROGRAMMED AGE
envoy-ai-gateway-token-ratelimit envoy-ai-gateway-token-ratelimit 192.168.141.23 True 33m
kubectl get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
envoy-ai-gateway-system pod/ai-gateway-controller-7988ccbc8-rpt9k 1/1 Running 0 26h
envoy-gateway-system pod/envoy-default-envoy-ai-gateway-token-ratelimit-e3ed7007-5dqzhkn 3/3 Running 0 21h
envoy-gateway-system pod/envoy-gateway-5d54cdccd6-lmmfg 1/1 Running 0 26h
envoy-gateway-system pod/envoy-ratelimit-9d9985546-7bvx5 1/1 Running 0 26h
redis-system pod/redis-6bdfddfdf4-6f2r7 1/1 Running 0 23h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7d1h
envoy-ai-gateway-system service/ai-gateway-controller ClusterIP 10.108.249.108 <none> 9443/TCP,1063/TCP,9090/TCP 26h
envoy-gateway-system service/envoy-default-envoy-ai-gateway-token-ratelimit-e3ed7007 NodePort 10.102.137.157 <none> 80:30995/TCP 21h
envoy-gateway-system service/envoy-gateway ClusterIP 10.110.173.247 <none> 18000/TCP,18001/TCP,18002/TCP,19001/TCP,9443/TCP 26h
envoy-gateway-system service/envoy-ratelimit ClusterIP 10.101.212.39 <none> 8081/TCP,19001/TCP 26h
redis-system service/redis ClusterIP 10.100.30.75 <none> 6379/TCP 23h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
envoy-ai-gateway-system deployment.apps/ai-gateway-controller 1/1 1 1 26h
envoy-gateway-system deployment.apps/envoy-default-envoy-ai-gateway-token-ratelimit-e3ed7007 1/1 1 1 21h
envoy-gateway-system deployment.apps/envoy-gateway 1/1 1 1 26h
envoy-gateway-system deployment.apps/envoy-ratelimit 1/1 1 1 26h
redis-system deployment.apps/redis 1/1 1 1 23h
NAMESPACE NAME DESIRED CURRENT READY AGE
envoy-ai-gateway-system replicaset.apps/ai-gateway-controller-7988ccbc8 1 1 1 26h
envoy-gateway-system replicaset.apps/envoy-default-envoy-ai-gateway-token-ratelimit-e3ed7007-5d977bc45f 1 1 1 21h
envoy-gateway-system replicaset.apps/envoy-gateway-5d54cdccd6 1 1 1 26h
envoy-gateway-system replicaset.apps/envoy-ratelimit-9d9985546 1 1 1 26h
redis-system replicaset.apps/redis-6bdfddfdf4 1 1 1 23h
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels