AI model gateway
# 1. Clone repo
git clone https://github.com/hpi/litellm-k8s.git
cd litellm-k8s
# 2. Create secrets (see secrets/README.md)
cp secrets/example-secrets.yaml secrets/secrets.yaml
# Edit secrets/secrets.yaml with your values
# 3. Deploy
./scripts/deploy.sh dev
# 4. Port-forward
kubectl port-forward -n litellm service/litellm-service 4000:4000
# 5. Access UI
open http://localhost:4000/ui/login/Internet -> Ingress -> LiteLLM (Gateway)
|
v
ClusterIP Services
|
v
vLLM Model Pods
See docs/adding-models.md
- Cluster: HPI K8s (40x A30)
- Namespace: litellm
- GPU Scheduling: Uses GPU requests in model deployments
- Logs: kubectl logs -n litellm deployment/litellm-proxy -f
- Restart: kubectl rollout restart -n litellm deployment/litellm-proxy
- Scale: kubectl scale -n litellm deployment/llama-3b --replicas=2
- Added scripts:
scripts/call_qwen_image_edit.py(image edit via LiteLLM/v1/images/edits)scripts/test_octen_embedding.py(embeddings via LiteLLM/v1/embeddings)
- Added
octen-embedding-8bto LiteLLM model list (defaultencoding_format: float). - Added
models/gpt-oss-120b(deployment/service/pvc) with vLLM config mounted frommodels/gpt-oss-120b/configmap.yamlusingGPT-OSS_EAGLE3_Hopper.yaml. (Note: model is not yet added to LiteLLM proxy config.)
Apply model resources:
kubectl apply -k modelsPort-forward in dev or access via your ingress.
kubectl port-forward -n litellm service/litellm-service 4000:4000curl -sS -H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"llama-3b","messages":[{"role":"user","content":"Hello"}]}' \
http://localhost:4000/v1/chat/completionscurl -sS -H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"octen-embedding-8b","input":"Hello from octen","encoding_format":"float"}' \
http://localhost:4000/v1/embeddingsOr run:
LITELLM_API_KEY=sk-... python3 scripts/test_octen_embedding.pyLITELLM_API_KEY=sk-... python3 scripts/call_qwen_image_edit.py \
--api-base http://localhost:4000 \
--prompt "Remove the sleeves; keep fabric/lighting unchanged"Default credentials:
- Username: admin
- Password: your LITELLM_MASTER_KEY
- Felix Boelter (@felixboelter)