LiteLLM Kubernetes Deployment

AI model gateway

Quick Start

# 1. Clone repo
git clone https://github.com/hpi/litellm-k8s.git
cd litellm-k8s

# 2. Create secrets (see secrets/README.md)
cp secrets/example-secrets.yaml secrets/secrets.yaml
# Edit secrets/secrets.yaml with your values

# 3. Deploy
./scripts/deploy.sh dev

# 4. Port-forward
kubectl port-forward -n litellm service/litellm-service 4000:4000

# 5. Access UI
open http://localhost:4000/ui/login/

Architecture

Internet -> Ingress -> LiteLLM (Gateway)
                          |
                          v
                    ClusterIP Services
                          |
                          v
                    vLLM Model Pods

Adding Models

See docs/adding-models.md

Infrastructure

Cluster: HPI K8s (40x A30)
Namespace: litellm
GPU Scheduling: Uses GPU requests in model deployments

Maintenance

Logs: kubectl logs -n litellm deployment/litellm-proxy -f
Restart: kubectl rollout restart -n litellm deployment/litellm-proxy
Scale: kubectl scale -n litellm deployment/llama-3b --replicas=2

Handoff / Recent Changes

Added scripts:
- scripts/call_qwen_image_edit.py (image edit via LiteLLM /v1/images/edits)
- scripts/test_octen_embedding.py (embeddings via LiteLLM /v1/embeddings)
Added octen-embedding-8b to LiteLLM model list (default encoding_format: float).
Added models/gpt-oss-120b (deployment/service/pvc) with vLLM config mounted from models/gpt-oss-120b/configmap.yaml using GPT-OSS_EAGLE3_Hopper.yaml. (Note: model is not yet added to LiteLLM proxy config.)

Apply model resources:

kubectl apply -k models

Calling the API (via LiteLLM)

Port-forward in dev or access via your ingress.

kubectl port-forward -n litellm service/litellm-service 4000:4000

Chat/completions (example)

curl -sS -H "Authorization: Bearer $LITELLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3b","messages":[{"role":"user","content":"Hello"}]}' \
  http://localhost:4000/v1/chat/completions

Embeddings (octen-embedding-8b)

curl -sS -H "Authorization: Bearer $LITELLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"octen-embedding-8b","input":"Hello from octen","encoding_format":"float"}' \
  http://localhost:4000/v1/embeddings

Or run:

LITELLM_API_KEY=sk-... python3 scripts/test_octen_embedding.py

Image edits (qwen-image-edit)

LITELLM_API_KEY=sk-... python3 scripts/call_qwen_image_edit.py \
  --api-base http://localhost:4000 \
  --prompt "Remove the sleeves; keep fabric/lighting unchanged"

UI Login

Default credentials:

Username: admin
Password: your LITELLM_MASTER_KEY

Contributors

Felix Boelter (@felixboelter)

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github/workflows		.github/workflows
base-infra		base-infra
base		base
docs		docs
models		models
namespaces		namespaces
overlays		overlays
scripts		scripts
secrets		secrets
.gitignore		.gitignore
FUTURE.md		FUTURE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiteLLM Kubernetes Deployment

Quick Start

Architecture

Adding Models

Infrastructure

Maintenance

Handoff / Recent Changes

Calling the API (via LiteLLM)

Chat/completions (example)

Embeddings (octen-embedding-8b)

Image edits (qwen-image-edit)

UI Login

Contributors

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

aihpi/litellm-k8s

Folders and files

Latest commit

History

Repository files navigation

LiteLLM Kubernetes Deployment

Quick Start

Architecture

Adding Models

Infrastructure

Maintenance

Handoff / Recent Changes

Calling the API (via LiteLLM)

Chat/completions (example)

Embeddings (octen-embedding-8b)

Image edits (qwen-image-edit)

UI Login

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages