Skip to content

Commit ca250f9

Browse files
authored
Merge pull request #34 from AIMOverse/wesley-working-branch
feat: Add LiteLLM Proxy Integration for Multi-Provider LLM Support
2 parents 4a81cbc + 51bb8cb commit ca250f9

File tree

18 files changed

+1483
-43
lines changed

18 files changed

+1483
-43
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -325,4 +325,6 @@ data/prompts/*
325325
.db
326326

327327
# VS Code
328-
.vscode/
328+
.vscode/
329+
330+
.env.litellm

README.md

Lines changed: 94 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,67 @@ The following environment variables are used by AIMO:
8484
| `SECRET_KEY` | Secret Key for JWT Tokens | Yes | During running applications |
8585
| `ADMIN_API_KEY` | Admin Key for manage invitation codes | Yes | During running applications |
8686

87+
## LiteLLM Proxy Service
88+
89+
AIMO integrates with LiteLLM Proxy to provide multi-provider LLM support through a unified interface. This enables routing to different LLM providers (OpenAI, Anthropic, OpenRouter, local models) with automatic fallback capabilities.
90+
91+
### Setup LiteLLM Service
92+
93+
1. **Configure Environment Variables**
94+
95+
Add the following to your `.env` file:
96+
97+
```bash
98+
# LiteLLM Proxy Configuration
99+
LLM_BASE_URL=http://localhost:4000
100+
LLM_API_KEY=sk-litellm-proxy-key
101+
LLM_MODEL_DEFAULT=prod_default
102+
LLM_TIMEOUT=60
103+
104+
# LiteLLM Master Key
105+
LITELLM_MASTER_KEY=sk-litellm-proxy-key
106+
107+
# Provider API Keys (add as needed)
108+
OPENROUTER_API_KEY=your_openrouter_key_here
109+
```
110+
111+
2. **Start LiteLLM Proxy Service**
112+
113+
```bash
114+
# Navigate to the LiteLLM directory
115+
cd infra/litellm
116+
117+
# Start the LiteLLM Proxy using Docker Compose
118+
docker-compose -f docker-compose.litellm.yml up -d
119+
```
120+
121+
3. **Verify LiteLLM Service**
122+
123+
```bash
124+
# Check if LiteLLM Proxy is running
125+
curl http://localhost:4000/health
126+
127+
# Test the new endpoints
128+
curl -X GET http://localhost:8000/api/v1.0.0/chat/health
129+
curl -X GET http://localhost:8000/api/v1.0.0/chat/models
130+
```
131+
132+
4. **Stop LiteLLM Service**
133+
134+
```bash
135+
# Stop the LiteLLM Proxy
136+
cd infra/litellm
137+
docker-compose -f docker-compose.litellm.yml down
138+
```
139+
140+
### LiteLLM Benefits
141+
142+
- **Multi-Provider Support**: Route to different LLM providers through one interface
143+
- **Automatic Fallbacks**: Fallback to alternative models if primary fails
144+
- **Cost Optimization**: Route to cheaper models when appropriate
145+
- **Local Model Integration**: Support for local models via Ollama
146+
- **Centralized Configuration**: Manage all LLM configurations in one place
147+
87148
## Usage
88149

89150
Start the AIMO server using the following command:
@@ -117,9 +178,9 @@ coverage html --title "${@-coverage}"
117178
### Version: `1.0.0`
118179

119180
The AIMO backend provides a RESTful API for interaction. The version 1.0.0 of the server has a base url of /api/v1.0.0.
120-
Below is an example of the main endpoint:
181+
Below are the main endpoints:
121182

122-
### Endpoint: `/api/v1.0.0/chat/`
183+
### Original AIMO Endpoint: `/api/v1.0.0/chat/`
123184

124185
#### Method: `POST`
125186

@@ -135,6 +196,37 @@ Below is an example of the main endpoint:
135196
}
136197
```
137198

199+
### LiteLLM Proxy Endpoints
200+
201+
#### Chat Completion: `/api/v1.0.0/chat/completions_proxy`
202+
203+
**Method:** `POST`
204+
205+
**Request Body:**
206+
```json
207+
{
208+
"model": "prod_default",
209+
"messages": [
210+
{"role": "user", "content": "Hello!"}
211+
],
212+
"temperature": 0.7,
213+
"max_tokens": 100,
214+
"stream": false
215+
}
216+
```
217+
218+
#### Available Models: `/api/v1.0.0/chat/models`
219+
220+
**Method:** `GET`
221+
222+
Returns list of available LLM models configured in LiteLLM.
223+
224+
#### Health Check: `/api/v1.0.0/chat/health`
225+
226+
**Method:** `GET`
227+
228+
Returns LiteLLM Proxy connection status.
229+
138230
#### Response:
139231
```json
140232
{

app/api/routes/chat.py

Lines changed: 117 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
import logging
2-
from typing import Union
3-
from fastapi import APIRouter
2+
from typing import Union, List, Dict, Any
3+
from fastapi import APIRouter, HTTPException
44
from sse_starlette.sse import EventSourceResponse
55

66
from app.ai.aimo import AIMO
7+
from app.clients.llm_client import llm_client
78

89
logger = logging.getLogger(__name__)
910
from app.models.openai import (
@@ -12,6 +13,7 @@
1213
ChatChoice,
1314
Message
1415
)
16+
from app.models.llm import LLMChatRequest, LLMChatResponse, LLMChoice, LLMMessage, LLMUsage
1517

1618
"""
1719
Author: Jack Pan, Wesley Xu
@@ -52,3 +54,116 @@ async def create_chat_completion(request: ChatCompletionRequest) -> Union[ChatCo
5254
max_new_tokens=request.max_tokens
5355
)
5456
)
57+
58+
59+
@router.post("/completions_proxy", response_model=LLMChatResponse)
60+
async def create_chat_completion_proxy(request: LLMChatRequest) -> Union[LLMChatResponse, EventSourceResponse]:
61+
"""
62+
LiteLLM Proxy chat completion endpoint.
63+
64+
This route demonstrates using the new LLM client to interact with LiteLLM Proxy,
65+
which can route to multiple LLM providers (OpenAI, Anthropic, local models, etc.).
66+
"""
67+
try:
68+
# Convert pydantic models to dict format expected by openai client
69+
messages = [{"role": msg.role, "content": msg.content} for msg in request.messages]
70+
71+
if not request.stream:
72+
# Non-streaming response
73+
response = await llm_client.chat(
74+
messages=messages,
75+
model=request.model,
76+
temperature=request.temperature,
77+
max_tokens=request.max_tokens,
78+
tools=request.tools,
79+
tool_choice=request.tool_choice,
80+
presence_penalty=request.presence_penalty,
81+
frequency_penalty=request.frequency_penalty,
82+
top_p=request.top_p,
83+
user=request.user
84+
)
85+
86+
# Convert OpenAI response to our LLM response format
87+
choices = []
88+
for choice in response.choices:
89+
llm_choice = LLMChoice(
90+
index=choice.index,
91+
message=LLMMessage(
92+
role=choice.message.role,
93+
content=choice.message.content or ""
94+
),
95+
finish_reason=choice.finish_reason
96+
)
97+
choices.append(llm_choice)
98+
99+
usage = None
100+
if response.usage:
101+
usage = LLMUsage(
102+
prompt_tokens=response.usage.prompt_tokens,
103+
completion_tokens=response.usage.completion_tokens,
104+
total_tokens=response.usage.total_tokens
105+
)
106+
107+
return LLMChatResponse(
108+
id=response.id,
109+
model=response.model,
110+
choices=choices,
111+
usage=usage
112+
)
113+
else:
114+
# Streaming response
115+
async def stream_generator():
116+
async for chunk in await llm_client.chat(
117+
messages=messages,
118+
model=request.model,
119+
temperature=request.temperature,
120+
max_tokens=request.max_tokens,
121+
stream=True,
122+
tools=request.tools,
123+
tool_choice=request.tool_choice,
124+
presence_penalty=request.presence_penalty,
125+
frequency_penalty=request.frequency_penalty,
126+
top_p=request.top_p,
127+
user=request.user
128+
):
129+
if chunk.choices:
130+
choice = chunk.choices[0]
131+
if choice.delta and choice.delta.get('content'):
132+
yield f"data: {choice.delta['content']}\n\n"
133+
134+
yield "data: [DONE]\n\n"
135+
136+
return EventSourceResponse(stream_generator())
137+
138+
except Exception as e:
139+
logger.error(f"Error in chat_proxy: {e}")
140+
raise HTTPException(status_code=500, detail=f"LLM service error: {str(e)}")
141+
142+
143+
@router.get("/models")
144+
async def list_available_models() -> Dict[str, List[str]]:
145+
"""Get available models from LiteLLM Proxy."""
146+
try:
147+
models = await llm_client.get_available_models()
148+
return {"models": models}
149+
except Exception as e:
150+
logger.error(f"Error getting models: {e}")
151+
raise HTTPException(status_code=500, detail=f"Error fetching models: {str(e)}")
152+
153+
154+
@router.get("/health")
155+
async def health_check() -> Dict[str, Any]:
156+
"""Health check for LiteLLM Proxy connection."""
157+
try:
158+
is_healthy = await llm_client.health_check()
159+
return {
160+
"status": "healthy" if is_healthy else "unhealthy",
161+
"llm_proxy_connected": is_healthy
162+
}
163+
except Exception as e:
164+
logger.error(f"Health check error: {e}")
165+
return {
166+
"status": "unhealthy",
167+
"llm_proxy_connected": False,
168+
"error": str(e)
169+
}
File renamed without changes.

0 commit comments

Comments
 (0)