Skip to content

Commit 21ec3c6

Browse files
authored
Add docker compose and llama-cpp-server (#21)
* Add GPU Dockerfile and all-in-one llama-cpp docker compose w/ model downloader (fixed) * Sets `.env` file in docker compose + uses env values for llama-cpp-server
1 parent 0be7a9d commit 21ec3c6

File tree

6 files changed

+165
-10
lines changed

6 files changed

+165
-10
lines changed

.dockerignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
*.env
2+
models/
3+
*.gguf

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.env
2+
__pycache__/
3+
.venv/
4+
venv/
5+
models/
6+
*.gguf

Dockerfile.gpu

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
FROM ubuntu:22.04
2+
3+
# Set non-interactive frontend
4+
ENV DEBIAN_FRONTEND=noninteractive
5+
6+
# Install Python and other dependencies
7+
RUN apt-get update && apt-get install -y \
8+
python3.10 \
9+
python3-pip \
10+
python3-venv \
11+
libsndfile1 \
12+
ffmpeg \
13+
portaudio19-dev \
14+
&& apt-get clean && rm -rf /var/lib/apt/lists/*
15+
16+
# Create non-root user and set up directories
17+
RUN useradd -m -u 1001 appuser && \
18+
mkdir -p /app/outputs /app && \
19+
chown -R appuser:appuser /app
20+
21+
USER appuser
22+
WORKDIR /app
23+
24+
# Copy dependency files
25+
COPY --chown=appuser:appuser requirements.txt ./requirements.txt
26+
27+
# Create and activate virtual environment
28+
RUN python3 -m venv /app/venv
29+
ENV PATH="/app/venv/bin:$PATH"
30+
31+
# Install PyTorch with CUDA support and other dependencies
32+
RUN pip3 install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 && \
33+
pip3 install --no-cache-dir -r requirements.txt
34+
35+
# Copy project files
36+
COPY --chown=appuser:appuser . .
37+
38+
# Set environment variables
39+
ENV PYTHONUNBUFFERED=1 \
40+
PYTHONPATH=/app \
41+
USE_GPU=true
42+
43+
# Expose the port
44+
EXPOSE 5005
45+
46+
# Run FastAPI server with uvicorn
47+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "5005", "--workers", "1"]

README.md

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ Listen to sample outputs with different voices and emotions:
5757
```
5858
Orpheus-FastAPI/
5959
├── app.py # FastAPI server and endpoints
60+
├── docker-compose.yml # Docker compose configuration
61+
├── Dockerfile.gpu # GPU-enabled Docker image
6062
├── requirements.txt # Dependencies
6163
├── static/ # Static assets (favicon, etc.)
6264
├── outputs/ # Generated audio files
@@ -74,9 +76,21 @@ Orpheus-FastAPI/
7476

7577
- Python 3.8-3.11 (Python 3.12 is not supported due to removal of pkgutil.ImpImporter)
7678
- CUDA-compatible GPU (recommended: RTX series for best performance)
77-
- Separate LLM inference server running the Orpheus model (e.g., LM Studio or llama.cpp server)
79+
- Using docker compose or separate LLM inference server running the Orpheus model (e.g., LM Studio or llama.cpp server)
7880

79-
### Installation
81+
### 🐳 Docker compose
82+
83+
The docker compose file orchestrates the Orpheus-FastAPI for audio and a llama.cpp inference server for the base model token generation. The GGUF model is downloaded with the model-init service.
84+
85+
```bash
86+
cp .env.example .env # Nothing needs to be changed, but the file is required
87+
```
88+
89+
```bash
90+
docker compose up --build
91+
```
92+
93+
### FastAPI Service Native Installation
8094

8195
1. Clone the repository:
8296
```bash
@@ -271,7 +285,7 @@ You can easily integrate this TTS solution with [OpenWebUI](https://github.com/o
271285

272286
### External Inference Server
273287

274-
This application requires a separate LLM inference server running the Orpheus model. You can use:
288+
This application requires a separate LLM inference server running the Orpheus model. For easy setup, use Docker Compose, which automatically handles this for you. Alternatively, you can use:
275289

276290
- [GPUStack](https://github.com/gpustack/gpustack) - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
277291
- [LM Studio](https://lmstudio.ai/) - Load the GGUF model and start the local server
@@ -291,16 +305,17 @@ The inference server should be configured to expose an API endpoint that this Fa
291305

292306
### Environment Variables
293307

294-
You can configure the system using environment variables or a `.env` file:
308+
Configure in docker compose, if using docker. Not using docker; create a `.env` file:
295309

296-
- `ORPHEUS_API_URL`: URL of the LLM inference API (tts_engine/inference.py)
310+
- `ORPHEUS_API_URL`: URL of the LLM inference API (default in Docker: http://llama-cpp-server:5006/v1/completions)
297311
- `ORPHEUS_API_TIMEOUT`: Timeout in seconds for API requests (default: 120)
298312
- `ORPHEUS_MAX_TOKENS`: Maximum tokens to generate (default: 8192)
299313
- `ORPHEUS_TEMPERATURE`: Temperature for generation (default: 0.6)
300314
- `ORPHEUS_TOP_P`: Top-p sampling parameter (default: 0.9)
301315
- `ORPHEUS_SAMPLE_RATE`: Audio sample rate in Hz (default: 24000)
302316
- `ORPHEUS_PORT`: Web server port (default: 5005)
303317
- `ORPHEUS_HOST`: Web server host (default: 0.0.0.0)
318+
- `ORPHEUS_MODEL_NAME`: Model name for inference server
304319

305320
The system now supports loading environment variables from a `.env` file in the project root, making it easier to configure without modifying system-wide environment settings. See `.env.example` for a template.
306321

app.py

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,30 @@
1111

1212
# Function to ensure .env file exists
1313
def ensure_env_file_exists():
14-
"""Create a default .env file if one doesn't exist"""
14+
"""Create a .env file from defaults and OS environment variables"""
1515
if not os.path.exists(".env") and os.path.exists(".env.example"):
1616
try:
17-
# Copy .env.example to .env
17+
# 1. Create default env dictionary from .env.example
18+
default_env = {}
1819
with open(".env.example", "r") as example_file:
19-
with open(".env", "w") as env_file:
20-
env_file.write(example_file.read())
21-
print("✅ Created default configuration file at .env")
20+
for line in example_file:
21+
line = line.strip()
22+
if line and not line.startswith("#") and "=" in line:
23+
key = line.split("=")[0].strip()
24+
default_env[key] = line.split("=", 1)[1].strip()
25+
26+
# 2. Override defaults with Docker environment variables if they exist
27+
final_env = default_env.copy()
28+
for key in default_env:
29+
if key in os.environ:
30+
final_env[key] = os.environ[key]
31+
32+
# 3. Write dictionary to .env file in env format
33+
with open(".env", "w") as env_file:
34+
for key, value in final_env.items():
35+
env_file.write(f"{key}={value}\n")
36+
37+
print("✅ Created default .env file from .env.example and environment variables.")
2238
except Exception as e:
2339
print(f"⚠️ Error creating default .env file: {e}")
2440

docker-compose.yml

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
services:
2+
orpheus-fastapi:
3+
container_name: orpheus-fastapi
4+
build:
5+
context: .
6+
dockerfile: Dockerfile.gpu
7+
ports:
8+
- "5005:5005"
9+
env_file:
10+
- .env
11+
environment:
12+
- ORPHEUS_API_URL=http://llama-cpp-server:5006/v1/completions
13+
deploy:
14+
resources:
15+
reservations:
16+
devices:
17+
- driver: nvidia
18+
count: all
19+
capabilities: [gpu]
20+
restart: unless-stopped
21+
depends_on:
22+
llama-cpp-server:
23+
condition: service_started
24+
25+
llama-cpp-server:
26+
image: ghcr.io/ggml-org/llama.cpp:server-cuda
27+
ports:
28+
- "5006:5006"
29+
volumes:
30+
- ./models:/models
31+
env_file:
32+
- .env
33+
depends_on:
34+
model-init:
35+
condition: service_completed_successfully
36+
deploy:
37+
resources:
38+
reservations:
39+
devices:
40+
- driver: nvidia
41+
count: all
42+
capabilities: [gpu]
43+
restart: unless-stopped
44+
command: >
45+
-m /models/${ORPHEUS_MODEL_NAME}
46+
--port 5006
47+
--host 0.0.0.0
48+
--n-gpu-layers 29
49+
--ctx-size ${ORPHEUS_MAX_TOKENS}
50+
--n-predict ${ORPHEUS_MAX_TOKENS}
51+
--rope-scaling linear
52+
53+
model-init:
54+
image: curlimages/curl:latest
55+
user: ${UID}:${GID}
56+
volumes:
57+
- ./models:/app/models
58+
working_dir: /app
59+
command: >
60+
sh -c '
61+
if [ ! -f /app/models/${ORPHEUS_MODEL_NAME} ]; then
62+
echo "Downloading model file..."
63+
wget -P /app/models https://huggingface.co/lex-au/${ORPHEUS_MODEL_NAME}/resolve/main/${ORPHEUS_MODEL_NAME}
64+
else
65+
echo "Model file already exists"
66+
fi'
67+
restart: "no"
68+

0 commit comments

Comments
 (0)