This directory contains two services:
- Gateway (FastAPI, HTTP/1.1 + JSON): OpenAI-compatible endpoints, auth, rate limiting, model downloader.
- Streaming Engine (HTTP/3 over QUIC): Low-latency TTS/STT using local binaries (no heavy Python ML deps).
- Models directory:
data/models(mounted to host) - Audio data directory:
data/audio(mounted to host)
Both directories are created on first run.
API_TOKENS: optional comma-separated tokens for gateway authSTREAM_ENGINE_BASE: gateway → engine base URL, e.g.https://localhost:9443QUIC_INSECURE: set to1to skip TLS verification in devPIPER_BIN: absolute path to Piper binary inside container/hostWHISPER_CPP_BIN: absolute path to whisper.cpp binary inside container/host
Details version can be found at models.md
| Model | TTS | STT |
|---|---|---|
| parler-tts | ✅ | |
| piper | ✅ | |
| whisper-cpp | ✅ |
- Start QUIC engine (HTTP/3):
shabda-quic --host 0.0.0.0 --port 9443 --cert ./quic_cert.pem --key ./quic_key.pem- Start Gateway (HTTP/1.1):
export STREAM_ENGINE_BASE=https://localhost:9443
export QUIC_INSECURE=1
shabda-gatewayDownload a model file (GGUF/BIN/ONNX) directly by URL into data/models/<name>:
curl -X POST http://localhost:8000/v1/models/download \
-H 'Content-Type: application/json' \
-d '{
"name":"whisper-large-v3",
"url":"https://example.com/models/ggml-large-v3.gguf",
"format":"gguf"
}'List models:
curl http://localhost:8000/v1/modelsRequirements:
- Place a Piper
.onnxmodel and its matching.onnx.jsonunderdata/models/<piper-voice>/. - Provide
PIPER_BINpointing to Piper binary.
Request:
curl -X POST http://localhost:8000/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{"text":"Hello world","model":"piper-voice-dir-name"}' \
--output out.wavGenerated audio is also saved under data/audio/tts/.
Requirements:
- Place a whisper
.ggufor.binmodel underdata/models/<whisper-model>/. - Provide
WHISPER_CPP_BINpointing to whisper.cpp main binary.
Request:
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F "file=@/path/to/audio.wav" \
-F "model=whisper-large-v3" \
-F "response_format=json"Uploaded audio and transcripts are saved under data/audio/stt/.
A docker-compose.yml is provided at the repo root to run both services. It mounts ./data/models and ./data/audio from the host to ensure persistence and sharing between containers.
# Start both services
docker compose up -d --build
# View logs
docker compose logs -f gatewayNotes:
- Provide Linux-compatible binaries for Piper and whisper.cpp inside the containers (via volumes) or bake them into images.
- For local development, QUIC TLS verification is disabled by setting
QUIC_INSECURE=1on the gateway.
You can set environment variables via .env at repo root (compose loads it for both services):
# Gateway auth tokens (comma-separated). Leave empty to disable auth.
API_TOKENS=
# Gateway -> QUIC base URL and TLS behavior
STREAM_ENGINE_BASE=https://quic:9443
QUIC_INSECURE=1
# Hugging Face access token (optional) for authenticated/rate-limited downloads
HUGGINGFACE_TOKEN=
# QUIC engine binaries (must exist inside container)
PIPER_BIN=/usr/local/bin/piper
WHISPER_CPP_BIN=/usr/local/bin/whisper-cpp
Compose will also accept overrides in the environment: block.