v0.2.0 - Multimodal: Text, Image, Video & Audio
๐ What's New
vLLM-MLX now supports Text, Image, Video & Audio - all GPU-accelerated on Apple Silicon.
๐๏ธ Audio Support (NEW)
- STT (Speech-to-Text): Whisper, Parakeet
- TTS (Text-to-Speech): Kokoro with native multilingual voices
- Native voices: English, Spanish, French, Chinese, Japanese, Italian, Portuguese, Hindi
- Bug fix included for mlx-audio 0.2.9 multilingual support
๐ฆ Modular Architecture
| Modality | Library | Install |
|---|---|---|
| Text | mlx-lm | pip install vllm-mlx |
| Image | mlx-vlm | pip install vllm-mlx |
| Video | mlx-vlm | pip install vllm-mlx |
| Audio | mlx-audio | pip install vllm-mlx[audio] |
๐ฃ๏ธ Native TTS Voices
| Language | Voices |
|---|---|
| English | af_heart, am_adam, bf_emma, bm_george + 24 more |
| Spanish | ef_dora, em_alex, em_santa |
| French | ff_siwis |
| Chinese | zf_xiaobei, zm_yunjian + 6 more |
| Japanese | jf_alpha, jm_kumo + 3 more |
๐ Examples
Text (LLM Inference)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="default",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)Image Understanding
response = client.chat.completions.create(
model="default",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
)Video Understanding
response = client.chat.completions.create(
model="default",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this video"},
{"type": "video_url", "video_url": {"url": "file://video.mp4"}}
]
}]
)Text-to-Speech (Native Spanish)
python -m mlx_audio.tts.generate \
--model mlx-community/Kokoro-82M-bf16 \
--text "Hola, bienvenido" \
--voice ef_dora --lang_code eSpeech-to-Text
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=open("audio.mp3", "rb")
)๐ง Requirements
- Apple Silicon (M1, M2, M3, M4, M5+)
- Python 3.10+
- macOS
Full Changelog: v0.1.0...v0.2.0