You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Updated README to reflect new features and improvements in Chatterbox TTS Server, including Chatterbox-Turbo support, hot-swappable engines, and paralinguistic tags.
Copy file name to clipboardExpand all lines: README.md
+69-3Lines changed: 69 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
# Chatterbox TTS Server: OpenAI-Compatible API with Web UI, Large Text Handling & Built-in Voices
2
2
3
-
**Self-host the powerful[Chatterbox TTS model](https://github.com/resemble-ai/chatterbox)with this enhanced FastAPI server! Features an intuitive Web UI, a flexible API endpoint, voice cloning, large text processing via intelligent chunking, audiobook generation, and consistent, reproducible voices using built-in ready-to-use voices and a generation seed feature.**
3
+
**Self-host Resemble AI's[Chatterbox](https://github.com/resemble-ai/chatterbox)open-source TTS family (Original + Chatterbox‑Turbo) behind an OpenAI‑compatible API and a modern Web UI. Chatterbox‑Turbo is a streamlined 350M-parameter model with dramatically improved throughput and native paralinguistic tags like `[laugh]`, `[cough]`, and `[chuckle]` for more expressive voice agents and narration. Features voice cloning, large text processing via intelligent chunking, audiobook generation, and consistent, reproducible voices using built-in ready-to-use voices and a generation seed feature.**
4
4
5
5
> 🚀 **Try it now!** Test the full TTS server with voice cloning and audiobook generation in Google Colab - no installation required!
6
6
>
7
7
> [](https://colab.research.google.com/github/devnen/Chatterbox-TTS-Server/blob/main/Chatterbox_TTS_Colab_Demo.ipynb)
8
8
9
-
This server is based on the architecture and UI of our [Dia-TTS-Server](https://github.com/devnen/Dia-TTS-Server) project but uses the distinct `chatterbox-tts` engine. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS) GPUs, with a fallback to CPU.
9
+
This server is based on the architecture and UI of our [Dia-TTS-Server](https://github.com/devnen/Dia-TTS-Server) project but uses the distinct `chatterbox-tts` engine. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS) GPUs, with a fallback to CPU. Make sure you also check our [Kitten-TTS-Server](https://github.com/devnen/Kitten-TTS-Server) project.
@@ -28,6 +28,45 @@ This server is based on the architecture and UI of our [Dia-TTS-Server](https://
28
28
29
29
---
30
30
31
+
## 🆕 What's New
32
+
33
+
### ⚡ Chatterbox‑Turbo support (new)
34
+
35
+
- Added full support for **Chatterbox‑Turbo**, Resemble AI's latest efficiency-focused Chatterbox model.
36
+
- Turbo is built on a **streamlined 350M‑parameter architecture**, designed to use less compute/VRAM while keeping high-fidelity output.
37
+
- Turbo distills the speech-token-to-mel "audio diffusion decoder" from **10 steps → 1 step**, removing a major inference bottleneck.
38
+
- Resemble positions Turbo for real-time/agent workflows and highlights significantly faster-than-real-time performance on GPU (performance varies by hardware/settings).
39
+
40
+
### 🔁 Hot‑swappable TTS engines (UI)
41
+
42
+
- Added a new **engine selector** dropdown at the top of the Web UI.
43
+
- Instantly hot-swap between **Original Chatterbox** and **Chatterbox‑Turbo**; the backend auto-loads the selected engine.
44
+
- All UI + API requests route through the active engine so you can A/B test quality vs latency without changing client code.
45
+
46
+
### 🎭 Paralinguistic tags (Turbo)
47
+
48
+
- Turbo adds **native paralinguistic tags** you can write directly into your text, e.g. `…calling you back [chuckle]…`.
49
+
- Supported tags include `[laugh]`, `[cough]`, and `[chuckle]`, plus text-based prompting for reactions like sigh, gasp, and cough.
50
+
- Added **new presets** in `ui/presets.yaml` demonstrating paralinguistic prompting for agent-style scripts and expressive reads.
51
+
52
+
### ✅ Original Chatterbox remains first‑class
53
+
54
+
- The original Chatterbox models remain available (including multilingual), with support for **23 languages**, a **0.5B LLaMA backbone**, **emotion exaggeration control**, and training on **0.5M hours** of cleaned data.
55
+
- Chatterbox outputs are **watermarked** (PerTh) for responsible AI usage.
56
+
57
+
### 🖥️ New NVIDIA / CUDA support
58
+
59
+
- Updated to support **NVIDIA CUDA 12.8** and **RTX 5090 / Blackwell** generation GPUs.
60
+
61
+
### 🧰 Automated launcher + easy updates
62
+
63
+
- New **Automated Launcher** (Windows + Linux) that creates/activates a venv, installs the right dependencies, downloads model files, starts the server, and opens the Web UI.
64
+
- Easy maintenance commands:
65
+
-`--upgrade` to update code + dependencies.
66
+
-`--reinstall` for a clean reinstall when environments get messy.
The [Chatterbox TTS model by Resemble AI](https://github.com/resemble-ai/chatterbox) provides capabilities for generating high-quality speech. This project builds upon that foundation by providing a robust [FastAPI](https://fastapi.tiangolo.com/) server that makes Chatterbox significantly easier to use and integrate.
@@ -37,6 +76,9 @@ The [Chatterbox TTS model by Resemble AI](https://github.com/resemble-ai/chatter
37
76
The server expects plain text input for synthesis and we solve the complexity of setting up and running the model by offering:
38
77
39
78
* A **modern Web UI** for easy experimentation, preset loading, reference audio management, and generation parameter tuning.
79
+
***Multi-engine support (Original + Turbo):** Choose the TTS engine directly in the Web UI, then generate via the same UI/API surface.
80
+
***Paralinguistic prompting (Turbo):** Native tags like `[laugh]`, `[cough]`, and `[chuckle]` for natural non-speech reactions inside the same generated voice.
81
+
***Original Chatterbox strengths:** High quality English output plus unique "emotion exaggeration control" and 0.5B LLaMA backbone.
40
82
***Multi-Platform Acceleration:** Full support for **NVIDIA (CUDA)**, **AMD (ROCm)**, and **Apple Silicon (MPS)** GPUs, with an automatic fallback to **CPU**, ensuring you can run on any hardware.
41
83
***Large Text Handling:** Intelligently splits long plain text inputs into manageable chunks based on sentence structure, processes them sequentially, and seamlessly concatenates the audio.
42
84
***📚 Audiobook Generation:** Perfect for creating complete audiobooks - simply paste an entire book's text and the server automatically processes it into a single, seamless audio file with consistent voice quality throughout.
@@ -56,6 +98,13 @@ This server application enhances the underlying `chatterbox-tts` engine with the
56
98
57
99
**🚀 Core Functionality:**
58
100
101
+
***Multi-Engine Support:**
102
+
* Choose between **Original Chatterbox** and **Chatterbox‑Turbo** via a hot-swappable engine selector in the Web UI.
103
+
* Turbo offers significantly faster inference with a streamlined 350M-parameter architecture.
104
+
* Original Chatterbox provides multilingual support (23 languages) and emotion exaggeration control.
105
+
***Paralinguistic Tags (Turbo):**
106
+
* Write native tags like `[laugh]`, `[cough]`, and `[chuckle]` directly in your text when using Chatterbox‑Turbo.
107
+
* New presets demonstrate paralinguistic prompting for agent-style scripts and expressive narration.
59
108
***Large Text Processing (Chunking):**
60
109
* Automatically handles long plain text inputs by intelligently splitting them into smaller chunks based on sentence boundaries.
61
110
* Processes each chunk individually and seamlessly concatenates the resulting audio, overcoming potential generation limits of the TTS engine.
@@ -96,12 +145,16 @@ This server application enhances the underlying `chatterbox-tts` engine with the
96
145
***Core Chatterbox Capabilities (via [Resemble AI Chatterbox](https://github.com/resemble-ai/chatterbox)):**
97
146
* 🗣️ High-quality single-speaker voice synthesis from plain text.
98
147
* 🎤 Perform voice cloning using reference audio prompts.
148
+
* ⚡ **Chatterbox‑Turbo** for significantly faster inference with paralinguistic tag support.
149
+
* 🌍 **Original Chatterbox** with high quality English output and emotion exaggeration control.
99
150
***Enhanced Server & API:**
100
151
* ⚡ Built with the high-performance **[FastAPI](https://fastapi.tiangolo.com/)** framework.
101
152
* ⚙️ **Custom API Endpoint** (`/tts`) as the primary method for programmatic generation, exposing all key parameters.
102
153
* 📄 Interactive API documentation via Swagger UI (`/docs`).
103
154
* 🩺 Health check endpoint (`/api/ui/initial-data` also serves as a comprehensive status check).
104
155
***Advanced Generation Features:**
156
+
* 🔁 **Hot-Swappable Engines:** Switch between Original Chatterbox and Chatterbox‑Turbo directly in the Web UI.
157
+
* 🎭 **Paralinguistic Tags (Turbo):** Native support for `[laugh]`, `[cough]`, `[chuckle]` and other expressive tags.
105
158
* 📚 **Large Text Handling:** Intelligently splits long plain text inputs into chunks based on sentences, generates audio for each, and concatenates the results seamlessly. Configurable via `split_text` and `chunk_size`.
106
159
* 📖 **Audiobook Creation:** Perfect for generating complete audiobooks from full-length texts with consistent voice quality and automatic chapter handling.
107
160
* 🎤 **Predefined Voices:** Select from curated synthetic voices in the `./voices` directory.
@@ -110,6 +163,7 @@ This server application enhances the underlying `chatterbox-tts` engine with the
110
163
* 🔇 **Audio Post-Processing:** Optional automatic steps to trim silence, fix internal pauses, and remove long unvoiced segments/artifacts (configurable via `config.yaml`).
111
164
***Intuitive Web User Interface:**
112
165
* 🖱️ Modern, easy-to-use interface.
166
+
* 🔁 **Engine Selector:** Hot-swap between Original Chatterbox and Chatterbox‑Turbo.
113
167
* 💡 **Presets:** Load example text and settings dynamically from `ui/presets.yaml`.
***Engine Selector:** Use the dropdown at the top to switch between **Original Chatterbox** and **Chatterbox‑Turbo**. The backend auto-loads the selected engine.
779
834
***Text Input:** Enter your plain text script. **For audiobooks:** Simply paste the entire book text - the chunking system will automatically handle long texts and create seamless audio output.
780
835
***Voice Mode:** Choose:
781
836
*`Predefined Voices`: Select a curated voice from the `./voices` directory.
782
837
*`Voice Cloning`: Select an uploaded reference file from `./reference_audio`.
783
-
***Presets:** Load examples from `ui/presets.yaml`.
838
+
***Presets:** Load examples from `ui/presets.yaml`. New presets demonstrate Turbo's paralinguistic tags.
784
839
* **Reference/Predefined Audio Management:** Import new files and refresh lists.
785
840
* **Generation Parameters:** Adjust Temperature, Exaggeration, CFG Weight, Speed Factor, Seed. Save defaults to `config.yaml`.
786
841
* **Chunking Controls:** Toggle "Split text into chunks" and adjust "Chunk Size" for long texts.
787
842
* **Server Configuration:** View/edit parts of `config.yaml` (requires server restart for some changes).
788
843
* **Audio Player:** Play generated audio with waveform visualization.
789
844
845
+
### Using Paralinguistic Tags (Turbo)
846
+
847
+
When the engine selector is set to **Chatterbox‑Turbo**, you can include paralinguistic tags inline:
848
+
849
+
```
850
+
Hi there [chuckle] — thanks for calling back.
851
+
One moment… [cough] sorry about that. Let's get this fixed.
852
+
```
853
+
854
+
Turbo supports native tags like `[laugh]`, `[cough]`, and `[chuckle]` for more realistic, expressive speech. These tags are ignored when using Original Chatterbox.
855
+
790
856
### API Endpoints (`/docs` for interactive details)
791
857
792
858
The primary endpoint for TTS generation is `/tts`, which offers detailed control over the synthesis process.
0 commit comments