A Vietnamese Text-to-Speech library that provides high-quality speech synthesis with voice cloning capabilities.
- 🎯 High-quality Vietnamese TTS - Natural-sounding speech synthesis
- 🔊 Multiple voice options - Gender, accent, emotion, and style variations
- 🎭 Voice cloning - Clone voices using reference audio
- 📱 Dual interfaces - Both CLI and Python API
- 🔄 Chunk processing - Handle long texts efficiently
Try VietVoice TTS online with our interactive Gradio interface before installing the library:
The demo allows you to:
- Test different voice options (gender, accent, emotion, style)
- Try voice cloning with your own reference audio
- Experience the quality and capabilities without any setup
- Generate sample audio files to evaluate the results
NOTE: The demo link is temporary and may change or be disabled at any time. You can also try our colab, which is more stable.
Since this package is not yet published on PyPI, you need to install it from source:
# Clone the repository
git clone https://github.com/nguyenvulebinh/VietVoice-TTS.git
cd VietVoice-TTS
# Install with GPU support (recommended if you have CUDA)
pip install -e ".[gpu]"
# OR install with CPU support (for systems without GPU)
pip install -e ".[cpu]"Important: You must choose either [gpu] or [cpu] - the base installation without extras will not include ONNX Runtime and will not work.
# Basic usage
python -m vietvoicetts "Xin chào các bạn! Đây là ví dụ cơ bản về tổng hợp giọng nói tiếng Việt." output.wav
# With voice options
python -m vietvoicetts "Xin chào các bạn! Đây là ví dụ cơ bản về tổng hợp giọng nói tiếng Việt." output.wav --gender female --area northern
# Voice cloning with reference audio
python -m vietvoicetts "Xin chào các bạn! Đây là ví dụ cơ bản về tổng hợp giọng nói tiếng Việt." output.wav --reference-audio examples/sample.m4a --reference-text "Xin chào các anh chị và các bạn. Chào mừng các anh chị đến với podcast Hiếu TV. Trước khi bắt đầu, dành cho anh chị nào mới lần đầu đến podcast này."from vietvoicetts import synthesize
# Simple synthesis
duration = synthesize("Xin chào các bạn! Đây là ví dụ cơ bản về tổng hợp giọng nói tiếng Việt.", "greeting.wav")
print(f"Generated audio: {duration:.2f} seconds")from vietvoicetts import synthesize
# Female voice with northern accent and happy emotion
duration = synthesize(
"Xin chào các bạn! Đây là ví dụ cơ bản về tổng hợp giọng nói tiếng Việt.",
"welcome.wav",
gender="female",
area="northern",
)from vietvoicetts import synthesize
# Clone voice from reference audio
duration = synthesize(
"Đây là giọng nói được nhân bản từ tệp âm thanh tham chiếu",
"cloned_voice.wav",
reference_audio="examples/sample.m4a",
reference_text="Xin chào các anh chị và các bạn. Chào mừng các anh chị đến với podcast Hiếu TV. Trước khi bắt đầu, dành cho anh chị nào mới lần đầu đến podcast này."
)from vietvoicetts import TTSApi, ModelConfig
# Custom model configuration
config = ModelConfig(
speed=1.2,
random_seed=12345
)
api = TTSApi(config)
duration = api.synthesize_to_file("Xin chào các bạn! Đây là ví dụ cơ bản về tổng hợp giọng nói tiếng Việt.", "custom.wav")male- Male voicefemale- Female voice
northern- Northern Vietnamese accentsouthern- Southern Vietnamese accentcentral- Central Vietnamese accent
story- Storytelling stylenews- News reading styleaudiobook- Audiobook narration styleinterview- Interview/conversation stylereview- Review/commentary style
neutral- Neutral emotion (default)serious- Serious tonemonotone- Monotone deliverysad- Sad emotionsurprised- Surprised tonehappy- Happy emotionangry- Angry emotion
text- Text to synthesizeoutput- Output audio file path
--gender- Voice gender (male/female)--group- Voice group/style (story/news/audiobook/interview/review)--area- Voice area/accent (northern/southern/central)--emotion- Voice emotion (neutral/serious/monotone/sad/surprised/happy/angry)
--reference-audio- Path to reference audio file--reference-text- Text corresponding to reference audio
--speed- Speech speed multiplier (default: 1.0)--cross-fade-duration- Cross-fade duration in seconds (default: 0.1)
--random-seed- Random seed for consistent voice generation (default: 9527)
By using VietVoice TTS, you agree to the following terms:
Content Responsibility:
- Users are solely responsible for all generated content and its usage
- Do not use this library to create content that infringes on third-party intellectual property rights
- Do not generate content that violates applicable laws or regulations
Voice Cloning Ethics:
- Only use reference audio that you own or have explicit permission to use
- Respect the rights and consent of individuals whose voices may be cloned
- Clearly indicate when content has been generated using AI voice synthesis
Liability:
- The authors and contributors are not liable for any damages or legal issues arising from the use of this software
- Users assume full responsibility for their use of the generated content
Attribution:
- When sharing AI-generated content, clearly indicate that it was created using VietVoice TTS
- Provide appropriate attribution to this project when redistributing or building upon this work
- Python 3.7+
- ONNX Runtime
- pydub
- soundfile
- numpy
This project is licensed under the MIT License.
For issues and questions, please visit the GitHub repository.