This repository demonstrates the integration between Agent Voice Response (AVR) and ElevenLabs Text-to-Speech (TTS) API, allowing for real-time speech synthesis in an audio format suitable for telephony applications. The project is built with Node.js and leverages ElevenLabs for high-quality voice generation.
- Real-time Text-to-Speech (TTS): Convert text to natural-sounding speech using ElevenLabs API.
- Streaming Audio: The audio response is streamed back to the client in real-time using Node.js' stream capabilities, allowing for low-latency voice responses.
Before you begin, ensure you have the following:
- Node.js and npm installed.
- An ElevenLabs API key and a voice ID.
-
Clone the repository:
git clone https://github.com/agentvoiceresponse/avr-tts-elevenlabs.git cd agent-voice-response-elevenlabs -
Install dependencies:
npm install
-
Create a
.envfile in the root directory and add your ElevenLabs API key and voice ID:ELEVENLABS_API_KEY=your_elevenlabs_api_key ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id PORT=6007
To start the application:
npm startThe application will listen on the port specified in the .env file (default is 6007).
This endpoint accepts a JSON payload containing the text to be converted into speech. The audio is streamed back in WAV format.
-
Request Body:
{ "text": "Hello, how can I assist you today?" } -
Response: The server streams the audio as
audio/wavwith the following characteristics:- Mono channel
- 8kHz sample rate
- 16-bit linear PCM
curl -X POST http://localhost:6003/text-to-speech-stream \
-H "Content-Type: application/json" \
-d '{"text":"Hello, this is a real-time voice response!"}' \
--output response.wav- The application receives a text string through an HTTP POST request.
- It sends this text to ElevenLabs' API to synthesize the voice.
- The audio response is streamed back to the client.
- ElevenLabs API Call: The text is sent to the ElevenLabs API to generate speech using the provided
voice ID. The request includes parameters like voice settings (stability, similarity boost, etc.). - Real-time Streaming: The audio is streamed back to the client in real-time.
The application includes basic error handling:
- Missing
textin the request body results in a400 Bad Requestresponse. - Issues with the ElevenLabs API result in a
500 Internal Server Errorresponse.
- GitHub: https://github.com/agentvoiceresponse - Report issues, contribute code.
- Discord: https://discord.gg/DFTU69Hg74 - Join the community discussion.
- Docker Hub: https://hub.docker.com/u/agentvoiceresponse - Find Docker images.
- NPM: https://www.npmjs.com/~agentvoiceresponse - Browse our packages.
- Wiki: https://wiki.agentvoiceresponse.com/en/home - Project documentation and guides.
AVR is free and open-source. Any support is entirely voluntary and intended as a personal gesture of appreciation. Donations do not provide access to features, services, or special benefits, and the project remains fully available regardless of donations.
MIT License - see the LICENSE file for details.