A real-time voice interaction system built with LiveKit that combines Speech-to-Text, Large Language Model, and Text-to-Speech capabilities to create an interactive voice agent.
- Speech-to-Text (STT) using OpenAI's Whisper
- Large Language Model (LLM) integration with Groq (Llama3-70B model)
- Text-to-Speech (TTS) using ElevenLabs
- Real-time streaming support via LiveKit
- Comprehensive metrics tracking and logging to Excel
- Multi-language support
- Python 3.8 or higher
- Virtual environment (recommended)
# Clone or download the project files
git clone https://github.com/allwin107/AI-Voice-Agent.git
cd ai-voice-agent
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activateInstall the required dependencies:
pip install -r requirements.txtCreate a .env file with your API keys:
GROQ_API_KEY=your_groq_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
# LiveKit Configuration
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret- Gorq (free LLM) console.groq.com - Fast LLM inference
- ElevenLabs: elevenlabs.io - Text-to-Speech
- Livekit: LiveKit Cloud - Real-time communication
app/pipeline/- Core pipeline componentsstt.py- Speech-to-Text using Whisperllm.py- Language model integration using Groqtts.py- Text-to-Speech using ElevenLabsvoice_agent.py- Main voice agent pipelinelivekit_backend.py- Livekit Integration
app/test/- Testing Scriptstest_stt.py- Tests the transcription functionality of the STT (Speech-to-Text) module.test_llm.py- This script is used to test the LLM response generation functionality.test_tts.py- Test the text-to-speech functionality of the application.test_agent.py- Test script for the voice agent pipelinetest_audio- Test .wav audio file
app/config.py- Configuration settings for the application.env- Environment VariablesREADME.mdrequirements.txt
Test individual components:
python app/pipeline/test_stt.py
python app/pipeline/test_llm.py
python app/pipeline/test_tts.py
python app/pipeline/test_agent.pypython app/pipeline/voice_agent.pyMake sure youโve done this:
- Activated a LiveKit Cloud instance
- Have the following values into .env :
- LIVEKIT_WS_URL=wss://.livekit.cloud
- LIVEKIT_API_KEY=...
- LIVEKIT_API_SECRET=...
Run your livekit_backend.py script from terminal:
python app/pipeline/livekit_backend.pyIf working correctly, logs will say:
Connected to room your-livekit-room as your-participant-name
This means the agent is live and ready to receive audio.
Use the LiveKit Agent Playground: https://agent.livekit.io
This is essential for testing as the "other participant"
Steps:
- Go to the Playground URL
- Input the same Room Name (your-livekit-room)
- Use your LiveKit credentials:
- API Key, API Secret
- Click Join Room
๐๏ธ Now when you speak, your local VoiceAgentBot should:
- Detect your voice
- Transcribe it
- Send it to the LLM
- Reply back via audio in real-time
- Log metrics
The system tracks several key metrics:
- EOU (End of Utterance) Delay
- TTFT (Time to First Token)
- TTFB (Time to First Byte)
- Total Latency
- Smarter language detection
- Improved end-of-utterance (EOU) timing
- Web or mobile interface integration
This project is created for the proPAL AI Backend Engineering Internship assignment.
Built with โค๏ธ for proPAL