A real-time conversational AI assistant powered by Agora, with RAG (Retrieval-Augmented Generation), emotion detection, live transcripts, and ReSpeaker LED visualization.
- 🎙️ Real-time Voice Conversation - Talk naturally with AI assistant
- 🤖 RAG-Powered Responses - Custom knowledge base for accurate answers
- 🎭 Emotion Detection - AI responses include emotional context
- 💡 LED Visualization - reSpeaker lights up with emotion colors
- 📝 Live Transcripts - Real-time conversation transcription
- 🌐 Web Interface - Beautiful, animated UI
- 🔊 Voice Synthesis - Natural-sounding TTS responses
- Prerequisites
- Setup Instructions
- Running the Application
- Project Structure
- Troubleshooting
- API Reference
- Computer with reSpeaker (built-in or external)
- Internet connection
- Python 3.7 or higher
- pip (Python package manager)
- Modern web browser (Chrome, Firefox, Edge)
- Agora Account
- AssemblyAI Account (for speech recognition)
- Groq Account (for LLM and TTS)
- Go to Agora Console
- Sign up for a free account
- Verify your email
- In the Agora Console, click "Create Project"
- Enter a project name (e.g., "AI Voice Assistant")
- Choose "Secured mode: APP ID + Token"
- Click "Create"
After creating the project, you'll see:
- APP ID - Copy this (looks like:
550749b706214846a1a2eef3612a8cd3) - Click "Configure" next to your project
- Find "Primary Certificate" - Copy this
- In Agora Console, go to RESTful API
- Click "Add a secret" or view existing secrets
- Copy:
- Customer Key (looks like:
8a598f4690f740c9a8760a10e28cae9d) - Customer Secret (looks like:
0706c45e30b74b7fa4b3c71eae2c2924)
- Customer Key (looks like:
📚 Reference: Agora RESTful Authentication Guide
- Go to AssemblyAI
- Sign up for a free account
- Go to your Dashboard
- Copy your API Key
- Go to Groq Console
- Sign up for a free account
- Navigate to API Keys
- Create two API keys:
- One for LLM (text generation)
- One for TTS (text-to-speech)
- Copy both keys
git clone https://github.com/KasunThushara/RTM_RTC_TokenGenerator.git
cd RTM_RTC_TokenGeneratorEdit the token generator configuration with your Agora credentials:
# In the token generator script
APP_ID = "your_app_id_from_step_1.3"
APP_CERTIFICATE = "your_primary_certificate_from_step_1.3"python generate_rtc_rtm_token.py --account 1001Copy the generated token - This is for the AI Agent
Token: 007eJxTYHhx+deOGjf+P58sJG4e...
python generate_rtc_rtm_token.py --account 1002Copy the generated token - This is for the Web User
git clone https://github.com/KasunThushara/Agora_Convo_AI_reSpeaker.git
cd ai-voice-assistantpip install -r requirements.txtIf you don't have a requirements.txt, install manually:
pip install fastapi uvicorn requests openai pydanticCreate or edit config.py with your credentials:
# config.py
# Central configuration file for Agora AI Voice Chat
# ==========================
# AGORA CREDENTIALS
# ==========================
CUSTOMER_KEY = "your_customer_key_from_step_1.4"
CUSTOMER_SECRET = "your_customer_secret_from_step_1.4"
APP_ID = "your_app_id_from_step_1.3"
# ==========================
# CHANNEL SETTINGS
# ==========================
CHANNEL_NAME = "test"
AGORA_TEMP_TOKEN = "your_agent_token_from_step_3.3_uid_1001"
# Agent and User UIDs
AGENT_RTC_UID = "1001"
USER_RTC_UID = "1002"
# ==========================
# 3RD PARTY SERVICES
# ==========================
ASSEMBLY_AI_KEY = "your_assemblyai_key_from_step_2.1"
GROQ_KEY = "your_groq_llm_key_from_step_2.2"
TTS_GROQ_KEY = "your_groq_tts_key_from_step_2.2"
# ==========================
# AGENT SETTINGS
# ==========================
IDLE_TIMEOUT = 120
MAX_HISTORY = 32
SYSTEM_PROMPT = "You are a helpful chatbot."
GREETING_MESSAGE = "Hello, how can I assist you?"
FAILURE_MESSAGE = "Please hold on a second."
LLM_MODEL = "llama-3.3-70b-versatile"
TTS_MODEL = "playai-tts"
TTS_VOICE = "Arista-PlayAI"
ASR_LANGUAGE = "en-US"Edit index_v5.html (or your HTML file) in two places:
Location 1: RTM Login Token (around line 950)
// Find this line:
await rtmClient.login();
// Replace with:
await rtmClient.login({token: 'your_user_token_from_step_3.4_uid_1002'});Location 2: Configuration Panel Inputs Update the default values in the HTML:
<!-- App ID -->
<input type="text" class="config-input" id="appId" value="your_app_id">
<!-- Token -->
<input type="text" class="config-input" id="token" value="your_user_token_uid_1002">Edit my_city_info.txt with your own information:
# Example: Replace with your use case
Your Company/Location Information
Ground Floor
- Main entrance and reception
- Coffee shop location
- Facilities
... (customize with your data)
💡 Use Cases:
- Shopping mall guide
- Office building directory
- Museum tour guide
- Hotel concierge
- Campus navigation
python rag_server.pyYou should see:
🚀 Starting RAG Server with Emotion Support
✅ Knowledge base found: X bytes
🌐 Service running on http://localhost:8000
Test it:
curl http://localhost:8000/healthWhy ngrok? Agora's servers need to reach your RAG server. ngrok creates a public URL.
-
Install ngrok:
- Download from ngrok.com
- Or:
brew install ngrok(Mac) /choco install ngrok(Windows)
-
Sign up and authenticate:
ngrok config add-authtoken <your-auth-token>
-
Start ngrok tunnel:
ngrok http 8000
-
Copy the public URL:
Forwarding https://abcd1234.ngrok-free.app -> http://localhost:8000 -
Update
join_api.py:RAG_SERVER_URL = "https://your-ngrok-url.ngrok-free.app/rag/chat/completions" USE_RAG = True
Only needed if you have a reSpeaker USB Microphone.
Windows:
pip install pyusb libusb-packagemacOS:
brew install libusb
pip install pyusbLinux:
sudo apt-get install libusb-1.0-0-dev
pip install pyusbpython test_respeaker.pyExpected output:
✅ ReSpeaker device found!
Vendor ID: 0x2886
Product ID: 0x001a
sudo nano /etc/udev/rules.d/99-respeaker.rulesAdd this line:
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
Reload rules:
sudo udevadm control --reload-rules
sudo udevadm triggerpython emotion_led_service.pyTest all emotions:
python test_led_emotions.pyYou'll need 4 terminals (or 3 if skipping LED):
python emotion_led_service.pyWait for:
✅ reSpeaker device found!
✅ Device initialized in DoA mode
🌐 Service running on http://localhost:5000
# Terminal 2a: Start RAG Server
python rag_server.py# Terminal 2b: Start ngrok (separate terminal/tab)
ngrok http 8000Copy the ngrok URL and update join_api.py.
python join_api.pyWait for:
✅ SUCCESS!
Agent ID: A42AA74LL69CF58MN33AE74ME57KJ86K
⚠️ SAVE THIS AGENT ID FOR STOPPING
Simply open index_v5.html in your web browser.
Or use a local server:
python -m http.server 8080
# Then visit: http://localhost:8080/index_v5.html-
Click "▶ Start Conversation"
-
Allow microphone access when prompted
-
Start talking! Try:
- "Hello!"
- "Are there any special offers?"
- "Where is the washroom?"
- "What are some hidden features?"
-
Watch the magic happen:
- 🎙️ Your speech is transcribed
- 🤖 AI responds with emotion
- 📝 Transcripts appear in left panel
- 🎭 Emoji displays at top
- 💡 reSpeaker LEDs light up (if connected)
-
Stop the conversation: Click "⏹ Stop Conversation" in web UI
-
Stop the Agora Agent:
# Edit stop_api.py with your Agent ID AGENT_ID = "your_agent_id_from_terminal_3" # Then run: python stop_api.py
-
Stop other services: Press
Ctrl+Cin each terminal
ai-voice-assistant/
├── config.py # Main configuration file
├── join_api.py # Starts Agora AI agent
├── stop_api.py # Stops Agora AI agent
├── rag_server.py # RAG server with emotions
├── emotion_led_service.py # LED control service
├── my_city_info.txt # Your knowledge base
├── index.html # Web interface
├── agora-rtm-2.2.3.min.js # Agora RTM SDK
├── test_respeaker.py # Device connection test
├── utils
└── requirements.txt # Python dependencies
└── index.ts
└── type.ts
# Health check
curl http://localhost:8000/health
# Test query
curl -X POST http://localhost:8000/rag/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [{"role": "user", "content": "Where is the coffee shop?"}],
"stream": false
}'# Check device status
curl http://localhost:5000/status
# Test emotion
curl -X POST http://localhost:5000/emotion \
-H "Content-Type: application/json" \
-d '{"emotion": "excited", "duration": 1.0}'
# Test color
curl http://localhost:5000/test/yellow- Start all services
- Open web interface
- Start conversation
- Say: "Are there any special offers?"
- Verify:
- ✅ Transcript appears
- ✅ Emotion emoji shows
- ✅ LED lights up (if connected)
reSpeaker LED:
# Check device connection
lsusb | grep 2886 # Linux/Mac
# or check Device Manager (Windows)
# Verify with test script
python test_respeaker.pyCheck:
- Verify all credentials in
config.py - Ensure tokens are not expired (regenerate if needed)
- Check Agora Console for account status
- Verify network connectivity
Debug:
python join_api.py
# Check the error message in outputCheck:
- Is
rag_server.pyrunning? Check Terminal 2 - Is ngrok running? Check the public URL
- Did you update
join_api.pywith ngrok URL?
Test:
# Test local
curl http://localhost:8000/health
# Test ngrok
curl https://your-ngrok-url.ngrok-free.app/healthCheck:
- Open browser console (F12)
- Look for RTM connection messages
- Verify token in
index_v5.html(UID 1002) - Check if
enable_rtm: Trueinjoin_api.py
Check:
- System prompt includes emotion instructions
- RAG server has
EMOTION_SYSTEM_PROMPT - Look for
[emotion]labels in transcripts - Check browser console for emotion detection logs
# Find process using port
lsof -i :5000 # LED service
lsof -i :8000 # RAG server
# Kill process
kill -9 <PID>- Unplug and replug reSpeaker
- Restart LED service
- Manual reset:
curl -X POST http://localhost:5000/doa
| Emotion | Color | Hex | Use Case |
|---|---|---|---|
| 😊 happy | Yellow | 0xFFFF00 |
Good news, positive responses |
| 🎉 excited | Magenta | 0xFF00FF |
Sales, special offers, amazing deals |
| 😲 surprised | Orange | 0xFF8800 |
Unexpected facts, hidden features |
| 🤔 thinking | Cyan | 0x00FFFF |
Processing, searching information |
| 🙋 helpful | Green | 0x00FF00 |
Giving directions, assistance |
| 😐 neutral | Light Blue | 0x8888FF |
Standard information, facts |
| 😔 sad | Blue | 0x0000FF |
Apologies, closures, bad news |
| 👋 welcoming | Pink | 0xFF69B4 |
Greetings, warm welcomes |
Generate AI response with RAG
Request:
{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "Where is the coffee shop?"}
],
"stream": true
}Check server health
Response:
{
"status": "healthy",
"knowledge_base_loaded": true,
"knowledge_base_size": 12345
}Trigger emotion LED animation
Request:
{
"emotion": "excited",
"duration": 1.0,
"text": "Optional transcript"
}Check device status
Return to Direction of Arrival mode
Test specific color (red, green, blue, yellow, etc.)
-
Never commit credentials to Git:
# Add to .gitignore config.py .env *.key
-
Use environment variables:
import os GROQ_KEY = os.getenv('GROQ_API_KEY')
-
Rotate tokens regularly: Agora tokens expire after 24 hours by default
-
Secure ngrok tunnels: Use authentication for production
-
Keep dependencies updated:
pip install --upgrade -r requirements.txt
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License.
- Agora - Real-time communication platform
- Groq - Fast LLM inference
- AssemblyAI - Speech recognition
- FastAPI - Web framework
- reSpeaker - Smart microphone
If you encounter issues:
- Check the Troubleshooting section
- Review console logs from all services
- Verify all credentials are correct
- Check that all services are running
- Open an issue on GitHub with:
- Error messages
- Steps to reproduce
- System information
- Agora account created
- APP_ID and certificates obtained
- Customer Key & Secret obtained
- AssemblyAI API key obtained
- Groq API keys obtained (LLM + TTS)
- Tokens generated (UID 1001 & 1002)
-
config.pyconfigured -
index_v5.htmltoken updated -
my_city_info.txtcustomized - Python dependencies installed
- ngrok installed and authenticated
- All services started
- Web interface opened
- Test conversation successful
Ready to go? Start with Setup Instructions! 🚀

