🎭 AI Voice Assistant with Emotion Detection & LED Control

A real-time conversational AI assistant powered by Agora, with RAG (Retrieval-Augmented Generation), emotion detection, live transcripts, and ReSpeaker LED visualization.

✨ Features

🎙️ Real-time Voice Conversation - Talk naturally with AI assistant
🤖 RAG-Powered Responses - Custom knowledge base for accurate answers
🎭 Emotion Detection - AI responses include emotional context
💡 LED Visualization - reSpeaker lights up with emotion colors
📝 Live Transcripts - Real-time conversation transcription
🌐 Web Interface - Beautiful, animated UI
🔊 Voice Synthesis - Natural-sounding TTS responses

🔧 Prerequisites

Required Hardware

Computer with reSpeaker (built-in or external)
Internet connection

Required Software

Python 3.7 or higher
pip (Python package manager)
Modern web browser (Chrome, Firefox, Edge)

Required Accounts

Agora Account
AssemblyAI Account (for speech recognition)
Groq Account (for LLM and TTS)

🚀 Setup Instructions

1. Agora Account Setup

Step 1.1: Create Agora Account

Go to Agora Console
Sign up for a free account
Verify your email

Step 1.2: Create a Project

In the Agora Console, click "Create Project"
Enter a project name (e.g., "AI Voice Assistant")
Choose "Secured mode: APP ID + Token"
Click "Create"

Step 1.3: Get Your Credentials

After creating the project, you'll see:

APP ID - Copy this (looks like: 550749b706214846a1a2eef3612a8cd3)
Click "Configure" next to your project
Find "Primary Certificate" - Copy this

Step 1.4: Get Customer Key & Customer Secret

In Agora Console, go to RESTful API
Click "Add a secret" or view existing secrets
Copy:
- Customer Key (looks like: 8a598f4690f740c9a8760a10e28cae9d)
- Customer Secret (looks like: 0706c45e30b74b7fa4b3c71eae2c2924)

📚 Reference: Agora RESTful Authentication Guide

2. Third-Party API Keys

Step 2.1: Get AssemblyAI API Key

Go to AssemblyAI
Sign up for a free account
Go to your Dashboard
Copy your API Key

Step 2.2: Get Groq API Keys

Go to Groq Console
Sign up for a free account
Navigate to API Keys
Create two API keys:
- One for LLM (text generation)
- One for TTS (text-to-speech)
Copy both keys

3. Generate Tokens

Step 3.1: Clone Token Generator

git clone https://github.com/KasunThushara/RTM_RTC_TokenGenerator.git
cd RTM_RTC_TokenGenerator

Step 3.2: Configure Token Generator

Edit the token generator configuration with your Agora credentials:

# In the token generator script
APP_ID = "your_app_id_from_step_1.3"
APP_CERTIFICATE = "your_primary_certificate_from_step_1.3"

Step 3.3: Generate Token for Agent (UID: 1001)

python generate_rtc_rtm_token.py --account 1001

Copy the generated token - This is for the AI Agent

Token: 007eJxTYHhx+deOGjf+P58sJG4e...

Step 3.4: Generate Token for User (UID: 1002)

python generate_rtc_rtm_token.py --account 1002

Copy the generated token - This is for the Web User

⚠️ Important: Keep both tokens safe. You'll need them in the next steps.

4. Configure Project

Step 4.1: Clone This Repository

git clone https://github.com/KasunThushara/Agora_Convo_AI_reSpeaker.git
cd ai-voice-assistant

Step 4.2: Install Python Dependencies

pip install -r requirements.txt

If you don't have a requirements.txt, install manually:

pip install fastapi uvicorn requests openai pydantic

Step 4.3: Configure `config.py`

Create or edit config.py with your credentials:

# config.py
# Central configuration file for Agora AI Voice Chat

# ==========================
# AGORA CREDENTIALS
# ==========================
CUSTOMER_KEY = "your_customer_key_from_step_1.4"
CUSTOMER_SECRET = "your_customer_secret_from_step_1.4"
APP_ID = "your_app_id_from_step_1.3"

# ==========================
# CHANNEL SETTINGS
# ==========================
CHANNEL_NAME = "test"
AGORA_TEMP_TOKEN = "your_agent_token_from_step_3.3_uid_1001"

# Agent and User UIDs
AGENT_RTC_UID = "1001"
USER_RTC_UID = "1002"

# ==========================
# 3RD PARTY SERVICES
# ==========================
ASSEMBLY_AI_KEY = "your_assemblyai_key_from_step_2.1"
GROQ_KEY = "your_groq_llm_key_from_step_2.2"
TTS_GROQ_KEY = "your_groq_tts_key_from_step_2.2"

# ==========================
# AGENT SETTINGS
# ==========================
IDLE_TIMEOUT = 120
MAX_HISTORY = 32

SYSTEM_PROMPT = "You are a helpful chatbot."
GREETING_MESSAGE = "Hello, how can I assist you?"
FAILURE_MESSAGE = "Please hold on a second."

LLM_MODEL = "llama-3.3-70b-versatile"
TTS_MODEL = "playai-tts"
TTS_VOICE = "Arista-PlayAI"
ASR_LANGUAGE = "en-US"

Step 4.4: Configure Web Interface

Edit index_v5.html (or your HTML file) in two places:

Location 1: RTM Login Token (around line 950)

// Find this line:
await rtmClient.login();

// Replace with:
await rtmClient.login({token: 'your_user_token_from_step_3.4_uid_1002'});

Location 2: Configuration Panel Inputs Update the default values in the HTML:

<!-- App ID -->
<input type="text" class="config-input" id="appId" value="your_app_id">

<!-- Token -->
<input type="text" class="config-input" id="token" value="your_user_token_uid_1002">

5. Setup RAG Server

Step 5.1: Customize Knowledge Base

Edit my_city_info.txt with your own information:

# Example: Replace with your use case
Your Company/Location Information

Ground Floor
- Main entrance and reception
- Coffee shop location
- Facilities

... (customize with your data)

💡 Use Cases:

Shopping mall guide
Office building directory
Museum tour guide
Hotel concierge
Campus navigation

Step 5.2: Test RAG Server Locally

python rag_server.py

You should see:

🚀 Starting RAG Server with Emotion Support
✅ Knowledge base found: X bytes
🌐 Service running on http://localhost:8000

Test it:

curl http://localhost:8000/health

Step 5.3: Setup ngrok (For Cloud Connectivity)

Why ngrok? Agora's servers need to reach your RAG server. ngrok creates a public URL.

Install ngrok:
- Download from ngrok.com
- Or: brew install ngrok (Mac) / choco install ngrok (Windows)

Sign up and authenticate:

ngrok config add-authtoken <your-auth-token>

Start ngrok tunnel:
```
ngrok http 8000
```

Copy the public URL:

Forwarding   https://abcd1234.ngrok-free.app -> http://localhost:8000

Update join_api.py:

RAG_SERVER_URL = "https://your-ngrok-url.ngrok-free.app/rag/chat/completions"
USE_RAG = True

⚠️ Note: Free ngrok URLs change each restart. Use a static domain with paid plans.

6. Setup LED Control (Optional)

Only needed if you have a reSpeaker USB Microphone.

Step 6.1: Install USB Libraries

Windows:

pip install pyusb libusb-package

macOS:

brew install libusb
pip install pyusb

Linux:

sudo apt-get install libusb-1.0-0-dev
pip install pyusb

Step 6.2: Test Device Connection

python test_respeaker.py

Expected output:

✅ ReSpeaker device found!
   Vendor ID: 0x2886
   Product ID: 0x001a

Step 6.3: Linux USB Permissions (if needed)

sudo nano /etc/udev/rules.d/99-respeaker.rules

Add this line:

SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"

Reload rules:

sudo udevadm control --reload-rules
sudo udevadm trigger

Step 6.4: Test LED Service

python emotion_led_service.py

Test all emotions:

python test_led_emotions.py

▶️ Running the Application

Terminal Setup

You'll need 4 terminals (or 3 if skipping LED):

Terminal 1: LED Service (Optional)

python emotion_led_service.py

Wait for:

✅ reSpeaker device found!
✅ Device initialized in DoA mode
🌐 Service running on http://localhost:5000

Terminal 2: RAG Server (with ngrok)

# Terminal 2a: Start RAG Server
python rag_server.py

# Terminal 2b: Start ngrok (separate terminal/tab)
ngrok http 8000

Copy the ngrok URL and update join_api.py.

Terminal 3: Agora AI Agent

python join_api.py

Wait for:

✅ SUCCESS!
Agent ID: A42AA74LL69CF58MN33AE74ME57KJ86K
⚠️  SAVE THIS AGENT ID FOR STOPPING

⚠️ Important: Copy the Agent ID - you'll need it to stop the agent later.

Terminal 4: Open Web Interface

Simply open index_v5.html in your web browser.

Or use a local server:

python -m http.server 8080
# Then visit: http://localhost:8080/index_v5.html

Using the Application

Click "▶ Start Conversation"
Allow microphone access when prompted
Start talking! Try:
- "Hello!"
- "Are there any special offers?"
- "Where is the washroom?"
- "What are some hidden features?"
Watch the magic happen:
- 🎙️ Your speech is transcribed
- 🤖 AI responds with emotion
- 📝 Transcripts appear in left panel
- 🎭 Emoji displays at top
- 💡 reSpeaker LEDs light up (if connected)

Stopping the Application

Stop the conversation: Click "⏹ Stop Conversation" in web UI

Stop the Agora Agent:

# Edit stop_api.py with your Agent ID
AGENT_ID = "your_agent_id_from_terminal_3"

# Then run:
python stop_api.py

Stop other services: Press Ctrl+C in each terminal

📁 Project Structure

ai-voice-assistant/
├── config.py                    # Main configuration file
├── join_api.py                  # Starts Agora AI agent
├── stop_api.py                  # Stops Agora AI agent
├── rag_server.py                # RAG server with emotions
├── emotion_led_service.py       # LED control service
├── my_city_info.txt            # Your knowledge base
├── index.html               # Web interface
├── agora-rtm-2.2.3.min.js      # Agora RTM SDK
├── test_respeaker.py           # Device connection test
├── utils
└── requirements.txt            # Python dependencies
└── index.ts
└── type.ts

🧪 Testing

Test RAG Server

# Health check
curl http://localhost:8000/health

# Test query
curl -X POST http://localhost:8000/rag/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Where is the coffee shop?"}],
    "stream": false
  }'

Test LED Control

# Check device status
curl http://localhost:5000/status

# Test emotion
curl -X POST http://localhost:5000/emotion \
  -H "Content-Type: application/json" \
  -d '{"emotion": "excited", "duration": 1.0}'

# Test color
curl http://localhost:5000/test/yellow

Test End-to-End

Start all services
Open web interface
Start conversation
Say: "Are there any special offers?"
Verify:
- ✅ Transcript appears
- ✅ Emotion emoji shows
- ✅ LED lights up (if connected)

🐛 Troubleshooting

Issue: "Device Not Found"

reSpeaker LED:

# Check device connection
lsusb | grep 2886  # Linux/Mac
# or check Device Manager (Windows)

# Verify with test script
python test_respeaker.py

Issue: "Agent Join Failed"

Check:

Verify all credentials in config.py
Ensure tokens are not expired (regenerate if needed)
Check Agora Console for account status
Verify network connectivity

Debug:

python join_api.py
# Check the error message in output

Issue: "RAG Server Connection Failed"

Check:

Is rag_server.py running? Check Terminal 2
Is ngrok running? Check the public URL
Did you update join_api.py with ngrok URL?

Test:

# Test local
curl http://localhost:8000/health

# Test ngrok
curl https://your-ngrok-url.ngrok-free.app/health

Issue: "No Transcripts Appearing"

Check:

Open browser console (F12)
Look for RTM connection messages
Verify token in index_v5.html (UID 1002)
Check if enable_rtm: True in join_api.py

Issue: "Emotions Not Detected"

Check:

System prompt includes emotion instructions
RAG server has EMOTION_SYSTEM_PROMPT
Look for [emotion] labels in transcripts
Check browser console for emotion detection logs

Issue: "Port Already in Use"

# Find process using port
lsof -i :5000  # LED service
lsof -i :8000  # RAG server

# Kill process
kill -9 <PID>

Issue: "LED Not Responding"

Unplug and replug reSpeaker
Restart LED service
Manual reset:
```
curl -X POST http://localhost:5000/doa
```

🎨 Emotion Color Reference

Emotion	Color	Hex	Use Case
😊 happy	Yellow	`0xFFFF00`	Good news, positive responses
🎉 excited	Magenta	`0xFF00FF`	Sales, special offers, amazing deals
😲 surprised	Orange	`0xFF8800`	Unexpected facts, hidden features
🤔 thinking	Cyan	`0x00FFFF`	Processing, searching information
🙋 helpful	Green	`0x00FF00`	Giving directions, assistance
😐 neutral	Light Blue	`0x8888FF`	Standard information, facts
😔 sad	Blue	`0x0000FF`	Apologies, closures, bad news
👋 welcoming	Pink	`0xFF69B4`	Greetings, warm welcomes

📊 API Reference

RAG Server (`http://localhost:8000`)

`POST /rag/chat/completions`

Generate AI response with RAG

Request:

{
  "model": "llama-3.3-70b-versatile",
  "messages": [
    {"role": "user", "content": "Where is the coffee shop?"}
  ],
  "stream": true
}

`GET /health`

Check server health

Response:

{
  "status": "healthy",
  "knowledge_base_loaded": true,
  "knowledge_base_size": 12345
}

LED Service (`http://localhost:5000`)

`POST /emotion`

Trigger emotion LED animation

Request:

{
  "emotion": "excited",
  "duration": 1.0,
  "text": "Optional transcript"
}

`GET /status`

Check device status

`POST /doa`

Return to Direction of Arrival mode

`GET /test/{color}`

Test specific color (red, green, blue, yellow, etc.)

🔐 Security Notes

⚠️ Important Security Considerations:

Never commit credentials to Git:

# Add to .gitignore
config.py
.env
*.key

Use environment variables:

import os
GROQ_KEY = os.getenv('GROQ_API_KEY')

Rotate tokens regularly: Agora tokens expire after 24 hours by default
Secure ngrok tunnels: Use authentication for production

Keep dependencies updated:

pip install --upgrade -r requirements.txt

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

Agora - Real-time communication platform
Groq - Fast LLM inference
AssemblyAI - Speech recognition
FastAPI - Web framework
reSpeaker - Smart microphone

📞 Support

If you encounter issues:

Check the Troubleshooting section
Review console logs from all services
Verify all credentials are correct
Check that all services are running
Open an issue on GitHub with:
- Error messages
- Steps to reproduce
- System information

🎯 Quick Start Checklist

Ready to go? Start with Setup Instructions! 🚀

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🎭 AI Voice Assistant with Emotion Detection & LED Control

✨ Features

📋 Table of Contents

🔧 Prerequisites

Required Hardware

Required Software

Required Accounts

🚀 Setup Instructions

1. Agora Account Setup

Step 1.1: Create Agora Account

Step 1.2: Create a Project

Step 1.3: Get Your Credentials

Step 1.4: Get Customer Key & Customer Secret

2. Third-Party API Keys

Step 2.1: Get AssemblyAI API Key

Step 2.2: Get Groq API Keys

3. Generate Tokens

Step 3.1: Clone Token Generator

Step 3.2: Configure Token Generator

Step 3.3: Generate Token for Agent (UID: 1001)

Step 3.4: Generate Token for User (UID: 1002)

4. Configure Project

Step 4.1: Clone This Repository

Step 4.2: Install Python Dependencies

Step 4.3: Configure config.py

Step 4.4: Configure Web Interface

5. Setup RAG Server

Step 5.1: Customize Knowledge Base

Step 5.2: Test RAG Server Locally

Step 5.3: Setup ngrok (For Cloud Connectivity)

6. Setup LED Control (Optional)

Step 6.1: Install USB Libraries

Step 6.2: Test Device Connection

Step 6.3: Linux USB Permissions (if needed)

Step 6.4: Test LED Service

▶️ Running the Application

Terminal Setup

Terminal 1: LED Service (Optional)

Terminal 2: RAG Server (with ngrok)

Terminal 3: Agora AI Agent

Terminal 4: Open Web Interface

Using the Application

Stopping the Application

📁 Project Structure

🧪 Testing

Test RAG Server

Test LED Control

Test End-to-End

🐛 Troubleshooting

Issue: "Device Not Found"

Issue: "Agent Join Failed"

Issue: "RAG Server Connection Failed"

Issue: "No Transcripts Appearing"

Issue: "Emotions Not Detected"

Issue: "Port Already in Use"

Issue: "LED Not Responding"

🎨 Emotion Color Reference

📊 API Reference

RAG Server (http://localhost:8000)

POST /rag/chat/completions

GET /health

LED Service (http://localhost:5000)

POST /emotion

GET /status

POST /doa

GET /test/{color}

🔐 Security Notes

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

🎯 Quick Start Checklist

Step 4.3: Configure `config.py`

RAG Server (`http://localhost:8000`)

`POST /rag/chat/completions`

`GET /health`

LED Service (`http://localhost:5000`)

`POST /emotion`

`GET /status`

`POST /doa`

`GET /test/{color}`