简体中文 | English
This repository demonstrates how to use Seeed Studio reSpeaker XVF3800 as an edge voice device, build a real-time voice link via Agora, and connect to an AI Agent backend service to complete a full voice conversation loop.
The key content is in
ai_agents/:
- Edge (ESP32):
ai_agents/esp32-client- Backend (AI Agent Server):
ai_agents/server
ai_agents/
esp32-client/ # XIAO ESP32-S3 edge side: audio capture/playback + Agora connection + conversation interaction
server/ # Backend: AI Agent orchestration / LLM / ASR / TTS, etc. (works together with the edge side)
- The XIAO ESP32-S3 connects to the network and joins an Agora room
- The edge side captures microphone audio and publishes it (or uploads data)
ai_agents/serverreceives audio/events and performs ASR → LLM → TTS (or other Agent flows)- The backend sends the response audio/commands back, and the edge side plays it, enabling real-time voice conversation
- Hardware: Seeed Studio reSpeaker XVF3800 (plus mic/speaker or the corresponding expansion board)
- Network: Able to access Agora services
- Software: Install according to the requirements in the subdirectories (see the two links below)
Go to: ai_agents/server
Follow the README/docs in that directory:
- Configure environment variables (Agora / LLM / ASR / TTS, etc.)
- Start the service
After that, you should see the backend service start successfully and wait for edge connections or room events.
Applies to: Windows 10/11 (WSL2 is recommended). The following commands are recommended to run in PowerShell or Windows Terminal.
A. Install / Configure Docker Desktop (one-time)
- Download and install Docker Desktop: https://www.docker.com/products/docker-desktop/
- During installation, select/enable Use WSL 2 instead of Hyper-V (if available).
- After installation, open Docker Desktop and wait until the tray shows Docker is running.
- (Optional but recommended) In Docker Desktop:
Settings -> Resources -> WSL IntegrationEnable your commonly used WSL distribution (e.g., Ubuntu).
B. Clone the repo and prepare environment variables
git clone https://github.com/zhannn668/seeed-xiao-agora-client.git
cd seeed-xiao-agora-client
cd ai_agentsCopy the example environment variables to .env (choose one):
- PowerShell:
Copy-Item .env.example .env- Or CMD:
copy .env.example .envThen open .env in an editor and fill in your keys/config (Agora / LLM / ASR / TTS, etc.):
How to get API Keys:
Agora:
- Visit https://console.agora.io/
- Register a free account
- Create a new project (Project)
- Copy the App ID and App Certificate
Deepgram:
- Visit https://console.deepgram.com/
- Register a free account
- Go to the API Keys page
- Create a new API Key
OpenAI:
- Visit https://platform.openai.com/
- Register and add a payment method
- Go to the API Keys page
- Create a new Secret Key
ElevenLabs:
- Visit https://elevenlabs.io/
- Register a free account
- Go to Profile → API Key
- Copy the API Key
# Agora RTC Configuration (Required)
AGORA_APP_ID=your_agora_app_id_here
AGORA_APP_CERTIFICATE=your_agora_certificate_here
# Deepgram ASR Configuration (Required)
DEEPGRAM_API_KEY=your_deepgram_api_key_here
# OpenAI LLM Configuration (Required)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o
OPENAI_PROXY_URL= # Optional: leave empty if not using proxy
# ElevenLabs TTS Configuration (Required)
ELEVENLABS_TTS_KEY=your_elevenlabs_api_key_here
# Optional: Weather API (for weather tool functionality)
WEATHERAPI_API_KEY=your_weatherapi_api_key_here
Open ai_agents/agents/examples/voice-assistant/tenapp/property.json in an editor, and modify it according to the model you choose. You can refer to: https://docs.agora.io/en/conversational-ai/models/asr/overview
......
"llm": {
"url": "https://api.openai.com/v1/chat/completions",
"api_key": "<your_llm_key>",
"system_messages": [
{
"role": "system",
"content": "You are a helpful chatbot."
}
],
"max_history": 32,
"greeting_message": "Hello, how can I assist you",
"failure_message": "Please hold on a second.",
"params": {
"model": "gpt-4o-mini"
},
}
"tts": {
"vendor": "cartesia",
"params": {
"api_key": "<your_cartesia_key>",
"model_id": "sonic-2",
"voice": {
"mode": "id",
"id": "<voice_id>"
},
"output_format": {
"container": "raw",
"sample_rate": 16000
},
"language": "en"
}
}
......
C. Start the service (Docker Compose)
docker compose up -dCheck container status (optional):
docker compose psD. Enter the container and install the sample (Voice Assistant)
Note: The container name may vary depending on the compose configuration. The example below uses
ten_agent_dev. If yours differs, use the output ofdocker compose ps.
docker exec -it ten_agent_dev bashAfter entering the container, run:
cd agents/examples/voice-assistant
task install
task runE. Verify the backend is running
- You can see the relevant containers are Running in Docker Desktop
- Or view logs (optional):
docker compose logs -fTo stop the service:
docker compose downGo to: ai_agents/esp32-client
Follow the README/docs in that directory:
- Configure Wi-Fi / Agora AppID/Token/Channel (or obtain them from the backend)
- Build and flash to the XIAO ESP32-S3
- Power on and observe the serial logs
- Serial logs show the device successfully joined the channel / connected
- After you speak to the device, the backend receives audio / recognized text
- The device can play back the AI response audio (or execute commands)
- Q: Which one should I read first?
A: Read
ai_agents/serverfirst (get the whole pipeline running), then readai_agents/esp32-client(connect the edge device). - Q: What if I only want to modify the edge side?
A: Go directly to
ai_agents/esp32-client. The backend can be started with the default sample.
- Directories outside
ai_agents/are upstream frameworks/toolchains/sample collections. This demo mainly focuses on the edge + backend linkage.