The first open-source implementation of a Discord Bot utilizing the Google Gemini Multimodal Live API for native Speech-to-Speech interaction.
Most "voice" bots on Discord today utilize a slow, chained pipeline:
Speech-to-Text (Whisper) ➔ LLM (GPT) ➔ Text-to-Speech (ElevenLabs)
This bot is different. It establishes a direct, bi-directional WebSocket connection with Google's Gemini 2.0 model.
- No Transcriptions: The model "hears" the raw audio bytes (tone, emotion, pace).
- No TTS Engine: The model generates raw audio bytes directly.
- Sub-Second Latency: Responses feel almost instantaneous.
- Barge-In Capable: You can interrupt the bot, and it will stop talking and listen (Echo Cancellation).
Connecting Discord's UDP audio stream to Gemini's WebSocket required solving several complex synchronization issues. This repo implements three critical fixes:
Gemini's WebSocket will close the connection with a 1011 error if the client stops sending data. However, when the bot is speaking, we must cut the microphone stream to prevent the bot from hearing itself (Echo).
- Solution: When the bot speaks, we inject Digital Silence (
b'\x00') into the upload stream. This "mutes" the mic but keeps the WebSocket heartbeat alive.
Discord sends audio in tiny 20ms chunks. Sending these individually to Google causes network congestion and "choppy" audio.
- Solution: We implement an Accumulation Buffer that collects ~150ms of audio (4800 bytes) before sending a single, stable chunk to the API.
Discord occasionally sends empty or malformed Opus packets, which causes standard decoders to crash.
- Solution: A monkey-patch for
discord.opus.Decoderthat safely returns silence instead of raising an exception.
- Python 3.10+
- FFmpeg (Required for Discord audio processing)
- Linux:
sudo apt install ffmpeg - Windows: Download and add to PATH
- Mac:
brew install ffmpeg
- Linux:
- Google Gemini API Key (Access to
gemini-2.0-flash-expor newer)
-
Clone the repository:
git clone [https://github.com/yourusername/discord-gemini-live.git](https://github.com/yourusername/discord-gemini-live.git) cd discord-gemini-live -
Create a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Create your
.envfile: Copy.env.exampleto.envand fill in your details.cp .env.example .env
| Variable | Description |
|---|---|
DISCORD_TOKEN |
Your Discord Bot Token (Get it from Developer Portal). |
GEMINI_API_KEY |
Your Google AI Studio API Key. |
GEMINI_MODEL_ID |
Default: gemini-2.5-flash-native-audio-preview-12-2025 |
GEMINI_VOICE_NAME |
Voices: Aoede, Puck, Charon, Kore, Fenrir. |
BOT_PERSONALITY |
The System Instruction (Prompt) for the bot. |
Example Personality:
You are Skippy, a grumpy otter wizard who hates technology but loves fish.