ENE-SYSTEM is an open-source "Voice-to-Voice" Discord bot that runs entirely on local hardware (CPU/GPU). It combines advanced Local LLMs, Neural Voice Cloning, and Speech Recognition to create immersive roleplay experiences with persistent memory and reactive sound effects.
- 🧠 Local Intelligence: Powered by Ollama (Hermes 3 / Llama 3.1 / Qwen 2.5) for uncensored, smart, and context-aware roleplay.
- 🗣️ Voice Cloning (XTTS): Uses Coqui XTTS v2 to clone specific character voices (Rick Sanchez, Anime characters, etc.) with high fidelity and emotion.
- 👂 Speech Recognition: Transcribes voice chat in real-time using OpenAI Whisper.
- 💾 Memento Protocol (Long-Term Memory): The bot remembers user details (names, facts, hobbies) across different sessions using a JSON database.
- 🔊 Reactive Soundboard: Automatically injects SFX (burps, glitches, slams) based on the context of the conversation using FFmpeg mixing.
- 🎭 Multi-Personality Engine: Dynamic switching between different character profiles (prompts + voices) on the fly.
The system comes with pre-configured profiles (JSON/TXT):
- Rick (C-137): Nihilistic, cynical, speaks English (auto-generates burps).
- Shiro (NGNL): Logical, gamer, emotionless tone.
- The Commander: Historical parody, paranoid, screams orders.
- The Shadow: "Yandere" entity living in the code (creepy/whispery).
- Anime Girl: Energetic and cheerful assistant.
- Picara: Flirty/Sarcastic Latina personality.
- Language: Python 3.10+
- Discord:
discord.py(with experimental voice recv support). - LLM: Ollama (Server-Client architecture).
- TTS: Coqui XTTS v2 (PyTorch).
- STT: OpenAI Whisper (Tiny/Base).
- Audio Processing: FFmpeg & NumPy.
Before starting, ensure you have the following installed on your system:
- Python 3.10: This version is crucial for XTTS compatibility.
- FFmpeg: Must be installed and added to your system PATH (or placed in the project root folder).
- Ollama: Installed and running locally.
- Hardware: A dedicated GPU is recommended (NVIDIA for CUDA, or AMD via Vulkan for Ollama inference).
Clone the repository to your local machine:
git clone https://github.com/mandarinoazul/discord-ai-bot.git
cd discord-ai-bot
Using a virtual environment is highly recommended to avoid conflicts with system libraries.
Create a virtual environment with Python 3.10:
py -3.10 -m venv venv_tts
Activate the environment:
- Windows:
.\venv_tts\Scripts\activate
- Linux/Mac:
source venv_tts/bin/activate
With the virtual environment active, install the required packages:
pip install -r requirements.txt
If the requirements.txt file is missing or you encounter issues, you can install dependencies manually:
pip install discord.py ollama openai-whisper scipy discord-ext-voice-recv numpy TTS transformers==4.36.2
Open a new terminal window and pull the brain model via Ollama:
ollama pull hermes3
Alternatively, you can use ollama pull qwen2.5.
You need a Bot Token from the Discord Developer Portal.
- Set it as an environment variable named
DISCORD_TOKEN. - Alternatively, edit
ENEScript.pydirectly (not recommended for public repos):
DISCORD_TOKEN = "YOUR_TOKEN_HERE"Edit the BASE_DIR variable in ENEScript.py to match your local folder path:
BASE_DIR = r"C:\Path\To\Your\Project\Folder"Ensure you have your .wav voice samples (mono, 22050Hz) in the root folder. The filenames must match the names defined in the PERFILES dictionary within the script (e.g., voz_rick.wav, voz_shiro.wav).
To start the bot, run:
python ENEScript.py
-
!voice [name]: Switch active personality. -
Example:
!voice rick,!voice shadow,!voice anime. -
!listen: Bot joins your Voice Channel and listens to you for 5 seconds. -
Usage: Say "Hello Rick", wait for processing, and hear the response.
-
!shh: Emergency Silence. Stops the bot from speaking immediately. -
!forget: Wipes the bot's long-term memory about the current user. -
!leave: Disconnects the bot from the voice channel.
You can also chat via text. The bot responds if you:
- Mention it (
@Bot). - Say its name ("Ene", "Rick", "Commander").
- Reply to its messages.
To force Ollama to use your GPU instead of CPU/RAM:
- Open Windows Environment Variables.
- Add a new System Variable:
OLLAMA_VULKANwith value1. - Add another System Variable:
HSA_OVERRIDE_GFX_VERSIONwith value10.3.0. - Restart Ollama completely.
- XTTS runs on CPU by default for stability on Windows AMD systems. It takes approximately 2-3 seconds of processing per second of audio.
- To improve speed, keep the system prompts asking for short responses (max 20 words).
This project is open-source. Feel free to fork, modify, and distribute.
Built with ❤️, Python, and a lot of VRAM.