A real-time voice interaction application built with Python, integrating FunASR speech recognition, Ollama LLM, and advanced TTS capabilities with emotional expression support.
- Real-time Voice Recognition: Powered by FunASR for accurate Chinese speech recognition
- Intelligent Conversation: Local LLM processing using Ollama with Qwen2.5 model
- Emotional TTS: Advanced text-to-speech with Coqui TTS and emotional expression
- Robotic Integration: Support for Reachy Mini robot with head movement and antenna control
- Error Recovery: Comprehensive error handling and automatic recovery mechanisms
- Keyboard Control: Simple R/S key controls for recording start/stop
- Audio Quality Optimization: Built-in noise reduction and audio enhancement
- Modular Architecture: Clean, extensible design with well-defined interfaces
voice_assistant/
βββ voice_assistant/ # Main application package
β βββ __init__.py
β βββ app.py # Main application controller
β βββ core/ # Core modules
β β βββ __init__.py
β β βββ models.py # Data models
β β βββ interfaces.py # Interface definitions
β β βββ state_manager.py # System state management
β β βββ error_recovery.py # Error recovery system
β βββ audio/ # Audio processing modules
β β βββ __init__.py
β β βββ recorder.py # Audio recording
β β βββ recording_controller.py
β βββ speech/ # Speech processing modules
β β βββ __init__.py
β β βββ recognizer.py # Speech recognition
β β βββ tts_engine.py # Text-to-speech engine
β βββ llm/ # LLM processing modules
β β βββ __init__.py
β β βββ processor.py # LLM request processing
β β βββ reachy_mini_emotions.py # Emotion system
β βββ input/ # Input handling
β β βββ __init__.py
β β βββ keyboard_listener.py
β βββ ui/ # User interface modules
β βββ __init__.py
βββ tests/ # Test directory
β βββ __init__.py
β βββ test_*.py # Test files
βββ CosyVoice/ # CosyVoice TTS integration
βββ TTS/ # Coqui TTS models and configs
βββ logs/ # Application logs
βββ temp/ # Temporary files
βββ temp_audio/ # Temporary audio files
βββ main.py # Main program entry point
βββ config.py # Configuration file
βββ requirements.txt # Dependencies
βββ pytest.ini # Pytest configuration
βββ conftest.py # Test configuration
βββ README.md # Project documentation
- Python 3.8+ (3.10+ recommended)
- Ollama server running locally (for LLM processing)
- Audio hardware (microphone and speakers)
- Linux/macOS/Windows (tested on Linux)
- NVIDIA Jetson (for edge deployment) - reComputer Mini (Jetson AGX Orin series)
git clone <repository-url>
cd voice-assistantpip install -r requirements.txt -i https://pypi.jetson-ai-lab.io/# Install Ollama (visit https://ollama.ai for platform-specific instructions)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the required model
ollama pull qwen2.5:7b# Install Coqui TTS if not already installed
pip install TTS
# The application will automatically download required models on first runCreate a .env file or set environment variables:
export OLLAMA_HOST="http://localhost:11434"
export OLLAMA_MODEL="qwen2.5:7b"
export COQUI_MODEL_NAME="tts_models/zh-CN/baker/tacotron2-DDC-GST"
export DEFAULT_VOLUME="1.5"# Start the voice assistant
python main.py
# Start with debug mode
python main.py --debug
# Use custom configuration
python main.py --config custom_config.py
# Check version
python main.py --version- R Key: Start recording
- S Key: Stop recording and process
- Ctrl+C: Exit application
python main.py [OPTIONS]
Options:
--debug Enable debug mode with verbose logging
--config FILE Specify custom configuration file
--log-file FILE Specify log file path
--no-audio Disable audio functionality (test mode)
--quiet Reduce console output
--version Show version information
--help Show help messagepytest# Test core models
pytest tests/test_core_models.py
# Test with coverage
pytest --cov=voice_assistant
# Test with verbose output
pytest -v# Run hypothesis-based property tests
pytest tests/ -k "hypothesis"AudioData: Audio data representation with validationRecognitionResult: Speech recognition results with confidence scoresLLMResponse: LLM responses with emotion dataStatusUpdate: System status updates with timestampsSystemState: System state enumeration (IDLE, RECORDING, PROCESSING, etc.)
AudioRecorderInterface: Audio recording abstractionSpeechRecognizerInterface: Speech recognition abstractionLLMProcessorInterface: LLM processing abstractionTTSEngineInterface: Text-to-speech abstractionStateManagerInterface: State management abstractionKeyboardListenerInterface: Keyboard input abstractionRecordingControllerInterface: Recording control abstractionVoiceAssistantAppInterface: Main application controller abstraction
- Automatic error detection and classification
- Component-specific recovery strategies
- Graceful degradation and fallback mechanisms
- Comprehensive error logging and statistics
- 80+ emotional expressions supported
- LLM-driven emotion selection
- Robotic integration with head movements
- Audio quality optimization with noise reduction
- Real-time system state tracking
- Event-driven status updates
- Thread-safe state transitions
- Comprehensive logging
AUDIO_SAMPLE_RATE = 16000 # Audio sample rate
AUDIO_CHANNELS = 1 # Mono audio
MAX_RECORDING_DURATION = 300 # 5 minutes max
MIN_RECORDING_DURATION = 0.5 # 0.5 seconds minCOQUI_MODEL_NAME = "tts_models/zh-CN/baker/tacotron2-DDC-GST"
DEFAULT_VOLUME = 1.5 # Audio volume multiplier
COQUI_NOISE_SCALE = 0.333 # Noise reduction
COQUI_DENOISER_STRENGTH = 0.005 # Denoiser strengthOLLAMA_HOST = "http://localhost:11434"
OLLAMA_MODEL = "qwen2.5:7b"
LLM_TIMEOUT = 60 # Response timeout in secondsThe system supports integration with robotic platforms, particularly optimized for NVIDIA Jetson-powered robots:
- Head Movement: Emotional head gestures during speech synthesis
- Antenna Control: Visual feedback through antenna positioning
- Synchronized Actions: Coordinated movement with TTS output
- Jetson Integration: Optimized for Jetson Xavier NX/AGX platforms
For robotic applications on Jetson:
# Quick deployment for robots
./deploy_jetson.sh
# Start with robotic features
python main.py --config config_jetson.py
# Monitor robot performance
tegrastats- Reachy Mini (Pollen Robotics) - Full integration
- Custom Jetson Robots - Configurable GPIO/servo control
- ROS Integration - Compatible with ROS/ROS2 (planned)
- Application logs:
logs/voice_assistant.log - Error tracking: Comprehensive error statistics
- Performance metrics: Processing time tracking
# Get current system status
status = app.get_system_status()
print(f"State: {status['current_state']}")
print(f"Components: {status['components']}")
# Get error statistics
errors = app.get_error_statistics()
print(f"Total errors: {errors['total_errors']}")- β Core architecture and interfaces
- β Audio recording and playback
- β Speech recognition (FunASR)
- β LLM integration (Ollama)
- β TTS with emotional expressions
- β Error recovery system
- β Robotic integration (Reachy Mini)
- β NVIDIA Jetson optimization
- β Comprehensive testing
- β Production deployment scripts
- β³ GUI interface (planned)
- β³ Multi-language support (planned)
- β³ ROS/ROS2 integration (planned)
- β³ Docker containerization (planned)
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
# Format code
black voice_assistant/
# Lint code
flake8 voice_assistant/
# Type checking
mypy voice_assistant/- Python 3.10+: Main programming language
- FunASR: Speech recognition and basic TTS
- Ollama: Local LLM inference server
- Coqui TTS: Advanced text-to-speech synthesis
- PyTorch: Deep learning framework for TTS models
- sounddevice: Real-time audio I/O
- soundfile: Audio file handling
- pygame: Audio playback
- librosa: Audio analysis and processing
- scipy: Signal processing and filtering
- pynput: Keyboard input handling
- numpy: Numerical computations
- threading: Concurrent processing
- pathlib: Modern path handling
- pytest: Testing framework
- hypothesis: Property-based testing
- pytest-cov: Code coverage analysis
- black: Code formatting
- flake8: Code linting
- mypy: Static type checking
- FunASR: For providing excellent speech recognition capabilities
- Ollama: For local LLM inference infrastructure
- Coqui TTS: For high-quality text-to-speech synthesis
- Reachy Mini: For robotic platform integration
For questions, issues, or contributions:
- Check the Issues page
- Review the documentation
- Submit a detailed bug report or feature request
Note: This is an active development project. Features and APIs may change. Please check the latest documentation and release notes for updates.