Skip to content

Seeed-Projects/reachy-mini-loacl-conversation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Voice Assistant System

A real-time voice interaction application built with Python, integrating FunASR speech recognition, Ollama LLM, and advanced TTS capabilities with emotional expression support.

πŸš€ Features

  • Real-time Voice Recognition: Powered by FunASR for accurate Chinese speech recognition
  • Intelligent Conversation: Local LLM processing using Ollama with Qwen2.5 model
  • Emotional TTS: Advanced text-to-speech with Coqui TTS and emotional expression
  • Robotic Integration: Support for Reachy Mini robot with head movement and antenna control
  • Error Recovery: Comprehensive error handling and automatic recovery mechanisms
  • Keyboard Control: Simple R/S key controls for recording start/stop
  • Audio Quality Optimization: Built-in noise reduction and audio enhancement
  • Modular Architecture: Clean, extensible design with well-defined interfaces

πŸ—οΈ Project Structure

voice_assistant/
β”œβ”€β”€ voice_assistant/           # Main application package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ app.py                 # Main application controller
β”‚   β”œβ”€β”€ core/                  # Core modules
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ models.py          # Data models
β”‚   β”‚   β”œβ”€β”€ interfaces.py      # Interface definitions
β”‚   β”‚   β”œβ”€β”€ state_manager.py   # System state management
β”‚   β”‚   └── error_recovery.py  # Error recovery system
β”‚   β”œβ”€β”€ audio/                 # Audio processing modules
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ recorder.py        # Audio recording
β”‚   β”‚   └── recording_controller.py
β”‚   β”œβ”€β”€ speech/                # Speech processing modules
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ recognizer.py      # Speech recognition
β”‚   β”‚   └── tts_engine.py      # Text-to-speech engine
β”‚   β”œβ”€β”€ llm/                   # LLM processing modules
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ processor.py       # LLM request processing
β”‚   β”‚   └── reachy_mini_emotions.py  # Emotion system
β”‚   β”œβ”€β”€ input/                 # Input handling
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── keyboard_listener.py
β”‚   └── ui/                    # User interface modules
β”‚       └── __init__.py
β”œβ”€β”€ tests/                     # Test directory
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── test_*.py             # Test files
β”œβ”€β”€ CosyVoice/                 # CosyVoice TTS integration
β”œβ”€β”€ TTS/                       # Coqui TTS models and configs
β”œβ”€β”€ logs/                      # Application logs
β”œβ”€β”€ temp/                      # Temporary files
β”œβ”€β”€ temp_audio/                # Temporary audio files
β”œβ”€β”€ main.py                    # Main program entry point
β”œβ”€β”€ config.py                  # Configuration file
β”œβ”€β”€ requirements.txt           # Dependencies
β”œβ”€β”€ pytest.ini                # Pytest configuration
β”œβ”€β”€ conftest.py                # Test configuration
└── README.md                  # Project documentation

πŸ“‹ Prerequisites

  • Python 3.8+ (3.10+ recommended)
  • Ollama server running locally (for LLM processing)
  • Audio hardware (microphone and speakers)
  • Linux/macOS/Windows (tested on Linux)
  • NVIDIA Jetson (for edge deployment) - reComputer Mini (Jetson AGX Orin series)

πŸ› οΈ Installation

Standard Installation

1. Clone the Repository

git clone <repository-url>
cd voice-assistant

2. Install Python Dependencies

pip install -r requirements.txt -i https://pypi.jetson-ai-lab.io/

3. Install and Setup Ollama

# Install Ollama (visit https://ollama.ai for platform-specific instructions)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the required model
ollama pull qwen2.5:7b

4. Setup Coqui TTS (Optional)

# Install Coqui TTS if not already installed
pip install TTS

# The application will automatically download required models on first run

5. Configure Environment (Optional)

Create a .env file or set environment variables:

export OLLAMA_HOST="http://localhost:11434"
export OLLAMA_MODEL="qwen2.5:7b"
export COQUI_MODEL_NAME="tts_models/zh-CN/baker/tacotron2-DDC-GST"
export DEFAULT_VOLUME="1.5"

πŸš€ Usage

Basic Usage

# Start the voice assistant
python main.py

# Start with debug mode
python main.py --debug

# Use custom configuration
python main.py --config custom_config.py

# Check version
python main.py --version

Interactive Controls

  • R Key: Start recording
  • S Key: Stop recording and process
  • Ctrl+C: Exit application

Command Line Options

python main.py [OPTIONS]

Options:
  --debug              Enable debug mode with verbose logging
  --config FILE        Specify custom configuration file
  --log-file FILE      Specify log file path
  --no-audio          Disable audio functionality (test mode)
  --quiet             Reduce console output
  --version           Show version information
  --help              Show help message

πŸ§ͺ Testing

Run All Tests

pytest

Run Specific Tests

# Test core models
pytest tests/test_core_models.py

# Test with coverage
pytest --cov=voice_assistant

# Test with verbose output
pytest -v

Run Property-Based Tests

# Run hypothesis-based property tests
pytest tests/ -k "hypothesis"

πŸ—οΈ Core Components

Data Models

  • AudioData: Audio data representation with validation
  • RecognitionResult: Speech recognition results with confidence scores
  • LLMResponse: LLM responses with emotion data
  • StatusUpdate: System status updates with timestamps
  • SystemState: System state enumeration (IDLE, RECORDING, PROCESSING, etc.)

Interfaces

  • AudioRecorderInterface: Audio recording abstraction
  • SpeechRecognizerInterface: Speech recognition abstraction
  • LLMProcessorInterface: LLM processing abstraction
  • TTSEngineInterface: Text-to-speech abstraction
  • StateManagerInterface: State management abstraction
  • KeyboardListenerInterface: Keyboard input abstraction
  • RecordingControllerInterface: Recording control abstraction
  • VoiceAssistantAppInterface: Main application controller abstraction

Key Features

Error Recovery System

  • Automatic error detection and classification
  • Component-specific recovery strategies
  • Graceful degradation and fallback mechanisms
  • Comprehensive error logging and statistics

Emotional TTS

  • 80+ emotional expressions supported
  • LLM-driven emotion selection
  • Robotic integration with head movements
  • Audio quality optimization with noise reduction

State Management

  • Real-time system state tracking
  • Event-driven status updates
  • Thread-safe state transitions
  • Comprehensive logging

βš™οΈ Configuration

Audio Settings

AUDIO_SAMPLE_RATE = 16000      # Audio sample rate
AUDIO_CHANNELS = 1             # Mono audio
MAX_RECORDING_DURATION = 300   # 5 minutes max
MIN_RECORDING_DURATION = 0.5   # 0.5 seconds min

TTS Settings

COQUI_MODEL_NAME = "tts_models/zh-CN/baker/tacotron2-DDC-GST"
DEFAULT_VOLUME = 1.5           # Audio volume multiplier
COQUI_NOISE_SCALE = 0.333      # Noise reduction
COQUI_DENOISER_STRENGTH = 0.005 # Denoiser strength

LLM Settings

OLLAMA_HOST = "http://localhost:11434"
OLLAMA_MODEL = "qwen2.5:7b"
LLM_TIMEOUT = 60               # Response timeout in seconds

πŸ€– Robotic Integration

The system supports integration with robotic platforms, particularly optimized for NVIDIA Jetson-powered robots:

Reachy Mini Robot Support

  • Head Movement: Emotional head gestures during speech synthesis
  • Antenna Control: Visual feedback through antenna positioning
  • Synchronized Actions: Coordinated movement with TTS output
  • Jetson Integration: Optimized for Jetson Xavier NX/AGX platforms

Jetson Robot Deployment

For robotic applications on Jetson:

# Quick deployment for robots
./deploy_jetson.sh

# Start with robotic features
python main.py --config config_jetson.py

# Monitor robot performance
tegrastats

Supported Robot Platforms

  • Reachy Mini (Pollen Robotics) - Full integration
  • Custom Jetson Robots - Configurable GPIO/servo control
  • ROS Integration - Compatible with ROS/ROS2 (planned)

πŸ“Š Monitoring and Logging

Log Files

  • Application logs: logs/voice_assistant.log
  • Error tracking: Comprehensive error statistics
  • Performance metrics: Processing time tracking

System Status

# Get current system status
status = app.get_system_status()
print(f"State: {status['current_state']}")
print(f"Components: {status['components']}")

# Get error statistics
errors = app.get_error_statistics()
print(f"Total errors: {errors['total_errors']}")

πŸ”§ Development

Development Status

  • βœ… Core architecture and interfaces
  • βœ… Audio recording and playback
  • βœ… Speech recognition (FunASR)
  • βœ… LLM integration (Ollama)
  • βœ… TTS with emotional expressions
  • βœ… Error recovery system
  • βœ… Robotic integration (Reachy Mini)
  • βœ… NVIDIA Jetson optimization
  • βœ… Comprehensive testing
  • βœ… Production deployment scripts
  • ⏳ GUI interface (planned)
  • ⏳ Multi-language support (planned)
  • ⏳ ROS/ROS2 integration (planned)
  • ⏳ Docker containerization (planned)

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Code Quality

# Format code
black voice_assistant/

# Lint code
flake8 voice_assistant/

# Type checking
mypy voice_assistant/

πŸ› οΈ Technology Stack

Core Technologies

  • Python 3.10+: Main programming language
  • FunASR: Speech recognition and basic TTS
  • Ollama: Local LLM inference server
  • Coqui TTS: Advanced text-to-speech synthesis
  • PyTorch: Deep learning framework for TTS models

Audio Processing

  • sounddevice: Real-time audio I/O
  • soundfile: Audio file handling
  • pygame: Audio playback
  • librosa: Audio analysis and processing
  • scipy: Signal processing and filtering

System Integration

  • pynput: Keyboard input handling
  • numpy: Numerical computations
  • threading: Concurrent processing
  • pathlib: Modern path handling

Testing and Quality

  • pytest: Testing framework
  • hypothesis: Property-based testing
  • pytest-cov: Code coverage analysis
  • black: Code formatting
  • flake8: Code linting
  • mypy: Static type checking

🀝 Acknowledgments

  • FunASR: For providing excellent speech recognition capabilities
  • Ollama: For local LLM inference infrastructure
  • Coqui TTS: For high-quality text-to-speech synthesis
  • Reachy Mini: For robotic platform integration

πŸ“ž Support

For questions, issues, or contributions:

  1. Check the Issues page
  2. Review the documentation
  3. Submit a detailed bug report or feature request

Note: This is an active development project. Features and APIs may change. Please check the latest documentation and release notes for updates.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages