Voice Assistant System

A real-time voice interaction application built with Python, integrating FunASR speech recognition, Ollama LLM, and advanced TTS capabilities with emotional expression support.

🚀 Features

Real-time Voice Recognition: Powered by FunASR for accurate Chinese speech recognition
Intelligent Conversation: Local LLM processing using Ollama with Qwen2.5 model
Emotional TTS: Advanced text-to-speech with Coqui TTS and emotional expression
Robotic Integration: Support for Reachy Mini robot with head movement and antenna control
Error Recovery: Comprehensive error handling and automatic recovery mechanisms
Keyboard Control: Simple R/S key controls for recording start/stop
Audio Quality Optimization: Built-in noise reduction and audio enhancement
Modular Architecture: Clean, extensible design with well-defined interfaces

🏗️ Project Structure

voice_assistant/
├── voice_assistant/           # Main application package
│   ├── __init__.py
│   ├── app.py                 # Main application controller
│   ├── core/                  # Core modules
│   │   ├── __init__.py
│   │   ├── models.py          # Data models
│   │   ├── interfaces.py      # Interface definitions
│   │   ├── state_manager.py   # System state management
│   │   └── error_recovery.py  # Error recovery system
│   ├── audio/                 # Audio processing modules
│   │   ├── __init__.py
│   │   ├── recorder.py        # Audio recording
│   │   └── recording_controller.py
│   ├── speech/                # Speech processing modules
│   │   ├── __init__.py
│   │   ├── recognizer.py      # Speech recognition
│   │   └── tts_engine.py      # Text-to-speech engine
│   ├── llm/                   # LLM processing modules
│   │   ├── __init__.py
│   │   ├── processor.py       # LLM request processing
│   │   └── reachy_mini_emotions.py  # Emotion system
│   ├── input/                 # Input handling
│   │   ├── __init__.py
│   │   └── keyboard_listener.py
│   └── ui/                    # User interface modules
│       └── __init__.py
├── tests/                     # Test directory
│   ├── __init__.py
│   └── test_*.py             # Test files
├── CosyVoice/                 # CosyVoice TTS integration
├── TTS/                       # Coqui TTS models and configs
├── logs/                      # Application logs
├── temp/                      # Temporary files
├── temp_audio/                # Temporary audio files
├── main.py                    # Main program entry point
├── config.py                  # Configuration file
├── requirements.txt           # Dependencies
├── pytest.ini                # Pytest configuration
├── conftest.py                # Test configuration
└── README.md                  # Project documentation

📋 Prerequisites

Python 3.8+ (3.10+ recommended)
Ollama server running locally (for LLM processing)
Audio hardware (microphone and speakers)
Linux/macOS/Windows (tested on Linux)
NVIDIA Jetson (for edge deployment) - reComputer Mini (Jetson AGX Orin series)

🛠️ Installation

Standard Installation

1. Clone the Repository

git clone <repository-url>
cd voice-assistant

2. Install Python Dependencies

pip install -r requirements.txt -i https://pypi.jetson-ai-lab.io/

3. Install and Setup Ollama

# Install Ollama (visit https://ollama.ai for platform-specific instructions)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the required model
ollama pull qwen2.5:7b

4. Setup Coqui TTS (Optional)

# Install Coqui TTS if not already installed
pip install TTS

# The application will automatically download required models on first run

5. Configure Environment (Optional)

Create a .env file or set environment variables:

export OLLAMA_HOST="http://localhost:11434"
export OLLAMA_MODEL="qwen2.5:7b"
export COQUI_MODEL_NAME="tts_models/zh-CN/baker/tacotron2-DDC-GST"
export DEFAULT_VOLUME="1.5"

🚀 Usage

Basic Usage

# Start the voice assistant
python main.py

# Start with debug mode
python main.py --debug

# Use custom configuration
python main.py --config custom_config.py

# Check version
python main.py --version

Interactive Controls

R Key: Start recording
S Key: Stop recording and process
Ctrl+C: Exit application

Command Line Options

python main.py [OPTIONS]

Options:
  --debug              Enable debug mode with verbose logging
  --config FILE        Specify custom configuration file
  --log-file FILE      Specify log file path
  --no-audio          Disable audio functionality (test mode)
  --quiet             Reduce console output
  --version           Show version information
  --help              Show help message

🧪 Testing

Run All Tests

pytest

Run Specific Tests

# Test core models
pytest tests/test_core_models.py

# Test with coverage
pytest --cov=voice_assistant

# Test with verbose output
pytest -v

Run Property-Based Tests

# Run hypothesis-based property tests
pytest tests/ -k "hypothesis"

🏗️ Core Components

Data Models

AudioData: Audio data representation with validation
RecognitionResult: Speech recognition results with confidence scores
LLMResponse: LLM responses with emotion data
StatusUpdate: System status updates with timestamps
SystemState: System state enumeration (IDLE, RECORDING, PROCESSING, etc.)

Interfaces

AudioRecorderInterface: Audio recording abstraction
SpeechRecognizerInterface: Speech recognition abstraction
LLMProcessorInterface: LLM processing abstraction
TTSEngineInterface: Text-to-speech abstraction
StateManagerInterface: State management abstraction
KeyboardListenerInterface: Keyboard input abstraction
RecordingControllerInterface: Recording control abstraction
VoiceAssistantAppInterface: Main application controller abstraction

Key Features

Error Recovery System

Automatic error detection and classification
Component-specific recovery strategies
Graceful degradation and fallback mechanisms
Comprehensive error logging and statistics

Emotional TTS

80+ emotional expressions supported
LLM-driven emotion selection
Robotic integration with head movements
Audio quality optimization with noise reduction

State Management

Real-time system state tracking
Event-driven status updates
Thread-safe state transitions
Comprehensive logging

⚙️ Configuration

Audio Settings

AUDIO_SAMPLE_RATE = 16000      # Audio sample rate
AUDIO_CHANNELS = 1             # Mono audio
MAX_RECORDING_DURATION = 300   # 5 minutes max
MIN_RECORDING_DURATION = 0.5   # 0.5 seconds min

TTS Settings

COQUI_MODEL_NAME = "tts_models/zh-CN/baker/tacotron2-DDC-GST"
DEFAULT_VOLUME = 1.5           # Audio volume multiplier
COQUI_NOISE_SCALE = 0.333      # Noise reduction
COQUI_DENOISER_STRENGTH = 0.005 # Denoiser strength

LLM Settings

OLLAMA_HOST = "http://localhost:11434"
OLLAMA_MODEL = "qwen2.5:7b"
LLM_TIMEOUT = 60               # Response timeout in seconds

🤖 Robotic Integration

The system supports integration with robotic platforms, particularly optimized for NVIDIA Jetson-powered robots:

Reachy Mini Robot Support

Head Movement: Emotional head gestures during speech synthesis
Antenna Control: Visual feedback through antenna positioning
Synchronized Actions: Coordinated movement with TTS output
Jetson Integration: Optimized for Jetson Xavier NX/AGX platforms

Jetson Robot Deployment

For robotic applications on Jetson:

# Quick deployment for robots
./deploy_jetson.sh

# Start with robotic features
python main.py --config config_jetson.py

# Monitor robot performance
tegrastats

Supported Robot Platforms

Reachy Mini (Pollen Robotics) - Full integration
Custom Jetson Robots - Configurable GPIO/servo control
ROS Integration - Compatible with ROS/ROS2 (planned)

📊 Monitoring and Logging

Log Files

Application logs: logs/voice_assistant.log
Error tracking: Comprehensive error statistics
Performance metrics: Processing time tracking

System Status

# Get current system status
status = app.get_system_status()
print(f"State: {status['current_state']}")
print(f"Components: {status['components']}")

# Get error statistics
errors = app.get_error_statistics()
print(f"Total errors: {errors['total_errors']}")

🔧 Development

Development Status

✅ Core architecture and interfaces
✅ Audio recording and playback
✅ Speech recognition (FunASR)
✅ LLM integration (Ollama)
✅ TTS with emotional expressions
✅ Error recovery system
✅ Robotic integration (Reachy Mini)
✅ NVIDIA Jetson optimization
✅ Comprehensive testing
✅ Production deployment scripts
⏳ GUI interface (planned)
⏳ Multi-language support (planned)
⏳ ROS/ROS2 integration (planned)
⏳ Docker containerization (planned)

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Code Quality

# Format code
black voice_assistant/

# Lint code
flake8 voice_assistant/

# Type checking
mypy voice_assistant/

🛠️ Technology Stack

Core Technologies

Python 3.10+: Main programming language
FunASR: Speech recognition and basic TTS
Ollama: Local LLM inference server
Coqui TTS: Advanced text-to-speech synthesis
PyTorch: Deep learning framework for TTS models

Audio Processing

sounddevice: Real-time audio I/O
soundfile: Audio file handling
pygame: Audio playback
librosa: Audio analysis and processing
scipy: Signal processing and filtering

System Integration

pynput: Keyboard input handling
numpy: Numerical computations
threading: Concurrent processing
pathlib: Modern path handling

Testing and Quality

pytest: Testing framework
hypothesis: Property-based testing
pytest-cov: Code coverage analysis
black: Code formatting
flake8: Code linting
mypy: Static type checking

🤝 Acknowledgments

FunASR: For providing excellent speech recognition capabilities
Ollama: For local LLM inference infrastructure
Coqui TTS: For high-quality text-to-speech synthesis
Reachy Mini: For robotic platform integration

📞 Support

For questions, issues, or contributions:

Check the Issues page
Review the documentation
Submit a detailed bug report or feature request

Note: This is an active development project. Features and APIs may change. Please check the latest documentation and release notes for updates.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.hypothesis		.hypothesis
.kiro/specs/voice-assistant		.kiro/specs/voice-assistant
TTS		TTS
__pycache__		__pycache__
logs		logs
tests		tests
voice_assistant		voice_assistant
.coverage		.coverage
README.md		README.md
config.py		config.py
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Voice Assistant System

🚀 Features

🏗️ Project Structure

📋 Prerequisites

🛠️ Installation

Standard Installation

1. Clone the Repository

2. Install Python Dependencies

3. Install and Setup Ollama

4. Setup Coqui TTS (Optional)

5. Configure Environment (Optional)

🚀 Usage

Basic Usage

Interactive Controls

Command Line Options

🧪 Testing

Run All Tests

Run Specific Tests

Run Property-Based Tests

🏗️ Core Components

Data Models

Interfaces

Key Features

Error Recovery System

Emotional TTS

State Management

⚙️ Configuration

Audio Settings

TTS Settings

LLM Settings

🤖 Robotic Integration

Reachy Mini Robot Support

Jetson Robot Deployment

Supported Robot Platforms

📊 Monitoring and Logging

Log Files

System Status

🔧 Development

Development Status

Contributing

Code Quality

🛠️ Technology Stack

Core Technologies

Audio Processing

System Integration

Testing and Quality

🤝 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages