Deepgram + Fish Speech Voice Loop

A Python-based voice loop system combining Deepgram STT with Fish Speech TTS for offline-capable conversational AI. Designed as a foundation for edge use cases like wearables and accessibility applications.

Features

Hybrid Online/Offline Mode: Uses Deepgram for high-accuracy streaming STT online, falls back to local Whisper offline
Local TTS: Fish Speech for fully offline text-to-speech with emotion and speed control
LLM Integration: Support for OpenAI, Anthropic, and local LLMs (Ollama)
Voice Activity Detection: Silero VAD for accurate speech detection
Low Latency: Streaming architecture for responsive voice interactions
Interruption Handling: User can interrupt assistant responses

Architecture

Microphone → VAD → STT (Deepgram/Whisper) → LLM (optional) → TTS (Fish Speech) → Speaker

Quick Start

Installation

# Clone the repository
cd deepgram-fish-speech-workflow

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install dependencies
pip install -e ".[dev,llm]"

Environment Setup

# Copy example environment file
cp .env.example .env

# Edit .env with your API keys
# DEEPGRAM_API_KEY=your_key_here
# OPENAI_API_KEY=your_key_here  # Optional, for LLM

Run Examples

# Test audio I/O
python examples/audio_test.py --list-devices
python examples/audio_test.py --tone

# Basic voice loop (echo mode)
python examples/basic_loop.py

# Fully offline mode
python examples/basic_loop.py --offline

# Conversational AI with LLM
python examples/conversational_ai.py --llm openai

Configuration

Configuration can be done via:

Environment variables (see .env.example)
YAML config files (see configs/)
Programmatic configuration

Example: Load YAML Config

from src.core.config import VoiceLoopConfig

config = VoiceLoopConfig.from_yaml("configs/low_latency.yaml")

Configuration Options

Setting	Description	Default
`mode`	online, offline, or hybrid	hybrid
`stt_provider`	deepgram or whisper	deepgram
`tts_provider`	fish_speech	fish_speech
`echo_mode`	Echo input without LLM	false
`streaming_mode`	Stream TTS for lower latency	true
`interruption_enabled`	Allow user to interrupt	true

Project Structure

deepgram-fish-speech-workflow/
├── src/
│   ├── core/           # Types, config, protocols
│   ├── stt/            # Speech-to-text providers
│   ├── tts/            # Text-to-speech providers
│   ├── llm/            # LLM providers
│   ├── audio/          # Audio I/O and VAD
│   └── pipeline/       # Voice loop orchestration
├── examples/           # Usage examples
├── configs/            # Configuration presets
└── tests/              # Test suite

Provider Support

STT (Speech-to-Text)

Deepgram Nova-2: 95%+ accuracy, streaming, diarization (100+ languages)
Whisper (local): Offline fallback via faster-whisper

TTS (Text-to-Speech)

Fish Speech: Local inference, emotion control, duration control

LLM (Language Model)

OpenAI: GPT-4o, GPT-4o-mini
Anthropic: Claude 3.5 Sonnet
Local: Ollama (Llama 3, etc.)

Use Cases

This foundation supports edge applications like:

Real-time Coaching: Continuous listening with context-aware feedback
Accessibility/AAC: Non-verbal cue synthesis, emotion detection
Voice Dubbing: Audio-in to translated audio-out with timing preservation
Wearables: Low-latency voice interfaces for smart devices

Development

# Run tests
pytest

# Type checking
mypy src

# Linting
ruff check src

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
configs		configs
examples		examples
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deepgram + Fish Speech Voice Loop

Features

Architecture

Quick Start

Installation

Environment Setup

Run Examples

Configuration

Example: Load YAML Config

Configuration Options

Project Structure

Provider Support

STT (Speech-to-Text)

TTS (Text-to-Speech)

LLM (Language Model)

Use Cases

Development

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

argotdev/deepgram-fish-speech-workflow

Folders and files

Latest commit

History

Repository files navigation

Deepgram + Fish Speech Voice Loop

Features

Architecture

Quick Start

Installation

Environment Setup

Run Examples

Configuration

Example: Load YAML Config

Configuration Options

Project Structure

Provider Support

STT (Speech-to-Text)

TTS (Text-to-Speech)

LLM (Language Model)

Use Cases

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages