feat: Add real-time conversational TTS with Ollama by AlaeddineMessadi · Pull Request #41 · supertone-inc/supertonic

AlaeddineMessadi · 2025-11-30T01:47:58Z

🎯 Overview

This PR adds a complete real-time voice-to-voice conversational AI system built on top of Supertonic TTS. Users can have natural conversations with AI through voice input and output, with low-latency streaming and intelligent interruption handling.

✨ Features

Core Functionality

🚀 Real-time Audio Streaming: Streams audio chunks as they're generated for ultra-low latency
🤖 Ollama Integration: Seamless integration with local LLMs for natural conversations
🎤 Voice Input: Browser-based speech recognition using Web Speech API
📡 Multiple Protocols: Supports both Server-Sent Events (SSE) and WebSocket
💬 Conversation History: Maintains context across multiple messages
🎯 User Priority: AI automatically stops speaking when user starts talking

Technical Features

⚡ Low Latency: First audio chunk available within seconds
🎭 Voice Styles: Support for all voice presets (M1, M2, F1, F2)
🔧 Configurable: Adjustable denoising steps and speech speed
📱 Responsive Design: Modern UI with icons, tooltips, and mobile-friendly layout
🔄 Continuous Listening: Real-time mode for hands-free conversations

📁 What's Included

Server (`real-time/server.js`)

Express server with comprehensive logging and error handling
SSE and WebSocket streaming endpoints
Ollama API integration with streaming support
Smart phrase detection for natural speech breaks
Conversation history management
System prompt support for AI personality customization

Clients

test-client.html: Simple test client for basic TTS streaming
conversation-client.html: Full-featured client with voice input, model selection, and real-time mode

Documentation

Complete README with setup instructions
API endpoint documentation
Usage examples and feature descriptions

🚀 Quick Start

Install dependencies

cd nodejs && npm install
cd ../real-time && npm install

Start Ollama

ollama serve
ollama pull llama3.2

Start the server

cd real-time
npm startOpen conversation-client.html in your browser to start a voice conversation!

🔧 API Endpoints

POST /stream - Basic TTS streaming (SSE)
POST /conversation - Conversational AI with Ollama (SSE)
WS /ws - WebSocket endpoint for bidirectional streaming
GET /health - Health check with Ollama status
GET /models - List available Ollama models
GET /voices - List available voice styles

📝 Implementation Details

Phrase Detection: Intelligent chunking that prioritizes sentence endings, avoids breaking words, and ensures natural speech flow
Audio Queue: Sequential playback of audio chunks to prevent overlapping
Error Handling: Comprehensive logging and error recovery
Self-contained: All code changes are isolated to the real-time/ folder

🧪 Testing

The implementation includes:

Test client for basic TTS functionality
Full conversation client with voice input
Error handling and edge case management
Cross-browser compatibility (Web Speech API)

📋 Requirements

Node.js v18+
Ollama installed and running
Supertonic assets (ONNX models) in parent directory
Browser with Web Speech API support (Chrome, Edge, Safari)

🎨 UI Features

Model selection dropdown
Voice style selection
Adjustable steps and speed controls
Real-time mode toggle
Status indicators with tooltips
Responsive design for mobile and desktop
Chat history with scrollable container

🔄 Git History

This PR includes 11 commits with a clean, logical progression:

Initial setup
Basic TTS streaming server
Ollama integration
WebSocket support
Test client
Conversation client
Documentation updates
Bug fixes and path corrections

All changes are self-contained in the real-time/ directory and don't modify any existing code in the repository.

- Add package.json with dependencies - Add basic README with features and quick start

- Implement Express server with logging and error handling - Add TTS model loading and initialization - Implement Server-Sent Events (SSE) streaming endpoint - Add audio chunk generation and WAV buffer conversion - Add health check and voice styles endpoints - Support configurable steps and speed parameters

- Add Ollama API integration with streaming support - Implement conversation history management - Add system prompt for AI assistant personality - Implement smart phrase detection for natural speech breaks - Add /conversation endpoint for voice-to-voice conversations - Add /models endpoint to list available Ollama models - Update health check to include Ollama status - Support conversation history endpoints

- Add WebSocket server with /ws endpoint - Implement WebSocket-based TTS synthesis - Add WebSocket-based conversational streaming - Support both synthesize and conversation message types - Add connection lifecycle management and error handling

- Add HTML test client with SSE streaming support - Implement audio playback for streamed chunks - Add voice, steps, and speed controls - Add status display and error handling

- Add full-featured conversation client with Ollama integration - Implement Web Speech API for voice transcription - Add audio queue for sequential playback - Add model selection dropdown - Add conversation history display - Add real-time text rendering and voice playback - Support configurable voice, steps, and speed

- Add detailed usage instructions for both clients - Document API endpoints - List features and controls

- Add onnxruntime-node to package.json dependencies - Required by nodejs/helper.js for TTS model loading

- Update README to include npm install in nodejs directory - Required because server imports from ../nodejs/helper.js which needs onnxruntime-node

- Remove chunkText import from nodejs/helper.js - Add chunkText function directly to server.js - Keeps real-time folder self-contained without modifying other repo code

- Change assets path from ../assets to ../../assets - Assets are in the parent directory, not sibling to real-time - Update README to reflect correct path

- Document browser-based Web Speech API (default) - Add information about server-side Whisper integration - Reference whisper.cpp repository for users who want higher accuracy - Include example integration code

- Change default model recommendation to llama3.2:1b for lower latency - Add comment about full llama3.2 for better quality - Optimize for real-time conversational use case

- Create conversation-client.css with all styles from conversation-client.html - Create test-client.css with all styles from test-client.html - Update HTML files to link to external CSS files - Improves code organization and maintainability

- Create styles/ directory - Move conversation-client.css and test-client.css into styles/ - Update HTML files to reference new CSS paths - Better project organization

- Reuse existing media stream instead of requesting new one each time - Remove enumerateDevices call on page load that triggers permission prompt - Keep media stream open for reuse between recordings - Only request permission once when user clicks voice button - Improves user experience by avoiding repeated permission dialogs

- Remove duplicate const audioContext declaration - Reuse single audioContext for both testing and monitoring - Fixes SyntaxError: Identifier 'audioContext' has already been declared

ANLGBOY · 2025-12-15T09:32:23Z

Thank you for sharing your implementation of a conversational chat app. I tested it on my local laptop, and it works very well. However, we try to keep this repository focused on minimal examples for each programming language. Therefore, rather than merging your code into this repository, we encourage you to publish it as a separate repository. We would be happy to feature it in the Built with Supertonic section.

AlaeddineMessadi added 18 commits November 30, 2025 02:26

feat: initial setup for real-time conversational TTS

e740367

- Add package.json with dependencies - Add basic README with features and quick start

feat: add basic test client for TTS streaming

1e5f231

- Add HTML test client with SSE streaming support - Implement audio playback for streamed chunks - Add voice, steps, and speed controls - Add status display and error handling

docs: update README with usage instructions and API endpoints

c0a19a6

- Add detailed usage instructions for both clients - Document API endpoints - List features and controls

fix: add onnxruntime-node dependency

8398b55

- Add onnxruntime-node to package.json dependencies - Required by nodejs/helper.js for TTS model loading

docs: add nodejs dependency installation step

cf175df

- Update README to include npm install in nodejs directory - Required because server imports from ../nodejs/helper.js which needs onnxruntime-node

fix: add chunkText function to server.js

8664ea0

- Remove chunkText import from nodejs/helper.js - Add chunkText function directly to server.js - Keeps real-time folder self-contained without modifying other repo code

fix: update assets path to point to parent directory

9ceb635

- Change assets path from ../assets to ../../assets - Assets are in the parent directory, not sibling to real-time - Update README to reflect correct path

docs: add Whisper integration section to README

ffe402c

- Document browser-based Web Speech API (default) - Add information about server-side Whisper integration - Reference whisper.cpp repository for users who want higher accuracy - Include example integration code

docs: update README to recommend low-latency llama3.2:1b model

92c322e

- Change default model recommendation to llama3.2:1b for lower latency - Add comment about full llama3.2 for better quality - Optimize for real-time conversational use case

refactor: extract CSS into separate files

b1fd446

- Create conversation-client.css with all styles from conversation-client.html - Create test-client.css with all styles from test-client.html - Update HTML files to link to external CSS files - Improves code organization and maintainability

refactor: move CSS files into styles directory

498becc

- Create styles/ directory - Move conversation-client.css and test-client.css into styles/ - Update HTML files to reference new CSS paths - Better project organization

fix: add missing mediaStream variable declaration

0de5d05

fix: remove duplicate audioContext declaration

7689b20

- Remove duplicate const audioContext declaration - Reuse single audioContext for both testing and monitoring - Fixes SyntaxError: Identifier 'audioContext' has already been declared

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add real-time conversational TTS with Ollama#41

feat: Add real-time conversational TTS with Ollama#41
AlaeddineMessadi wants to merge 18 commits intosupertone-inc:mainfrom
AlaeddineMessadi:feature/realtime-conversational-tts

AlaeddineMessadi commented Nov 30, 2025

Uh oh!

ANLGBOY commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlaeddineMessadi commented Nov 30, 2025

🎯 Overview

✨ Features

Core Functionality

Technical Features

📁 What's Included

Server (real-time/server.js)

Clients

Documentation

🚀 Quick Start

Install dependencies

Start Ollama

Start the server

🔧 API Endpoints

📝 Implementation Details

🧪 Testing

📋 Requirements

🎨 UI Features

🔄 Git History

Uh oh!

ANLGBOY commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Server (`real-time/server.js`)