Skip to content

feat: Add real-time conversational TTS with Ollama#41

Open
AlaeddineMessadi wants to merge 18 commits intosupertone-inc:mainfrom
AlaeddineMessadi:feature/realtime-conversational-tts
Open

feat: Add real-time conversational TTS with Ollama#41
AlaeddineMessadi wants to merge 18 commits intosupertone-inc:mainfrom
AlaeddineMessadi:feature/realtime-conversational-tts

Conversation

@AlaeddineMessadi
Copy link

🎯 Overview

This PR adds a complete real-time voice-to-voice conversational AI system built on top of Supertonic TTS. Users can have natural conversations with AI through voice input and output, with low-latency streaming and intelligent interruption handling.

✨ Features

Core Functionality

  • 🚀 Real-time Audio Streaming: Streams audio chunks as they're generated for ultra-low latency
  • 🤖 Ollama Integration: Seamless integration with local LLMs for natural conversations
  • 🎤 Voice Input: Browser-based speech recognition using Web Speech API
  • 📡 Multiple Protocols: Supports both Server-Sent Events (SSE) and WebSocket
  • 💬 Conversation History: Maintains context across multiple messages
  • 🎯 User Priority: AI automatically stops speaking when user starts talking

Technical Features

  • Low Latency: First audio chunk available within seconds
  • 🎭 Voice Styles: Support for all voice presets (M1, M2, F1, F2)
  • 🔧 Configurable: Adjustable denoising steps and speech speed
  • 📱 Responsive Design: Modern UI with icons, tooltips, and mobile-friendly layout
  • 🔄 Continuous Listening: Real-time mode for hands-free conversations

📁 What's Included

Server (real-time/server.js)

  • Express server with comprehensive logging and error handling
  • SSE and WebSocket streaming endpoints
  • Ollama API integration with streaming support
  • Smart phrase detection for natural speech breaks
  • Conversation history management
  • System prompt support for AI personality customization

Clients

  • test-client.html: Simple test client for basic TTS streaming
  • conversation-client.html: Full-featured client with voice input, model selection, and real-time mode

Documentation

  • Complete README with setup instructions
  • API endpoint documentation
  • Usage examples and feature descriptions

🚀 Quick Start

Install dependencies

cd nodejs && npm install
cd ../real-time && npm install

Start Ollama

ollama serve
ollama pull llama3.2

Start the server

cd real-time
npm startOpen conversation-client.html in your browser to start a voice conversation!

🔧 API Endpoints

  • POST /stream - Basic TTS streaming (SSE)
  • POST /conversation - Conversational AI with Ollama (SSE)
  • WS /ws - WebSocket endpoint for bidirectional streaming
  • GET /health - Health check with Ollama status
  • GET /models - List available Ollama models
  • GET /voices - List available voice styles

📝 Implementation Details

  • Phrase Detection: Intelligent chunking that prioritizes sentence endings, avoids breaking words, and ensures natural speech flow
  • Audio Queue: Sequential playback of audio chunks to prevent overlapping
  • Error Handling: Comprehensive logging and error recovery
  • Self-contained: All code changes are isolated to the real-time/ folder

🧪 Testing

The implementation includes:

  • Test client for basic TTS functionality
  • Full conversation client with voice input
  • Error handling and edge case management
  • Cross-browser compatibility (Web Speech API)

📋 Requirements

  • Node.js v18+
  • Ollama installed and running
  • Supertonic assets (ONNX models) in parent directory
  • Browser with Web Speech API support (Chrome, Edge, Safari)

🎨 UI Features

  • Model selection dropdown
  • Voice style selection
  • Adjustable steps and speed controls
  • Real-time mode toggle
  • Status indicators with tooltips
  • Responsive design for mobile and desktop
  • Chat history with scrollable container

🔄 Git History

This PR includes 11 commits with a clean, logical progression:

  1. Initial setup
  2. Basic TTS streaming server
  3. Ollama integration
  4. WebSocket support
  5. Test client
  6. Conversation client
  7. Documentation updates
  8. Bug fixes and path corrections

All changes are self-contained in the real-time/ directory and don't modify any existing code in the repository.

- Add package.json with dependencies
- Add basic README with features and quick start
- Implement Express server with logging and error handling
- Add TTS model loading and initialization
- Implement Server-Sent Events (SSE) streaming endpoint
- Add audio chunk generation and WAV buffer conversion
- Add health check and voice styles endpoints
- Support configurable steps and speed parameters
- Add Ollama API integration with streaming support
- Implement conversation history management
- Add system prompt for AI assistant personality
- Implement smart phrase detection for natural speech breaks
- Add /conversation endpoint for voice-to-voice conversations
- Add /models endpoint to list available Ollama models
- Update health check to include Ollama status
- Support conversation history endpoints
- Add WebSocket server with /ws endpoint
- Implement WebSocket-based TTS synthesis
- Add WebSocket-based conversational streaming
- Support both synthesize and conversation message types
- Add connection lifecycle management and error handling
- Add HTML test client with SSE streaming support
- Implement audio playback for streamed chunks
- Add voice, steps, and speed controls
- Add status display and error handling
- Add full-featured conversation client with Ollama integration
- Implement Web Speech API for voice transcription
- Add audio queue for sequential playback
- Add model selection dropdown
- Add conversation history display
- Add real-time text rendering and voice playback
- Support configurable voice, steps, and speed
- Add detailed usage instructions for both clients
- Document API endpoints
- List features and controls
- Add onnxruntime-node to package.json dependencies
- Required by nodejs/helper.js for TTS model loading
- Update README to include npm install in nodejs directory
- Required because server imports from ../nodejs/helper.js which needs onnxruntime-node
- Remove chunkText import from nodejs/helper.js
- Add chunkText function directly to server.js
- Keeps real-time folder self-contained without modifying other repo code
- Change assets path from ../assets to ../../assets
- Assets are in the parent directory, not sibling to real-time
- Update README to reflect correct path
- Document browser-based Web Speech API (default)
- Add information about server-side Whisper integration
- Reference whisper.cpp repository for users who want higher accuracy
- Include example integration code
- Change default model recommendation to llama3.2:1b for lower latency
- Add comment about full llama3.2 for better quality
- Optimize for real-time conversational use case
- Create conversation-client.css with all styles from conversation-client.html
- Create test-client.css with all styles from test-client.html
- Update HTML files to link to external CSS files
- Improves code organization and maintainability
- Create styles/ directory
- Move conversation-client.css and test-client.css into styles/
- Update HTML files to reference new CSS paths
- Better project organization
- Reuse existing media stream instead of requesting new one each time
- Remove enumerateDevices call on page load that triggers permission prompt
- Keep media stream open for reuse between recordings
- Only request permission once when user clicks voice button
- Improves user experience by avoiding repeated permission dialogs
- Remove duplicate const audioContext declaration
- Reuse single audioContext for both testing and monitoring
- Fixes SyntaxError: Identifier 'audioContext' has already been declared
@ANLGBOY
Copy link
Collaborator

ANLGBOY commented Dec 15, 2025

Thank you for sharing your implementation of a conversational chat app. I tested it on my local laptop, and it works very well. However, we try to keep this repository focused on minimal examples for each programming language. Therefore, rather than merging your code into this repository, we encourage you to publish it as a separate repository. We would be happy to feature it in the Built with Supertonic section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants