feat: Add real-time conversational TTS with Ollama#41
Open
AlaeddineMessadi wants to merge 18 commits intosupertone-inc:mainfrom
Open
feat: Add real-time conversational TTS with Ollama#41AlaeddineMessadi wants to merge 18 commits intosupertone-inc:mainfrom
AlaeddineMessadi wants to merge 18 commits intosupertone-inc:mainfrom
Conversation
- Add package.json with dependencies - Add basic README with features and quick start
- Implement Express server with logging and error handling - Add TTS model loading and initialization - Implement Server-Sent Events (SSE) streaming endpoint - Add audio chunk generation and WAV buffer conversion - Add health check and voice styles endpoints - Support configurable steps and speed parameters
- Add Ollama API integration with streaming support - Implement conversation history management - Add system prompt for AI assistant personality - Implement smart phrase detection for natural speech breaks - Add /conversation endpoint for voice-to-voice conversations - Add /models endpoint to list available Ollama models - Update health check to include Ollama status - Support conversation history endpoints
- Add WebSocket server with /ws endpoint - Implement WebSocket-based TTS synthesis - Add WebSocket-based conversational streaming - Support both synthesize and conversation message types - Add connection lifecycle management and error handling
- Add HTML test client with SSE streaming support - Implement audio playback for streamed chunks - Add voice, steps, and speed controls - Add status display and error handling
- Add full-featured conversation client with Ollama integration - Implement Web Speech API for voice transcription - Add audio queue for sequential playback - Add model selection dropdown - Add conversation history display - Add real-time text rendering and voice playback - Support configurable voice, steps, and speed
- Add detailed usage instructions for both clients - Document API endpoints - List features and controls
- Add onnxruntime-node to package.json dependencies - Required by nodejs/helper.js for TTS model loading
- Update README to include npm install in nodejs directory - Required because server imports from ../nodejs/helper.js which needs onnxruntime-node
- Remove chunkText import from nodejs/helper.js - Add chunkText function directly to server.js - Keeps real-time folder self-contained without modifying other repo code
- Change assets path from ../assets to ../../assets - Assets are in the parent directory, not sibling to real-time - Update README to reflect correct path
- Document browser-based Web Speech API (default) - Add information about server-side Whisper integration - Reference whisper.cpp repository for users who want higher accuracy - Include example integration code
- Change default model recommendation to llama3.2:1b for lower latency - Add comment about full llama3.2 for better quality - Optimize for real-time conversational use case
- Create conversation-client.css with all styles from conversation-client.html - Create test-client.css with all styles from test-client.html - Update HTML files to link to external CSS files - Improves code organization and maintainability
- Create styles/ directory - Move conversation-client.css and test-client.css into styles/ - Update HTML files to reference new CSS paths - Better project organization
- Reuse existing media stream instead of requesting new one each time - Remove enumerateDevices call on page load that triggers permission prompt - Keep media stream open for reuse between recordings - Only request permission once when user clicks voice button - Improves user experience by avoiding repeated permission dialogs
- Remove duplicate const audioContext declaration - Reuse single audioContext for both testing and monitoring - Fixes SyntaxError: Identifier 'audioContext' has already been declared
Collaborator
|
Thank you for sharing your implementation of a conversational chat app. I tested it on my local laptop, and it works very well. However, we try to keep this repository focused on minimal examples for each programming language. Therefore, rather than merging your code into this repository, we encourage you to publish it as a separate repository. We would be happy to feature it in the Built with Supertonic section. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 Overview
This PR adds a complete real-time voice-to-voice conversational AI system built on top of Supertonic TTS. Users can have natural conversations with AI through voice input and output, with low-latency streaming and intelligent interruption handling.
✨ Features
Core Functionality
Technical Features
📁 What's Included
Server (
real-time/server.js)Clients
test-client.html: Simple test client for basic TTS streamingconversation-client.html: Full-featured client with voice input, model selection, and real-time modeDocumentation
🚀 Quick Start
Install dependencies
cd nodejs && npm install
cd ../real-time && npm install
Start Ollama
ollama serve
ollama pull llama3.2
Start the server
cd real-time
npm startOpen
conversation-client.htmlin your browser to start a voice conversation!🔧 API Endpoints
POST /stream- Basic TTS streaming (SSE)POST /conversation- Conversational AI with Ollama (SSE)WS /ws- WebSocket endpoint for bidirectional streamingGET /health- Health check with Ollama statusGET /models- List available Ollama modelsGET /voices- List available voice styles📝 Implementation Details
real-time/folder🧪 Testing
The implementation includes:
📋 Requirements
🎨 UI Features
🔄 Git History
This PR includes 11 commits with a clean, logical progression:
All changes are self-contained in the
real-time/directory and don't modify any existing code in the repository.