A cross-platform desktop application built with Tauri and React that provides real-time voice-to-text transcription using Deepgram's API. This is a functional clone of Wispr Flow, focusing on core voice-to-text workflow rather than UI replication.
- Real-time Voice Transcription: Live speech-to-text using Deepgram's API
- Push-to-Talk Interface: Hold to record, release to stop
- Cross-platform: Works on Windows, macOS, and Linux
- Audio Device Detection: Automatic microphone detection and configuration
- Export Options: Copy to clipboard or download as text file
- Clean UI: Modern, responsive interface with visual recording feedback
- Frontend: React with modern hooks and components
- Backend: Rust with Tauri framework
- Audio Processing: cpal for cross-platform audio capture
- API Integration: Deepgram WebSocket streaming API
- Build System: Vite for frontend, Cargo for Rust backend
Before running the application, ensure you have:
-
Rust (latest stable version)
- Install from rustup.rs
- Verify installation:
rustc --version
-
Node.js (version 16 or higher)
- Download from nodejs.org
- Verify installation:
node --version
-
Deepgram API Key
- Sign up at deepgram.com
- Create a new project and copy your API key
-
Clone the repository
git clone <your-repo-url> cd voice-to-text-app
-
Install dependencies
npm install
-
Run the development server
npm run tauri dev
-
For production build
npm run tauri build
-
Launch the application
- The app will open with an API key input screen
-
Enter your Deepgram API key
- Paste your API key and click "Start App"
-
Grant microphone permissions
- Allow the app to access your microphone when prompted
-
Start recording
- Hold the record button and speak clearly
- Release the button to stop recording
-
View transcription
- Your speech will appear as text in real-time
- Use copy or download buttons to save the text
voice-to-text-app/
├── src/ # React frontend
│ ├── components/ # UI components
│ ├── hooks/ # Custom React hooks
│ └── App.jsx # Main application
├── src-tauri/ # Rust backend
│ ├── src/
│ │ ├── audio/ # Audio capture & streaming
│ │ ├── commands.rs # Tauri command handlers
│ │ └── lib.rs # Main Rust application
│ └── Cargo.toml # Rust dependencies
└── package.json # Node.js dependencies
┌─────────────────────────────────────────┐
│ Frontend │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ UI Controls │ │ Audio Visualizer│ │
│ │ (Push-to- │ │ & Text Display │ │
│ │ Talk) │ └─────────────────┘ │
│ └─────────────┘ │
└─────────────────┬───────────────────────┘
│ Tauri Commands & Events
┌─────────────────▼───────────────────────┐
│ Rust Backend │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Audio │ │ Deepgram │ │
│ │ Capture │ │ Integration │ │
│ │ (cpal) │ │ (WebSocket) │ │
│ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────┘
-
Audio Capture System
- Uses
cpalfor cross-platform microphone access - Handles multiple audio formats (F32, I16, U16)
- Adapts to device's native configuration
- Uses
-
Deepgram Integration
- WebSocket streaming for real-time transcription
- Secure TLS connection with proper headers
- JSON response parsing and error handling
-
Event-Driven Communication
- Tauri events for real-time UI updates
- Background thread processing
- Clean separation between frontend and backend
- Device Compatibility: Uses device's native audio configuration instead of forcing specific formats
- Real-time Streaming: Direct PCM to WebSocket pipeline for minimal latency
- Cross-platform: cpal library ensures compatibility across Windows, macOS, and Linux
- Secure Connection: TLS-enabled WebSocket to Deepgram's API
- Proper Headers: Includes all required WebSocket handshake headers
- Error Recovery: Graceful handling of connection failures
- Simple Architecture: React state with custom hooks
- Event-based Updates: Real-time transcription via Tauri events
- Thread Safety: Arc for shared state in Rust
- Internet Dependency: Requires active internet connection for Deepgram API
- API Key Required: Must have valid Deepgram API key to function
- Audio Device: Requires working microphone for voice input
- Language Support: Currently configured for English (en-US)
"No input device available"
- Ensure microphone is connected and working
- Check system audio permissions
- Try restarting the application
"TLS support not compiled in"
- Ensure you're using the latest build with TLS features
- Run
cargo cleanand rebuild if necessary
"WebSocket protocol error"
- Verify your Deepgram API key is correct
- Check internet connection
- Ensure firewall allows WebSocket connections
"Recording session error: stream configuration not supported"
- This should be resolved in the current version
- The app now adapts to your device's native audio format
To see detailed logs, run the development version:
npm run tauri devCheck the console for connection status and error messages.
The application has been tested with:
- ✅ Real-time voice transcription
- ✅ Multiple recording sessions
- ✅ Copy and download functionality
- ✅ Error handling scenarios
- ✅ Audio device compatibility
MIT License - See LICENSE file for details
This is a technical assignment project demonstrating:
- Cross-platform desktop app development
- Real-time audio processing
- API integration with WebSocket streaming
- Clean code architecture
- Modern Rust and React development
For technical questions or issues:
- Check the troubleshooting section above
- Review console logs in development mode
- Verify all prerequisites are installed correctly