Voice-to-Text Desktop App

A cross-platform desktop application built with Tauri and React that provides real-time voice-to-text transcription using Deepgram's API. This is a functional clone of Wispr Flow, focusing on core voice-to-text workflow rather than UI replication.

🎯 Features

Real-time Voice Transcription: Live speech-to-text using Deepgram's API
Push-to-Talk Interface: Hold to record, release to stop
Cross-platform: Works on Windows, macOS, and Linux
Audio Device Detection: Automatic microphone detection and configuration
Export Options: Copy to clipboard or download as text file
Clean UI: Modern, responsive interface with visual recording feedback

🛠 Tech Stack

Frontend: React with modern hooks and components
Backend: Rust with Tauri framework
Audio Processing: cpal for cross-platform audio capture
API Integration: Deepgram WebSocket streaming API
Build System: Vite for frontend, Cargo for Rust backend

📋 Prerequisites

Before running the application, ensure you have:

Rust (latest stable version)
- Install from rustup.rs
- Verify installation: rustc --version
Node.js (version 16 or higher)
- Download from nodejs.org
- Verify installation: node --version
Deepgram API Key
- Sign up at deepgram.com
- Create a new project and copy your API key

🚀 Installation & Setup

Clone the repository

git clone <your-repo-url>
cd voice-to-text-app

Install dependencies
```
npm install
```
Run the development server
```
npm run tauri dev
```
For production build
```
npm run tauri build
```

📱 Usage

Launch the application
- The app will open with an API key input screen
Enter your Deepgram API key
- Paste your API key and click "Start App"
Grant microphone permissions
- Allow the app to access your microphone when prompted
Start recording
- Hold the record button and speak clearly
- Release the button to stop recording
View transcription
- Your speech will appear as text in real-time
- Use copy or download buttons to save the text

🏗 Architecture

Project Structure

voice-to-text-app/
├── src/                    # React frontend
│   ├── components/         # UI components
│   ├── hooks/             # Custom React hooks
│   └── App.jsx            # Main application
├── src-tauri/             # Rust backend
│   ├── src/
│   │   ├── audio/         # Audio capture & streaming
│   │   ├── commands.rs    # Tauri command handlers
│   │   └── lib.rs         # Main Rust application
│   └── Cargo.toml         # Rust dependencies
└── package.json           # Node.js dependencies

Technical Architecture

┌─────────────────────────────────────────┐
│                Frontend                 │
│  ┌─────────────┐  ┌─────────────────┐   │
│  │ UI Controls │  │ Audio Visualizer│   │
│  │ (Push-to-   │  │ & Text Display  │   │
│  │  Talk)      │  └─────────────────┘   │
│  └─────────────┘                        │
└─────────────────┬───────────────────────┘
                  │ Tauri Commands & Events
┌─────────────────▼───────────────────────┐
│              Rust Backend               │
│  ┌─────────────┐  ┌─────────────────┐   │
│  │ Audio       │  │ Deepgram       │   │
│  │ Capture     │  │ Integration    │   │
│  │ (cpal)      │  │ (WebSocket)    │   │
│  └─────────────┘  └─────────────────┘   │
└─────────────────────────────────────────┘

Key Components

Audio Capture System
- Uses cpal for cross-platform microphone access
- Handles multiple audio formats (F32, I16, U16)
- Adapts to device's native configuration
Deepgram Integration
- WebSocket streaming for real-time transcription
- Secure TLS connection with proper headers
- JSON response parsing and error handling
Event-Driven Communication
- Tauri events for real-time UI updates
- Background thread processing
- Clean separation between frontend and backend

🔧 Technical Decisions

Audio Processing

Device Compatibility: Uses device's native audio configuration instead of forcing specific formats
Real-time Streaming: Direct PCM to WebSocket pipeline for minimal latency
Cross-platform: cpal library ensures compatibility across Windows, macOS, and Linux

WebSocket Implementation

Secure Connection: TLS-enabled WebSocket to Deepgram's API
Proper Headers: Includes all required WebSocket handshake headers
Error Recovery: Graceful handling of connection failures

State Management

Simple Architecture: React state with custom hooks
Event-based Updates: Real-time transcription via Tauri events
Thread Safety: Arc for shared state in Rust

⚠️ Known Limitations

Internet Dependency: Requires active internet connection for Deepgram API
API Key Required: Must have valid Deepgram API key to function
Audio Device: Requires working microphone for voice input
Language Support: Currently configured for English (en-US)

🐛 Troubleshooting

Common Issues

"No input device available"

Ensure microphone is connected and working
Check system audio permissions
Try restarting the application

"TLS support not compiled in"

Ensure you're using the latest build with TLS features
Run cargo clean and rebuild if necessary

"WebSocket protocol error"

Verify your Deepgram API key is correct
Check internet connection
Ensure firewall allows WebSocket connections

"Recording session error: stream configuration not supported"

This should be resolved in the current version
The app now adapts to your device's native audio format

Debug Mode

To see detailed logs, run the development version:

npm run tauri dev

Check the console for connection status and error messages.

🧪 Testing

The application has been tested with:

✅ Real-time voice transcription
✅ Multiple recording sessions
✅ Copy and download functionality
✅ Error handling scenarios
✅ Audio device compatibility

📄 License

MIT License - See LICENSE file for details

🤝 Contributing

This is a technical assignment project demonstrating:

Cross-platform desktop app development
Real-time audio processing
API integration with WebSocket streaming
Clean code architecture
Modern Rust and React development

📞 Support

For technical questions or issues:

Check the troubleshooting section above
Review console logs in development mode
Verify all prerequisites are installed correctly

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
public		public
src-tauri		src-tauri
src		src
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-to-Text Desktop App

🎯 Features

🛠 Tech Stack

📋 Prerequisites

🚀 Installation & Setup

📱 Usage

🏗 Architecture

Project Structure

Technical Architecture

Key Components

🔧 Technical Decisions

Audio Processing

WebSocket Implementation

State Management

⚠️ Known Limitations

🐛 Troubleshooting

Common Issues

Debug Mode

🧪 Testing

📄 License

🤝 Contributing

📞 Support

About

Uh oh!

Releases

Packages

Languages

Mausumi134/voice-to-text-desktop-app

Folders and files

Latest commit

History

Repository files navigation

Voice-to-Text Desktop App

🎯 Features

🛠 Tech Stack

📋 Prerequisites

🚀 Installation & Setup

📱 Usage

🏗 Architecture

Project Structure

Technical Architecture

Key Components

🔧 Technical Decisions

Audio Processing

WebSocket Implementation

State Management

⚠️ Known Limitations

🐛 Troubleshooting

Common Issues

Debug Mode

🧪 Testing

📄 License

🤝 Contributing

📞 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages