A production-ready, full-stack application for converting speech to text and translating it into multiple languages. Built with Python (FastAPI), OpenAI Whisper, and Vue.js.
- π€ Audio Recording: Record audio directly from your browser
- π File Upload: Support for multiple audio formats (MP3, WAV, M4A, OGG, FLAC, WebM)
- π£οΈ Speech-to-Text: High-accuracy transcription using OpenAI Whisper
- π Multi-Language Translation: Translate to 30+ languages
- β±οΈ Timestamped Segments: Get detailed transcription with timestamps
- πΎ Multiple Export Formats: Download as TXT, JSON, or SRT
- π³ Docker Ready: Production-ready containerization
- β Fully Tested: Comprehensive test suite with pytest
- π¨ Modern UI: Beautiful, responsive Vue.js interface
Speech-to-Text-Translation-App/
βββ backend/ # Python FastAPI backend
β βββ main.py # FastAPI application
β βββ models/ # Pydantic schemas
β βββ services/ # Business logic
β βββ speech_to_text.py # Whisper integration
β βββ translator.py # Translation service
βββ frontend/ # Vue.js frontend
β βββ src/
β β βββ components/ # Vue components
β β βββ services/ # API client
β β βββ App.vue # Main app component
β βββ package.json
βββ tests/ # Test suite
βββ docker-compose.yml # Docker orchestration
βββ Dockerfile # Production Docker image
βββ requirements.txt # Python dependencies
- Python 3.10+
- Node.js 18+
- FFmpeg (for audio processing)
# Clone the repository
git clone https://github.com/pyenthusiasts/Speech-to-Text-Translation-App.git
cd Speech-to-Text-Translation-App
# Start with Docker Compose
docker-compose up --build
# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs# Install Python dependencies
pip install -r requirements.txt
# Start the backend server
cd backend
python main.py
# Or with uvicorn
uvicorn main:app --reload --host 0.0.0.0 --port 8000# Install Node dependencies
cd frontend
npm install
# Start development server
npm run dev
# Access at http://localhost:3000-
Upload or Record Audio
- Click to upload an audio file, or
- Click "Start Recording" to record directly
-
Select Languages (Optional)
- Choose source language (or auto-detect)
- Select target languages for translation
-
Process
- Click "Transcribe & Translate"
- Wait for processing to complete
-
Download Results
- Download transcription as TXT, JSON, or SRT
- Get translations in all selected languages
curl -X POST "http://localhost:8000/api/transcribe" \
-F "file=@audio.mp3" \
-F "language=en"curl -X POST "http://localhost:8000/api/translate" \
-F "text=Hello, world!" \
-F "source_language=en" \
-F "target_languages=es,fr,de"curl -X POST "http://localhost:8000/api/process" \
-F "file=@audio.mp3" \
-F "source_language=en" \
-F "target_languages=es,fr,de"Interactive API documentation available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
# Run all tests
pytest
# Run with coverage
pytest --cov=backend --cov-report=html
# Run specific test file
pytest tests/test_api.py
# Run with verbose output
pytest -vThe app supports 30+ languages including:
- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Italian (it)
- Portuguese (pt)
- Russian (ru)
- Japanese (ja)
- Korean (ko)
- Chinese (zh-CN, zh-TW)
- Arabic (ar)
- Hindi (hi)
- And many more...
Choose from different Whisper model sizes based on your needs:
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| tiny | ~39 MB | β‘β‘β‘ | ββ |
| base | ~74 MB | β‘β‘ | βββ |
| small | ~244 MB | β‘ | ββββ |
| medium | ~769 MB | π | βββββ |
| large | ~1550 MB | ππ | βββββ |
Configure in .env:
WHISPER_MODEL_SIZE=baseCreate a .env file:
# Backend
API_HOST=0.0.0.0
API_PORT=8000
WHISPER_MODEL_SIZE=base
# Frontend
VITE_API_URL=http://your-domain.com
# Environment
ENVIRONMENT=production# Build production image
docker build -t speech-translation-app .
# Run container
docker run -p 8000:8000 \
-e WHISPER_MODEL_SIZE=base \
-v $(pwd)/uploads:/app/uploads \
speech-translation-appserver {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://localhost:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location /api {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}API_HOST: Host to bind (default:0.0.0.0)API_PORT: Port to listen (default:8000)WHISPER_MODEL_SIZE: Whisper model size (default:base)
VITE_API_URL: Backend API URL (default:http://localhost:8000)
- FastAPI: Modern, fast web framework
- OpenAI Whisper: State-of-the-art speech recognition
- Deep Translator: Multi-language translation
- PyTorch: ML framework for Whisper
- Pydantic: Data validation
- pytest: Testing framework
- Vue.js 3: Progressive JavaScript framework
- Vite: Next-generation frontend tooling
- Axios: HTTP client
- Pinia: State management
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Health check |
| GET | /api/languages |
Get supported languages |
| POST | /api/transcribe |
Transcribe audio to text |
| POST | /api/translate |
Translate text |
| POST | /api/process |
Full pipeline (transcribe + translate) |
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for speech recognition
- Deep Translator for translation
- FastAPI for the backend framework
- Vue.js for the frontend framework
For issues and questions:
- Open an issue on GitHub
- Check the documentation
- Add support for real-time transcription
- Implement user authentication
- Add batch processing for multiple files
- Support for video files
- Custom vocabulary and domain-specific models
- WebSocket support for live transcription
- Mobile app (React Native)
Made with β€οΈ by the PyEnthusiasts Team