🎙️ Speech-to-Text & Translation App

A production-ready, full-stack application for converting speech to text and translating it into multiple languages. Built with Python (FastAPI), OpenAI Whisper, and Vue.js.

✨ Features

🎤 Audio Recording: Record audio directly from your browser
📁 File Upload: Support for multiple audio formats (MP3, WAV, M4A, OGG, FLAC, WebM)
🗣️ Speech-to-Text: High-accuracy transcription using OpenAI Whisper
🌍 Multi-Language Translation: Translate to 30+ languages
⏱️ Timestamped Segments: Get detailed transcription with timestamps
💾 Multiple Export Formats: Download as TXT, JSON, or SRT
🐳 Docker Ready: Production-ready containerization
✅ Fully Tested: Comprehensive test suite with pytest
🎨 Modern UI: Beautiful, responsive Vue.js interface

🏗️ Architecture

Speech-to-Text-Translation-App/
├── backend/                    # Python FastAPI backend
│   ├── main.py                # FastAPI application
│   ├── models/                # Pydantic schemas
│   └── services/              # Business logic
│       ├── speech_to_text.py  # Whisper integration
│       └── translator.py      # Translation service
├── frontend/                   # Vue.js frontend
│   ├── src/
│   │   ├── components/        # Vue components
│   │   ├── services/          # API client
│   │   └── App.vue           # Main app component
│   └── package.json
├── tests/                     # Test suite
├── docker-compose.yml         # Docker orchestration
├── Dockerfile                # Production Docker image
└── requirements.txt          # Python dependencies

🚀 Quick Start

Prerequisites

Python 3.10+
Node.js 18+
FFmpeg (for audio processing)

Option 1: Docker (Recommended)

# Clone the repository
git clone https://github.com/pyenthusiasts/Speech-to-Text-Translation-App.git
cd Speech-to-Text-Translation-App

# Start with Docker Compose
docker-compose up --build

# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Option 2: Local Development

Backend Setup

# Install Python dependencies
pip install -r requirements.txt

# Start the backend server
cd backend
python main.py

# Or with uvicorn
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

# Install Node dependencies
cd frontend
npm install

# Start development server
npm run dev

# Access at http://localhost:3000

📖 Usage

Web Interface

Upload or Record Audio
- Click to upload an audio file, or
- Click "Start Recording" to record directly
Select Languages (Optional)
- Choose source language (or auto-detect)
- Select target languages for translation
Process
- Click "Transcribe & Translate"
- Wait for processing to complete
Download Results
- Download transcription as TXT, JSON, or SRT
- Get translations in all selected languages

API Usage

Transcribe Audio

curl -X POST "http://localhost:8000/api/transcribe" \
  -F "file=@audio.mp3" \
  -F "language=en"

Translate Text

curl -X POST "http://localhost:8000/api/translate" \
  -F "text=Hello, world!" \
  -F "source_language=en" \
  -F "target_languages=es,fr,de"

Full Pipeline (Transcribe + Translate)

curl -X POST "http://localhost:8000/api/process" \
  -F "file=@audio.mp3" \
  -F "source_language=en" \
  -F "target_languages=es,fr,de"

API Documentation

Interactive API documentation available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

🧪 Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=backend --cov-report=html

# Run specific test file
pytest tests/test_api.py

# Run with verbose output
pytest -v

🌍 Supported Languages

The app supports 30+ languages including:

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Russian (ru)
Japanese (ja)
Korean (ko)
Chinese (zh-CN, zh-TW)
Arabic (ar)
Hindi (hi)
And many more...

🎯 Whisper Models

Choose from different Whisper model sizes based on your needs:

Model	Size	Speed	Accuracy
tiny	~39 MB	⚡⚡⚡	⭐⭐
base	~74 MB	⚡⚡	⭐⭐⭐
small	~244 MB	⚡	⭐⭐⭐⭐
medium	~769 MB	🐌	⭐⭐⭐⭐⭐
large	~1550 MB	🐌🐌	⭐⭐⭐⭐⭐

Configure in .env:

WHISPER_MODEL_SIZE=base

📦 Production Deployment

Environment Variables

Create a .env file:

# Backend
API_HOST=0.0.0.0
API_PORT=8000
WHISPER_MODEL_SIZE=base

# Frontend
VITE_API_URL=http://your-domain.com

# Environment
ENVIRONMENT=production

Docker Production Build

# Build production image
docker build -t speech-translation-app .

# Run container
docker run -p 8000:8000 \
  -e WHISPER_MODEL_SIZE=base \
  -v $(pwd)/uploads:/app/uploads \
  speech-translation-app

Nginx Reverse Proxy

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    location /api {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

🔧 Configuration

Backend Configuration

API_HOST: Host to bind (default: 0.0.0.0)
API_PORT: Port to listen (default: 8000)
WHISPER_MODEL_SIZE: Whisper model size (default: base)

Frontend Configuration

VITE_API_URL: Backend API URL (default: http://localhost:8000)

🛠️ Tech Stack

Backend

FastAPI: Modern, fast web framework
OpenAI Whisper: State-of-the-art speech recognition
Deep Translator: Multi-language translation
PyTorch: ML framework for Whisper
Pydantic: Data validation
pytest: Testing framework

Frontend

Vue.js 3: Progressive JavaScript framework
Vite: Next-generation frontend tooling
Axios: HTTP client
Pinia: State management

📝 API Endpoints

Method	Endpoint	Description
GET	`/`	Health check
GET	`/api/languages`	Get supported languages
POST	`/api/transcribe`	Transcribe audio to text
POST	`/api/translate`	Translate text
POST	`/api/process`	Full pipeline (transcribe + translate)

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI Whisper for speech recognition
Deep Translator for translation
FastAPI for the backend framework
Vue.js for the frontend framework

📧 Support

For issues and questions:

Open an issue on GitHub
Check the documentation

🗺️ Roadmap

Add support for real-time transcription
Implement user authentication
Add batch processing for multiple files
Support for video files
Custom vocabulary and domain-specific models
WebSocket support for live transcription
Mobile app (React Native)

Made with ❤️ by the PyEnthusiasts Team

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

License

pyenthusiasts/Speech-to-Text-Translation-App

Folders and files

Latest commit

History

Repository files navigation