Skip to content

pyenthusiasts/Speech-to-Text-Translation-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Speech-to-Text & Translation App

A production-ready, full-stack application for converting speech to text and translating it into multiple languages. Built with Python (FastAPI), OpenAI Whisper, and Vue.js.

Python FastAPI Vue.js License

✨ Features

  • 🎀 Audio Recording: Record audio directly from your browser
  • πŸ“ File Upload: Support for multiple audio formats (MP3, WAV, M4A, OGG, FLAC, WebM)
  • πŸ—£οΈ Speech-to-Text: High-accuracy transcription using OpenAI Whisper
  • 🌍 Multi-Language Translation: Translate to 30+ languages
  • ⏱️ Timestamped Segments: Get detailed transcription with timestamps
  • πŸ’Ύ Multiple Export Formats: Download as TXT, JSON, or SRT
  • 🐳 Docker Ready: Production-ready containerization
  • βœ… Fully Tested: Comprehensive test suite with pytest
  • 🎨 Modern UI: Beautiful, responsive Vue.js interface

πŸ—οΈ Architecture

Speech-to-Text-Translation-App/
β”œβ”€β”€ backend/                    # Python FastAPI backend
β”‚   β”œβ”€β”€ main.py                # FastAPI application
β”‚   β”œβ”€β”€ models/                # Pydantic schemas
β”‚   └── services/              # Business logic
β”‚       β”œβ”€β”€ speech_to_text.py  # Whisper integration
β”‚       └── translator.py      # Translation service
β”œβ”€β”€ frontend/                   # Vue.js frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/        # Vue components
β”‚   β”‚   β”œβ”€β”€ services/          # API client
β”‚   β”‚   └── App.vue           # Main app component
β”‚   └── package.json
β”œβ”€β”€ tests/                     # Test suite
β”œβ”€β”€ docker-compose.yml         # Docker orchestration
β”œβ”€β”€ Dockerfile                # Production Docker image
└── requirements.txt          # Python dependencies

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • FFmpeg (for audio processing)

Option 1: Docker (Recommended)

# Clone the repository
git clone https://github.com/pyenthusiasts/Speech-to-Text-Translation-App.git
cd Speech-to-Text-Translation-App

# Start with Docker Compose
docker-compose up --build

# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Option 2: Local Development

Backend Setup

# Install Python dependencies
pip install -r requirements.txt

# Start the backend server
cd backend
python main.py

# Or with uvicorn
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

# Install Node dependencies
cd frontend
npm install

# Start development server
npm run dev

# Access at http://localhost:3000

πŸ“– Usage

Web Interface

  1. Upload or Record Audio

    • Click to upload an audio file, or
    • Click "Start Recording" to record directly
  2. Select Languages (Optional)

    • Choose source language (or auto-detect)
    • Select target languages for translation
  3. Process

    • Click "Transcribe & Translate"
    • Wait for processing to complete
  4. Download Results

    • Download transcription as TXT, JSON, or SRT
    • Get translations in all selected languages

API Usage

Transcribe Audio

curl -X POST "http://localhost:8000/api/transcribe" \
  -F "file=@audio.mp3" \
  -F "language=en"

Translate Text

curl -X POST "http://localhost:8000/api/translate" \
  -F "text=Hello, world!" \
  -F "source_language=en" \
  -F "target_languages=es,fr,de"

Full Pipeline (Transcribe + Translate)

curl -X POST "http://localhost:8000/api/process" \
  -F "file=@audio.mp3" \
  -F "source_language=en" \
  -F "target_languages=es,fr,de"

API Documentation

Interactive API documentation available at:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

πŸ§ͺ Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=backend --cov-report=html

# Run specific test file
pytest tests/test_api.py

# Run with verbose output
pytest -v

🌍 Supported Languages

The app supports 30+ languages including:

  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Portuguese (pt)
  • Russian (ru)
  • Japanese (ja)
  • Korean (ko)
  • Chinese (zh-CN, zh-TW)
  • Arabic (ar)
  • Hindi (hi)
  • And many more...

🎯 Whisper Models

Choose from different Whisper model sizes based on your needs:

Model Size Speed Accuracy
tiny ~39 MB ⚑⚑⚑ ⭐⭐
base ~74 MB ⚑⚑ ⭐⭐⭐
small ~244 MB ⚑ ⭐⭐⭐⭐
medium ~769 MB 🐌 ⭐⭐⭐⭐⭐
large ~1550 MB 🐌🐌 ⭐⭐⭐⭐⭐

Configure in .env:

WHISPER_MODEL_SIZE=base

πŸ“¦ Production Deployment

Environment Variables

Create a .env file:

# Backend
API_HOST=0.0.0.0
API_PORT=8000
WHISPER_MODEL_SIZE=base

# Frontend
VITE_API_URL=http://your-domain.com

# Environment
ENVIRONMENT=production

Docker Production Build

# Build production image
docker build -t speech-translation-app .

# Run container
docker run -p 8000:8000 \
  -e WHISPER_MODEL_SIZE=base \
  -v $(pwd)/uploads:/app/uploads \
  speech-translation-app

Nginx Reverse Proxy

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    location /api {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

πŸ”§ Configuration

Backend Configuration

  • API_HOST: Host to bind (default: 0.0.0.0)
  • API_PORT: Port to listen (default: 8000)
  • WHISPER_MODEL_SIZE: Whisper model size (default: base)

Frontend Configuration

  • VITE_API_URL: Backend API URL (default: http://localhost:8000)

πŸ› οΈ Tech Stack

Backend

  • FastAPI: Modern, fast web framework
  • OpenAI Whisper: State-of-the-art speech recognition
  • Deep Translator: Multi-language translation
  • PyTorch: ML framework for Whisper
  • Pydantic: Data validation
  • pytest: Testing framework

Frontend

  • Vue.js 3: Progressive JavaScript framework
  • Vite: Next-generation frontend tooling
  • Axios: HTTP client
  • Pinia: State management

πŸ“ API Endpoints

Method Endpoint Description
GET / Health check
GET /api/languages Get supported languages
POST /api/transcribe Transcribe audio to text
POST /api/translate Translate text
POST /api/process Full pipeline (transcribe + translate)

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“§ Support

For issues and questions:

πŸ—ΊοΈ Roadmap

  • Add support for real-time transcription
  • Implement user authentication
  • Add batch processing for multiple files
  • Support for video files
  • Custom vocabulary and domain-specific models
  • WebSocket support for live transcription
  • Mobile app (React Native)

Made with ❀️ by the PyEnthusiasts Team

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •