Whisperize

A Python application for real-time audio transcription and speaker diarization using Faster-Whisper and PyAnnote.

Features

Real-time audio transcription with Apple Silicon support (MPS)
Advanced speaker diarization using PyAnnote
Support for microphone and audio file input
Multiple Whisper model sizes and quantization options
Configurable via JSON with text or JSON output formats
Thread-safe parallel processing of transcription and diarization

Requirements

Python 3.10

FFmpeg (required for audio processing)

# On macOS using Homebrew
brew install ffmpeg

# On Ubuntu/Debian
sudo apt-get install ffmpeg

# On Windows using Chocolatey
choco install ffmpeg

Apple Silicon Mac recommended for optimal performance with MPS acceleration

Installation

1. Clone the Repository

git clone https://github.com/francescopace/whisperize.git
cd whisperize

2. Create and Activate Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows

3. Install Dependencies

pip install -r requirements.txt

4. Configure the Application

Create a HuggingFace account at https://huggingface.co/
Generate an access token at https://huggingface.co/settings/tokens
Edit config.json and update with your settings:

{
    "huggingface_token": "your_token_here",
    "output_folder": "transcripts/",
    "output_format": "text",
    "model": "turbo",
    "whisper_force_cpu": false,
    "language": "it",
    "buffer_duration": 4
}

Configuration Parameters

huggingface_token (required): Your HuggingFace API token for accessing PyAnnote models
output_folder (required): Directory where transcripts will be saved
output_format (optional): Output format - "text" or "json" (default: "text")
- text: Creates a human-readable transcript with timestamps
- json: Creates both a text file and a structured JSON file with metadata and word-level timestamps
model (optional): Whisper model size - "tiny", "base", "small", "medium", "large", "turbo" (default: "base")
whisper_force_cpu (optional): Force CPU usage even if GPU/MPS is available (default: false)
language (optional): Language code (e.g., "it", "en", "es"). If not specified, language is auto-detected
buffer_duration (optional): Audio buffer duration in seconds (default: 5.0)

Supported Models

Whisper Models

The application uses Faster-Whisper for transcription. Available models:

tiny - Fastest, lowest accuracy
base - Good balance of speed and accuracy
small - Better accuracy, slower
medium - High accuracy
large - Highest accuracy, slowest
turbo - Optimized large model

See Whisper documentation for language support and model details.

Diarization Model

The application uses PyAnnote speaker-diarization-3.1 for speaker identification. This model is automatically loaded and requires a HuggingFace token for access.

Usage

Basic Usage

Microphone Input (default):

python whisperize.py
# or explicitly
python whisperize.py microphone

Audio File Input:

python whisperize.py path/to/audio.wav

Note: Only WAV files (16-bit, mono or stereo) are currently supported.

Output

Transcripts are saved in the output_folder specified in config.json:

Text Format (output_format: "text"):

# Transcript started at 2025-02-11 18:30:00

[00:00:02.500-00:00:05.300] [SPEAKER_00]: Hello, this is a test transcription.
[00:00:06.100-00:00:09.800] [SPEAKER_01]: Yes, I can hear you clearly.

JSON Format (output_format: "json"):

Creates both a .txt file (for real-time monitoring) and a .json file
JSON includes metadata, speaker labels, timestamps, and word-level details with confidence scores

Example JSON structure:

{
  "metadata": {
    "start_time": "2025-02-11T18:30:00",
    "duration": 120.5,
    "model": "turbo",
    "language": "it",
    "source": "microphone"
  },
  "segments": [
    {
      "speaker": "SPEAKER_00",
      "start": 2.500,
      "end": 5.300,
      "text": "Hello, this is a test transcription.",
      "words": [
        {
          "word": "Hello",
          "start": 2.500,
          "end": 2.800,
          "probability": 0.95
        }
      ]
    }
  ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
config.json		config.json
requirements.txt		requirements.txt
whisperize.py		whisperize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisperize

Features

Requirements

Installation

1. Clone the Repository

2. Create and Activate Virtual Environment

3. Install Dependencies

4. Configure the Application

Configuration Parameters

Supported Models

Whisper Models

Diarization Model

Usage

Basic Usage

Output

About

Uh oh!

Languages

francescopace/whisperize

Folders and files

Latest commit

History

Repository files navigation

Whisperize

Features

Requirements

Installation

1. Clone the Repository

2. Create and Activate Virtual Environment

3. Install Dependencies

4. Configure the Application

Configuration Parameters

Supported Models

Whisper Models

Diarization Model

Usage

Basic Usage

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages