Upgrade to pyannote-audio 4 by borgoat · Pull Request #1243 · m-bain/whisperX

borgoat · 2025-10-01T08:03:07Z

There's a couple of new pyannote models:¹ pyannote/speaker-diarization-community-1 (offline) and pyannote/speaker-diarization-precision-2 (hosted by pyannote)

I did a minimal upgrade to pyannote-audio 4.0 here to be able to use it, although I believe to make it work properly we probably need additional arguments: the token parameter changed now since one may have to provide a pyannote AI token to use their cloud model.

https://www.pyannote.ai/blog/community-1 ↩

hbredin · 2025-10-10T12:44:06Z

That's great @borgoat.

FYI, I just released version 4.0.1 of pyannote.audio that fixes support for pyannoteAI premium diarization models.

Therefore, once you update this PR to use 4.0.1, running the following command will perform transcription locally but diarization on pyannoteAI cloud with state-of-the-part Precision2 model.

whisperx --diarize \
         --diarize_model pyannote/speaker-diarization-precision-2 \
         --hf_token {pyannoteAI-api-key} \
         audio.wav

{pyannoteAI-api-key} can be obtained from dashboard.pyannote.ai (you'll automatically get a bunch of free credits).

Enjoy!

stdweird · 2025-10-28T12:35:11Z

@borgoat any chance of rebasing your branch and resolving the conflicts?

borgoat · 2025-10-28T13:02:28Z

I rebased it now, there are just a couple of things to note:

I had to set 3.10 as the minimum python version (same as pyannote)
Right now we're using the HF token for pyannote too - I guess that's incorrect @hbredin assuming one may want to use pyannote.ai?

ErikHeggeli · 2025-10-28T13:04:12Z

Just got it to work, but had to up torch and python. Otherwise throwing "std::bad_alloc"
Also offline use is way easier with this new version.
Got a working version in my forked repo.

stdweird · 2025-10-28T13:13:07Z

@ErikHeggeli i am mainly switching to 4 to make it work offline. if i look in your test branch, the other main difference is the modified yaml file. what is that about? is that something required and should it be included here? (or perhaps you can open PR for it?)

ErikHeggeli · 2025-10-28T13:18:36Z

@ErikHeggeli i am mainly switching to 4 to make it work offline. if i look in your test branch, the other main difference is the modified yaml file. what is that about? is that something required and should it be included here? (or perhaps you can open PR for it?)

Can not see the yaml file you are talking about, what is it called?

ErikHeggeli · 2025-10-28T13:19:44Z

If it is from the test branch it is not needed. That's something needed to make the earlier versions of pyannote work offline.

stdweird · 2025-10-28T13:20:48Z

@ErikHeggeli nvm. i see that your branch ships everything needed to make it offline, not only the changes for pyannote 4.

ErikHeggeli · 2025-10-28T13:28:41Z

Yes, clone the main branch.
Get the models from HuggingFace:
git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1

Make sure the models are actually downloaded and not just some reference/pointer, just check that the models aren't 1 kb. (This happened to me). Produced this error "_pickle.UnpicklingError: invalid load key, 'v'."

And then in diarize.py provide full path to "/path/to/directory/pyannote-speaker-diarization-community-1", if path is wrong you will get "huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name'"

Then it should work offline. Only thing I haven't been able to test is GPU offloading, only have CPU available atm.

stdweird · 2025-10-28T13:31:59Z

wrt the reference pointer, you need git-lfs installed.
wrt offline, euhm, looks like regular whipser usage pulls in at least on file from torch hub and per language one other model, so still puzzling a bit; but i'll get there. it's much easier than reverse engineering needed to make pyannote 3.X work offline ;)

ErikHeggeli · 2025-10-28T14:15:11Z

Yes that was most likely the alignment model (or whisper asr model). You have to download and give the path to those as well, but that always worked as intended offline from my experience.

hbredin · 2025-11-02T15:01:17Z

Right now we're using the HF token for pyannote too - I guess that's incorrect @hbredin assuming one may want to use pyannote.ai?

Using --hf_token {pyannoteAI-api-key} should work just fine.
See https://huggingface.co/pyannote/speaker-diarization-precision-2#usage

to-audiobook · 2025-11-05T22:32:17Z

Just got it to work, but had to up torch and python. Otherwise throwing "std::bad_alloc" Also offline use is way easier with this new version. Got a working version in my forked repo.

thanks to @hbredin I just learned the std::bad_alloc exeption is caused by incompatibilities between torch and torchcodec versions. Because of that we'd better force specific torchcodec versions, depending on which torch version you guys decide to use. torchcodec's github page has a table showing the versions compatibilities.

GUUser91 · 2025-11-16T07:11:18Z

i had to edit /mnt/2tb/whisperX/whisperx/diarize.py
and replace

self.model = Pipeline.from_pretrained(
    model_config, use_token=use_auth_token
).to(device)

with

self.model = Pipeline.from_pretrained(
    model_config, token=use_auth_token
).to(device)

for whisperx to work

boegel · 2026-01-06T10:44:32Z

Any updates on finalizing this?

m-bain#1243 (comment)

stdweird · 2026-01-07T12:59:03Z

@borgoat can you fix the extra code from the @GUUser91 comment. it's needed for pyannote 4.0.3 (and i assume later versions)

edihasaj · 2026-01-07T13:33:00Z

this is an awaited feature! would be very good to have it

zckrs · 2026-01-09T08:24:32Z

friendly ping on this PR @Barabazs 🙏

edihasaj · 2026-01-30T10:24:28Z

would be good to move this forward @Barabazs

m-bain · 2026-02-10T13:03:18Z

Sorry can look into this can you resolve the conflict and ill merge

zckrs · 2026-02-10T16:01:59Z

If needed chore: upgrade pyannote-audio 4 (rebased) 👼

Barabazs · 2026-02-13T12:58:28Z

whisperx/diarize.py

+        output = self.model(
+            audio_data,
+            num_speakers=num_speakers,
+            min_speakers=min_speakers,
+            max_speakers=max_speakers,
+        )

-        diarize_df = pd.DataFrame(diarization.itertracks(yield_label=True), columns=['segment', 'label', 'speaker'])
-        diarize_df['start'] = diarize_df['segment'].apply(lambda x: x.start)
-        diarize_df['end'] = diarize_df['segment'].apply(lambda x: x.end)
+        diarization = output.speaker_diarization
+        embeddings = output.speaker_embeddings
+
+        diarize_df = pd.DataFrame(
+            diarization.itertracks(yield_label=True),
+            columns=["segment", "label", "speaker"],
+        )
+        diarize_df["start"] = diarize_df["segment"].apply(lambda x: x.start)
+        diarize_df["end"] = diarize_df["segment"].apply(lambda x: x.end)

        if return_embeddings and embeddings is not None:
-            speaker_embeddings = {speaker: embeddings[s].tolist() for s, speaker in enumerate(diarization.labels())}
+            speaker_embeddings = {
+                speaker: embeddings[s].tolist()
+                for s, speaker in enumerate(diarization.labels())
+            }


what's the reason for this change?

Because pyannote 4.x no longer returns a single pyannot.core.Annotation object.
It now returns a structured output with speaker_diarization: Annotation and speaker_embeddings: np.ndarray attributes.

Barabazs · 2026-02-13T20:54:34Z

Closing in favor of #1349

borgoat force-pushed the feat/pyannote-audio-4 branch from 569a38a to f570307 Compare October 1, 2025 08:04

borgoat changed the title ~~chore: upgrade to pyannote-audio 4~~ Upgrade to pyannote-audio 4 Oct 1, 2025

ivkond mentioned this pull request Oct 17, 2025

Support of pyannote v4 ? #1240

Closed

borgoat force-pushed the feat/pyannote-audio-4 branch from f570307 to 99a2ed4 Compare October 28, 2025 13:00

jidan064 mentioned this pull request Nov 20, 2025

Any plan to upgrade pyannote dependency version #1295

Open

zckrs added a commit to zckrs/whisperX that referenced this pull request Jan 6, 2026

fix: update argument name for loading diarization model

b62f25a

m-bain#1243 (comment)

borgoat force-pushed the feat/pyannote-audio-4 branch from 99a2ed4 to cf0b6df Compare January 7, 2026 13:12

chore: upgrade pyannote-audio 4

741f8de

borgoat force-pushed the feat/pyannote-audio-4 branch from cf0b6df to 741f8de Compare January 7, 2026 13:13

pavelToman mentioned this pull request Jan 13, 2026

pyannote.audio 4.0.1 or newer + WhisperX vscentrum/vsc-software-stack#612

Open

2 tasks

This was referenced Jan 26, 2026

Deprecated use_auth_token parameter breaks compatibility with pyannote.audio 4.x #1322

Closed

Replace deprecated use_auth_token with token for HuggingFace API compatibility #1323

Closed

zckrs mentioned this pull request Feb 10, 2026

chore: upgrade pyannote-audio 4 (rebased) #1344

Closed

Barabazs requested changes Feb 13, 2026

View reviewed changes

Barabazs closed this Feb 13, 2026

Uh oh!

Conversation

borgoat commented Oct 1, 2025

Footnotes

Uh oh!

hbredin commented Oct 10, 2025

Uh oh!

stdweird commented Oct 28, 2025

Uh oh!

borgoat commented Oct 28, 2025

Uh oh!

ErikHeggeli commented Oct 28, 2025

Uh oh!

stdweird commented Oct 28, 2025

Uh oh!

ErikHeggeli commented Oct 28, 2025

Uh oh!

ErikHeggeli commented Oct 28, 2025

Uh oh!

stdweird commented Oct 28, 2025

Uh oh!

ErikHeggeli commented Oct 28, 2025

Uh oh!

stdweird commented Oct 28, 2025

Uh oh!

ErikHeggeli commented Oct 28, 2025

Uh oh!

hbredin commented Nov 2, 2025

Uh oh!

to-audiobook commented Nov 5, 2025

Uh oh!

GUUser91 commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boegel commented Jan 6, 2026

Uh oh!

stdweird commented Jan 7, 2026

Uh oh!

edihasaj commented Jan 7, 2026

Uh oh!

zckrs commented Jan 9, 2026

Uh oh!

edihasaj commented Jan 30, 2026

Uh oh!

m-bain commented Feb 10, 2026

Uh oh!

zckrs commented Feb 10, 2026

Uh oh!

Barabazs Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

hbredin Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Barabazs Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Barabazs commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

GUUser91 commented Nov 16, 2025 •

edited

Loading