Conversation
569a38a to
f570307
Compare
|
That's great @borgoat. FYI, I just released version Therefore, once you update this PR to use whisperx --diarize \
--diarize_model pyannote/speaker-diarization-precision-2 \
--hf_token {pyannoteAI-api-key} \
audio.wav
Enjoy! |
|
@borgoat any chance of rebasing your branch and resolving the conflicts? |
f570307 to
99a2ed4
Compare
|
I rebased it now, there are just a couple of things to note:
|
|
Just got it to work, but had to up torch and python. Otherwise throwing "std::bad_alloc" |
|
@ErikHeggeli i am mainly switching to 4 to make it work offline. if i look in your test branch, the other main difference is the modified yaml file. what is that about? is that something required and should it be included here? (or perhaps you can open PR for it?) |
Can not see the yaml file you are talking about, what is it called? |
|
If it is from the test branch it is not needed. That's something needed to make the earlier versions of pyannote work offline. |
|
@ErikHeggeli nvm. i see that your branch ships everything needed to make it offline, not only the changes for pyannote 4. |
|
Yes, clone the main branch. Make sure the models are actually downloaded and not just some reference/pointer, just check that the models aren't 1 kb. (This happened to me). Produced this error "_pickle.UnpicklingError: invalid load key, 'v'." And then in diarize.py provide full path to "/path/to/directory/pyannote-speaker-diarization-community-1", if path is wrong you will get "huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name'" Then it should work offline. Only thing I haven't been able to test is GPU offloading, only have CPU available atm. |
|
wrt the reference pointer, you need |
|
Yes that was most likely the alignment model (or whisper asr model). You have to download and give the path to those as well, but that always worked as intended offline from my experience. |
Using |
thanks to @hbredin I just learned the std::bad_alloc exeption is caused by incompatibilities between torch and torchcodec versions. Because of that we'd better force specific torchcodec versions, depending on which torch version you guys decide to use. torchcodec's github page has a table showing the versions compatibilities. |
|
i had to edit /mnt/2tb/whisperX/whisperx/diarize.py with for whisperx to work |
|
Any updates on finalizing this? |
99a2ed4 to
cf0b6df
Compare
cf0b6df to
741f8de
Compare
|
this is an awaited feature! would be very good to have it |
|
friendly ping on this PR @Barabazs 🙏 |
|
would be good to move this forward @Barabazs |
|
Sorry can look into this can you resolve the conflict and ill merge |
|
If needed chore: upgrade pyannote-audio 4 (rebased) 👼 |
| output = self.model( | ||
| audio_data, | ||
| num_speakers=num_speakers, | ||
| min_speakers=min_speakers, | ||
| max_speakers=max_speakers, | ||
| ) | ||
|
|
||
| diarize_df = pd.DataFrame(diarization.itertracks(yield_label=True), columns=['segment', 'label', 'speaker']) | ||
| diarize_df['start'] = diarize_df['segment'].apply(lambda x: x.start) | ||
| diarize_df['end'] = diarize_df['segment'].apply(lambda x: x.end) | ||
| diarization = output.speaker_diarization | ||
| embeddings = output.speaker_embeddings | ||
|
|
||
| diarize_df = pd.DataFrame( | ||
| diarization.itertracks(yield_label=True), | ||
| columns=["segment", "label", "speaker"], | ||
| ) | ||
| diarize_df["start"] = diarize_df["segment"].apply(lambda x: x.start) | ||
| diarize_df["end"] = diarize_df["segment"].apply(lambda x: x.end) | ||
|
|
||
| if return_embeddings and embeddings is not None: | ||
| speaker_embeddings = {speaker: embeddings[s].tolist() for s, speaker in enumerate(diarization.labels())} | ||
| speaker_embeddings = { | ||
| speaker: embeddings[s].tolist() | ||
| for s, speaker in enumerate(diarization.labels()) | ||
| } |
There was a problem hiding this comment.
what's the reason for this change?
There was a problem hiding this comment.
Because pyannote 4.x no longer returns a single pyannot.core.Annotation object.
It now returns a structured output with speaker_diarization: Annotation and speaker_embeddings: np.ndarray attributes.
|
Closing in favor of #1349 |
There's a couple of new pyannote models:1
pyannote/speaker-diarization-community-1(offline) andpyannote/speaker-diarization-precision-2(hosted by pyannote)I did a minimal upgrade to pyannote-audio 4.0 here to be able to use it, although I believe to make it work properly we probably need additional arguments: the token parameter changed now since one may have to provide a pyannote AI token to use their cloud model.
Footnotes
https://www.pyannote.ai/blog/community-1 ↩