Skip to content

Fix for crash in alignment when called with no tokens#1081

Open
balooii wants to merge 1 commit intom-bain:mainfrom
balooii:main
Open

Fix for crash in alignment when called with no tokens#1081
balooii wants to merge 1 commit intom-bain:mainfrom
balooii:main

Conversation

@balooii
Copy link

@balooii balooii commented Mar 20, 2025

Fixes #1048

There are probably better ways to do this but this change restores previous behavior which worked before changes from 65b2332
-> when called with empty list of tokens return empty tensor

@ZodiacFRA
Copy link

ZodiacFRA commented Apr 18, 2025

Worked for me, thanks

EDIT: I was using version 3.3.1

@SoftwareAndOutsourcing
Copy link

I haven't had this issue since upgrading to v3.3.2. Is it still necessary to address it?

@ZodiacFRA
Copy link

ZodiacFRA commented Apr 20, 2025

Hi @SoftwareAndOutsourcing

I was using version 3.3.1, I did not try with 3.3.2.

However in alignment.py, get_wildcard_emission() is the same in both versions so it might be worth it to add this fix (unless the bug has been fixed in another part of the code)

Thank you for your work anyway :)
Have a nice day

bjollans added a commit to bjollans/whisperX that referenced this pull request Jun 7, 2025
@uduntuntu
Copy link

I patched that issue for upper boundary also. Test code below

# Run (in virtualenv):  python test_alignment_out_of_bounds.py
import torch
from whisperx import alignment as A

def run_case(V, tokens, blank_id=0, label=""):
    fe = torch.randn(V, dtype=torch.float32)
    t = torch.tensor(tokens, dtype=torch.long)
    try:
        out = A.get_wildcard_emission(fe, t, blank_id)
        ok = bool(torch.isfinite(out).all())
        print(f"[OK ] {label:40s} -> out.shape={tuple(out.shape)} finite={ok}")
    except Exception as e:
        print(f"[ERR] {label:40s} -> {type(e).__name__}: {e}")

print("Testing whisperx.alignment.get_wildcard_emission")
print("Function:", A.get_wildcard_emission.__code__.co_filename)

# OFF-BY-ONE repro: token == vocab size (V)
run_case(34, [34], 0, "off-by-one (V=34, token=V)")

# Mixed case: wildcard + valid + token==V
run_case(34, [-1, 0, 5, 34], 0, "wildcard + in-range + token=V")

# Edge case: empty emissions (should assert / error, but not IndexError from indexing)
run_case(0, [0], 0, "empty emissions (V=0)")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tensors used as indices must be long, int, byte or bool tensors

4 participants