Fix CUDA illegal memory access bug in monotonic_rnnt by SimBe195 · Pull Request #14 · rwth-i6/i6_native_ops

SimBe195 · 2026-02-02T18:25:42Z

The size of the gradient tensor in monotonic RNN-T loss computation is essentially B * T * (S+1) * V. For larger vocabulary sizes and sequence lengths, this size can overflow the signed 32-bit integer limit. In the current implementation of the gradient CUDA kernel, the index for writing into the gradient tensor (grads[bts * *V + v] = ...) has a datatype of int, so such an overflow leads to a negative index and thus an illegal memory access error. Changing the datatype to int64_t fixes the issue.

Change dtype of bts from int to int64_t in grad kernel

afc1f44

SimBe195 requested review from DanEnergetics, JackTemaki, Stefanwuu and curufinwe February 2, 2026 18:25

Stefanwuu approved these changes Feb 2, 2026

View reviewed changes

albertz approved these changes Feb 2, 2026

View reviewed changes

Change to int64_t also for act access index

cc01525

SimBe195 merged commit 422d579 into main Feb 3, 2026
1 check passed

SimBe195 deleted the mono_rnnt_illegal_memory_fix branch February 3, 2026 07:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA illegal memory access bug in monotonic_rnnt#14

Fix CUDA illegal memory access bug in monotonic_rnnt#14
SimBe195 merged 2 commits intomainfrom
mono_rnnt_illegal_memory_fix

SimBe195 commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SimBe195 commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants