Fix alignment logits when using fusion models in TDT greedy decoder #15311

Garvys · 2026-01-23T10:37:30Z

What does this PR do ?

Fixes an inconsistency in alignment logits when using preserve_alignments=True with TDT models and fusion models (e.g., LM, boosting tree).

Problem

When fusion models are enabled, the label selection logic correctly applies:

If blank is the most likely token before fusion → emit blank
Otherwise → emit the non-blank token with the highest logit after fusion

However, the logits stored in alignments were always the pre-fusion logits. This caused a mismatch where the stored label (correctly chosen with the above logic) did not correspond to the argmax of the stored logits.

Solution

Update the logits sent to alignments to be consistent with the label selection:

If blank was selected (best before fusion) → store pre-fusion logits (unchanged)
If a non-blank token was selected → store post-fusion logits with blank set to -inf

This ensures argmax(logits) == label for every alignment entry, making the alignments reliable for downstream analysis.

Scope

Implementation in torch_impl method only. CUDA graphs implementation is not affected.

Collection: ASR

Changelog

tdt_label_looping.py: Updated torch_impl to store alignment-consistent logits when fusion models are used (both outer loop and inner loop)
test_rnnt_decoding.py: Added test_tdt_greedy_decoding_preserve_alignments_with_fusion to verify label/logits consistency with fusion models

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

None

Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

* Added datasets filtering to the inference script New command line argument: --datasets <dataset1,dataset2,...> where dataset1, dataset2, ... are the names datasets to process in the datasets_json_path file. If not specified, all datasets in the datasets_json_path will be processed. If specified, only the datasets in the list will be processed. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Refined datasets filtering in the inference script * Correctly handle comma-separated list of dataset names in the --datasets argument. * Help text Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Enable label to force CI tests Signed-off-by: Charlie Truong <chtruong@nvidia.com> --------- Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Fejgin, Roy <rfejgin@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

Signed-off-by: Jason <jasoli@nvidia.com> Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

Garvys · 2026-01-23T10:50:28Z

If you have time for a review @nithinraok. Thanks a lot!!

artbataev

First, thanks for this contribution.
Unfortunately, I'm working on a concurrent PR with modifications to the decoding algorithm, so asking to wait before it is merged.
But I would be happy to fix bugs and inconsistencies in this implementation.

Also, it would be great if you could clarify your use-case of alignments: alignments data structure (which stores all logits) requires an enormous amount of memory and makes decoding slower.

Regarding the PR itself:

I agree about using consistent logits with LM scores in alignments (currently inconsistent, but for both TDT and RNN-T)
Also, we need to maintain consistency of tdt_label_looping and rnnt_label_looping
I disagree with using -inf for blank, since it will break some of our use cases of alignments.

If you clarify the core use-case, we could discuss how to implement it.

artbataev · 2026-01-26T19:41:59Z

Concurrent PR with decoding changes: #15315

Garvys · 2026-01-27T18:20:37Z

Hello @artbataev !

Thanks a lot for taking the time to review my PR!

No worries, I can wait.

Regarding my use-case for the alignments data structure: I use it to have access to the top tokens at each decoding step in order to troubleshoot why the model is not behaving as expected. I attached a screenshot of the tool.

It is very interesting to see, at each prediction step, which tokens are the most likely according to the model and to compare that with how the Fusion model is trained. This really allow to have a deep understanding of the behaviour of the models and to see which changes could improve the results (fine tuning, changing how the fusion model is trained ...)

What i noticed is that the token that was predicted at each prediction step was not necessarily the one with the highest score in the logits of the relevant prediction step. My proposed change allows to fix that but indeed I agree that setting -inf for the blank logit is not very satisfying.

When the blank is not the most likely, keeping the blank logit to its value and updating the logits with the fusion scores of all the other tokens won't guarantee that the blank token is not the most likely token.
It does seem to me that the good way of implementing this would be to store the blank logits in another tensor when shallow fusion is applied.

WDYT ? How would you fix this issue with another implementation ?

Also, thanks a lot for linking the other PR. I am having a look !

Thanks a lot for all the PRs you merged related to TDT, very impressive work 💪

github-actions bot added the ASR label Jan 23, 2026

Garvys force-pushed the fix_logits_sf_debug_2 branch from e8ac4c9 to 9a90078 Compare January 23, 2026 10:39

github-actions bot added CI and removed CI labels Jan 23, 2026

Garvys and others added 8 commits January 23, 2026 11:46

Fix 1

4fe9297

Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

Fix for real

87302f0

Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

Clean

8a9df26

Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

Add test

0e726e3

Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

Improve comments

0eaf6da

Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

remove codeowners (NVIDIA-NeMo#15307)

1fe4619

Signed-off-by: Jason <jasoli@nvidia.com> Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

Run black

6264954

Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>

Garvys force-pushed the fix_logits_sf_debug_2 branch from 9244671 to 6264954 Compare January 23, 2026 10:47

github-actions bot added the CI label Jan 23, 2026

Merge branch 'main' into fix_logits_sf_debug_2

1700838

github-actions bot removed the CI label Jan 23, 2026

Garvys marked this pull request as ready for review January 23, 2026 10:49

github-actions bot added the community-request label Jan 23, 2026

Garvys changed the title ~~Fix alignment logits when using fusion models in TDT decoder~~ Fix alignment logits when using fusion models in TDT greedy decoder Jan 23, 2026

nithinraok added the Run CICD label Jan 24, 2026

chtruong814 added the needs-follow-up Issue needs follow-up label Jan 25, 2026

nithinraok temporarily deployed to test January 26, 2026 15:04 — with GitHub Actions Inactive

nithinraok requested a review from artbataev January 26, 2026 15:20

nithinraok added Run CICD and removed Run CICD labels Jan 26, 2026

nithinraok temporarily deployed to test January 26, 2026 15:25 — with GitHub Actions Inactive

artbataev requested changes Jan 26, 2026

View reviewed changes

chtruong814 removed the needs-follow-up Issue needs follow-up label Jan 26, 2026

chtruong814 added the needs-follow-up Issue needs follow-up label Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix alignment logits when using fusion models in TDT greedy decoder #15311

Fix alignment logits when using fusion models in TDT greedy decoder #15311

Garvys commented Jan 23, 2026

Uh oh!

Garvys commented Jan 23, 2026

Uh oh!

artbataev left a comment

Uh oh!

artbataev commented Jan 26, 2026

Uh oh!

Garvys commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix alignment logits when using fusion models in TDT greedy decoder #15311

Are you sure you want to change the base?

Fix alignment logits when using fusion models in TDT greedy decoder #15311

Conversation

Garvys commented Jan 23, 2026

What does this PR do ?

Problem

Solution

Scope

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Garvys commented Jan 23, 2026

Uh oh!

artbataev left a comment

Choose a reason for hiding this comment

Uh oh!

artbataev commented Jan 26, 2026

Uh oh!

Garvys commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants