Skip to content

Conversation

@Garvys
Copy link
Contributor

@Garvys Garvys commented Jan 23, 2026

What does this PR do ?

Fixes an inconsistency in alignment logits when using preserve_alignments=True with TDT models and fusion models (e.g., LM, boosting tree).

Problem

When fusion models are enabled, the label selection logic correctly applies:

  • If blank is the most likely token before fusion → emit blank
  • Otherwise → emit the non-blank token with the highest logit after fusion

However, the logits stored in alignments were always the pre-fusion logits. This caused a mismatch where the stored label (correctly chosen with the above logic) did not correspond to the argmax of the stored logits.

Solution

Update the logits sent to alignments to be consistent with the label selection:

  • If blank was selected (best before fusion) → store pre-fusion logits (unchanged)
  • If a non-blank token was selected → store post-fusion logits with blank set to -inf

This ensures argmax(logits) == label for every alignment entry, making the alignments reliable for downstream analysis.

Scope

Implementation in torch_impl method only. CUDA graphs implementation is not affected.

Collection: ASR

Changelog

  • tdt_label_looping.py: Updated torch_impl to store alignment-consistent logits when fusion models are used (both outer loop and inner loop)
  • test_rnnt_decoding.py: Added test_tdt_greedy_decoding_preserve_alignments_with_fusion to verify label/logits consistency with fusion models

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

None

@github-actions github-actions bot added the ASR label Jan 23, 2026
@Garvys Garvys force-pushed the fix_logits_sf_debug_2 branch from e8ac4c9 to 9a90078 Compare January 23, 2026 10:39
@github-actions github-actions bot added CI and removed CI labels Jan 23, 2026
Garvys and others added 8 commits January 23, 2026 11:46
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
* Added datasets filtering to the inference script

New command line argument: --datasets <dataset1,dataset2,...> where
dataset1, dataset2, ... are the names datasets to process in the
datasets_json_path file.

If not specified, all datasets in the datasets_json_path will be processed.
If specified, only the datasets in the list will be processed.

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

* Refined datasets filtering in the inference script

* Correctly handle comma-separated list of dataset names in the --datasets argument.
* Help text

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

* Enable label to force CI tests

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

---------

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Fejgin, Roy <rfejgin@nvidia.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
@Garvys Garvys force-pushed the fix_logits_sf_debug_2 branch from 9244671 to 6264954 Compare January 23, 2026 10:47
@github-actions github-actions bot added the CI label Jan 23, 2026
@github-actions github-actions bot removed the CI label Jan 23, 2026
@Garvys Garvys marked this pull request as ready for review January 23, 2026 10:49
@Garvys
Copy link
Contributor Author

Garvys commented Jan 23, 2026

If you have time for a review @nithinraok. Thanks a lot!!

@Garvys Garvys changed the title Fix alignment logits when using fusion models in TDT decoder Fix alignment logits when using fusion models in TDT greedy decoder Jan 23, 2026
@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Jan 25, 2026
@nithinraok nithinraok requested a review from artbataev January 26, 2026 15:20
Copy link
Collaborator

@artbataev artbataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thanks for this contribution.
Unfortunately, I'm working on a concurrent PR with modifications to the decoding algorithm, so asking to wait before it is merged.
But I would be happy to fix bugs and inconsistencies in this implementation.

Also, it would be great if you could clarify your use-case of alignments: alignments data structure (which stores all logits) requires an enormous amount of memory and makes decoding slower.

Regarding the PR itself:

  • I agree about using consistent logits with LM scores in alignments (currently inconsistent, but for both TDT and RNN-T)
  • Also, we need to maintain consistency of tdt_label_looping and rnnt_label_looping
  • I disagree with using -inf for blank, since it will break some of our use cases of alignments.

If you clarify the core use-case, we could discuss how to implement it.

@artbataev
Copy link
Collaborator

Concurrent PR with decoding changes: #15315

@chtruong814 chtruong814 removed the needs-follow-up Issue needs follow-up label Jan 26, 2026
@Garvys
Copy link
Contributor Author

Garvys commented Jan 27, 2026

Hello @artbataev !

Thanks a lot for taking the time to review my PR!

No worries, I can wait.

Regarding my use-case for the alignments data structure: I use it to have access to the top tokens at each decoding step in order to troubleshoot why the model is not behaving as expected. I attached a screenshot of the tool.

Screenshot 2026-01-21 at 16 20 05

It is very interesting to see, at each prediction step, which tokens are the most likely according to the model and to compare that with how the Fusion model is trained. This really allow to have a deep understanding of the behaviour of the models and to see which changes could improve the results (fine tuning, changing how the fusion model is trained ...)

What i noticed is that the token that was predicted at each prediction step was not necessarily the one with the highest score in the logits of the relevant prediction step. My proposed change allows to fix that but indeed I agree that setting -inf for the blank logit is not very satisfying.

When the blank is not the most likely, keeping the blank logit to its value and updating the logits with the fusion scores of all the other tokens won't guarantee that the blank token is not the most likely token.
It does seem to me that the good way of implementing this would be to store the blank logits in another tensor when shallow fusion is applied.

WDYT ? How would you fix this issue with another implementation ?

Also, thanks a lot for linking the other PR. I am having a look !

Thanks a lot for all the PRs you merged related to TDT, very impressive work 💪

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants