(fix): refactor audio stage names to be shown after running benchmark by SwekeR-463 · Pull Request #1470 · NVIDIA-NeMo/Curator

SwekeR-463 · 2026-02-07T05:46:31Z

Description

Preserve _stage_perf when stages return new task instances.
Define explicit name fields for audio stages to populate StagePerfStats stage names.

Snippet

After re running python benchmarking/run.py --config benchmarking/nightly-benchmark.yaml --entries audio_fleurs got this output.

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-02-07T05:46:36Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-02-07T05:48:10Z

Greptile Overview

Greptile Summary

Adds explicit name attributes to several audio stages so StagePerfStats / benchmark output shows stable, human-readable stage names.
Updates ProcessingStage.process_batch() to handle None results (filtering) and to preserve _stage_perf when stages return new task instances.
Change is localized to stage metadata and the default batch-processing fallback path; no new APIs introduced.

Confidence Score: 4/5

This PR is likely safe to merge and primarily improves benchmarking/stats reporting, with a small behavior change in the default batch-processing path.
Changes are small and well-scoped. The new None handling aligns process_batch() with the documented contract, and _stage_perf propagation is guarded to only fill missing perf stats. Remaining risk is around assuming _stage_perf is always list-like on both input and outputs, and that all list results contain task objects.
nemo_curator/stages/base.py (process_batch semantics and _stage_perf propagation)

Important Files Changed

Filename	Overview
nemo_curator/stages/base.py	Updates default `ProcessingStage.process_batch()` to (a) skip `None` results and (b) copy `_stage_perf` from input task onto newly-created output tasks when their `_stage_perf` is empty.
nemo_curator/stages/audio/common.py	Adds explicit `name` fields to `GetAudioDurationStage` and `PreserveByValueStage` so benchmark stage perf can show stable stage names.
nemo_curator/stages/audio/io/convert.py	Adds explicit `name = "AudioToDocumentStage"` attribute on the stage class for consistent reporting.
nemo_curator/stages/audio/metrics/get_wer.py	Adds explicit `name` field to `GetPairwiseWerStage` dataclass for consistent stage perf naming.

Sequence Diagram

sequenceDiagram
  autonumber
  participant Exec as Executor/Backend
  participant Stage as ProcessingStage
  participant TaskIn as Input Task
  participant TaskOut as Output Task(s)

  Exec->>Stage: process_batch(tasks)
  loop for each task in tasks
    Stage->>Stage: validate_input(task)
    Stage->>Stage: result = process(task)
    alt result is None
      Stage-->>Exec: skip (filtered out)
    else result is list
      loop for each r in result
        alt r has empty _stage_perf and r != task
          Stage->>TaskOut: r._stage_perf = copy(task._stage_perf)
        end
      end
      Stage-->>Exec: results.extend(result)
    else result is single task
      alt result has empty _stage_perf and result != task
        Stage->>TaskOut: result._stage_perf = copy(task._stage_perf)
      end
      Stage-->>Exec: results.append(result)
    end
  end
  Exec-->>Exec: downstream stages consume results

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-07T05:48:14Z

nemo_curator/stages/base.py

            result = self.process(task)
            if isinstance(result, list):
+                for r in result:
+                    if r is not task and hasattr(r, "_stage_perf") and not r._stage_perf:
+                        r._stage_perf = list(task._stage_perf)
                results.extend(result)
            else:
+                if result is not task and hasattr(result, "_stage_perf") and not result._stage_perf:
+                    result._stage_perf = list(task._stage_perf)
                results.append(result)


None result can crash

ProcessingStage.process() is documented to allow None for filtering, but process_batch() treats any non-list result as a task-like object and then does hasattr(result, "_stage_perf") / results.append(result). With this change, if process() returns None, the else branch will raise (at result is not task / hasattr(...)) and/or append None into results. This is a functional regression in the default batch path for any stage that filters tasks by returning None.

Fix by explicitly handling result is None before the list/non-list logic (skip or continue).

Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>

greptile-apps

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

github-actions bot added the community-request label Feb 7, 2026

SwekeR-463 force-pushed the fix/stage-name-propagate branch from ee5bba1 to 3d7cca0 Compare February 7, 2026 05:47

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

SwekeR-463 added 2 commits February 7, 2026 11:21

Refactor stage names and update paths in configuration files

777d5f6

Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>

Fix processing logic to handle None results in ProcessingStage

f6a5132

Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>

SwekeR-463 force-pushed the fix/stage-name-propagate branch from 2a05acd to f6a5132 Compare February 7, 2026 05:51

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(fix): refactor audio stage names to be shown after running benchmark#1470

(fix): refactor audio stage names to be shown after running benchmark#1470
SwekeR-463 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
SwekeR-463:fix/stage-name-propagate

SwekeR-463 commented Feb 7, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 7, 2026

Uh oh!

greptile-apps bot commented Feb 7, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 7, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SwekeR-463 commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Snippet

Checklist

Uh oh!

copy-pr-bot bot commented Feb 7, 2026

Uh oh!

greptile-apps bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SwekeR-463 commented Feb 7, 2026 •

edited

Loading

greptile-apps bot commented Feb 7, 2026 •

edited

Loading