Skip to content

add torch profiler plugin and call it in profiler scripts#15299

Open
briancoutinho wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
briancoutinho:bcoutinho/torch_profiling_plugin
Open

add torch profiler plugin and call it in profiler scripts#15299
briancoutinho wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
briancoutinho:bcoutinho/torch_profiling_plugin

Conversation

@briancoutinho
Copy link

@briancoutinho briancoutinho commented Jan 14, 2026


Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Adds a Torch profiler plugin and updates Torch profiler callback. Performance scripts update to have ready access to torch profiler now.

Changelog

  1. Update Torch profiler callbacks so we can optionally enable chakra execution trace.
  2. Add a Torch profiler NeMo Run plugin (see nemo/lightning/run/plugins.py).
  3. Update performance scripts to include torch profiler plugin (see nemo/scripts/performance/...)
  4. Adds an option to set the torch profiling output directory either by an environment variable or a cmd line flag in the performance scripts.

Usage

Thanks @shengf-nv for help with testing this change.

Plugin usage in Nemo Run (Python example)

We added a new nemo run plugin to add PyTorch profiling.
One can add the plugin like

plugins = []
...
plugins += PyTorchProfilerPlugin(
    start_step=start_iter,
    end_step=end_iter,
    output_path=log_dir,
    profiler_kwargs={
        "with_stack": os.environ.get('TORCH_PROFILER_WITH_STACK', '0') == '1',
    }
)
...
with run.Experiment("llama3_8b_nsys_profiling") as exp:
    exp.add(
        recipe,
        executor=executor,
        plugins=plugins,
    )
    exp.run()

In the nemo performance scripts scripts/performance, you can use the following helper function

    if torch_profiler_plugin := build_torch_profiler_plugin(args):
        plugins.append(torch_profiler_plugin)

Sample output

In the logs we should the profiling configured as

[default0]:[NeMo I 2025-09-25 15:22:14 nemo_logging:393] PyTorch profiling initialized:
[default0]:     - Start Step: 36
[default0]:     - End Step: 40
[default0]:     - Warmup Steps: 2
[default0]:     - Active Steps: 4
[default0]:     - Trace Directory: /home/bcoutinho/nemo_experiments/torch_and_nsys
[default0]:     - Collect Execution Trace: False
[default0]:     - Extra profiler kwargs: {'with_stack': False}

After the correct iteration you will see logs dumped

[default0]:Training epoch 0, iteration 38/49 | lr: 2.335e-05 | global_batch_size: 32 | global_step: 38 | reduced_train_loss: 10.88 | train_step_timing in s: 3.106 | consumed_samples: 1248
[default0]:[NeMo I 2025-09-25 15:24:58 nemo_logging:393] Kineto trace saved: /home/bcoutinho/nemo_experiments/torch_and_nsys/torch_profiler/rank-0.json.gz

Performance Script Example

When Nemo is run in a container/docker the output path basically changes. Use an environment variable or a flag to set this path.

# By setting env. Note this is the path mounted inside the docker image.
export TORCH_PROFILES_DIR='/nemo_run'

# Call perf script
python3 -m scripts.performance.llm.pretrain_nemotronh_....

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests? There is an existing test
  • Did you add or update any necessary documentation? Not sure if this can be documented.
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc) No
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

N/A

…o#14779)

Signed-off-by: Brian Coutinho <bcoutinho@nvidia.com>
---------
Copy link

@shengf-nv shengf-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my inline comments

Signed-off-by: Brian Coutinho <bcoutinho@nvidia.com>
@briancoutinho briancoutinho marked this pull request as ready for review January 15, 2026 23:52
Copy link

@shengf-nv shengf-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG2M, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants