Add kernel warning system for overflow detection by adenzler-nvidia · Pull Request #1089 · google-deepmind/mujoco_warp

adenzler-nvidia · 2026-01-29T16:00:59Z

Summary

Adds a structured warning system for detecting and reporting overflow conditions in kernels. Previously, wp.printf calls went to stdout and cluttered output. Now warnings are captured via atomic flags and emitted to stderr with actionable messages.

Changes

New Warning System

WarningType enum for categorizing warnings: NEFC_OVERFLOW, BROADPHASE_OVERFLOW, NARROWPHASE_OVERFLOW, CONTACT_MATCH_OVERFLOW, GJK_ITERATIONS, EPA_HORIZON, HFIELD_OVERFLOW
check_warnings(d) - reads flags from GPU and emits to stderr with suggested values
get_warnings(d) / clear_warnings(d) - utility functions for programmatic access
Warning arrays added to Data: warning (flags) and warning_info (suggested values)

Kernel Changes

All overflow wp.printf calls are now guarded by wp.static(m.opt.warning_printf) for compile-time removal
Kernels set warning flags via wp.atomic_max with suggested parameter values
Affected: forward.py, sensor.py, collision_convex.py, collision_gjk.py

New Option

opt.warning_printf (default: True) - controls whether printf warnings are compiled into kernels
testspeed and viewer disable this and use check_warnings() instead

Cleanup

Removed dead wp.printf calls in smooth.py (unreachable due to io.py validation)

API

Warnings are emitted per-step to stderr:

Warning: nefc overflow - increase njmax to 32
Warning: nefc overflow - increase njmax to 36

Programmatic access:

m.opt.warning_printf = False  # disable printf, use check_warnings instead
mjw.step(m, d)
mjw.check_warnings(d)

Testing

New warning_test.py with 8 tests covering initialization, detection, clearing, graph capture scenarios, and the printf option.

- Add WarningType enum to types.py with overflow warning types: NEFC_OVERFLOW, BROADPHASE_OVERFLOW, NARROWPHASE_OVERFLOW, CONTACT_MATCH_OVERFLOW, GJK_ITERATIONS, EPA_HORIZON - Add warning and warning_info arrays to Data class: - warning: flag array set via atomic_max in kernels - warning_info: stores suggested values for user action - Create warning.py with check_warnings(), get_warnings(), clear_warnings() utilities that read warning flags and emit to stderr via warnings module - Convert forward.py _next_time kernel to factory pattern: - _next_time_printf (with wp.printf, default) - _next_time_silent (without printf) - Both set warning flags via atomic_max - Convert sensor.py _contact_match kernel to factory pattern - Remove dead code wp.printf in smooth.py: - 'unrecognized joint type' - unreachable (all 4 joint types handled) - 'unhandled transmission type' - unreachable (io.py validates) - Add TODO comments for collision_gjk.py printf calls - Integrate check_warnings() into testspeed.py and viewer.py - Export WarningType, check_warnings, get_warnings, clear_warnings from mujoco_warp package

Tests cover: - Warning arrays initialization (shape, initial values) - No warnings when simulation runs normally - Nefc overflow warning triggered when constraints exceed njmax - check_warnings clears flags by default - clear_warnings utility - Multi-step graph captures mid-graph warnings - Single-step graph only reports warning when event occurs (not before) Uses optimized test parameters (large timestep, sphere close to ground) for fast execution (~1s total).

Printfs are now disabled by default. The warning flag system still captures all overflow events. Users should call check_warnings(d) to read and report warnings.

- Add warning_printf field to Option class in types.py (default: True) - Select printf/silent kernel variant based on m.opt.warning_printf - When True (default): emit warnings via printf to stdout - When False: only set warning flags, use check_warnings(d) to read them - testspeed and viewer disable printf and use check_warnings instead - Tests disable printf to avoid stdout spam

- Add warning array parameters to gjk(), _epa(), ccd() in collision_gjk.py - Replace TODOs with atomic_max calls to set warning flags - Keep optional printf controlled by warning_printf parameter - Update collision_convex.py kernels to pass warning arrays through call chain - Pass m.opt.warning_printf to kernel builders for wp.static compilation

- Add HFIELD_OVERFLOW warning type (index 6) - Add warning message for heightfield collision overflow - Convert collision_convex.py heightfield printf to use warning flags

- Move check_warnings call into benchmark loop after wp.synchronize() - Clear warnings after each check for immediate per-step feedback - Remove callstack from warning output for cleaner stderr messages - Remove redundant check_warnings call from testspeed.py

- Add module='unique' to nested kernel factories in forward.py and sensor.py - Reorder warning parameters to end of function signatures in collision_gjk.py - Add proper comments (# Data out:, # Out:) for kernel parameters - Fix whitespace in warning_test.py docstrings

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

- Replace pre-created kernel variants with @cache_kernel pattern - Simplifies code and follows established codebase patterns

- Add NUM_WARNINGS to sizes dict for automatic allocation - Update type annotations to use array('NUM_WARNINGS', ...) syntax - Remove explicit d.warning allocation in make_data/put_data - Skip warning fields when copying from MuJoCo data (mjd.warning is different)

- Update collision_gjk_test.py to pass warning arrays to ccd function - Add warning/warning_info documentation to Data class docstring - Exclude MuJoCo's 'warning' field from union check in types_test.py (our warning field has different semantics than MuJoCo's)

erikfrey · 2026-01-30T21:24:21Z

Nice!

I wouldn't be against defaulting warning_printf to false if we want to discourage this for general use. WDYT @adenzler-nvidia ?

adenzler-nvidia · 2026-01-30T21:28:29Z

I'm torn on that one - I think before we had the printfs we had a lot of silent overflows. Any user not opting in to this warning system would be in that situation again.. So I think it's safer to default to True and give everyone using this warning handler or implementing it themselves the option to turn off the printfs.

thowell · 2026-02-02T18:24:10Z

the overflow warnings are very helpful for debugging and my vote would be to keep them on by default. we could note in the the performance section of the documentation to set this option field to False for best performance?

thowell · 2026-02-05T11:41:36Z

@adenzler-nvidia is it possible for us to utilize this (in combination with some sort of option) to error / stop simulation when a warning is detected?

adenzler-nvidia · 2026-02-05T11:46:26Z

The error handler can definitely be expended to do something like this. I would say it's up to the application integrating MjWarp, I don't think this should be hardcoded into MjWarp.

I can see an app like MjLab implementing their own error handler to do this. We can do it for the renderer/testspeed as well.

This is also the reason I vote for keeping the printfs in by default - it's a good thing for anyone integrating MjWarp. Once you need more than that, you can turn off the prints and implement an error handler.

adenzler-nvidia requested a review from erikfrey January 29, 2026 16:01

adenzler-nvidia mentioned this pull request Jan 29, 2026

[REQ] wp.printf supports stderr NVIDIA/warp#1194

Open

thowell linked an issue Jan 29, 2026 that may be closed by this pull request

wp.printf alternative for reporting overflow #692

Open

adenzler-nvidia force-pushed the warning-system branch from 83caa20 to 987480e Compare January 30, 2026 14:44

adenzler-nvidia added 13 commits January 30, 2026 16:22

Use consistent warnings.warn style with io.py

d0027f7

Default to silent kernels (disable printf, keep warning flags)

614a271

Printfs are now disabled by default. The warning flag system still captures all overflow events. Users should call check_warnings(d) to read and report warnings.

Add heightfield overflow warning to warning system

0d82ae2

- Add HFIELD_OVERFLOW warning type (index 6) - Add warning message for heightfield collision overflow - Convert collision_convex.py heightfield printf to use warning flags

missed file

40552ea

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

Use @cache_kernel for warning kernel factories

bfb3f60

- Replace pre-created kernel variants with @cache_kernel pattern - Simplifies code and follows established codebase patterns

adenzler-nvidia force-pushed the warning-system branch from 15cbaad to 528aa16 Compare January 30, 2026 15:24

Skip graph capture tests on CPU (requires CUDA)

cc1d8e6

Merge branch 'main' into warning-system

a279741

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kernel warning system for overflow detection#1089

Add kernel warning system for overflow detection#1089
adenzler-nvidia wants to merge 15 commits intogoogle-deepmind:mainfrom
adenzler-nvidia:warning-system

adenzler-nvidia commented Jan 29, 2026 •

edited

Loading

Uh oh!

erikfrey commented Jan 30, 2026

Uh oh!

adenzler-nvidia commented Jan 30, 2026

Uh oh!

thowell commented Feb 2, 2026

Uh oh!

thowell commented Feb 5, 2026

Uh oh!

adenzler-nvidia commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adenzler-nvidia commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Warning System

Kernel Changes

New Option

Cleanup

API

Testing

Uh oh!

erikfrey commented Jan 30, 2026

Uh oh!

adenzler-nvidia commented Jan 30, 2026

Uh oh!

thowell commented Feb 2, 2026

Uh oh!

thowell commented Feb 5, 2026

Uh oh!

adenzler-nvidia commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adenzler-nvidia commented Jan 29, 2026 •

edited

Loading