Skip to content

Add kernel warning system for overflow detection#1089

Open
adenzler-nvidia wants to merge 15 commits intogoogle-deepmind:mainfrom
adenzler-nvidia:warning-system
Open

Add kernel warning system for overflow detection#1089
adenzler-nvidia wants to merge 15 commits intogoogle-deepmind:mainfrom
adenzler-nvidia:warning-system

Conversation

@adenzler-nvidia
Copy link
Collaborator

@adenzler-nvidia adenzler-nvidia commented Jan 29, 2026

Summary

Adds a structured warning system for detecting and reporting overflow conditions in kernels. Previously, wp.printf calls went to stdout and cluttered output. Now warnings are captured via atomic flags and emitted to stderr with actionable messages.

Changes

New Warning System

  • WarningType enum for categorizing warnings: NEFC_OVERFLOW, BROADPHASE_OVERFLOW, NARROWPHASE_OVERFLOW, CONTACT_MATCH_OVERFLOW, GJK_ITERATIONS, EPA_HORIZON, HFIELD_OVERFLOW
  • check_warnings(d) - reads flags from GPU and emits to stderr with suggested values
  • get_warnings(d) / clear_warnings(d) - utility functions for programmatic access
  • Warning arrays added to Data: warning (flags) and warning_info (suggested values)

Kernel Changes

  • All overflow wp.printf calls are now guarded by wp.static(m.opt.warning_printf) for compile-time removal
  • Kernels set warning flags via wp.atomic_max with suggested parameter values
  • Affected: forward.py, sensor.py, collision_convex.py, collision_gjk.py

New Option

  • opt.warning_printf (default: True) - controls whether printf warnings are compiled into kernels
  • testspeed and viewer disable this and use check_warnings() instead

Cleanup

  • Removed dead wp.printf calls in smooth.py (unreachable due to io.py validation)

API

Warnings are emitted per-step to stderr:

Warning: nefc overflow - increase njmax to 32
Warning: nefc overflow - increase njmax to 36

Programmatic access:

m.opt.warning_printf = False  # disable printf, use check_warnings instead
mjw.step(m, d)
mjw.check_warnings(d)

Testing

New warning_test.py with 8 tests covering initialization, detection, clearing, graph capture scenarios, and the printf option.

- Add WarningType enum to types.py with overflow warning types:
  NEFC_OVERFLOW, BROADPHASE_OVERFLOW, NARROWPHASE_OVERFLOW,
  CONTACT_MATCH_OVERFLOW, GJK_ITERATIONS, EPA_HORIZON

- Add warning and warning_info arrays to Data class:
  - warning: flag array set via atomic_max in kernels
  - warning_info: stores suggested values for user action

- Create warning.py with check_warnings(), get_warnings(), clear_warnings()
  utilities that read warning flags and emit to stderr via warnings module

- Convert forward.py _next_time kernel to factory pattern:
  - _next_time_printf (with wp.printf, default)
  - _next_time_silent (without printf)
  - Both set warning flags via atomic_max

- Convert sensor.py _contact_match kernel to factory pattern

- Remove dead code wp.printf in smooth.py:
  - 'unrecognized joint type' - unreachable (all 4 joint types handled)
  - 'unhandled transmission type' - unreachable (io.py validates)

- Add TODO comments for collision_gjk.py printf calls

- Integrate check_warnings() into testspeed.py and viewer.py

- Export WarningType, check_warnings, get_warnings, clear_warnings from
  mujoco_warp package
Tests cover:
- Warning arrays initialization (shape, initial values)
- No warnings when simulation runs normally
- Nefc overflow warning triggered when constraints exceed njmax
- check_warnings clears flags by default
- clear_warnings utility
- Multi-step graph captures mid-graph warnings
- Single-step graph only reports warning when event occurs (not before)

Uses optimized test parameters (large timestep, sphere close to ground)
for fast execution (~1s total).
Printfs are now disabled by default. The warning flag system still
captures all overflow events. Users should call check_warnings(d)
to read and report warnings.
- Add warning_printf field to Option class in types.py (default: True)
- Select printf/silent kernel variant based on m.opt.warning_printf
- When True (default): emit warnings via printf to stdout
- When False: only set warning flags, use check_warnings(d) to read them
- testspeed and viewer disable printf and use check_warnings instead
- Tests disable printf to avoid stdout spam
- Add warning array parameters to gjk(), _epa(), ccd() in collision_gjk.py
- Replace TODOs with atomic_max calls to set warning flags
- Keep optional printf controlled by warning_printf parameter
- Update collision_convex.py kernels to pass warning arrays through call chain
- Pass m.opt.warning_printf to kernel builders for wp.static compilation
- Add HFIELD_OVERFLOW warning type (index 6)
- Add warning message for heightfield collision overflow
- Convert collision_convex.py heightfield printf to use warning flags
- Move check_warnings call into benchmark loop after wp.synchronize()
- Clear warnings after each check for immediate per-step feedback
- Remove callstack from warning output for cleaner stderr messages
- Remove redundant check_warnings call from testspeed.py
- Add module='unique' to nested kernel factories in forward.py and sensor.py
- Reorder warning parameters to end of function signatures in collision_gjk.py
- Add proper comments (# Data out:, # Out:) for kernel parameters
- Fix whitespace in warning_test.py docstrings
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
- Replace pre-created kernel variants with @cache_kernel pattern
- Simplifies code and follows established codebase patterns
- Add NUM_WARNINGS to sizes dict for automatic allocation
- Update type annotations to use array('NUM_WARNINGS', ...) syntax
- Remove explicit d.warning allocation in make_data/put_data
- Skip warning fields when copying from MuJoCo data (mjd.warning is different)
- Update collision_gjk_test.py to pass warning arrays to ccd function
- Add warning/warning_info documentation to Data class docstring
- Exclude MuJoCo's 'warning' field from union check in types_test.py
  (our warning field has different semantics than MuJoCo's)
@erikfrey
Copy link
Collaborator

Nice!

I wouldn't be against defaulting warning_printf to false if we want to discourage this for general use. WDYT @adenzler-nvidia ?

@adenzler-nvidia
Copy link
Collaborator Author

I'm torn on that one - I think before we had the printfs we had a lot of silent overflows. Any user not opting in to this warning system would be in that situation again.. So I think it's safer to default to True and give everyone using this warning handler or implementing it themselves the option to turn off the printfs.

@thowell
Copy link
Collaborator

thowell commented Feb 2, 2026

the overflow warnings are very helpful for debugging and my vote would be to keep them on by default. we could note in the the performance section of the documentation to set this option field to False for best performance?

@thowell
Copy link
Collaborator

thowell commented Feb 5, 2026

@adenzler-nvidia is it possible for us to utilize this (in combination with some sort of option) to error / stop simulation when a warning is detected?

@adenzler-nvidia
Copy link
Collaborator Author

The error handler can definitely be expended to do something like this. I would say it's up to the application integrating MjWarp, I don't think this should be hardcoded into MjWarp.

I can see an app like MjLab implementing their own error handler to do this. We can do it for the renderer/testspeed as well.

This is also the reason I vote for keeping the printfs in by default - it's a good thing for anyone integrating MjWarp. Once you need more than that, you can turn off the prints and implement an error handler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wp.printf alternative for reporting overflow

3 participants