Add kernel warning system for overflow detection#1089
Add kernel warning system for overflow detection#1089adenzler-nvidia wants to merge 15 commits intogoogle-deepmind:mainfrom
Conversation
83caa20 to
987480e
Compare
- Add WarningType enum to types.py with overflow warning types: NEFC_OVERFLOW, BROADPHASE_OVERFLOW, NARROWPHASE_OVERFLOW, CONTACT_MATCH_OVERFLOW, GJK_ITERATIONS, EPA_HORIZON - Add warning and warning_info arrays to Data class: - warning: flag array set via atomic_max in kernels - warning_info: stores suggested values for user action - Create warning.py with check_warnings(), get_warnings(), clear_warnings() utilities that read warning flags and emit to stderr via warnings module - Convert forward.py _next_time kernel to factory pattern: - _next_time_printf (with wp.printf, default) - _next_time_silent (without printf) - Both set warning flags via atomic_max - Convert sensor.py _contact_match kernel to factory pattern - Remove dead code wp.printf in smooth.py: - 'unrecognized joint type' - unreachable (all 4 joint types handled) - 'unhandled transmission type' - unreachable (io.py validates) - Add TODO comments for collision_gjk.py printf calls - Integrate check_warnings() into testspeed.py and viewer.py - Export WarningType, check_warnings, get_warnings, clear_warnings from mujoco_warp package
Tests cover: - Warning arrays initialization (shape, initial values) - No warnings when simulation runs normally - Nefc overflow warning triggered when constraints exceed njmax - check_warnings clears flags by default - clear_warnings utility - Multi-step graph captures mid-graph warnings - Single-step graph only reports warning when event occurs (not before) Uses optimized test parameters (large timestep, sphere close to ground) for fast execution (~1s total).
Printfs are now disabled by default. The warning flag system still captures all overflow events. Users should call check_warnings(d) to read and report warnings.
- Add warning_printf field to Option class in types.py (default: True) - Select printf/silent kernel variant based on m.opt.warning_printf - When True (default): emit warnings via printf to stdout - When False: only set warning flags, use check_warnings(d) to read them - testspeed and viewer disable printf and use check_warnings instead - Tests disable printf to avoid stdout spam
- Add warning array parameters to gjk(), _epa(), ccd() in collision_gjk.py - Replace TODOs with atomic_max calls to set warning flags - Keep optional printf controlled by warning_printf parameter - Update collision_convex.py kernels to pass warning arrays through call chain - Pass m.opt.warning_printf to kernel builders for wp.static compilation
- Add HFIELD_OVERFLOW warning type (index 6) - Add warning message for heightfield collision overflow - Convert collision_convex.py heightfield printf to use warning flags
- Move check_warnings call into benchmark loop after wp.synchronize() - Clear warnings after each check for immediate per-step feedback - Remove callstack from warning output for cleaner stderr messages - Remove redundant check_warnings call from testspeed.py
- Add module='unique' to nested kernel factories in forward.py and sensor.py - Reorder warning parameters to end of function signatures in collision_gjk.py - Add proper comments (# Data out:, # Out:) for kernel parameters - Fix whitespace in warning_test.py docstrings
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
- Replace pre-created kernel variants with @cache_kernel pattern - Simplifies code and follows established codebase patterns
- Add NUM_WARNINGS to sizes dict for automatic allocation
- Update type annotations to use array('NUM_WARNINGS', ...) syntax
- Remove explicit d.warning allocation in make_data/put_data
- Skip warning fields when copying from MuJoCo data (mjd.warning is different)
- Update collision_gjk_test.py to pass warning arrays to ccd function - Add warning/warning_info documentation to Data class docstring - Exclude MuJoCo's 'warning' field from union check in types_test.py (our warning field has different semantics than MuJoCo's)
15cbaad to
528aa16
Compare
|
Nice! I wouldn't be against defaulting |
|
I'm torn on that one - I think before we had the printfs we had a lot of silent overflows. Any user not opting in to this warning system would be in that situation again.. So I think it's safer to default to True and give everyone using this warning handler or implementing it themselves the option to turn off the printfs. |
|
the overflow warnings are very helpful for debugging and my vote would be to keep them on by default. we could note in the the performance section of the documentation to set this option field to |
|
@adenzler-nvidia is it possible for us to utilize this (in combination with some sort of option) to error / stop simulation when a warning is detected? |
|
The error handler can definitely be expended to do something like this. I would say it's up to the application integrating MjWarp, I don't think this should be hardcoded into MjWarp. I can see an app like MjLab implementing their own error handler to do this. We can do it for the renderer/testspeed as well. This is also the reason I vote for keeping the printfs in by default - it's a good thing for anyone integrating MjWarp. Once you need more than that, you can turn off the prints and implement an error handler. |
Summary
Adds a structured warning system for detecting and reporting overflow conditions in kernels. Previously,
wp.printfcalls went to stdout and cluttered output. Now warnings are captured via atomic flags and emitted to stderr with actionable messages.Changes
New Warning System
WarningTypeenum for categorizing warnings:NEFC_OVERFLOW,BROADPHASE_OVERFLOW,NARROWPHASE_OVERFLOW,CONTACT_MATCH_OVERFLOW,GJK_ITERATIONS,EPA_HORIZON,HFIELD_OVERFLOWcheck_warnings(d)- reads flags from GPU and emits to stderr with suggested valuesget_warnings(d)/clear_warnings(d)- utility functions for programmatic accessData:warning(flags) andwarning_info(suggested values)Kernel Changes
wp.printfcalls are now guarded bywp.static(m.opt.warning_printf)for compile-time removalwp.atomic_maxwith suggested parameter valuesforward.py,sensor.py,collision_convex.py,collision_gjk.pyNew Option
opt.warning_printf(default:True) - controls whether printf warnings are compiled into kernelstestspeedandviewerdisable this and usecheck_warnings()insteadCleanup
wp.printfcalls insmooth.py(unreachable due toio.pyvalidation)API
Warnings are emitted per-step to stderr:
Programmatic access:
Testing
New
warning_test.pywith 8 tests covering initialization, detection, clearing, graph capture scenarios, and the printf option.