make_constraint: traverse dof tree when constructing contact constraints and is sparse by thowell · Pull Request #929 · google-deepmind/mujoco_warp

thowell · 2025-12-16T15:33:53Z

this pr is part of the effort to implement sparse Jacobians #88

when is_sparse==True, instead of iterating over all dofs to construction contact constraints, traverse dof tree for each contact bodies

mujoco reference: https://github.com/google-deepmind/mujoco/blob/08b4b4144d70c69206f96cf329d5044ae686a1e6/src/engine/engine_core_util.c#L55

humanoid

performance for dense should be unchanged

mjwarp-testspeed ./benchmark/humanoid/humanoid.xml --nconmax=24 --njmax=64 --nworld=8192 --event_trace=True

this pr:

Summary for 8192 parallel rollouts

Total JIT time: 0.33 s
Total simulation time: 2.96 s
Total steps per second: 2,767,435
Total realtime factor: 13,837.18 x
Total time per step: 361.35 ns
Total converged worlds: 8192 / 8192

step: 359.65
  forward: 357.14
    fwd_position: 89.85
      kinematics: 16.37
      com_pos: 5.85
      camlight: 1.75
      flex: 0.17
      crb: 13.22
      tendon_armature: 0.17
      collision: 9.35
        nxn_broadphase: 3.71
        convex_narrowphase: 0.17
        primitive_narrowphase: 4.57
      make_constraint: 39.00

main (bb81495):

Total JIT time: 0.32 s
Total simulation time: 2.96 s
Total steps per second: 2,767,848
Total realtime factor: 13,839.24 x
Total time per step: 361.29 ns
Total converged worlds: 8192 / 8192

step: 359.55
  forward: 357.04
    fwd_position: 89.84
      kinematics: 16.36
      com_pos: 5.85
      camlight: 1.75
      flex: 0.17
      crb: 13.23
      tendon_armature: 0.17
      collision: 9.35
        nxn_broadphase: 3.71
        convex_narrowphase: 0.18
        primitive_narrowphase: 4.57
      make_constraint: 38.99

performance for sparse should be improved

mjwarp-testspeed ./benchmark/humanoid/humanoid.xml --nconmax=24 --njmax=64 --nworld=8192 --event_trace=True -o "opt.is_sparse=True"

this pr:

Total JIT time: 0.81 s
Total simulation time: 3.10 s
Total steps per second: 2,641,695
Total realtime factor: 13,208.47 x
Total time per step: 378.54 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 376.86
  forward: 374.33
    fwd_position: 77.87
      kinematics: 16.36
      com_pos: 5.84
      camlight: 1.74
      flex: 0.17
      crb: 9.98
      tendon_armature: 0.17
      collision: 9.34
        nxn_broadphase: 3.71
        convex_narrowphase: 0.17
        primitive_narrowphase: 4.57
      make_constraint: 30.65

main (bb81495):

Total JIT time: 0.25 s
Total simulation time: 3.17 s
Total steps per second: 2,586,971
Total realtime factor: 12,934.85 x
Total time per step: 386.55 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 384.86
  forward: 382.33
    fwd_position: 86.29
      kinematics: 16.36
      com_pos: 5.84
      camlight: 1.74
      flex: 0.17
      crb: 9.99
      tendon_armature: 0.17
      collision: 9.34
        nxn_broadphase: 3.71
        convex_narrowphase: 0.17
        primitive_narrowphase: 4.56
      make_constraint: 38.73

notes:

performance should be further improved once efc.J is represented in a sparse format and it is not necessary to zero memory on each call to make_constraint
replacing repeated code in contact_pyramidal and contact_elliptic with a wp.func introduced overhead? as a result, to maintain performance for now there is duplicated code in dense and sparse cases.

todo

improve tree traversal performance with dense

adenzler-nvidia · 2025-12-17T09:20:40Z

I think even with dense jacobians, the tree traversal strategy could be faster? What do you think? We need to zero the memory though, but I'm sure we can find a good way to do it.

thowell · 2025-12-19T08:18:50Z

@adenzler-nvidia yes, i think the tree traversal could be faster with dense. added a todo for making the dense version performant with tree traversal

erikfrey · 2025-12-27T19:00:38Z

@thowell usually I'm in favor of incremental changes and TODOs, but in this case we're adding some complexity (cache kernel, wp.static etc) that we might remove if it turns out that dof tree traversal makes sense for both dense and sparse.

Would you mind having a go at seeing whether it helps in both cases and if so we can simplify the changes in this PR?

In general I'm not a huge fan of cache_kernel, nested_kernel - I really try to use them sparingly when there's no other choice.

thowell · 2026-01-05T19:07:30Z

@erikfrey i think ultimately all of the constraint functions will need this complexity in order to support dense and sparse without additional overhead, see #934

thowell · 2026-01-10T14:05:56Z

one reason why it might not make sense to perform dof traversal for dense is that the efc.J row needs to have elements not visited by dof traversal zeroed. the dof traversal and zeroing might be more expensive that simply iterating over all dofs. in the sparse case it is not necessary to zero elements.

thowell · 2026-01-30T22:42:24Z

refactored the code so that pyramidal and elliptic constraints each have 1 path that utilizes dof traversal

mjwarp-testspeed benchmark/humanoid/humanoid.xml --nconmax=24 --njmax=64

this pr

Loading model from: benchmark/humanoid/humanoid.xml...
  nbody: 17 nv: 27 ngeom: 20 nu: 21 is_sparse: False
  broadphase: NXN broadphase_filter: PLANE|SPHERE|OBB
  solver: NEWTON cone: PYRAMIDAL iterations: 100 iterative linesearch iterations: 50
  integrator: EULER graph_conditional: True
Data
  nworld: 8192 naconmax: 196608 njmax: 64

Rolling out 1000 steps at dt = 0.005...

Summary for 8192 parallel rollouts

Total JIT time: 32.93 s
Total simulation time: 2.85 s
Total steps per second: 2,873,305
Total realtime factor: 14,366.53 x
Total time per step: 348.03 ns
Total converged worlds: 8192 / 8192

main bb81495

Loading model from: benchmark/humanoid/humanoid.xml...
  nbody: 17 nv: 27 ngeom: 20 nu: 21 is_sparse: False
  broadphase: NXN broadphase_filter: PLANE|SPHERE|OBB
  solver: NEWTON cone: PYRAMIDAL iterations: 100 iterative linesearch iterations: 50
  integrator: EULER graph_conditional: True
Data
  nworld: 8192 naconmax: 196608 njmax: 64

Rolling out 1000 steps at dt = 0.005...

Summary for 8192 parallel rollouts

Total JIT time: 2.81 s
Total simulation time: 2.86 s
Total steps per second: 2,867,562
Total realtime factor: 14,337.81 x
Total time per step: 348.73 ns
Total converged worlds: 8192 / 8192

…tact

erikfrey

Looks good, just one nit

mujoco_warp/_src/constraint.py

thowell mentioned this pull request Jan 23, 2026

JacobianType.SPARSE #88

Open

4 tasks

thowell force-pushed the make_constraint_contact branch from 660e2c2 to b857db9 Compare January 30, 2026 22:40

traverse dof tree

79575bf

thowell force-pushed the make_constraint_contact branch from b857db9 to 79575bf Compare January 30, 2026 22:59

Merge remote-tracking branch 'upstream/main' into make_constraint_con…

375b797

…tact

erikfrey approved these changes Jan 30, 2026

View reviewed changes

mujoco_warp/_src/constraint.py Outdated Show resolved Hide resolved

mujoco_warp/_src/constraint.py Outdated Show resolved Hide resolved

body_weldid

e3ce42e

thowell merged commit 7bb099c into google-deepmind:main Feb 1, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make_constraint: traverse dof tree when constructing contact constraints and is sparse#929

make_constraint: traverse dof tree when constructing contact constraints and is sparse#929
thowell merged 3 commits intogoogle-deepmind:mainfrom
thowell:make_constraint_contact

thowell commented Dec 16, 2025 •

edited

Loading

Uh oh!

adenzler-nvidia commented Dec 17, 2025

Uh oh!

thowell commented Dec 19, 2025 •

edited

Loading

Uh oh!

erikfrey commented Dec 27, 2025

Uh oh!

thowell commented Jan 5, 2026

Uh oh!

thowell commented Jan 10, 2026

Uh oh!

thowell commented Jan 30, 2026

Uh oh!

erikfrey left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thowell commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adenzler-nvidia commented Dec 17, 2025

Uh oh!

thowell commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikfrey commented Dec 27, 2025

Uh oh!

thowell commented Jan 5, 2026

Uh oh!

thowell commented Jan 10, 2026

Uh oh!

thowell commented Jan 30, 2026

Uh oh!

erikfrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thowell commented Dec 16, 2025 •

edited

Loading

thowell commented Dec 19, 2025 •

edited

Loading