Skip to content

make_constraint: traverse dof tree when constructing contact constraints and is sparse#929

Merged
thowell merged 3 commits intogoogle-deepmind:mainfrom
thowell:make_constraint_contact
Feb 1, 2026
Merged

make_constraint: traverse dof tree when constructing contact constraints and is sparse#929
thowell merged 3 commits intogoogle-deepmind:mainfrom
thowell:make_constraint_contact

Conversation

@thowell
Copy link
Collaborator

@thowell thowell commented Dec 16, 2025

this pr is part of the effort to implement sparse Jacobians #88

when is_sparse==True, instead of iterating over all dofs to construction contact constraints, traverse dof tree for each contact bodies

mujoco reference: https://github.com/google-deepmind/mujoco/blob/08b4b4144d70c69206f96cf329d5044ae686a1e6/src/engine/engine_core_util.c#L55


humanoid

performance for dense should be unchanged

mjwarp-testspeed ./benchmark/humanoid/humanoid.xml --nconmax=24 --njmax=64 --nworld=8192 --event_trace=True

this pr:

Summary for 8192 parallel rollouts

Total JIT time: 0.33 s
Total simulation time: 2.96 s
Total steps per second: 2,767,435
Total realtime factor: 13,837.18 x
Total time per step: 361.35 ns
Total converged worlds: 8192 / 8192

step: 359.65
  forward: 357.14
    fwd_position: 89.85
      kinematics: 16.37
      com_pos: 5.85
      camlight: 1.75
      flex: 0.17
      crb: 13.22
      tendon_armature: 0.17
      collision: 9.35
        nxn_broadphase: 3.71
        convex_narrowphase: 0.17
        primitive_narrowphase: 4.57
      make_constraint: 39.00

main (bb81495):

Total JIT time: 0.32 s
Total simulation time: 2.96 s
Total steps per second: 2,767,848
Total realtime factor: 13,839.24 x
Total time per step: 361.29 ns
Total converged worlds: 8192 / 8192

step: 359.55
  forward: 357.04
    fwd_position: 89.84
      kinematics: 16.36
      com_pos: 5.85
      camlight: 1.75
      flex: 0.17
      crb: 13.23
      tendon_armature: 0.17
      collision: 9.35
        nxn_broadphase: 3.71
        convex_narrowphase: 0.18
        primitive_narrowphase: 4.57
      make_constraint: 38.99

performance for sparse should be improved

mjwarp-testspeed ./benchmark/humanoid/humanoid.xml --nconmax=24 --njmax=64 --nworld=8192 --event_trace=True -o "opt.is_sparse=True"

this pr:

Total JIT time: 0.81 s
Total simulation time: 3.10 s
Total steps per second: 2,641,695
Total realtime factor: 13,208.47 x
Total time per step: 378.54 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 376.86
  forward: 374.33
    fwd_position: 77.87
      kinematics: 16.36
      com_pos: 5.84
      camlight: 1.74
      flex: 0.17
      crb: 9.98
      tendon_armature: 0.17
      collision: 9.34
        nxn_broadphase: 3.71
        convex_narrowphase: 0.17
        primitive_narrowphase: 4.57
      make_constraint: 30.65

main (bb81495):

Total JIT time: 0.25 s
Total simulation time: 3.17 s
Total steps per second: 2,586,971
Total realtime factor: 12,934.85 x
Total time per step: 386.55 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 384.86
  forward: 382.33
    fwd_position: 86.29
      kinematics: 16.36
      com_pos: 5.84
      camlight: 1.74
      flex: 0.17
      crb: 9.99
      tendon_armature: 0.17
      collision: 9.34
        nxn_broadphase: 3.71
        convex_narrowphase: 0.17
        primitive_narrowphase: 4.56
      make_constraint: 38.73

notes:

  • performance should be further improved once efc.J is represented in a sparse format and it is not necessary to zero memory on each call to make_constraint
  • replacing repeated code in contact_pyramidal and contact_elliptic with a wp.func introduced overhead? as a result, to maintain performance for now there is duplicated code in dense and sparse cases.

todo

  • improve tree traversal performance with dense

@adenzler-nvidia
Copy link
Collaborator

I think even with dense jacobians, the tree traversal strategy could be faster? What do you think? We need to zero the memory though, but I'm sure we can find a good way to do it.

@thowell
Copy link
Collaborator Author

thowell commented Dec 19, 2025

@adenzler-nvidia yes, i think the tree traversal could be faster with dense. added a todo for making the dense version performant with tree traversal

@erikfrey
Copy link
Collaborator

@thowell usually I'm in favor of incremental changes and TODOs, but in this case we're adding some complexity (cache kernel, wp.static etc) that we might remove if it turns out that dof tree traversal makes sense for both dense and sparse.

Would you mind having a go at seeing whether it helps in both cases and if so we can simplify the changes in this PR?

In general I'm not a huge fan of cache_kernel, nested_kernel - I really try to use them sparingly when there's no other choice.

@thowell
Copy link
Collaborator Author

thowell commented Jan 5, 2026

@erikfrey i think ultimately all of the constraint functions will need this complexity in order to support dense and sparse without additional overhead, see #934

@thowell
Copy link
Collaborator Author

thowell commented Jan 10, 2026

one reason why it might not make sense to perform dof traversal for dense is that the efc.J row needs to have elements not visited by dof traversal zeroed. the dof traversal and zeroing might be more expensive that simply iterating over all dofs. in the sparse case it is not necessary to zero elements.

@thowell thowell mentioned this pull request Jan 23, 2026
4 tasks
@thowell thowell force-pushed the make_constraint_contact branch from 660e2c2 to b857db9 Compare January 30, 2026 22:40
@thowell
Copy link
Collaborator Author

thowell commented Jan 30, 2026

refactored the code so that pyramidal and elliptic constraints each have 1 path that utilizes dof traversal

mjwarp-testspeed benchmark/humanoid/humanoid.xml --nconmax=24 --njmax=64

this pr

Loading model from: benchmark/humanoid/humanoid.xml...
  nbody: 17 nv: 27 ngeom: 20 nu: 21 is_sparse: False
  broadphase: NXN broadphase_filter: PLANE|SPHERE|OBB
  solver: NEWTON cone: PYRAMIDAL iterations: 100 iterative linesearch iterations: 50
  integrator: EULER graph_conditional: True
Data
  nworld: 8192 naconmax: 196608 njmax: 64

Rolling out 1000 steps at dt = 0.005...

Summary for 8192 parallel rollouts

Total JIT time: 32.93 s
Total simulation time: 2.85 s
Total steps per second: 2,873,305
Total realtime factor: 14,366.53 x
Total time per step: 348.03 ns
Total converged worlds: 8192 / 8192

main bb81495

Loading model from: benchmark/humanoid/humanoid.xml...
  nbody: 17 nv: 27 ngeom: 20 nu: 21 is_sparse: False
  broadphase: NXN broadphase_filter: PLANE|SPHERE|OBB
  solver: NEWTON cone: PYRAMIDAL iterations: 100 iterative linesearch iterations: 50
  integrator: EULER graph_conditional: True
Data
  nworld: 8192 naconmax: 196608 njmax: 64

Rolling out 1000 steps at dt = 0.005...

Summary for 8192 parallel rollouts

Total JIT time: 2.81 s
Total simulation time: 2.86 s
Total steps per second: 2,867,562
Total realtime factor: 14,337.81 x
Total time per step: 348.73 ns
Total converged worlds: 8192 / 8192

@thowell thowell force-pushed the make_constraint_contact branch from b857db9 to 79575bf Compare January 30, 2026 22:59
Copy link
Collaborator

@erikfrey erikfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one nit

@thowell thowell merged commit 7bb099c into google-deepmind:main Feb 1, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants