Skip to content

Remove epa_pr#1014

Draft
kbayes wants to merge 2 commits intogoogle-deepmind:mainfrom
kbayes:facepr
Draft

Remove epa_pr#1014
kbayes wants to merge 2 commits intogoogle-deepmind:mainfrom
kbayes:facepr

Conversation

@kbayes
Copy link
Member

@kbayes kbayes commented Jan 13, 2026

DO NOT MERGE (needs further benchmarking)

mjwarp-testspeed benchmark/aloha_pot/scene.xml --nconmax=24 --njmax=128 --event_trace --memory

Before:

Summary for 8192 parallel rollouts

Total JIT time: 0.89 s
Total simulation time: 4.23 s
Total steps per second: 1,938,721
Total realtime factor: 3,877.44 x
Total time per step: 515.80 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 513.39
  forward: 506.49
    fwd_position: 238.21
      kinematics: 34.60
      com_pos: 8.99
      camlight: 1.86
      flex: 0.18
      crb: 10.23
      tendon_armature: 0.19
      collision: 162.63
        nxn_broadphase: 84.31
        convex_narrowphase: 75.09
        primitive_narrowphase: 2.24
      make_constraint: 16.21
      transmission: 1.36
    sensor_pos: 0.18
    fwd_velocity: 30.68
      com_vel: 8.88
      passive: 1.14
      rne: 12.29
      tendon_bias: 0.19
    sensor_vel: 0.19
    fwd_actuation: 1.46
    fwd_acceleration: 7.54
      xfrc_accumulate: 2.00
    solve: 226.10
      mul_m: 2.56
    sensor_acc: 0.18
  euler: 6.30

Model memory 5.38 MB (0.36% of used memory):
 (no field >= 1% of used memory)
Data memory 427.43 MB (28.46% of used memory):
 geom_xpos: 19.12 MB (1.27%)
 geom_xmat: 57.38 MB (3.82%)
 qM: 18.00 MB (1.20%)
 qLD: 16.53 MB (1.10%)
 efc.J: 96.00 MB (6.39%)
Other memory: 1069.19 MB (71.18% of used memory)
Total memory: 1502.00 MB (3.10% of total device memory)

After:

Summary for 8192 parallel rollouts

Total JIT time: 0.90 s
Total simulation time: 4.22 s
Total steps per second: 1,939,639
Total realtime factor: 3,879.28 x
Total time per step: 515.56 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 513.36
  forward: 506.49
    fwd_position: 239.93
      kinematics: 34.67
      com_pos: 8.99
      camlight: 1.86
      flex: 0.18
      crb: 10.22
      tendon_armature: 0.19
      collision: 164.55
        nxn_broadphase: 84.36
        convex_narrowphase: 76.99
        primitive_narrowphase: 2.24
      make_constraint: 16.02
      transmission: 1.34
    sensor_pos: 0.18
    fwd_velocity: 30.62
      com_vel: 8.87
      passive: 1.13
      rne: 12.28
      tendon_bias: 0.18
    sensor_vel: 0.18
    fwd_actuation: 1.45
    fwd_acceleration: 7.52
      xfrc_accumulate: 2.00
    solve: 224.49
      mul_m: 2.53
    sensor_acc: 0.18
  euler: 6.27

Model memory 5.38 MB (0.50% of used memory):
 (no field >= 1% of used memory)
Data memory 427.43 MB (39.36% of used memory):
 geom_xpos: 19.12 MB (1.76%)
 geom_xmat: 57.38 MB (5.28%)
 qM: 18.00 MB (1.66%)
 qLD: 16.53 MB (1.52%)
 efc.J: 96.00 MB (8.84%)
 efc.quad: 12.00 MB (1.10%)
Other memory: 653.19 MB (60.15% of used memory)
Total memory: 1086.00 MB (2.24% of total device memory)

@thowell
Copy link
Collaborator

thowell commented Jan 13, 2026

these changes supersede #956?

@kbayes
Copy link
Member Author

kbayes commented Jan 13, 2026

Here's a test of 30 cylinders rolling on each other in a box (worse case with hight ccd iterations):

before

Summary for 8192 parallel rollouts

Total JIT time: 3.74 s
Total simulation time: 210.74 s
Total steps per second: 38,872
Total realtime factor: 77.74 x
Total time per step: 25725.30 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 25717.28
  forward: 25660.35
    fwd_position: 3409.81
      kinematics: 11.80
      com_pos: 31.40
      camlight: 0.19
      flex: 0.18
      crb: 10.82
      tendon_armature: 0.19
      collision: 1882.94
        nxn_broadphase: 11.04
        convex_narrowphase: 1854.81
        primitive_narrowphase: 16.12
      make_constraint: 1470.12
      transmission: 0.19
    sensor_pos: 0.18
    fwd_velocity: 46.15
      com_vel: 21.54
      passive: 4.76
      rne: 18.71
      tendon_bias: 0.18
    sensor_vel: 0.19
    fwd_actuation: 0.61
    fwd_acceleration: 52.49
      xfrc_accumulate: 5.35
    solve: 22148.70
      mul_m: 10.26
    sensor_acc: 0.19
  euler: 56.34

Model memory 0.18 MB (0.00% of used memory):
 (no field >= 1% of used memory)
Data memory 5107.07 MB (27.11% of used memory):
 efc.J: 4224.00 MB (22.42%)
Other memory: 13732.74 MB (72.89% of used memory)
Total memory: 18840.00 MB (38.84% of total device memory)

after

Summary for 8192 parallel rollouts

Total JIT time: 10.47 s
Total simulation time: 245.23 s
Total steps per second: 33,405
Total realtime factor: 66.81 x
Total time per step: 29935.28 ns
Total converged worlds: 8191 / 8192

Event trace:

step: 29929.08
  forward: 29873.35
    fwd_position: 3848.42
      kinematics: 11.57
      com_pos: 30.68
      camlight: 0.19
      flex: 0.19
      crb: 10.63
      tendon_armature: 0.19
      collision: 2327.71
        nxn_broadphase: 10.79
        convex_narrowphase: 2299.73
        primitive_narrowphase: 16.21
      make_constraint: 1465.10
      transmission: 0.19
    sensor_pos: 0.19
    fwd_velocity: 45.96
      com_vel: 21.48
      passive: 4.72
      rne: 18.60
      tendon_bias: 0.19
    sensor_vel: 0.19
    fwd_actuation: 0.61
    fwd_acceleration: 51.91
      xfrc_accumulate: 5.30
    solve: 25923.85
      mul_m: 10.20
    sensor_acc: 0.19
  euler: 55.13

Model memory 0.18 MB (0.00% of used memory):
 (no field >= 1% of used memory)
Data memory 5107.07 MB (36.96% of used memory):
 efc.J: 4224.00 MB (30.57%)
Other memory: 8708.74 MB (63.03% of used memory)
Total memory: 13816.00 MB (28.48% of total device memory)

@thowell
Copy link
Collaborator

thowell commented Jan 29, 2026

@kbayes what is the status of this pr?

@kbayes
Copy link
Member Author

kbayes commented Jan 30, 2026

@thowell this is WIP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants