Add MetricsManager for custom metric logging by kevinzakka · Pull Request #596 · mujocolab/mjlab

kevinzakka · 2026-02-07T03:11:48Z

Summary

Adds a MetricsManager so users can log custom per-step metrics during training without hacking reward functions or adding zero-weight reward terms. Closes #584.

New: MetricsManager, MetricsTermCfg, NullMetricsManager in managers/metrics_manager.py
Integration: wired into ManagerBasedRlEnv config, step(), and _reset_idx()
Terms use the same callable signature as rewards (env, **params) → Tensor[num_envs]
No weight, no dt scaling — metrics are observational, not reward signals
Episode values are true per-step averages (sum / step_count), so a metric in [0,1] stays in [0,1] in wandb
Empty config means zero overhead (NullMetricsManager)

Example usage

  from mjlab.managers import MetricsTermCfg

  def joint_velocity_magnitude(env, asset_cfg):
    """L1 norm of joint velocities."""
    return env.scene[asset_cfg.name].data.joint_vel.abs().sum(dim=-1)  # (num_envs,)

  @dataclass(kw_only=True)
  class MyEnvCfg(ManagerBasedRlEnvCfg):
    metrics = {
      "joint_vel_mag": MetricsTermCfg(
        func=joint_velocity_magnitude,
        params={"asset_cfg": SceneEntityCfg(name="robot")},
      ),
    }

On episode reset, Episode_Metrics/joint_vel_mag appears in extras["log"] and flows to wandb/tensorboard automatically.

Test plan

uv run pytest tests/test_metrics_manager.py — 6 targeted tests
uv run pytest tests/test_rewards.py — no regression
uv run ty check / uv run pyright — clean
uv run ruff check && uv run ruff format — clean

🤖 Generated with Claude Code

brentyi

+100, seems super useful!

I'm wondering if any if these things are possible in a simple/not overengineered way:

Does it make sense to allow customization of how metrics are "reduced" between steps and before logging? In the extreme case it'd be nice, for example, to be able to specify logging for std of metrics, stds of per-episode means, histograms, histograms of per-episode stds, etc.

Can we implement any of the existing things that are logged (eg rewards) as default terms in the metrics manager?

src/mjlab/envs/manager_based_rl_env.py

src/mjlab/managers/metrics_manager.py

kevinzakka · 2026-02-07T03:52:21Z

Thanks for the review @brentyi!

Does it make sense to allow customization of how metrics are "reduced" between steps and before logging?

The rsl_rl logger only supports scalars. Everything in extras["log"] goes through torch.mean() then add_scalar() (logger.py:171). So histograms/distributions aren't possible without changing the logger. We could try submitting a PR to rsl_rl. In the meantime, for std or other reductions, users can just add a second metric term that computes it directly (e.g. joint_vel_std alongside joint_vel_mean). Both log as scalars and it works today with no extra machinery.

Can we implement any of the existing things that are logged (eg rewards) as default terms in the metrics manager?

In principle yes. Note however that rewards use dt-scaled sums divided by max_episode_length_s while metrics use raw sums divided by step_count. We'd have to add machinery to the metric manager to support this.

brentyi · 2026-02-07T04:28:46Z

In the meantime, for std or other reductions, users can just add a second metric term that computes it directly (e.g. joint_vel_std alongside joint_vel_mean). Both log as scalars and it works today with no extra machinery.

Makes sense!

To check my understanding: we wouldn't be able to reuse intermediates right? And we could compute std across episodes within a single timestep but not across timesteps within an episode? These are a bit annoying but fine.

Note however that rewards use dt-scaled sums divided by max_episode_length_s while metrics use raw sums divided by step_count.

Makes sense. It seems kind of nice to consolidate logic but I don't feel strongly about this.

Adds a MetricsManager so users can log custom per-step metrics without hacking reward functions or adding zero-weight reward terms. Metrics terms use the same callable signature as rewards (env, **params) but have no weight, no dt scaling, and no normalization by episode length. Episode values are true per-step averages (sum / step_count) logged under "Episode_Metrics/{term_name}". Closes #584 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kevinzakka force-pushed the feat/metrics-manager branch from 81c644b to 85df945 Compare February 7, 2026 03:13

kevinzakka mentioned this pull request Feb 7, 2026

Approach to inject and inspect new metrics during training #584

Open

kevinzakka force-pushed the feat/metrics-manager branch from 85df945 to b158504 Compare February 7, 2026 03:20

brentyi reviewed Feb 7, 2026

View reviewed changes

src/mjlab/envs/manager_based_rl_env.py Show resolved Hide resolved

src/mjlab/managers/metrics_manager.py Show resolved Hide resolved

kevinzakka marked this pull request as ready for review February 7, 2026 19:31

kevinzakka force-pushed the feat/metrics-manager branch from b158504 to ac34d2c Compare February 7, 2026 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MetricsManager for custom metric logging#596

Add MetricsManager for custom metric logging#596
kevinzakka wants to merge 1 commit intomainfrom
feat/metrics-manager

kevinzakka commented Feb 7, 2026 •

edited

Loading

Uh oh!

brentyi left a comment

Uh oh!

Uh oh!

Uh oh!

kevinzakka commented Feb 7, 2026

Uh oh!

brentyi commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kevinzakka commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example usage

Test plan

Uh oh!

brentyi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kevinzakka commented Feb 7, 2026

Uh oh!

brentyi commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevinzakka commented Feb 7, 2026 •

edited

Loading