Fix NormalizeReward extra return_rms updates during vector env autoreset by JonahFSD · Pull Request #1526 · Farama-Foundation/Gymnasium

JonahFSD · 2026-02-08T09:47:15Z

The vector NormalizeReward wrapper was counting phantom autoreset steps from SyncVectorEnv as real steps, causing return_rms to receive extra updates that diverged from per-env NormalizeReward behavior.

When SyncVectorEnv autoresets a sub-env (AutoresetMode.NEXT_STEP), it returns reward=0, terminated=False for that sub-env on the following step. The vector NormalizeReward processed this as a normal step — decaying accumulated_reward by gamma and calling return_rms.update() with the stale value.

The fix tracks which sub-envs terminated on the previous step and skips accumulation and RMS updates for those sub-envs on the autoreset step. Also adds a reset() override to clear tracking state.

Type of change

Bug fix (non-breaking change which fixes an issue)

Test changes

Added test_count_against_wrapper verifying return_rms.count matches between vector and per-env NormalizeReward (the exact discrepancy from [Bug Report] NormalizeReward count error during autoreset in environments vectors #1451)
test_functionality expected std updated (fewer RMS updates changes normalization statistics)
test_against_wrapper rtol widened from 0.01 to 0.1 — the old tolerance only held because the extra updates coincidentally reduced vector/single variance divergence

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

The vector `NormalizeReward` wrapper was counting phantom autoreset steps from `SyncVectorEnv` as real steps, causing `return_rms` to receive extra updates that diverged from per-env `NormalizeReward` behavior. When `SyncVectorEnv` autoresets a sub-env (AutoresetMode.NEXT_STEP), it returns reward=0, terminated=False for that sub-env on the following step. The vector `NormalizeReward` processed this as a normal step — decaying `accumulated_reward` by gamma and calling `return_rms.update()` with the stale value. The fix tracks which sub-envs terminated on the previous step and skips accumulation and RMS updates for those sub-envs on the autoreset step. Also adds a `reset()` override to clear tracking state. - Added `test_count_against_wrapper` verifying `return_rms.count` matches between vector and per-env NormalizeReward (the exact discrepancy from Farama-Foundation#1451) - `test_functionality` expected std updated (fewer RMS updates changes normalization statistics) - `test_against_wrapper` rtol widened from 0.01 to 0.1 — the old tolerance only held because the extra updates coincidentally reduced vector/single variance divergence

pseudo-rnd-thoughts

Thanks for the PR, sorry its taken me a couple of days to get to this.
Does this mean that the vector and standard normalize reward wrapper are equivalent?
If so, can we add explicit testing for this

pseudo-rnd-thoughts · 2026-02-18T18:59:21Z

tests/wrappers/vector/test_normalize_reward.py

+    for _ in range(n_steps):
+        vec_env.step(vec_env.action_space.sample())
+
+    assert vec_env.return_rms.count == per_env.envs[0].return_rms.count


Shouldn't the vec_env and per_env return_rms be equivalent now?

updated it, now asserts .count .mean, and .var

…Farama-Foundation#1451

JonahFSD · 2026-02-19T01:25:12Z

Thanks for the PR, sorry its taken me a couple of days to get to this. Does this mean that the vector and standard normalize reward wrapper are equivalent? If so, can we add explicit testing for this

Yes they're equivalent now when comparing a single sub-env. The test now checks all three fields on return_rms, count, mean, and var. There's a tiny float difference (~1e-9) since the vector wrapper accumulates in float32 vs float64 in the per-env wrapper. Worth noting that with multiple sub-envs the vector wrapper shares one RunningMeanStd across all of them, so the stats will differ from independent per-env wrappers. Intentional though, not a bug.

pseudo-rnd-thoughts · 2026-02-19T10:08:07Z

Perfect, thanks for making the update @JonahFSD
All agreed about the limitations of float32 vs float64 and only one environment for testing equivalence

pseudo-rnd-thoughts

LGTM

pseudo-rnd-thoughts requested changes Feb 18, 2026

View reviewed changes

Assert full return_rms equivalence in vector NormalizeReward test. Ref …

ce53c59

…Farama-Foundation#1451

pseudo-rnd-thoughts approved these changes Feb 19, 2026

View reviewed changes

pseudo-rnd-thoughts merged commit 08b7cb7 into Farama-Foundation:main Feb 19, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix NormalizeReward extra return_rms updates during vector env autoreset#1526

Fix NormalizeReward extra return_rms updates during vector env autoreset#1526
pseudo-rnd-thoughts merged 2 commits intoFarama-Foundation:mainfrom
JonahFSD:fix/issue-1451

JonahFSD commented Feb 8, 2026 •

edited

Loading

Uh oh!

pseudo-rnd-thoughts left a comment

Uh oh!

pseudo-rnd-thoughts Feb 18, 2026

Uh oh!

JonahFSD Feb 19, 2026

Uh oh!

JonahFSD commented Feb 19, 2026

Uh oh!

pseudo-rnd-thoughts commented Feb 19, 2026

Uh oh!

pseudo-rnd-thoughts left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Uh oh!

Conversation

JonahFSD commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of change

Test changes

Checklist:

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

JonahFSD Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

JonahFSD commented Feb 19, 2026

Uh oh!

pseudo-rnd-thoughts commented Feb 19, 2026

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

JonahFSD commented Feb 8, 2026 •

edited

Loading