Skip to content

Conversation

@BY571
Copy link
Collaborator

@BY571 BY571 commented Feb 11, 2026

Summary

Fixes #187

  • Track the previous action value in _execute_fractional_action and skip recalculation when the action is unchanged and a position is already open
  • Reset _prev_action_value to None in _reset_position_state so it's cleared on episode boundaries
  • Add parametrized regression test covering spot-long, futures-long, and futures-short repeated actions

Test plan

  • New regression test test_repeated_action_does_not_rebalance verifies position size stays constant across 50 steps of repeated actions (spot and futures)
  • All 1011 existing tests pass (0 failures, 47 skipped)

🤖 Generated with Claude Code

BY571 and others added 2 commits February 11, 2026 12:11
When an agent repeated the same action (e.g. -1.0 = full short), the
environment recalculated the target position from drifting portfolio
value, triggering unintended close_partial/increase trades. Track the
previous action value and skip recalculation when unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused `expected_direction` parameter from test parametrization
- Add `test_action_change_after_repeated_holds_still_executes` to verify
  the fix doesn't lock agents into positions they cannot exit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@BY571
Copy link
Collaborator Author

BY571 commented Feb 11, 2026

@claude review

@claude
Copy link

claude bot commented Feb 11, 2026

Claude finished @BY571's task in 1m 38s —— View job


Code Review

  • Read repository guidelines (CLAUDE.md)
  • Review implementation changes in sequential.py
  • Review test changes in test_sequential.py
  • Analyze logic correctness and edge cases
  • Provide comprehensive feedback

Summary

This PR successfully fixes issue #187 by preventing unnecessary rebalancing when agents repeat the same action. The fix is clean, well-tested, and addresses the root cause of the problem.

Implementation Review (sequential.py)

Changes at sequential.py:461 and sequential.py:604-608

The fix adds a simple guard that tracks the previous action value and skips recalculation when the action is unchanged and a position is already open:

# If action hasn't changed and we already have a position, just hold
if action_value == self._prev_action_value and self.position.position_size != 0:
    self.position.hold_counter += 1
    return {"executed": False, "side": None, "fee_paid": 0.0, "liquidated": False}
self._prev_action_value = action_value

Strengths:

  1. Correct placement: The guard is positioned at the start of _execute_fractional_action, before any position calculations occur
  2. Proper state management: _prev_action_value is initialized to None in _reset_position_state (line 461), ensuring clean episode boundaries
  3. Hold counter update: Correctly increments hold_counter when holding a position
  4. Returns early: Avoids unnecessary computation and maintains consistency with the existing tolerance-based hold logic below

Potential edge case to consider:

The guard checks self.position.position_size != 0, which means the first time an agent opens a position (from flat), the guard doesn't trigger. This is correct. However, consider this sequence:

  1. Action 0.5 → Opens position
  2. Action 0.5 → Guard triggers, holds
  3. Price changes significantly
  4. Action 0.5 → Guard triggers, holds ✓ (even though target position may have drifted due to price change)

This is actually the intended behavior according to your PR description - the fix prevents recalculation when the action is unchanged. However, it's worth noting that this introduces a subtle behavioral change: with fractional sizing, the agent's effective exposure can drift with price movements, but the position size (in base units) stays constant when actions repeat.

This is a trade-off:

  • Before fix: Constant rebalancing to maintain portfolio allocation (constant-leverage behavior)
  • After fix: Hold position when action unchanged (constant-quantity behavior)

Both are valid approaches, and your PR correctly chooses the latter. The PR description acknowledges this is for "repeated actions," so I assume this is the desired behavior.

Test Review (test_sequential.py)

test_repeated_action_does_not_rebalance (lines 806-857)

Excellent regression test that:

  • Tests all three relevant modes: spot-long, futures-long, futures-short
  • Opens a position and repeats the same action 49 times
  • Asserts that position_size never changes (trades_executed == 0)
  • Has clear documentation explaining the issue

test_action_change_after_repeated_holds_still_executes (lines 859-904)

Critical complementary test that ensures the fix doesn't break normal functionality:

  • Verifies that after holding for 10 steps, changing the action still executes
  • Tests closing positions after repeated holds
  • Prevents regression where the guard might "lock" agents into positions

Test quality:

  • ✅ Parametrized to cover multiple trading modes
  • ✅ Clear test names and documentation
  • ✅ Verifies both the fix and potential side effects
  • ✅ Uses meaningful assertions with helpful error messages

Minor observations

  1. Float equality comparison (sequential.py:605):

    if action_value == self._prev_action_value and self.position.position_size != 0:

    You're comparing floats with ==. This works because you're comparing the exact same value from self.action_levels[action_idx], not computed values. However, you might consider adding a comment to clarify this is intentional:

    # Direct comparison is safe - both values come from action_levels array
    if action_value == self._prev_action_value and self.position.position_size != 0:
  2. Type annotation: Consider adding a type hint for _prev_action_value:

    self._prev_action_value: Optional[float] = None

    Though this is a minor style preference and not required.

  3. Test coverage: The tests are comprehensive for the primary use case. One additional edge case test could verify behavior when switching between different non-zero actions (e.g., 0.5 → 0.8), but this is already implicitly covered by the tolerance-based logic below the guard.

Verdict

LGTM

This is a clean, well-tested fix that addresses the root cause of issue #187. The implementation is simple, efficient, and maintains backward compatibility for all other behaviors. The test suite provides excellent coverage of both the fix and potential edge cases.

The only suggestions above are minor style improvements - the functionality is correct as-is.


| Branch: fix/187-fractional-rebalancing

@BY571 BY571 merged commit 48b0a40 into main Feb 11, 2026
2 checks passed
@BY571 BY571 deleted the fix/187-fractional-rebalancing branch February 11, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fractional position sizing: repeating same action causes unintended rebalancing

1 participant