fix(scheduler): correct disk-level storage double counting during replica scheduling#4533
Conversation
…lica scheduling Longhorn: 12653 Signed-off-by: David Cheng <david.cheng@suse.com>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug in the replica scheduler where disk-level storage was being incorrectly double-counted during replica scheduling. When scheduling multiple replicas on the same node but different disks, the scheduler incorrectly counted replica sizes across all disks on a node instead of per-disk, potentially causing false "insufficient storage" errors even when the target disk had enough space.
Changes:
- Added disk-level filtering (
r.Spec.DiskID == diskUUID) to storage accounting logic during replica scheduling - Improved code comment to clarify the purpose of accounting for replicas already assigned during the current scheduling cycle
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Which issue(s) this PR fixes:
longhorn/longhorn#12653
What this PR does / why we need it:
Fixes incorrect disk-level storage accounting during replica scheduling.
Previously, when scheduling multiple replicas on the same node but different disks, the scheduler incorrectly counted replica sizes across disks. This could lead to false "insufficient storage" errors even when the target disk had enough available space.
Special notes for your reviewer:
Additional documentation or context