[Data] Include logical memory in resource manager scheduling decisions#60774
[Data] Include logical memory in resource manager scheduling decisions#60774machichima wants to merge 6 commits intoray-project:masterfrom
Conversation
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
|
@bveeramani PTAL, thank you! |
There was a problem hiding this comment.
Code Review
This pull request correctly includes logical memory in the resource manager's scheduling decisions. The changes primarily involve renaming ..._processor_usage methods to ..._logical_usage and updating them to account for memory (memory) alongside CPU and GPU. The implementations in ActorPoolMapOperator, TaskPoolMapOperator, and HashShuffleOperator are updated to calculate memory usage from ray_remote_args. The changes are consistent across the codebase, and the new test test_memory_limit_blocks_task_submission effectively verifies that task submission is blocked when memory limits are exceeded. I have one minor suggestion to improve a docstring for clarity. Overall, the code looks good and the changes are well-implemented.
| This method is called by the executor to decide how to allocate processors | ||
| between different operators. |
There was a problem hiding this comment.
For better clarity and to align with the changes in this PR that add memory to resource calculations, consider updating the term "processors" to "resources" in this docstring.
| This method is called by the executor to decide how to allocate processors | |
| between different operators. | |
| This method is called by the executor to decide how to allocate resources | |
| between different operators. |
| can_submit = allocator.can_submit_new_task(o2) | ||
| assert not can_submit, ( | ||
| "Task should be blocked: requires 15GB but only 10GB available" | ||
| ) |
There was a problem hiding this comment.
Test assertion is conditional and may never execute
Medium Severity
The new test test_memory_limit_blocks_task_submission has its key assertion wrapped in if allocator is not None:. If _op_resource_allocator is None (which happens when op_resource_reservation_enabled is False), the test passes without verifying anything. The assertion that memory limits block task submission simply never executes, defeating the test's stated purpose of verifying memory limiting behavior.


Description
Rename:
current_processor_usagetocurrent_logical_usagepending_processor_usagetopending_logical_usagerunning_processor_usagetorunning_logical_usageand update the methods to include logical memory (
memory) alongside CPU and GPU. This ensures the resource manager accounts for memory when making scheduling decisions.Add new test
test_memory_limit_blocks_task_submissionto ensure task submission is blocked when a task requests more memory than availableRelated issues
Closes #60744
Additional information