[Feat] Support RLDataPacker with several packing strategy#1438
Open
YanhuiDua wants to merge 1 commit intoInternLM:mainfrom
Open
[Feat] Support RLDataPacker with several packing strategy#1438YanhuiDua wants to merge 1 commit intoInternLM:mainfrom
YanhuiDua wants to merge 1 commit intoInternLM:mainfrom
Conversation
Collaborator
Author
|
@copilot generate the description for this PR |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds DataBatchPacker functionality to support efficient batch packing for the RLTrainer. The implementation includes three packing strategies (greedy, balance, native) and utilities for workload calculation and partition balancing using the Karmarkar-Karp algorithm.
Changes:
- Added utility functions for workload calculation and balanced partitioning using the Karmarkar-Karp differencing method
- Implemented DataBatchPacker class with three packing strategies: greedy (maximize pack utilization), balance (token-balanced distribution), and native (simple sample-based splitting)
- Added comprehensive test coverage for all three packing strategies with various edge cases
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| xtuner/v1/rl/utils.py | Added workload calculation function and Karmarkar-Karp algorithm for balanced partitioning of sequence lengths across workers |
| xtuner/v1/rl/base/pack.py | Implemented DataBatchPacker class with three strategies for packing data batches across data parallel ranks and optimizer steps |
| tests/ray/test_pack.py | Added unit tests validating all three packing strategies with various input scenarios and padding expectations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4ef4b9d to
60b8ec2
Compare
60b8ec2 to
5af5e13
Compare
5af5e13 to
2b5aa84
Compare
2b5aa84 to
49e481e
Compare
e4600fb to
3e71db0
Compare
3e71db0 to
0f8f3bb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Key Changes
The PR includes three packing strategies: greedy, balance, native
Native Strategy: It preserves strict order but results in higher padding overhead. It first splits samples across DP ranks, then divides them into optimizer steps, and finally performs packing and padding.
Balance Strategy: It balances the token load across all GPUs for each mini training step. Same to n ative strategy, it splits across DP ranks and optimizer steps, and the performs packing and padding.
Greedy Strategy: It minimize the number of padding tokens. This is same to XTuner's original packing strategy. Unlike the other methods, it packs samples first to fill the max_seq_len as tightly as possible (disregarding the original sample order between steps). These dense packs are then distributed across DP ranks and optimizer steps.
Examples
Test Context:
test_variable_packsintests/ray/test_pack.pyParameters:
[1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800]Strategy 1: Native
按样本数量朴素切分,仅保证样本数均衡,不考虑长度。
Pre-processing (Padding for Divisibility)
补齐样本数量以确保能被 DP Size 整除(添加 padding item
1024)。Split by DP Rank
Split by Optimizer Steps
Pack & Pad (Independent per Step)
Cross-Rank Alignment (Final Result)
对齐各 Rank 在同一 Step 内的 Pack 数量(不足补空包)。
Strategy 2: Balance (Sorted Split)
先排序再切分,将长短样本均匀分配给各卡,减少 Padding 浪费。
Global Sort
Split by DP Rank & Step (Interleaved)
相近长度的样本被分发到不同的卡上以平衡负载。
Pack & Pad
Cross-Rank Alignment
(Skipped as packs are already balanced in this case)
Strategy 3: Greedy (Global Packing First)
先进行全局 Packing,再切分给各卡,最大化填充率。
Global Packing
将所有样本贪心地打成 Pack:
Split by DP Rank
Split by Optimizer Steps