design draft for rl components API#1477
Merged
jayhenry merged 5 commits intoInternLM:rl_designfrom Feb 5, 2026
Merged
Conversation
YanhuiDua
reviewed
Feb 4, 2026
| class _Buffer: ... | ||
| self.buffer = _Buffer(enable_partial_rollout, tail_batch_candidate_step, tail_batch_trigger_size) | ||
|
|
||
| async def produce_batch(self, batch_size: int, data_mgr: DataManager, agent: Agent): |
Collaborator
There was a problem hiding this comment.
这里传入的data_manager的是传入函数还是传入类比较好,需要问下用户的意见;传入函数无法跳转,传入类的话包含功能较多,用户理解是否有困难
YanhuiDua
reviewed
Feb 4, 2026
| ... | ||
|
|
||
|
|
||
| class DataManager: |
Collaborator
There was a problem hiding this comment.
需要包含所有跟数据有关的功能,jsonl -> RolloutState -> TrainItem
Collaborator
There was a problem hiding this comment.
data_manager的功能是否需要包含这么多,需要再讨论下,问下用户意见
YanhuiDua
reviewed
Feb 4, 2026
YanhuiDua
reviewed
Feb 4, 2026
design/component_rl.py
Outdated
| class Env: | ||
| def __init__(self, rollout_ctl: RolloutController): | ||
| self._agent: Agent = SingleTurnAgent(rollout_ctl) | ||
| self._scheduler: Scheduler = Scheduler() |
Collaborator
There was a problem hiding this comment.
scheduler的名字比较广,可能与推理的scheduler有混淆,是不是换一个名字比较好呢
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.