Skip to content

design draft for rl components API#1477

Merged
jayhenry merged 5 commits intoInternLM:rl_designfrom
jayhenry:rl_design
Feb 5, 2026
Merged

design draft for rl components API#1477
jayhenry merged 5 commits intoInternLM:rl_designfrom
jayhenry:rl_design

Conversation

@jayhenry
Copy link
Collaborator

@jayhenry jayhenry commented Feb 4, 2026

No description provided.

class _Buffer: ...
self.buffer = _Buffer(enable_partial_rollout, tail_batch_candidate_step, tail_batch_trigger_size)

async def produce_batch(self, batch_size: int, data_mgr: DataManager, agent: Agent):
Copy link
Collaborator

@YanhuiDua YanhuiDua Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里传入的data_manager的是传入函数还是传入类比较好,需要问下用户的意见;传入函数无法跳转,传入类的话包含功能较多,用户理解是否有困难

...


class DataManager:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要包含所有跟数据有关的功能,jsonl -> RolloutState -> TrainItem

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_manager的功能是否需要包含这么多,需要再讨论下,问下用户意见

class Env:
def __init__(self, rollout_ctl: RolloutController):
self._agent: Agent = SingleTurnAgent(rollout_ctl)
self._scheduler: Scheduler = Scheduler()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheduler的名字比较广,可能与推理的scheduler有混淆,是不是换一个名字比较好呢

@jayhenry jayhenry merged commit 08df53a into InternLM:rl_design Feb 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants