Skip to content

几个关于paper的细节问题 #59

@lck1201

Description

@lck1201

请问一下:
Q1、GRPO是在chunk-level还是轨迹level做的?如果是chunk-level,chunk-size有做过ablation实验吗?

Q2、假设group number=N,在rollout时,按任务顺序对每一个S都做N次采样吗?

举例而言,从S0开始,采样N个action,分别执行,到达N个S1,对于N个S1,分别再各自采样N组action,到达N^2个 S2,直到结束

这个理解正确吗?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions