feat: verifiers / environments hub integration #573
Conversation
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
5197f48 to
efbed85
Compare
ahmadki
left a comment
There was a problem hiding this comment.
tests would also be appreciated
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
|
would like for someone from prime intellect to take a look, and also test more environments to provide a longer list of what is working today (seems to be various version mismatches or other issues with some envs). however, I think this is good to merge. Can always open another PR for more verified environments, or based on PI feedback |
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
|
note that multi-turn does not support on-policy token-id correction with these envs, so training requires disabling this assert. In tests on alphabet-sort with it disabled training seems okay but we need to ensure the token ids look fine without this assert, or add support to use replace_prefix_tokens in nemo rl maybe I am missing something in verifiers, e.g. PrimeIntellect-ai/verifiers#626 |
Signed-off-by: Christian Munley <cmunley@nvidia.com>




enables using environments hub envs in NeMo Gym with NeMo RL for training.
#446