diff --git a/docs/environment-tutorials/aviary.md b/docs/environment-tutorials/aviary.md new file mode 100644 index 000000000..d29d8f2de --- /dev/null +++ b/docs/environment-tutorials/aviary.md @@ -0,0 +1,48 @@ +(environment-aviary)= + +# Aviary + +Integration with [Future-House/aviary](https://github.com/Future-House/aviary), a gymnasium for defining custom language agent RL environments. + +Aviary is a framework for building custom RL environments with tool use and multi-step reasoning. Environments built in Aviary can be ran through NeMo Gym for training and inference. The library features pre-existing environments on math, general knowledge, biological sequences, scientific literature search, and protein stability. + +--- + +## Available Environments + +The integration includes several pre-built Aviary environments: + +- **GSM8K** (`gsm8k_app.py`) - Grade school math problems with calculator tool +- **HotPotQA** (`hotpotqa_app.py`) - Multi-hop question answering +- **BixBench** (`notebook_app.py`) - Jupyter notebook execution for scientific tasks +- **Client/Proxy** (`client_app.py`) - Generic interface to remote Aviary dataset servers + +--- + +## Example Usage + +### GSM8K Environment + +Run the GSM8K Aviary resources server with a model config: + +```bash +ng_run "+config_paths=[resources_servers/aviary/configs/gsm8k_aviary.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]" +``` + +Collect rollouts: + +```bash +ng_collect_rollouts \ + +agent_name=gsm8k_aviary_agent \ + +input_jsonl_fpath=resources_servers/aviary/data/example.jsonl \ + +output_jsonl_fpath=resources_servers/aviary/data/example_rollouts.jsonl +``` + +--- + +## Reference + +- [Aviary GitHub](https://github.com/Future-House/aviary) - Official Aviary repository +- [Aviary Paper](https://arxiv.org/abs/2412.21154) - Training language agents on challenging scientific tasks +- `resources_servers/aviary/` - NeMo Gym resources server implementations +- `responses_api_agents/aviary_agent/` - NeMo Gym aviary agent integration diff --git a/docs/environment-tutorials/index.md b/docs/environment-tutorials/index.md index 4f053f1f3..39dc51d86 100644 --- a/docs/environment-tutorials/index.md +++ b/docs/environment-tutorials/index.md @@ -125,6 +125,43 @@ Scale environments across machines with containers. :::: +### Integrations + +::::{grid} 1 2 3 3 +:gutter: 2 + +:::{grid-item-card} {octicon}`light-bulb;1.5em;sd-mr-1` Reasoning Gym +:link: reasoning-gym +:link-type: doc + +100+ procedurally generated reasoning tasks across multiple domains. + ++++ +{bdg-secondary}`integration` {bdg-secondary}`15-20 min` +::: + +:::{grid-item-card} {octicon}`beaker;1.5em;sd-mr-1` Aviary +:link: aviary +:link-type: doc + +Custom language agent environments for scientific and reasoning tasks. + ++++ +{bdg-secondary}`integration` {bdg-secondary}`10-15 min` +::: + +:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Verifiers +:link: verifiers +:link-type: doc + +600+ environments from Prime Intellect's Environments Hub. + ++++ +{bdg-secondary}`integration` {bdg-secondary}`20 min` +::: + +:::: + --- ## Learning Path @@ -162,8 +199,9 @@ NeMo Gym includes working examples in `resources_servers/`: | `calendar/` | Multi-turn | State comparison | | `equivalence_llm_judge/` | Single-step | LLM judge with swap check | | `math_with_judge/` | Single-step | Library + judge fallback | -| `aviary/` | Multi-step | Aviary environment integration | +| `aviary/` | Multi-step | Aviary framework integration | | `workplace_assistant/` | Multi-step | Session state, tool routing | +| `reasoning_gym/` | Single-step | Algorithmic verification with reasoning-gym library | :::{tip} Use `ng_init_resources_server +entrypoint=resources_servers/my_env` to scaffold a new environment from a template. diff --git a/docs/environment-tutorials/reasoning-gym.md b/docs/environment-tutorials/reasoning-gym.md new file mode 100644 index 000000000..9697c8a22 --- /dev/null +++ b/docs/environment-tutorials/reasoning-gym.md @@ -0,0 +1,109 @@ +(environment-reasoning-gym)= + +# Reasoning Gym + +Integration with [open-thought/reasoning-gym](https://github.com/open-thought/reasoning-gym), a library of procedural dataset generators and algorithmically verifiable reasoning environments. + +Reasoning Gym provides 100+ tasks over many domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and common games. Tasks are procedurally generated with adjustable complexity and algorithmically verified. + +--- + +## Dataset Preparation + +The integration includes a helper script for creating datasets from reasoning gym tasks. + +**Single task:** +```bash +python resources_servers/reasoning_gym/scripts/create_dataset.py \ + --task knights_knaves \ + --size 500 \ + --seed 42 \ + --output resources_servers/reasoning_gym/data/train_knights_knaves.jsonl +``` + +**Multiple tasks (composite):** +```bash +python resources_servers/reasoning_gym/scripts/create_dataset.py \ + --tasks knights_knaves,syllogisms,leg_counting \ + --size 1000 \ + --output resources_servers/reasoning_gym/data/train_composite.jsonl +``` + +**All tasks in a category:** +```bash +python resources_servers/reasoning_gym/scripts/create_dataset.py \ + --category logic \ + --size 1000 \ + --output resources_servers/reasoning_gym/data/train_logic.jsonl +``` + +**All available tasks:** +```bash +python resources_servers/reasoning_gym/scripts/create_dataset.py \ + --all-tasks \ + --size 1000 \ + --output resources_servers/reasoning_gym/data/train_all.jsonl +``` + +**With custom task configuration:** +```bash +python resources_servers/reasoning_gym/scripts/create_dataset.py \ + --task knights_knaves \ + --size 500 \ + --config '{"n_people": 3, "depth_constraint": 3}' \ + --output resources_servers/reasoning_gym/data/train_hard.jsonl +``` + +--- + +## Rollout Collection + +### Start vLLM Server + +```bash +pip install -U "vllm>=0.12.0" + +wget https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/resolve/main/nano_v3_reasoning_parser.py + +vllm serve nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \ + --max-num-seqs 8 \ + --tensor-parallel-size 1 \ + --max-model-len 262144 \ + --port 10240 \ + --trust-remote-code \ + --tool-call-parser qwen3_coder \ + --reasoning-parser-plugin nano_v3_reasoning_parser.py \ + --reasoning-parser nano_v3 +``` + +### Create env.yaml + +```yaml +policy_base_url: http://localhost:10240/v1 +policy_api_key: EMPTY +policy_model_name: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 +``` + +### Launch NeMo Gym Servers + +```bash +ng_run "+config_paths=[resources_servers/reasoning_gym/configs/reasoning_gym.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]" +``` + +### Collect Rollouts + +```bash +ng_collect_rollouts \ + +agent_name=reasoning_gym_simple_agent \ + +input_jsonl_fpath=resources_servers/reasoning_gym/data/example.jsonl \ + +output_jsonl_fpath=results/reasoning_gym_rollouts.jsonl \ + +limit=5 +``` + +--- + +## Reference + +- [Reasoning Gym GitHub](https://github.com/open-thought/reasoning-gym) +- [Dataset Gallery](https://github.com/open-thought/reasoning-gym/blob/main/GALLERY.md) - Examples of all available tasks +- `resources_servers/reasoning_gym/` - NeMo Gym integration implementation diff --git a/docs/environment-tutorials/verifiers.md b/docs/environment-tutorials/verifiers.md new file mode 100644 index 000000000..be61d55ac --- /dev/null +++ b/docs/environment-tutorials/verifiers.md @@ -0,0 +1,111 @@ +(environment-verifiers)= + +# Verifiers + +Integration with [PrimeIntellect-ai/verifiers](https://github.com/PrimeIntellect-ai/verifiers), enabling environments from Prime Intellect's Environments Hub to run in NeMo Gym. + +Verifiers provides 600+ environments across reasoning, math, and agent tasks. Environments built for Environments Hub can be deployed through NeMo Gym for training with NeMo RL. Unlike typical NeMo Gym environments, verifiers environments handle state management, verification, and tool execution internally without requiring a separate resource server. + +:::{note} +**Multi-turn environments:** Currently require disabling `enforce_monotonicity` in training configuration until token propagation is fully patched. +::: + +--- + +## Install Dependencies + +Install verifiers and prime tools: + +```bash +# From the Gym repository root +uv venv +source .venv/bin/activate +uv sync +uv add verifiers +uv tool install prime +``` + +Install an environment: + +```bash +prime env install primeintellect/acereason-math +``` + +--- + +## Create Dataset + +Generate example tasks: + +```bash +python3 responses_api_agents/verifiers_agent/scripts/create_dataset.py \ + --env-id primeintellect/acereason-math \ + --size 5 \ + --output responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl +``` + +--- + +## Update Agent Requirements + +Add to `responses_api_agents/verifiers_agent/requirements.txt`: + +```txt +-e nemo-gym[dev] @ ../../ +verifiers>=0.1.9 +--extra-index-url https://hub.primeintellect.ai/primeintellect/simple/ +acereason-math +``` + +--- + +## Configure Model Server + +Create `env.yaml` at repository root: + +```yaml +policy_base_url: "http://localhost:8000/v1" +policy_api_key: "dummy" +policy_model_name: "Qwen/Qwen3-4B-Instruct-2507" +``` + +--- + +## Start Model Server + +```bash +uv add vllm +vllm serve Qwen/Qwen3-4B-Instruct-2507 \ + --max-model-len 32768 \ + --reasoning-parser qwen3 \ + --enable-auto-tool-choice \ + --tool-call-parser hermes +``` + +--- + +## Launch NeMo Gym Servers + +```bash +ng_run "+config_paths=[responses_api_agents/verifiers_agent/configs/verifiers_acereason-math.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]" +``` + +--- + +## Collect Rollouts + +```bash +ng_collect_rollouts \ + +agent_name=verifiers_agent \ + +input_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl \ + +output_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example-rollouts.jsonl \ + +limit=5 +``` + +--- + +## Reference + +- [Prime Intellect Environments Hub](https://app.primeintellect.ai/dashboard/environments) - Browse 600+ available environments +- [Verifiers GitHub](https://github.com/PrimeIntellect-ai/verifiers) - Verifiers library +- `responses_api_agents/verifiers_agent/` - NeMo Gym agent integration diff --git a/docs/index.md b/docs/index.md index a8ac3bab5..275d704e3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -407,6 +407,9 @@ Rollout Collection 🟡 Multi-Node Docker 🟡 LLM as Judge 🟡 RLHF Reward Models +Reasoning Gym +Aviary +Verifiers ``` ```{toctree}