Skip to content

Commit c34cfe1

Browse files
cwing-nvidiajkyi-nvidiacmunley1
authored
Clarify training environment framing and align docs messaging (#438)
## Summary Revises About page and aligns messaging across docs homepage, README, and Core Components. Addresses #384 - Clarifies "Agents" are server components, not AI agents being trained. ## Changes - **About page**: Added Motivation + NeMo Gym sections; reframed components as "server components that make up a training environment" - **Docs homepage + README**: Aligned intro messaging - **Core Components**: Renamed from `core-abstractions.md`; updated Agents/Resources definitions based on code; added Tasks to examples; added Azure OpenAI model ## Key Alignment All pages now consistently frame: training environment = Agents + Models + Resources (server components) --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Junkeun Yi <jkyi@nvidia.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Co-authored-by: jkyi-nvidia <jkyi@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com>
1 parent 52b1450 commit c34cfe1

File tree

6 files changed

+60
-38
lines changed

6 files changed

+60
-38
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# NeMo Gym
22

3-
NeMo Gym is a framework for building reinforcement learning environments to train large language models.
3+
NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework.
44

55
NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), NVIDIA’s GPU-accelerated platform for building and training generative AI models.
66

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
(core-abstractions)=
1+
(core-components)=
22

3-
# Core Abstractions
3+
# Core Components
44

5-
Before diving into code, let's understand the three core abstractions in NeMo Gym.
5+
Before diving into code, let's understand the three server components that make up a training environment in NeMo Gym.
66

77
> If you are new to reinforcement learning for LLMs, we recommend you refer to **[Key Terminology](./key-terminology)** first.
88
@@ -20,7 +20,8 @@ Responses API Model servers are stateless model endpoints that perform single-ca
2020

2121
**Available Implementations:**
2222

23-
- `openai_model`: Direct integration with OpenAI's Responses API
23+
- `openai_model`: Integration with OpenAI's Responses API
24+
- `azure_openai_model`: Integration with Azure OpenAI API
2425
- `vllm_model`: Middleware converting local models (using vLLM) to Responses API format
2526

2627
**Configuration:** Models are configured with API endpoints and credentials using YAML files in `responses_api_models/*/configs/`
@@ -29,45 +30,53 @@ Responses API Model servers are stateless model endpoints that perform single-ca
2930

3031
:::{tab-item} Resources
3132

32-
Resources servers provide tool implementations that can be invoked via tool calling and verification logic that measures task performance. NeMo Gym includes various NVIDIA and community-contributed resources servers for use during training, and provides tutorials for creating your own Resource server.
33+
Resource servers host the components and logic of environments including multi-step state persistence, tool and reward function implementations. Resource servers are responsible for returning observations, such as tool results or updated environment state, and rewards as a result of actions taken by the policy model. Actions can be moves in a game, tool calls, or anything an agent can do. NeMo Gym contains a variety of NVIDIA and community contributed resource servers that you can use during training. We also have tutorials on how to add your own resource server.
3334

34-
**What Resources Provide**
35+
**Examples of Resources**
3536

36-
Each resource server combines both tools and {term}`verification <Verifier>` logic:
37+
A resource server usually provides tasks, possible actions, and {term}`verification <Verifier>` logic:
3738

38-
- **Tools**: Functions agents can call during task execution
39+
- **Tasks**: Problems or prompts that agents solve during rollouts
40+
- **Actions**: Actions agents can take during rollouts, including tool calling
3941
- **Verification logic**: Scoring logic that evaluates performance (returns {term}`reward signals <Reward / Reward Signal>` for training)
4042

4143
**Example Resource Servers**
4244

43-
Each example shows what **tools** the agent can use and what **verification logic** measures success:
45+
Each example shows what **task** the agent solves, what **actions** are available, and what **verification logic** measures success:
4446

4547
- **[`google_search`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/google_search)**: Web search with verification
46-
- **Tools**: `search()` queries Google API; `browse()` extracts webpage content
48+
- **Task**: Answer knowledge questions using web search
49+
- **Actions**: `search()` queries Google API; `browse()` extracts webpage content
4750
- **Verification logic**: Checks if final answer matches expected result for MCQA questions
4851

4952
- **[`math_with_code`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/math_with_code)**: Mathematical reasoning with code execution
50-
- **Tool**: `execute_python()` runs Python code with numpy, scipy, pandas
53+
- **Task**: Solve math problems using Python
54+
- **Actions**: `execute_python()` runs Python code with numpy, scipy, pandas
5155
- **Verification logic**: Extracts boxed answer and checks mathematical correctness
5256

5357
- **[`code_gen`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/code_gen)**: Competitive programming problems
54-
- **Tools**: None (agent generates code directly)
58+
- **Task**: Implement solutions to coding problems
59+
- **Actions**: None (agent generates code directly)
5560
- **Verification logic**: Executes generated code against unit test inputs/outputs
5661

5762
- **[`math_with_judge`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/math_with_judge)**: Mathematical problem solving
58-
- **Tools**: None (or can be combined with `math_with_code`)
63+
- **Task**: Solve math problems
64+
- **Actions**: None (or can be combined with `math_with_code`)
5965
- **Verification logic**: Uses math library + LLM judge to verify answer equivalence
6066

6167
- **[`mcqa`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/mcqa)**: Multiple choice question answering
62-
- **Tools**: None (knowledge-based reasoning)
68+
- **Task**: Answer multiple choice questions
69+
- **Actions**: None (knowledge-based reasoning)
6370
- **Verification logic**: Checks if selected option matches ground truth
6471

6572
- **[`instruction_following`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/instruction_following)**: Instruction compliance evaluation
66-
- **Tools**: None (evaluates response format/content)
73+
- **Task**: Follow specified instructions
74+
- **Actions**: None (evaluates response format/content)
6775
- **Verification logic**: Checks if response follows all specified instructions
6876

6977
- **[`simple_weather`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/example_simple_weather)**: Mock weather API
70-
- **Tool**: `get_weather()` returns mock weather data
78+
- **Task**: Report weather information
79+
- **Actions**: `get_weather()` returns mock weather data
7180
- **Verification logic**: Checks if weather tool was called correctly
7281

7382
**Configuration**: Refer to resource-specific config files in `resources_servers/*/configs/`
@@ -76,14 +85,12 @@ Each example shows what **tools** the agent can use and what **verification logi
7685

7786
:::{tab-item} Agents
7887

79-
Responses API Agent servers {term}`orchestrate <Orchestration>` the interaction between models and resources.
88+
Responses API Agent servers {term}`orchestrate <Orchestration>` the rollout lifecycle—the full cycle of task execution and verification.
8089

81-
- Route requests to the right model
82-
- Provide tools to the model
83-
- Handle multi-turn conversations
84-
- Format responses consistently
90+
- Implement multi-step and multi-turn agentic systems
91+
- Orchestrate the model server and resources server(s) to collect complete trajectories
8592

86-
Agents are also called "training environments." NeMo Gym includes several training environment patterns covering multi-step, multi-turn, and user modeling scenarios.
93+
NeMo Gym provides several agent patterns covering multi-step, multi-turn, and user modeling scenarios.
8794

8895
**Examples:**
8996

docs/about/concepts/index.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@ Each explainer below covers one foundational idea and links to deeper material.
2020
::::{grid} 1 1 1 2
2121
:gutter: 1 1 1 2
2222

23-
:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Core Abstractions
24-
:link: core-abstractions
23+
:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Core Components
24+
:link: core-components
2525
:link-type: ref
26-
Understand how Models, Resources, and Agents remain decoupled yet coordinated as independent HTTP services, including which endpoints each abstraction exposes.
26+
Understand how Models, Resources, and Agents remain decoupled yet coordinated as independent HTTP services, including which endpoints each component exposes.
2727
:::
2828

2929
:::{grid-item-card} {octicon}`gear;1.5em;sd-mr-1` Configuration System
@@ -52,7 +52,7 @@ Essential vocabulary for agent training, RL workflows, and NeMo Gym. This glossa
5252
:hidden:
5353
:maxdepth: 1
5454
55-
Core Abstractions <core-abstractions>
55+
Core Components <core-components>
5656
Configuration System <configuration-system>
5757
Task Verification <task-verification>
5858
Key Terminology <key-terminology>

docs/about/concepts/task-verification.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ reward = await expensive_api_call(predicted, expected)
177177
## What You've Learned
178178

179179
This verification system is what makes NeMo Gym powerful for model training:
180-
- **Resource servers** provide both tools AND scoring systems
180+
- **Resource servers** provide verification logic
181181
- **Verification patterns** vary by domain but follow common principles
182182
- **Reward signals** from verification drive model improvement through RL
183183
- **Good verification** is reliable, meaningful, and scalable

docs/about/index.md

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,30 @@ orphan: true
55
(about-overview)=
66
# About NVIDIA NeMo Gym
77

8-
[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) is an open-source framework that generates training data for reinforcement learning by capturing how AI agents interact with tools and environments.
8+
## Motivation
9+
10+
The agentic AI era has increased both the demand for RL training and the complexity of training environments:
11+
12+
- More complex target model capabilities
13+
- More complex training patterns (e.g., multi-turn tool calling)
14+
- More complex orchestration between models and tools
15+
- More complex integrations with external systems
16+
- More complex integrations between environments and training frameworks
17+
- Scaling to high-throughput, concurrent rollout collection
18+
19+
Embedding custom training environments directly within training frameworks is complex and often conflicts with the training loop design.
20+
21+
## NeMo Gym
22+
23+
[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) decouples environment development from training, letting you build and iterate on environments independently. It provides the infrastructure to develop agentic training environments and scale rollout collection, enabling seamless integration with your preferred training framework.
924

1025
## Core Components
1126

12-
Three components work together to generate and evaluate agent interactions:
27+
A training environment consists of three server components:
1328

14-
- **Agents**: Orchestrate multi-turn interactions between models and resources. Handle conversation flow, tool routing, and response formatting.
15-
- **Models**: LLM inference endpoints (OpenAI-compatible or vLLM). Handle single-turn text generation and tool-calling decisions.
16-
- **Resources**: Provide tools (functions agents call) + verification logic (logic to score performance). Each resource server combines both:
17-
- **Example - Web Search**: Tools = `search()` and `browse()`; Verification logic = checks if answer matches expected result
18-
- **Example - Math with Code**: Tool = `execute_python()`; Verification logic = checks if final answer is mathematically correct
19-
- **Example - Code Generation**: Tools = none (provides problem statement); Verification logic = runs unit tests against generated code
29+
- **Agents**: Orchestrate the rollout lifecycle—calling models, executing tool calls via resources, and coordinating verification.
30+
- **Models**: Stateless text generation using LLM inference endpoints (OpenAI-compatible or vLLM).
31+
- **Resources**: Define tasks, tool implementations, and verification logic. Provide what agents need to run and score rollouts.
32+
- **Example - Web Search**: Task = answer knowledge questions; Tools = `search()` and `browse()`; Verification = checks if answer matches expected result
33+
- **Example - Math with Code**: Task = solve math problems; Tool = `execute_python()`; Verification = checks if final answer is mathematically correct
34+
- **Example - Code Generation**: Task = implement solution to coding problem; Tools = none; Verification = runs unit tests against generated code

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
# NeMo Gym Documentation
44

5-
NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides training environment development scaffolding and training environment patterns for multi-step, multi-turn, and user modeling scenarios.
5+
[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework.
66

7-
NeMo Gym has three core server types: **Responses API Model servers** provide model endpoints, **Resources servers** contain tool implementations and verification logic, and **Responses API Agent servers** orchestrate interactions between models and resources.
7+
A training environment consists of three server components: **Agents** orchestrate the rollout lifecycle—calling models, executing tool calls via resources, and coordinating verification. **Models** provide stateless text generation using LLM inference endpoints. **Resources** define tasks, tool implementations, and verification logic.
88

99
## Quickstart
1010

0 commit comments

Comments
 (0)