You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clarify training environment framing and align docs messaging (#438)
## Summary
Revises About page and aligns messaging across docs homepage, README,
and Core Components.
Addresses #384 - Clarifies "Agents" are server components, not AI agents
being trained.
## Changes
- **About page**: Added Motivation + NeMo Gym sections; reframed
components as "server components that make up a training environment"
- **Docs homepage + README**: Aligned intro messaging
- **Core Components**: Renamed from `core-abstractions.md`; updated
Agents/Resources definitions based on code; added Tasks to examples;
added Azure OpenAI model
## Key Alignment
All pages now consistently frame: training environment = Agents + Models
+ Resources (server components)
---------
Signed-off-by: Chris Wing <cwing@nvidia.com>
Signed-off-by: Junkeun Yi <jkyi@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Co-authored-by: jkyi-nvidia <jkyi@nvidia.com>
Co-authored-by: cmunley1 <cmunley@nvidia.com>
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# NeMo Gym
2
2
3
-
NeMo Gym is a framework for building reinforcement learning environments to train large language models.
3
+
NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework.
4
4
5
5
NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), NVIDIA’s GPU-accelerated platform for building and training generative AI models.
Copy file name to clipboardExpand all lines: docs/about/concepts/core-components.md
+29-22Lines changed: 29 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
-
(core-abstractions)=
1
+
(core-components)=
2
2
3
-
# Core Abstractions
3
+
# Core Components
4
4
5
-
Before diving into code, let's understand the three core abstractions in NeMo Gym.
5
+
Before diving into code, let's understand the three server components that make up a training environment in NeMo Gym.
6
6
7
7
> If you are new to reinforcement learning for LLMs, we recommend you refer to **[Key Terminology](./key-terminology)** first.
8
8
@@ -20,7 +20,8 @@ Responses API Model servers are stateless model endpoints that perform single-ca
20
20
21
21
**Available Implementations:**
22
22
23
-
-`openai_model`: Direct integration with OpenAI's Responses API
23
+
-`openai_model`: Integration with OpenAI's Responses API
24
+
-`azure_openai_model`: Integration with Azure OpenAI API
24
25
-`vllm_model`: Middleware converting local models (using vLLM) to Responses API format
25
26
26
27
**Configuration:** Models are configured with API endpoints and credentials using YAML files in `responses_api_models/*/configs/`
@@ -29,45 +30,53 @@ Responses API Model servers are stateless model endpoints that perform single-ca
29
30
30
31
:::{tab-item} Resources
31
32
32
-
Resources servers provide tool implementations that can be invoked via tool calling and verification logic that measures task performance. NeMo Gym includes various NVIDIA and community-contributed resources servers for use during training, and provides tutorials for creating your own Resource server.
33
+
Resource servers host the components and logic of environments including multi-step state persistence, tool and reward function implementations. Resource servers are responsible for returning observations, such as tool results or updated environment state, and rewards as a result of actions taken by the policy model. Actions can be moves in a game, tool calls, or anything an agent can do. NeMo Gym contains a variety of NVIDIA and communitycontributed resource servers that you can use during training. We also have tutorials on how to add your own resource server.
33
34
34
-
**What Resources Provide**
35
+
**Examples of Resources**
35
36
36
-
Each resource server combines both tools and {term}`verification <Verifier>` logic:
37
+
A resource server usually provides tasks, possible actions, and {term}`verification <Verifier>` logic:
37
38
38
-
-**Tools**: Functions agents can call during task execution
39
+
-**Tasks**: Problems or prompts that agents solve during rollouts
40
+
-**Actions**: Actions agents can take during rollouts, including tool calling
39
41
-**Verification logic**: Scoring logic that evaluates performance (returns {term}`reward signals <Reward / Reward Signal>` for training)
40
42
41
43
**Example Resource Servers**
42
44
43
-
Each example shows what **tools** the agent can use and what **verification logic** measures success:
45
+
Each example shows what **task** the agent solves, what **actions** are available, and what **verification logic** measures success:
44
46
45
47
-**[`google_search`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/google_search)**: Web search with verification
46
-
-**Tools**: `search()` queries Google API; `browse()` extracts webpage content
48
+
-**Task**: Answer knowledge questions using web search
49
+
-**Actions**: `search()` queries Google API; `browse()` extracts webpage content
47
50
-**Verification logic**: Checks if final answer matches expected result for MCQA questions
48
51
49
52
-**[`math_with_code`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/math_with_code)**: Mathematical reasoning with code execution
50
-
-**Tool**: `execute_python()` runs Python code with numpy, scipy, pandas
53
+
-**Task**: Solve math problems using Python
54
+
-**Actions**: `execute_python()` runs Python code with numpy, scipy, pandas
51
55
-**Verification logic**: Extracts boxed answer and checks mathematical correctness
-**Verification logic**: Checks if response follows all specified instructions
68
76
69
77
-**[`simple_weather`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/example_simple_weather)**: Mock weather API
70
-
-**Tool**: `get_weather()` returns mock weather data
78
+
-**Task**: Report weather information
79
+
-**Actions**: `get_weather()` returns mock weather data
71
80
-**Verification logic**: Checks if weather tool was called correctly
72
81
73
82
**Configuration**: Refer to resource-specific config files in `resources_servers/*/configs/`
@@ -76,14 +85,12 @@ Each example shows what **tools** the agent can use and what **verification logi
76
85
77
86
:::{tab-item} Agents
78
87
79
-
Responses API Agent servers {term}`orchestrate <Orchestration>` the interaction between models and resources.
88
+
Responses API Agent servers {term}`orchestrate <Orchestration>` the rollout lifecycle—the full cycle of task execution and verification.
80
89
81
-
- Route requests to the right model
82
-
- Provide tools to the model
83
-
- Handle multi-turn conversations
84
-
- Format responses consistently
90
+
- Implement multi-step and multi-turn agentic systems
91
+
- Orchestrate the model server and resources server(s) to collect complete trajectories
85
92
86
-
Agents are also called "training environments." NeMo Gym includes several training environment patterns covering multi-step, multi-turn, and user modeling scenarios.
93
+
NeMo Gym provides several agent patterns covering multi-step, multi-turn, and user modeling scenarios.
Understand how Models, Resources, and Agents remain decoupled yet coordinated as independent HTTP services, including which endpoints each abstraction exposes.
26
+
Understand how Models, Resources, and Agents remain decoupled yet coordinated as independent HTTP services, including which endpoints each component exposes.
27
27
:::
28
28
29
29
:::{grid-item-card} {octicon}`gear;1.5em;sd-mr-1` Configuration System
@@ -52,7 +52,7 @@ Essential vocabulary for agent training, RL workflows, and NeMo Gym. This glossa
Copy file name to clipboardExpand all lines: docs/about/index.md
+23-8Lines changed: 23 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,15 +5,30 @@ orphan: true
5
5
(about-overview)=
6
6
# About NVIDIA NeMo Gym
7
7
8
-
[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) is an open-source framework that generates training data for reinforcement learning by capturing how AI agents interact with tools and environments.
8
+
## Motivation
9
+
10
+
The agentic AI era has increased both the demand for RL training and the complexity of training environments:
11
+
12
+
- More complex target model capabilities
13
+
- More complex training patterns (e.g., multi-turn tool calling)
14
+
- More complex orchestration between models and tools
15
+
- More complex integrations with external systems
16
+
- More complex integrations between environments and training frameworks
17
+
- Scaling to high-throughput, concurrent rollout collection
18
+
19
+
Embedding custom training environments directly within training frameworks is complex and often conflicts with the training loop design.
20
+
21
+
## NeMo Gym
22
+
23
+
[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) decouples environment development from training, letting you build and iterate on environments independently. It provides the infrastructure to develop agentic training environments and scale rollout collection, enabling seamless integration with your preferred training framework.
9
24
10
25
## Core Components
11
26
12
-
Three components work together to generate and evaluate agent interactions:
27
+
A training environment consists of three server components:
13
28
14
-
-**Agents**: Orchestrate multi-turn interactions between models and resources. Handle conversation flow, tool routing, and response formatting.
15
-
-**Models**: LLM inference endpoints (OpenAI-compatible or vLLM). Handle single-turn text generation and tool-calling decisions.
16
-
-**Resources**: Provide tools (functions agents call) + verification logic (logic to score performance). Each resource server combines both:
17
-
-**Example - Web Search**: Tools = `search()` and `browse()`; Verification logic = checks if answer matches expected result
18
-
-**Example - Math with Code**: Tool = `execute_python()`; Verification logic = checks if final answer is mathematically correct
19
-
-**Example - Code Generation**: Tools = none (provides problem statement); Verification logic = runs unit tests against generated code
29
+
-**Agents**: Orchestrate the rollout lifecycle—calling models, executing tool calls via resources, and coordinating verification.
30
+
-**Models**: Stateless text generation using LLM inference endpoints (OpenAI-compatible or vLLM).
31
+
-**Resources**: Define tasks, tool implementations, and verification logic. Provide what agents need to run and score rollouts.
32
+
-**Example - Web Search**: Task = answer knowledge questions; Tools = `search()` and `browse()`; Verification = checks if answer matches expected result
33
+
-**Example - Math with Code**: Task = solve math problems; Tool = `execute_python()`; Verification = checks if final answer is mathematically correct
34
+
-**Example - Code Generation**: Task = implement solution to coding problem; Tools = none; Verification = runs unit tests against generated code
Copy file name to clipboardExpand all lines: docs/index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
# NeMo Gym Documentation
4
4
5
-
NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides training environment development scaffolding and training environment patterns for multi-step, multi-turn, and user modeling scenarios.
5
+
[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework.
6
6
7
-
NeMo Gym has three core server types: **Responses API Model servers**provide model endpoints, **Resources servers** contain tool implementations and verification logic, and **Responses API Agent servers** orchestrate interactions between models and resources.
7
+
A training environment consists of three server components: **Agents**orchestrate the rollout lifecycle—calling models, executing tool calls via resources, and coordinating verification. **Models** provide stateless text generation using LLM inference endpoints. **Resources** define tasks, tool implementations, and verification logic.
0 commit comments