Clarify training environment framing and align docs messaging (#438)

cwing-nvidia · jkyi-nvidia · cmunley1 · web-flow · commit c34cfe179f46 · 2025-12-04T21:10:01.000-08:00
## Summary Revises About page and aligns messaging across docs homepage, README, and Core Components. Addresses #384 - Clarifies "Agents" are server components, not AI agents being trained. ## Changes - **About page**: Added Motivation + NeMo Gym sections; reframed components as "server components that make up a training environment" - **Docs homepage + README**: Aligned intro messaging - **Core Components**: Renamed from `core-abstractions.md`; updated Agents/Resources definitions based on code; added Tasks to examples; added Azure OpenAI model ## Key Alignment All pages now consistently frame: training environment = Agents + Models + Resources (server components) --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Junkeun Yi <jkyi@nvidia.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Co-authored-by: jkyi-nvidia <jkyi@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com>
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # NeMo Gym
 
-NeMo Gym is a framework for building reinforcement learning environments to train large language models. 
+NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework. 
 
 NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), NVIDIA’s GPU-accelerated platform for building and training generative AI models.
 
diff --git a/docs/about/concepts/core-components.md b/docs/about/concepts/core-components.md
@@ -1,8 +1,8 @@
-(core-abstractions)=
+(core-components)=
 
-# Core Abstractions
+# Core Components
 
-Before diving into code, let's understand the three core abstractions in NeMo Gym.
+Before diving into code, let's understand the three server components that make up a training environment in NeMo Gym.
 
 > If you are new to reinforcement learning for LLMs, we recommend you refer to **[Key Terminology](./key-terminology)** first.
 
@@ -20,7 +20,8 @@ Responses API Model servers are stateless model endpoints that perform single-ca
 
 **Available Implementations:**
 
-- `openai_model`: Direct integration with OpenAI's Responses API  
+- `openai_model`: Integration with OpenAI's Responses API  
+- `azure_openai_model`: Integration with Azure OpenAI API
 - `vllm_model`: Middleware converting local models (using vLLM) to Responses API format
 
 **Configuration:** Models are configured with API endpoints and credentials using YAML files in `responses_api_models/*/configs/`
@@ -29,45 +30,53 @@ Responses API Model servers are stateless model endpoints that perform single-ca
 
 :::{tab-item} Resources
 
-Resources servers provide tool implementations that can be invoked via tool calling and verification logic that measures task performance. NeMo Gym includes various NVIDIA and community-contributed resources servers for use during training, and provides tutorials for creating your own Resource server.
+Resource servers host the components and logic of environments including multi-step state persistence, tool and reward function implementations. Resource servers are responsible for returning observations, such as tool results or updated environment state, and rewards as a result of actions taken by the policy model. Actions can be moves in a game, tool calls, or anything an agent can do. NeMo Gym contains a variety of NVIDIA and community contributed resource servers that you can use during training. We also have tutorials on how to add your own resource server.
 
-**What Resources Provide**
+**Examples of Resources**
 
-Each resource server combines both tools and {term}`verification <Verifier>` logic:
+A resource server usually provides tasks, possible actions, and {term}`verification <Verifier>` logic:
 
-- **Tools**: Functions agents can call during task execution
+- **Tasks**: Problems or prompts that agents solve during rollouts
+- **Actions**: Actions agents can take during rollouts, including tool calling
 - **Verification logic**: Scoring logic that evaluates performance (returns {term}`reward signals <Reward / Reward Signal>` for training)
 
 **Example Resource Servers**
 
-Each example shows what **tools** the agent can use and what **verification logic** measures success:
+Each example shows what **task** the agent solves, what **actions** are available, and what **verification logic** measures success:
 
 - **[`google_search`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/google_search)**: Web search with verification
-  - **Tools**: `search()` queries Google API; `browse()` extracts webpage content
+  - **Task**: Answer knowledge questions using web search
+  - **Actions**: `search()` queries Google API; `browse()` extracts webpage content
   - **Verification logic**: Checks if final answer matches expected result for MCQA questions
 
 - **[`math_with_code`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/math_with_code)**: Mathematical reasoning with code execution
-  - **Tool**: `execute_python()` runs Python code with numpy, scipy, pandas
+  - **Task**: Solve math problems using Python
+  - **Actions**: `execute_python()` runs Python code with numpy, scipy, pandas
   - **Verification logic**: Extracts boxed answer and checks mathematical correctness
 
 - **[`code_gen`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/code_gen)**: Competitive programming problems
-  - **Tools**: None (agent generates code directly)
+  - **Task**: Implement solutions to coding problems
+  - **Actions**: None (agent generates code directly)
   - **Verification logic**: Executes generated code against unit test inputs/outputs
 
 - **[`math_with_judge`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/math_with_judge)**: Mathematical problem solving
-  - **Tools**: None (or can be combined with `math_with_code`)
+  - **Task**: Solve math problems
+  - **Actions**: None (or can be combined with `math_with_code`)
   - **Verification logic**: Uses math library + LLM judge to verify answer equivalence
 
 - **[`mcqa`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/mcqa)**: Multiple choice question answering
-  - **Tools**: None (knowledge-based reasoning)
+  - **Task**: Answer multiple choice questions
+  - **Actions**: None (knowledge-based reasoning)
   - **Verification logic**: Checks if selected option matches ground truth
 
 - **[`instruction_following`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/instruction_following)**: Instruction compliance evaluation
-  - **Tools**: None (evaluates response format/content)
+  - **Task**: Follow specified instructions
+  - **Actions**: None (evaluates response format/content)
   - **Verification logic**: Checks if response follows all specified instructions
 
 - **[`simple_weather`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/example_simple_weather)**: Mock weather API
-  - **Tool**: `get_weather()` returns mock weather data
+  - **Task**: Report weather information
+  - **Actions**: `get_weather()` returns mock weather data
   - **Verification logic**: Checks if weather tool was called correctly
 
 **Configuration**: Refer to resource-specific config files in `resources_servers/*/configs/`
@@ -76,14 +85,12 @@ Each example shows what **tools** the agent can use and what **verification logi
 
 :::{tab-item} Agents
 
-Responses API Agent servers {term}`orchestrate <Orchestration>` the interaction between models and resources.
+Responses API Agent servers {term}`orchestrate <Orchestration>` the rollout lifecycle—the full cycle of task execution and verification.
 
-- Route requests to the right model
-- Provide tools to the model
-- Handle multi-turn conversations
-- Format responses consistently
+- Implement multi-step and multi-turn agentic systems
+- Orchestrate the model server and resources server(s) to collect complete trajectories
 
-Agents are also called "training environments." NeMo Gym includes several training environment patterns covering multi-step, multi-turn, and user modeling scenarios.
+NeMo Gym provides several agent patterns covering multi-step, multi-turn, and user modeling scenarios.
 
 **Examples:**
 
diff --git a/docs/about/concepts/index.md b/docs/about/concepts/index.md
@@ -20,10 +20,10 @@ Each explainer below covers one foundational idea and links to deeper material.
 ::::{grid} 1 1 1 2
 :gutter: 1 1 1 2
 
-:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Core Abstractions
-:link: core-abstractions
+:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Core Components
+:link: core-components
 :link-type: ref
-Understand how Models, Resources, and Agents remain decoupled yet coordinated as independent HTTP services, including which endpoints each abstraction exposes.
+Understand how Models, Resources, and Agents remain decoupled yet coordinated as independent HTTP services, including which endpoints each component exposes.
 :::
 
 :::{grid-item-card} {octicon}`gear;1.5em;sd-mr-1` Configuration System
@@ -52,7 +52,7 @@ Essential vocabulary for agent training, RL workflows, and NeMo Gym. This glossa
 :hidden:
 :maxdepth: 1
 
-Core Abstractions <core-abstractions>
+Core Components <core-components>
 Configuration System <configuration-system>
 Task Verification <task-verification>
 Key Terminology <key-terminology>
diff --git a/docs/about/concepts/task-verification.md b/docs/about/concepts/task-verification.md
@@ -177,7 +177,7 @@ reward = await expensive_api_call(predicted, expected)
 ## What You've Learned
 
 This verification system is what makes NeMo Gym powerful for model training:
-- **Resource servers** provide both tools AND scoring systems
+- **Resource servers** provide verification logic
 - **Verification patterns** vary by domain but follow common principles
 - **Reward signals** from verification drive model improvement through RL
 - **Good verification** is reliable, meaningful, and scalable
diff --git a/docs/about/index.md b/docs/about/index.md
@@ -5,15 +5,30 @@ orphan: true
 (about-overview)=
 # About NVIDIA NeMo Gym
 
-[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) is an open-source framework that generates training data for reinforcement learning by capturing how AI agents interact with tools and environments.
+## Motivation
+
+The agentic AI era has increased both the demand for RL training and the complexity of training environments:
+
+- More complex target model capabilities
+- More complex training patterns (e.g., multi-turn tool calling)
+- More complex orchestration between models and tools
+- More complex integrations with external systems
+- More complex integrations between environments and training frameworks
+- Scaling to high-throughput, concurrent rollout collection
+
+Embedding custom training environments directly within training frameworks is complex and often conflicts with the training loop design.
+
+## NeMo Gym
+
+[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) decouples environment development from training, letting you build and iterate on environments independently. It provides the infrastructure to develop agentic training environments and scale rollout collection, enabling seamless integration with your preferred training framework.
 
 ## Core Components
 
-Three components work together to generate and evaluate agent interactions:
+A training environment consists of three server components:
 
-- **Agents**: Orchestrate multi-turn interactions between models and resources. Handle conversation flow, tool routing, and response formatting.
-- **Models**: LLM inference endpoints (OpenAI-compatible or vLLM). Handle single-turn text generation and tool-calling decisions.
-- **Resources**: Provide tools (functions agents call) + verification logic (logic to score performance). Each resource server combines both:
-  - **Example - Web Search**: Tools = `search()` and `browse()`; Verification logic = checks if answer matches expected result
-  - **Example - Math with Code**: Tool = `execute_python()`; Verification logic = checks if final answer is mathematically correct
-  - **Example - Code Generation**: Tools = none (provides problem statement); Verification logic = runs unit tests against generated code
+- **Agents**: Orchestrate the rollout lifecycle—calling models, executing tool calls via resources, and coordinating verification.
+- **Models**: Stateless text generation using LLM inference endpoints (OpenAI-compatible or vLLM).
+- **Resources**: Define tasks, tool implementations, and verification logic. Provide what agents need to run and score rollouts.
+  - **Example - Web Search**: Task = answer knowledge questions; Tools = `search()` and `browse()`; Verification = checks if answer matches expected result
+  - **Example - Math with Code**: Task = solve math problems; Tool = `execute_python()`; Verification = checks if final answer is mathematically correct
+  - **Example - Code Generation**: Task = implement solution to coding problem; Tools = none; Verification = runs unit tests against generated code
diff --git a/docs/index.md b/docs/index.md
@@ -2,9 +2,9 @@
 
 # NeMo Gym Documentation
 
-NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides training environment development scaffolding and training environment patterns for multi-step, multi-turn, and user modeling scenarios.
+[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework.
 
-NeMo Gym has three core server types: **Responses API Model servers** provide model endpoints, **Resources servers** contain tool implementations and verification logic, and **Responses API Agent servers** orchestrate interactions between models and resources.
+A training environment consists of three server components: **Agents** orchestrate the rollout lifecycle—calling models, executing tool calls via resources, and coordinating verification. **Models** provide stateless text generation using LLM inference endpoints. **Resources** define tasks, tool implementations, and verification logic.
 
 ## Quickstart