Skip to content

Commit b984b2d

Browse files
authored
docs: various improvements and fixes (#415)
Should also close #349, #346 and #350 --------- Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
1 parent 71834d1 commit b984b2d

File tree

10 files changed

+30
-30
lines changed

10 files changed

+30
-30
lines changed

docs/about/concepts/configuration-system.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ This allows for:
1818

1919
:::{tab-item} 1. Server YAML Config Files
2020

21-
These are your base configurations that define server structures and default values. Later files override earlier files.
21+
These base configurations define server structures and default values, with later files overriding earlier ones.
2222

2323
Example: Multi-Server Configuration
2424
```bash
@@ -91,7 +91,7 @@ ng_run '+config_paths=${simple_weather_config_paths}'
9191

9292
:::{tab-item} 3. Command Line Arguments
9393

94-
**Runtime overrides** using Hydra syntax for maximum flexibility. These runtime command line have the highest priority, meaning they can override any previous setting set in the config.yaml or env.yaml files.
94+
**Runtime overrides** using Hydra syntax for maximum flexibility. These command line arguments have the highest priority and can override any settings from config.yaml or env.yaml files.
9595

9696
Basic Overrides
9797
```bash

docs/about/concepts/core-abstractions.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Before diving into code, let's understand the three core abstractions in NeMo Gy
1616

1717
:::{tab-item} Model
1818

19-
Responses API Model servers are model endpoints that perform text inference - stateless, single-call text generation without conversation memory or orchestration. You will always have at least one Response API Model server active during training, typically known as the "policy" model.
19+
Responses API Model servers are stateless model endpoints that perform single-call text generation without conversation memory or orchestration. During training, you will always have at least one active Responses API Model server, typically called the "policy" model.
2020

2121
**Available Implementations:**
2222

@@ -29,7 +29,7 @@ Responses API Model servers are model endpoints that perform text inference - st
2929

3030
:::{tab-item} Resources
3131

32-
Resource servers provide tool implementations that can be invoked through tool calling and verification logic that measures task performance. NeMo Gym contains a variety of NVIDIA and community contributed resource servers that you can use during training. We also have tutorials on how to add your own resource server.
32+
Resources servers provide tool implementations that can be invoked via tool calling and verification logic that measures task performance. NeMo Gym includes various NVIDIA and community-contributed resources servers for use during training, and provides tutorials for creating your own Resource server.
3333

3434
**What Resources Provide**
3535

@@ -83,7 +83,7 @@ Responses API Agent servers {term}`orchestrate <Orchestration>` the interaction
8383
- Handle multi-turn conversations
8484
- Format responses consistently
8585

86-
An agent can also be referred to as a "training environment." NeMo Gym contains several training environment patterns that cover a variety of scenarios including multi-step, multi-turn, or user modeling scenarios.
86+
Agents are also called "training environments." NeMo Gym includes several training environment patterns covering multi-step, multi-turn, and user modeling scenarios.
8787

8888
**Examples:**
8989

docs/about/concepts/task-verification.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,17 @@
66

77
## What is Verification?
88

9-
Every resource server in NeMo Gym has a `verify()` function that **measure task performance**. The purpose of this function is to define how to measure how well that task was accomplished.
9+
Every resource server in NeMo Gym implements a `verify()` function that returns a reward value for task performance.
1010

11-
**The Problem**: When you ran the weather example in the quickstart, it successfully called the tool and gave a response. But was that response *good*? Should the model be rewarded or penalized for that behavior? Without verification, there's no way to measure improvement.
11+
**The Problem**: When you ran the weather example in the quickstart, the agent successfully called the tool and provided a response. But was that response *good*? Should the model be rewarded or penalized? Without verification, you cannot measure performance or guide improvement.
1212

1313
**The Solution**: Each resource server must define exactly what "good performance" means for its domain.
1414

1515
## Why Verification Matters
1616

1717
**Tool Execution ≠ Good Performance**
1818

19-
- The right tool call was issued i.e. `get_weather("San Francisco")`
19+
- The right tool call was issued, e.g., `get_weather("San Francisco")`
2020
- But was helpful advice given? Was the response accurate? Was it efficient?
2121
- Verification answers these questions with numerical scores
2222

@@ -178,6 +178,6 @@ reward = await expensive_api_call(predicted, expected)
178178

179179
This verification system is what makes NeMo Gym powerful for model training:
180180
- **Resource servers** provide both tools AND scoring systems
181-
- **Verification patterns** vary by domain but follow common principles
181+
- **Verification patterns** vary by domain but follow common principles
182182
- **Reward signals** from verification drive model improvement through RL
183183
- **Good verification** is reliable, meaningful, and scalable

docs/about/ecosystem.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ The [NeMo Framework](https://github.com/NVIDIA-NeMo) is NVIDIA's GPU-accelerated
2323
* **NeMo Guardrails**: Programmable safety guardrails
2424
* And more...
2525

26-
**NeMo Gym's Role**: Within this ecosystem, Gym focuses specifically on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic, making it practical to generate large-scale, high-quality training data that feeds into NeMo RL and other training frameworks.
26+
**NeMo Gym's Role**: Within this ecosystem, Gym focuses on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic. This makes it practical to generate large-scale, high-quality training data for NeMo RL and other training frameworks.

docs/get-started/index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44

55
**Estimated Time**: 25-30 minutes
66

7-
This guided tutorial experience is designed for those brand new to training models with reinforcement learning (RL). These tutorials walk you through the complete journey from installation to generating training data at scale.
7+
This guided tutorial is designed for users new to training models with reinforcement learning (RL). These tutorials walk you through the complete journey from installation to generating training data at scale.
88

99
**By the end of this tutorial series, you will have:**
1010

11-
✅ A working NeMo Gym installation with servers running
12-
Ability to generate rollouts for RL training
11+
✅ A working NeMo Gym installation with servers running
12+
The ability to generate rollouts for RL training
1313

1414
## Before You Start
1515

@@ -23,7 +23,7 @@ Make sure you have these prerequisites ready before beginning the tutorials:
2323

2424
## Tutorial Path
2525

26-
Follow these four tutorials in sequence to build your first AI agent from scratch:
26+
Follow these tutorials in sequence to start collecting rollouts with NeMo Gym:
2727

2828
::::{grid} 1 1 1 1
2929
:gutter: 3
@@ -51,7 +51,7 @@ Generate your first batch of rollouts and understand how they become training da
5151
---
5252

5353
:::{tip}
54-
**New to reinforcement learning?** Do not worry—these tutorials introduce RL concepts naturally as you build.
54+
**New to reinforcement learning?** Do not worry—these tutorials introduce RL concepts naturally as you learn rollout collection.
5555

5656
- For deeper conceptual understanding, explore the [About](../about/index.md) section.
5757
- For quick definitions, refer to the [Glossary](../about/concepts/key-terminology.md).

docs/how-to-faq.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# How-To's and FAQ's
1+
# How-Tos and FAQs
22

33
:::{warning}
4-
This document is a smattering of How-To's and FAQs that have not made their way into an official tutorial yet. The following guides are **experimental** and may contain bugs. Proceed with caution.
4+
This document is a collection of How-Tos and FAQs that have not made their way into an official tutorial yet. The following guides are **experimental** and may contain bugs. Proceed with caution.
55
:::
66

77
# How To: Run tests for simple agent
@@ -731,7 +731,7 @@ TODO @bxyu-nvidia: expand on this later.
731731

732732
# FAQ: NeMo Gym what CI/CD do I need to pass?
733733

734-
NeMo Gym has an E2E suite of CI/CD in the form of Github actions workflows. Some of these are critical to PR merge and some of the mare not.
734+
NeMo Gym has an E2E suite of CI/CD in the form of Github actions workflows. Some of these are critical to PR merge and some of them are not.
735735

736736
For the majority of PRs, there are 5 checks that need to pass:
737737
1. DCO
@@ -745,7 +745,7 @@ Examples of PR checks that most PRs do not need to wait for to pass:
745745
2. CICD NeMo / Nemo_CICD_Test (push)
746746
...
747747

748-
# FAQ: Why aiohttp backend and not httpx/httpcore for async http?
748+
# FAQ: Why use aiohttp backend instead of httpx/httpcore for async http?
749749

750750
TL;DR: httpx is O(n^2) runtime where n is the number of queued requests (i.e. for each request, we check all other queued requests). This is terribly inefficient and results in major slowdowns.
751751

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
# NeMo Gym Documentation
44

5-
NeMo Gym is a framework for building reinforcement learning (RL) training environments large language models (LLMs). Gym provides training environment development scaffolding and training environment patterns such as multi-step, multi-turn, and user modeling scenarios.
5+
NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides training environment development scaffolding and training environment patterns for multi-step, multi-turn, and user modeling scenarios.
66

7-
At the core of NeMo Gym are three server concepts: **Responses API Model servers** are model endpoints, **Resources servers** contain tool implementations and verification logic, and **Response API Agent servers** orchestrate the interaction between models and resources.
7+
NeMo Gym has three core server types: **Responses API Model servers** provide model endpoints, **Resources servers** contain tool implementations and verification logic, and **Responses API Agent servers** orchestrate interactions between models and resources.
88

99
## Quickstart
1010

docs/tutorials/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
Hands-on learning experiences that guide you through building, training, and deploying AI agents with NeMo Gym.
66

77
:::{tip}
8-
**New to NeMo Gym?** Begin with the {doc}`Get Started <../get-started/index>` section for a guided tutorial experience from installation through your first verified agent. Return here after completing those tutorials to learn about advanced topics like additional rollout collection methods and training data generation. You can find the project repository on [GitHub](https://github.com/NVIDIA-NeMo/Gym).
8+
**New to NeMo Gym?** Begin with the {doc}`Get Started <../get-started/index>` section for a guided tutorial from installation through your first verified agent. Return here afterward to learn about advanced topics like additional rollout collection methods and training data generation. You can find the project repository on [GitHub](https://github.com/NVIDIA-NeMo/Gym).
99
:::
1010
---
1111

@@ -46,9 +46,9 @@ Transform rollouts into training data for {term}`supervised fine-tuning (SFT) <S
4646
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` RL Training with NeMo RL
4747
:link: rl-training-with-nemo-rl
4848
:link-type: doc
49-
Train a model with NeMo RL. Learn how to set up NeMo Gym + NeMo RL training environment, run tests, prepare data, and launch single and multi-node training runs.
49+
Train a model with NeMo RL. Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepare data, and launch single-node and multi-node training runs.
5050
+++
51-
{bdg-secondary}`sft` {bdg-secondary}`dpo`
51+
{bdg-secondary}`rl` {bdg-secondary}`training`
5252
:::
5353

5454
::::

docs/tutorials/offline-training-w-rollouts.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,11 +59,11 @@ This tutorial is **experimental** and may contain bugs. Proceed with caution.
5959

6060
The offline training pipeline follows this logical flow:
6161

62-
1. Collect rollouts using strategies from [Tutorial 5]
63-
- **SFT data**: Use consistent generation (low temperature, single rollout per task)
64-
- **DPO data**: Use diverse generation (higher temperature, 2 rollouts per task for comparison)
65-
1. Filter for quality - Remove poor rollouts before processing
66-
2. Format for training - Convert to SFT or DPO format based on your goals
62+
1. Collect rollouts
63+
- **SFT data**: Use consistent generation (low temperature, single rollout per task)
64+
- **DPO data**: Use diverse generation (higher temperature, 2 rollouts per task for comparison)
65+
2. Filter for quality - Remove poor rollouts before processing
66+
3. Format for training - Convert to SFT or DPO format based on your goals
6767

6868

6969

docs/tutorials/rl-training-with-nemo-rl.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This tutorial is **experimental** and may contain bugs. Proceed with caution.
88

99
**Goal**: Train a model with NeMo RL. Learn how to set up NeMo Gym + NeMo RL training environment, run tests, prepare data, and launch single and multi-node training runs!
1010

11-
Multinode Slurm script and run command are at the bottom of this document. Do the single node setup first. Do not skip it. Throughout this tutorial, you can see mentions of "Penguin". This refers to Gym's codename before it was fully open-sourced.
11+
Multinode Slurm script and run command are at the bottom of this document. Complete the single-node setup first before proceeding to multi-node training. Throughout this tutorial, you may see mentions of "Penguin", which refers to Gym's codename before it was fully open-sourced.
1212

1313
## Single GPU node setup to ensure correctness
1414

0 commit comments

Comments
 (0)