docs: various improvements and fixes (#415)

ahmadki · web-flow · commit b984b2d2f23e · 2025-12-03T21:25:01.000-08:00
Should also close #349, #346 and #350 --------- Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
diff --git a/docs/about/concepts/configuration-system.md b/docs/about/concepts/configuration-system.md
@@ -18,7 +18,7 @@ This allows for:
 
 :::{tab-item} 1. Server YAML Config Files
 
-These are your base configurations that define server structures and default values. Later files override earlier files.
+These base configurations define server structures and default values, with later files overriding earlier ones.
 
 Example: Multi-Server Configuration
 ```bash
@@ -91,7 +91,7 @@ ng_run '+config_paths=${simple_weather_config_paths}'
 
 :::{tab-item} 3. Command Line Arguments
 
-**Runtime overrides** using Hydra syntax for maximum flexibility. These runtime command line have the highest priority, meaning they can override any previous setting set in the config.yaml or env.yaml files.
+**Runtime overrides** using Hydra syntax for maximum flexibility. These command line arguments have the highest priority and can override any settings from config.yaml or env.yaml files.
 
 Basic Overrides
 ```bash
diff --git a/docs/about/concepts/core-abstractions.md b/docs/about/concepts/core-abstractions.md
@@ -16,7 +16,7 @@ Before diving into code, let's understand the three core abstractions in NeMo Gy
 
 :::{tab-item} Model
 
-Responses API Model servers are model endpoints that perform text inference - stateless, single-call text generation without conversation memory or orchestration. You will always have at least one Response API Model server active during training, typically known as the "policy" model.
+Responses API Model servers are stateless model endpoints that perform single-call text generation without conversation memory or orchestration. During training, you will always have at least one active Responses API Model server, typically called the "policy" model.
 
 **Available Implementations:**
 
@@ -29,7 +29,7 @@ Responses API Model servers are model endpoints that perform text inference - st
 
 :::{tab-item} Resources
 
-Resource servers provide tool implementations that can be invoked through tool calling and verification logic that measures task performance. NeMo Gym contains a variety of NVIDIA and community contributed resource servers that you can use during training. We also have tutorials on how to add your own resource server.
+Resources servers provide tool implementations that can be invoked via tool calling and verification logic that measures task performance. NeMo Gym includes various NVIDIA and community-contributed resources servers for use during training, and provides tutorials for creating your own Resource server.
 
 **What Resources Provide**
 
@@ -83,7 +83,7 @@ Responses API Agent servers {term}`orchestrate <Orchestration>` the interaction
 - Handle multi-turn conversations
 - Format responses consistently
 
-An agent can also be referred to as a "training environment." NeMo Gym contains several training environment patterns that cover a variety of scenarios including multi-step, multi-turn, or user modeling scenarios.
+Agents are also called "training environments." NeMo Gym includes several training environment patterns covering multi-step, multi-turn, and user modeling scenarios.
 
 **Examples:**
 
diff --git a/docs/about/concepts/task-verification.md b/docs/about/concepts/task-verification.md
@@ -6,17 +6,17 @@
 
 ## What is Verification?
 
-Every resource server in NeMo Gym has a `verify()` function that **measure task performance**. The purpose of this function is to define how to measure how well that task was accomplished.
+Every resource server in NeMo Gym implements a `verify()` function that returns a reward value for task performance.
 
-**The Problem**: When you ran the weather example in the quickstart, it successfully called the tool and gave a response. But was that response *good*? Should the model be rewarded or penalized for that behavior? Without verification, there's no way to measure improvement.
+**The Problem**: When you ran the weather example in the quickstart, the agent successfully called the tool and provided a response. But was that response *good*? Should the model be rewarded or penalized? Without verification, you cannot measure performance or guide improvement.
 
 **The Solution**: Each resource server must define exactly what "good performance" means for its domain.
 
 ## Why Verification Matters
 
 **Tool Execution ≠ Good Performance**
 
-- The right tool call was issued i.e. `get_weather("San Francisco")`
+- The right tool call was issued, e.g., `get_weather("San Francisco")`
 - But was helpful advice given? Was the response accurate? Was it efficient?
 - Verification answers these questions with numerical scores
 
@@ -178,6 +178,6 @@ reward = await expensive_api_call(predicted, expected)
 
 This verification system is what makes NeMo Gym powerful for model training:
 - **Resource servers** provide both tools AND scoring systems
-- **Verification patterns** vary by domain but follow common principles  
+- **Verification patterns** vary by domain but follow common principles
 - **Reward signals** from verification drive model improvement through RL
 - **Good verification** is reliable, meaningful, and scalable
diff --git a/docs/about/ecosystem.md b/docs/about/ecosystem.md
@@ -23,4 +23,4 @@ The [NeMo Framework](https://github.com/NVIDIA-NeMo) is NVIDIA's GPU-accelerated
 * **NeMo Guardrails**: Programmable safety guardrails
 * And more...
 
-**NeMo Gym's Role**: Within this ecosystem, Gym focuses specifically on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic, making it practical to generate large-scale, high-quality training data that feeds into NeMo RL and other training frameworks.
+**NeMo Gym's Role**: Within this ecosystem, Gym focuses on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic. This makes it practical to generate large-scale, high-quality training data for NeMo RL and other training frameworks.
diff --git a/docs/get-started/index.md b/docs/get-started/index.md
@@ -4,12 +4,12 @@
 
 **Estimated Time**: 25-30 minutes
 
-This guided tutorial experience is designed for those brand new to training models with reinforcement learning (RL). These tutorials walk you through the complete journey from installation to generating training data at scale.
+This guided tutorial is designed for users new to training models with reinforcement learning (RL). These tutorials walk you through the complete journey from installation to generating training data at scale.
 
 **By the end of this tutorial series, you will have:**
 
-✅ A working NeMo Gym installation with servers running  
-✅ Ability to generate rollouts for RL training
+✅ A working NeMo Gym installation with servers running
+✅ The ability to generate rollouts for RL training
 
 ## Before You Start
 
@@ -23,7 +23,7 @@ Make sure you have these prerequisites ready before beginning the tutorials:
 
 ## Tutorial Path
 
-Follow these four tutorials in sequence to build your first AI agent from scratch:
+Follow these tutorials in sequence to start collecting rollouts with NeMo Gym:
 
 ::::{grid} 1 1 1 1
 :gutter: 3
@@ -51,7 +51,7 @@ Generate your first batch of rollouts and understand how they become training da
 ---
 
 :::{tip}
-**New to reinforcement learning?** Do not worry—these tutorials introduce RL concepts naturally as you build.
+**New to reinforcement learning?** Do not worry—these tutorials introduce RL concepts naturally as you learn rollout collection.
 
 - For deeper conceptual understanding, explore the [About](../about/index.md) section.
 - For quick definitions, refer to the [Glossary](../about/concepts/key-terminology.md).
diff --git a/docs/how-to-faq.md b/docs/how-to-faq.md
@@ -1,7 +1,7 @@
-# How-To's and FAQ's
+# How-Tos and FAQs
 
 :::{warning}
-This document is a smattering of How-To's and FAQs that have not made their way into an official tutorial yet. The following guides are **experimental** and may contain bugs. Proceed with caution.
+This document is a collection of How-Tos and FAQs that have not made their way into an official tutorial yet. The following guides are **experimental** and may contain bugs. Proceed with caution.
 :::
 
 # How To: Run tests for simple agent
@@ -731,7 +731,7 @@ TODO @bxyu-nvidia: expand on this later.
 
 # FAQ: NeMo Gym what CI/CD do I need to pass?
 
-NeMo Gym has an E2E suite of CI/CD in the form of Github actions workflows. Some of these are critical to PR merge and some of the mare not.
+NeMo Gym has an E2E suite of CI/CD in the form of Github actions workflows. Some of these are critical to PR merge and some of them are not.
 
 For the majority of PRs, there are 5 checks that need to pass:
 1. DCO
@@ -745,7 +745,7 @@ Examples of PR checks that most PRs do not need to wait for to pass:
 2. CICD NeMo / Nemo_CICD_Test (push)
 ...
 
-# FAQ: Why aiohttp backend and not httpx/httpcore for async http?
+# FAQ: Why use aiohttp backend instead of httpx/httpcore for async http?
 
 TL;DR: httpx is O(n^2) runtime where n is the number of queued requests (i.e. for each request, we check all other queued requests). This is terribly inefficient and results in major slowdowns.
 
diff --git a/docs/index.md b/docs/index.md
@@ -2,9 +2,9 @@
 
 # NeMo Gym Documentation
 
-NeMo Gym is a framework for building reinforcement learning (RL) training environments large language models (LLMs). Gym provides training environment development scaffolding and training environment patterns such as multi-step, multi-turn, and user modeling scenarios.
+NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides training environment development scaffolding and training environment patterns for multi-step, multi-turn, and user modeling scenarios.
 
-At the core of NeMo Gym are three server concepts: **Responses API Model servers** are model endpoints, **Resources servers** contain tool implementations and verification logic, and **Response API Agent servers** orchestrate the interaction between models and resources.
+NeMo Gym has three core server types: **Responses API Model servers** provide model endpoints, **Resources servers** contain tool implementations and verification logic, and **Responses API Agent servers** orchestrate interactions between models and resources.
 
 ## Quickstart
 
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
@@ -5,7 +5,7 @@
 Hands-on learning experiences that guide you through building, training, and deploying AI agents with NeMo Gym.
 
 :::{tip}
-**New to NeMo Gym?** Begin with the {doc}`Get Started <../get-started/index>` section for a guided tutorial experience from installation through your first verified agent. Return here after completing those tutorials to learn about advanced topics like additional rollout collection methods and training data generation. You can find the project repository on [GitHub](https://github.com/NVIDIA-NeMo/Gym).
+**New to NeMo Gym?** Begin with the {doc}`Get Started <../get-started/index>` section for a guided tutorial from installation through your first verified agent. Return here afterward to learn about advanced topics like additional rollout collection methods and training data generation. You can find the project repository on [GitHub](https://github.com/NVIDIA-NeMo/Gym).
 :::
 ---
 
@@ -46,9 +46,9 @@ Transform rollouts into training data for {term}`supervised fine-tuning (SFT) <S
 :::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` RL Training with NeMo RL
 :link: rl-training-with-nemo-rl
 :link-type: doc
-Train a model with NeMo RL. Learn how to set up NeMo Gym + NeMo RL training environment, run tests, prepare data, and launch single and multi-node training runs.
+Train a model with NeMo RL. Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepare data, and launch single-node and multi-node training runs.
 +++
-{bdg-secondary}`sft` {bdg-secondary}`dpo`
+{bdg-secondary}`rl` {bdg-secondary}`training`
 :::
 
 ::::
diff --git a/docs/tutorials/offline-training-w-rollouts.md b/docs/tutorials/offline-training-w-rollouts.md
@@ -59,11 +59,11 @@ This tutorial is **experimental** and may contain bugs. Proceed with caution.
 
 The offline training pipeline follows this logical flow:
 
-1. Collect rollouts using strategies from [Tutorial 5]
-- **SFT data**: Use consistent generation (low temperature, single rollout per task)
-- **DPO data**: Use diverse generation (higher temperature, 2 rollouts per task for comparison)
-1. Filter for quality - Remove poor rollouts before processing
-2. Format for training - Convert to SFT or DPO format based on your goals
+1. Collect rollouts
+   - **SFT data**: Use consistent generation (low temperature, single rollout per task)
+   - **DPO data**: Use diverse generation (higher temperature, 2 rollouts per task for comparison)
+2. Filter for quality - Remove poor rollouts before processing
+3. Format for training - Convert to SFT or DPO format based on your goals
 
 
 
diff --git a/docs/tutorials/rl-training-with-nemo-rl.md b/docs/tutorials/rl-training-with-nemo-rl.md
@@ -8,7 +8,7 @@ This tutorial is **experimental** and may contain bugs. Proceed with caution.
 
 **Goal**: Train a model with NeMo RL. Learn how to set up NeMo Gym + NeMo RL training environment, run tests, prepare data, and launch single and multi-node training runs!
 
-Multinode Slurm script and run command are at the bottom of this document. Do the single node setup first. Do not skip it. Throughout this tutorial, you can see mentions of "Penguin". This refers to Gym's codename before it was fully open-sourced.
+Multinode Slurm script and run command are at the bottom of this document. Complete the single-node setup first before proceeding to multi-node training. Throughout this tutorial, you may see mentions of "Penguin", which refers to Gym's codename before it was fully open-sourced.
 
 ## Single GPU node setup to ensure correctness