Skip to content

Commit b486708

Browse files
docs: Relate sections Get Started and Rollout Collection (#426)
This PR bridges the gap between `setup-installation.md` and (what should be) a continuation of using the same environment in `rollout-collection.md`, in this case `simple_weather` with tool calling. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: L.B. <llane@nvidia.com>
1 parent b984b2d commit b486708

File tree

1 file changed

+133
-106
lines changed

1 file changed

+133
-106
lines changed
Lines changed: 133 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,157 +1,184 @@
1-
# Rollout Collection
1+
(gs-collecting-rollouts)=
22

3-
A {term}`rollout <Rollout / Trajectory>` is complete record of a task instance execution that captures:
4-
- What the model was asked to do (input)
5-
- How the model reasoned (internal processing)
6-
- What tools were used (tool calls and tool responses)
7-
- How well the task was achieved (verification scores)
8-
- The final response (output to user)
3+
# Collecting Rollouts
94

5+
In the previous tutorial, you set up NeMo Gym and ran your first agent interaction. But to train an agent with reinforcement learning, you need hundreds or thousands of these interactions—each one scored and saved. That's what rollout collection does.
106

11-
## Generating Your First Rollouts
7+
:::{card}
128

13-
Let's generate rollouts using the **Example Multi Step** resource server, which tests reading comprehension across long documents.
9+
**Goal**: Generate your first batch of rollouts and understand how they become training data.
1410

15-
::::{tab-set}
11+
^^^
12+
13+
**In this tutorial, you will**:
14+
15+
1. Run batch rollout collection
16+
2. Examine results with the rollout viewer
17+
3. Learn key parameters for scaling
18+
19+
:::
20+
21+
:::{button-ref} setup-installation
22+
:color: secondary
23+
:outline:
24+
:ref-type: doc
25+
26+
← Previous: Setup and Installation
27+
:::
28+
29+
---
30+
31+
## Before You Begin
32+
33+
Make sure you have:
34+
35+
- ✅ Completed [Setup and Installation](setup-installation.md)
36+
- ✅ Servers still running (or ready to restart them)
37+
-`env.yaml` configured with your OpenAI API key
38+
- ✅ Virtual environment activated
39+
40+
**What's in a rollout?** A complete record of a task execution: the input, the model's reasoning and tool calls, the final output, and a verification score.
41+
42+
---
43+
44+
## 1. Inspect the Data
45+
46+
Look at the example dataset included with the Simple Weather resource server:
1647

17-
:::{tab-item} 1. Inspect data
1848
```bash
19-
head -1 resources_servers/example_multi_step/data/example.jsonl | python -m json.tool
49+
head -1 resources_servers/example_simple_weather/data/example.jsonl | python -m json.tool
2050
```
2151

22-
**What this dataset contains**: Complex reading comprehension tasks where agents must find specific information ("needles") within long documents ("haystacks").
23-
24-
Each line in the input JSONL file follows the schema below.
25-
26-
**Key components**:
27-
- **responses_create_params**: Original task and available tools. Required
28-
- **metadata** (e.g. `expected_synonyms`, `minefield_label`, etc): Additional metadata used by the resources server to either setup or perform verification
29-
30-
```json
31-
{
32-
"responses_create_params": {
33-
"input": [
34-
{
35-
"role": "user",
36-
"content": "What factors contribute to a region experiencing extremely high temperatures, and how do these factors interact?"
37-
}
38-
]
39-
},
40-
"expected_synonyms": [
41-
"Blazing",
42-
"Warm"
43-
],
44-
"minefield_label": "Hot"
45-
}
46-
```
52+
Each line contains a `responses_create_params` object with:
53+
54+
- **input**: The conversation messages (user query)
55+
- **tools**: Available tools the agent can use
56+
57+
## 2. Verify Servers Are Running
58+
59+
If you still have servers running from the [Setup and Installation](setup-installation.md) tutorial, proceed to the next step.
60+
61+
If not, start them again:
4762

48-
:::
49-
:::{tab-item} 2. Start servers
50-
Start the example_multi_step agent server
5163
```bash
52-
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
53-
resources_servers/example_multi_step/configs/example_multi_step.yaml"
64+
config_paths="resources_servers/example_simple_weather/configs/simple_weather.yaml,\
65+
responses_api_models/openai_model/configs/openai_model.yaml"
5466
ng_run "+config_paths=[${config_paths}]"
5567
```
5668

57-
**✅ Success Check**: You should see 3 servers running including the `example_multi_step_simple_agent`.
58-
59-
:::
69+
**✅ Success Check**: You should see 3 servers running including the `simple_weather_simple_agent`.
6070

61-
:::{tab-item} 3. Generate Rollouts
71+
## 3. Generate Rollouts
6272

6373
In a separate terminal, run:
74+
6475
```bash
65-
ng_collect_rollouts +agent_name=example_multi_step_simple_agent \
66-
+input_jsonl_fpath=resources_servers/example_multi_step/data/example.jsonl \
67-
+output_jsonl_fpath=results/example_multi_step_rollouts.jsonl \
76+
ng_collect_rollouts +agent_name=simple_weather_simple_agent \
77+
+input_jsonl_fpath=resources_servers/example_simple_weather/data/example.jsonl \
78+
+output_jsonl_fpath=results/simple_weather_rollouts.jsonl \
6879
+limit=5 \
6980
+num_repeats=2 \
70-
+num_samples_in_parallel=3 \
71-
+responses_create_params.max_output_tokens=8192
81+
+num_samples_in_parallel=3
82+
```
83+
84+
```{list-table} Parameters
85+
:header-rows: 1
86+
:widths: 35 15 50
87+
88+
* - Parameter
89+
- Type
90+
- Description
91+
* - `+agent_name`
92+
- `str`
93+
- Which agent to use (required)
94+
* - `+input_jsonl_fpath`
95+
- `str`
96+
- Path to input JSONL file (required)
97+
* - `+output_jsonl_fpath`
98+
- `str`
99+
- Path to output JSONL file (required)
100+
* - `+limit`
101+
- `int`
102+
- Max examples to process (default: `null` = all)
103+
* - `+num_repeats`
104+
- `int`
105+
- Rollouts per example (default: `null` = 1)
106+
* - `+num_samples_in_parallel`
107+
- `int`
108+
- Concurrent requests (default: `null` = unlimited)
72109
```
73110

74-
**What's happening**:
75-
- `limit=5`: Process only the first 5 examples (for quick testing)
76-
- `num_repeats=2`: Generate 2 rollouts per example (10 total rollouts)
77-
- `num_samples_in_parallel=3`: Process 3 requests simultaneously
78-
- `max_output_tokens=8192`: Allow longer responses for complex reasoning
111+
**✅ Success Check**: You should see:
79112

80-
:::
113+
```text
114+
Collecting rollouts: 100%|████████████████| 5/5 [00:08<00:00, 1.67s/it]
115+
```
116+
117+
## 4. View Rollouts
81118

82-
:::{tab-item} 4. View rollouts
119+
Launch the rollout viewer:
83120

84-
Launch the rollout viewer
85121
```bash
86-
ng_viewer +jsonl_fpath=results/example_multi_step_rollouts.jsonl
122+
ng_viewer +jsonl_fpath=results/simple_weather_rollouts.jsonl
87123
```
88124

89-
Then visit http://127.0.0.1:7860
90-
91-
**What you'll see**: An interactive viewer showing reasoning, tool calls, and verification scores for each rollout.
92-
93-
**Key components**:
94-
- **{term}`reward <Reward / Reward Signal>`**: Verification score from the resource server. Required on output
95-
- **response**: Complete output conversation including tool calls and responses
96-
- **metadata** (`parsed_synonym_values`, `set_overlap`, etc): Additional metrics for analysis
97-
98-
```json
99-
{
100-
"responses_create_params": {
101-
"input": [
102-
{
103-
"content": "What factors contribute to a region experiencing extremely high temperatures, and how do these factors interact?",
104-
"role": "user",
105-
"type": "message"
106-
}
107-
]
108-
},
109-
"response": {
110-
"output": [
111-
{
112-
"arguments": "{\"synonym\":\"Blazing\"}",
113-
"name": "get_synonym_value",
114-
"type": "function_call",
115-
},
116-
"..."
117-
]
118-
},
119-
"reward": 1.0,
120-
"parsed_synonym_values": [
121-
711,
122-
407
123-
],
124-
"accuracy": true,
125-
"set_overlap": 1.0,
126-
"original_term_minefield_hit": false,
127-
"order_instruction_following_failure": false
128-
}
129-
```
125+
Then visit <http://127.0.0.1:7860>
126+
127+
The viewer shows each rollout with:
128+
129+
- **Input**: The original query and tools
130+
- **Response**: Tool calls and agent output
131+
- **Reward**: Verification score (0.0–1.0)
130132

133+
:::{important}
134+
**Where Do Reward Scores Come From?**
135+
136+
Scores come from the `verify()` function in your resource server. Each rollout is automatically sent to the `/verify` endpoint during collection. The default returns 1.0, but you can implement custom logic to score based on tool usage, response quality, or task completion.
131137
:::
132-
::::
133138

139+
---
134140

135141
## Rollout Generation Parameters
136142

137-
Essential
143+
::::{tab-set}
144+
145+
:::{tab-item} Essential
146+
138147
```bash
139148
ng_collect_rollouts \
140149
+agent_name=your_agent_name \ # Which agent to use
141150
+input_jsonl_fpath=input/tasks.jsonl \ # Input dataset
142151
+output_jsonl_fpath=output/rollouts.jsonl # Where to save results
143152
```
144153

145-
Data Control
154+
:::
155+
156+
:::{tab-item} Data Control
157+
146158
```bash
147159
+limit=100 \ # Limit examples processed (null = all)
148-
+num_repeats=3 \ # Rollouts per example (null = 1)
160+
+num_repeats=3 \ # Rollouts per example (null = 1)
149161
+num_samples_in_parallel=5 # Concurrent requests (null = default)
150162
```
151163

152-
Model Behavior
164+
:::
165+
166+
:::{tab-item} Model Behavior
167+
153168
```bash
154169
+responses_create_params.max_output_tokens=4096 \ # Response length limit
155170
+responses_create_params.temperature=0.7 \ # Randomness (0-1)
156171
+responses_create_params.top_p=0.9 # Nucleus sampling
157172
```
173+
174+
:::
175+
176+
::::
177+
178+
---
179+
180+
## Next Steps
181+
182+
You've completed the get-started tutorials. Your `simple_weather_rollouts.jsonl` file is training data ready for RL, SFT, or DPO pipelines.
183+
184+
From here, explore the [Tutorials](../tutorials/index.md) for advanced topics or [Concepts](../about/concepts/index.md) for deeper understanding.

0 commit comments

Comments
 (0)