|
1 | | -# Rollout Collection |
| 1 | +(gs-collecting-rollouts)= |
2 | 2 |
|
3 | | -A {term}`rollout <Rollout / Trajectory>` is complete record of a task instance execution that captures: |
4 | | -- What the model was asked to do (input) |
5 | | -- How the model reasoned (internal processing) |
6 | | -- What tools were used (tool calls and tool responses) |
7 | | -- How well the task was achieved (verification scores) |
8 | | -- The final response (output to user) |
| 3 | +# Collecting Rollouts |
9 | 4 |
|
| 5 | +In the previous tutorial, you set up NeMo Gym and ran your first agent interaction. But to train an agent with reinforcement learning, you need hundreds or thousands of these interactions—each one scored and saved. That's what rollout collection does. |
10 | 6 |
|
11 | | -## Generating Your First Rollouts |
| 7 | +:::{card} |
12 | 8 |
|
13 | | -Let's generate rollouts using the **Example Multi Step** resource server, which tests reading comprehension across long documents. |
| 9 | +**Goal**: Generate your first batch of rollouts and understand how they become training data. |
14 | 10 |
|
15 | | -::::{tab-set} |
| 11 | +^^^ |
| 12 | + |
| 13 | +**In this tutorial, you will**: |
| 14 | + |
| 15 | +1. Run batch rollout collection |
| 16 | +2. Examine results with the rollout viewer |
| 17 | +3. Learn key parameters for scaling |
| 18 | + |
| 19 | +::: |
| 20 | + |
| 21 | +:::{button-ref} setup-installation |
| 22 | +:color: secondary |
| 23 | +:outline: |
| 24 | +:ref-type: doc |
| 25 | + |
| 26 | +← Previous: Setup and Installation |
| 27 | +::: |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## Before You Begin |
| 32 | + |
| 33 | +Make sure you have: |
| 34 | + |
| 35 | +- ✅ Completed [Setup and Installation](setup-installation.md) |
| 36 | +- ✅ Servers still running (or ready to restart them) |
| 37 | +- ✅ `env.yaml` configured with your OpenAI API key |
| 38 | +- ✅ Virtual environment activated |
| 39 | + |
| 40 | +**What's in a rollout?** A complete record of a task execution: the input, the model's reasoning and tool calls, the final output, and a verification score. |
| 41 | + |
| 42 | +--- |
| 43 | + |
| 44 | +## 1. Inspect the Data |
| 45 | + |
| 46 | +Look at the example dataset included with the Simple Weather resource server: |
16 | 47 |
|
17 | | -:::{tab-item} 1. Inspect data |
18 | 48 | ```bash |
19 | | -head -1 resources_servers/example_multi_step/data/example.jsonl | python -m json.tool |
| 49 | +head -1 resources_servers/example_simple_weather/data/example.jsonl | python -m json.tool |
20 | 50 | ``` |
21 | 51 |
|
22 | | -**What this dataset contains**: Complex reading comprehension tasks where agents must find specific information ("needles") within long documents ("haystacks"). |
23 | | - |
24 | | -Each line in the input JSONL file follows the schema below. |
25 | | - |
26 | | -**Key components**: |
27 | | -- **responses_create_params**: Original task and available tools. Required |
28 | | -- **metadata** (e.g. `expected_synonyms`, `minefield_label`, etc): Additional metadata used by the resources server to either setup or perform verification |
29 | | - |
30 | | -```json |
31 | | -{ |
32 | | - "responses_create_params": { |
33 | | - "input": [ |
34 | | - { |
35 | | - "role": "user", |
36 | | - "content": "What factors contribute to a region experiencing extremely high temperatures, and how do these factors interact?" |
37 | | - } |
38 | | - ] |
39 | | - }, |
40 | | - "expected_synonyms": [ |
41 | | - "Blazing", |
42 | | - "Warm" |
43 | | - ], |
44 | | - "minefield_label": "Hot" |
45 | | -} |
46 | | -``` |
| 52 | +Each line contains a `responses_create_params` object with: |
| 53 | + |
| 54 | +- **input**: The conversation messages (user query) |
| 55 | +- **tools**: Available tools the agent can use |
| 56 | + |
| 57 | +## 2. Verify Servers Are Running |
| 58 | + |
| 59 | +If you still have servers running from the [Setup and Installation](setup-installation.md) tutorial, proceed to the next step. |
| 60 | + |
| 61 | +If not, start them again: |
47 | 62 |
|
48 | | -::: |
49 | | -:::{tab-item} 2. Start servers |
50 | | -Start the example_multi_step agent server |
51 | 63 | ```bash |
52 | | -config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\ |
53 | | -resources_servers/example_multi_step/configs/example_multi_step.yaml" |
| 64 | +config_paths="resources_servers/example_simple_weather/configs/simple_weather.yaml,\ |
| 65 | +responses_api_models/openai_model/configs/openai_model.yaml" |
54 | 66 | ng_run "+config_paths=[${config_paths}]" |
55 | 67 | ``` |
56 | 68 |
|
57 | | -**✅ Success Check**: You should see 3 servers running including the `example_multi_step_simple_agent`. |
58 | | - |
59 | | -::: |
| 69 | +**✅ Success Check**: You should see 3 servers running including the `simple_weather_simple_agent`. |
60 | 70 |
|
61 | | -:::{tab-item} 3. Generate Rollouts |
| 71 | +## 3. Generate Rollouts |
62 | 72 |
|
63 | 73 | In a separate terminal, run: |
| 74 | + |
64 | 75 | ```bash |
65 | | -ng_collect_rollouts +agent_name=example_multi_step_simple_agent \ |
66 | | - +input_jsonl_fpath=resources_servers/example_multi_step/data/example.jsonl \ |
67 | | - +output_jsonl_fpath=results/example_multi_step_rollouts.jsonl \ |
| 76 | +ng_collect_rollouts +agent_name=simple_weather_simple_agent \ |
| 77 | + +input_jsonl_fpath=resources_servers/example_simple_weather/data/example.jsonl \ |
| 78 | + +output_jsonl_fpath=results/simple_weather_rollouts.jsonl \ |
68 | 79 | +limit=5 \ |
69 | 80 | +num_repeats=2 \ |
70 | | - +num_samples_in_parallel=3 \ |
71 | | - +responses_create_params.max_output_tokens=8192 |
| 81 | + +num_samples_in_parallel=3 |
| 82 | +``` |
| 83 | + |
| 84 | +```{list-table} Parameters |
| 85 | +:header-rows: 1 |
| 86 | +:widths: 35 15 50 |
| 87 | +
|
| 88 | +* - Parameter |
| 89 | + - Type |
| 90 | + - Description |
| 91 | +* - `+agent_name` |
| 92 | + - `str` |
| 93 | + - Which agent to use (required) |
| 94 | +* - `+input_jsonl_fpath` |
| 95 | + - `str` |
| 96 | + - Path to input JSONL file (required) |
| 97 | +* - `+output_jsonl_fpath` |
| 98 | + - `str` |
| 99 | + - Path to output JSONL file (required) |
| 100 | +* - `+limit` |
| 101 | + - `int` |
| 102 | + - Max examples to process (default: `null` = all) |
| 103 | +* - `+num_repeats` |
| 104 | + - `int` |
| 105 | + - Rollouts per example (default: `null` = 1) |
| 106 | +* - `+num_samples_in_parallel` |
| 107 | + - `int` |
| 108 | + - Concurrent requests (default: `null` = unlimited) |
72 | 109 | ``` |
73 | 110 |
|
74 | | -**What's happening**: |
75 | | -- `limit=5`: Process only the first 5 examples (for quick testing) |
76 | | -- `num_repeats=2`: Generate 2 rollouts per example (10 total rollouts) |
77 | | -- `num_samples_in_parallel=3`: Process 3 requests simultaneously |
78 | | -- `max_output_tokens=8192`: Allow longer responses for complex reasoning |
| 111 | +**✅ Success Check**: You should see: |
79 | 112 |
|
80 | | -::: |
| 113 | +```text |
| 114 | +Collecting rollouts: 100%|████████████████| 5/5 [00:08<00:00, 1.67s/it] |
| 115 | +``` |
| 116 | + |
| 117 | +## 4. View Rollouts |
81 | 118 |
|
82 | | -:::{tab-item} 4. View rollouts |
| 119 | +Launch the rollout viewer: |
83 | 120 |
|
84 | | -Launch the rollout viewer |
85 | 121 | ```bash |
86 | | -ng_viewer +jsonl_fpath=results/example_multi_step_rollouts.jsonl |
| 122 | +ng_viewer +jsonl_fpath=results/simple_weather_rollouts.jsonl |
87 | 123 | ``` |
88 | 124 |
|
89 | | -Then visit http://127.0.0.1:7860 |
90 | | - |
91 | | -**What you'll see**: An interactive viewer showing reasoning, tool calls, and verification scores for each rollout. |
92 | | - |
93 | | -**Key components**: |
94 | | -- **{term}`reward <Reward / Reward Signal>`**: Verification score from the resource server. Required on output |
95 | | -- **response**: Complete output conversation including tool calls and responses |
96 | | -- **metadata** (`parsed_synonym_values`, `set_overlap`, etc): Additional metrics for analysis |
97 | | - |
98 | | -```json |
99 | | -{ |
100 | | - "responses_create_params": { |
101 | | - "input": [ |
102 | | - { |
103 | | - "content": "What factors contribute to a region experiencing extremely high temperatures, and how do these factors interact?", |
104 | | - "role": "user", |
105 | | - "type": "message" |
106 | | - } |
107 | | - ] |
108 | | - }, |
109 | | - "response": { |
110 | | - "output": [ |
111 | | - { |
112 | | - "arguments": "{\"synonym\":\"Blazing\"}", |
113 | | - "name": "get_synonym_value", |
114 | | - "type": "function_call", |
115 | | - }, |
116 | | - "..." |
117 | | - ] |
118 | | - }, |
119 | | - "reward": 1.0, |
120 | | - "parsed_synonym_values": [ |
121 | | - 711, |
122 | | - 407 |
123 | | - ], |
124 | | - "accuracy": true, |
125 | | - "set_overlap": 1.0, |
126 | | - "original_term_minefield_hit": false, |
127 | | - "order_instruction_following_failure": false |
128 | | -} |
129 | | -``` |
| 125 | +Then visit <http://127.0.0.1:7860> |
| 126 | + |
| 127 | +The viewer shows each rollout with: |
| 128 | + |
| 129 | +- **Input**: The original query and tools |
| 130 | +- **Response**: Tool calls and agent output |
| 131 | +- **Reward**: Verification score (0.0–1.0) |
130 | 132 |
|
| 133 | +:::{important} |
| 134 | +**Where Do Reward Scores Come From?** |
| 135 | + |
| 136 | +Scores come from the `verify()` function in your resource server. Each rollout is automatically sent to the `/verify` endpoint during collection. The default returns 1.0, but you can implement custom logic to score based on tool usage, response quality, or task completion. |
131 | 137 | ::: |
132 | | -:::: |
133 | 138 |
|
| 139 | +--- |
134 | 140 |
|
135 | 141 | ## Rollout Generation Parameters |
136 | 142 |
|
137 | | -Essential |
| 143 | +::::{tab-set} |
| 144 | + |
| 145 | +:::{tab-item} Essential |
| 146 | + |
138 | 147 | ```bash |
139 | 148 | ng_collect_rollouts \ |
140 | 149 | +agent_name=your_agent_name \ # Which agent to use |
141 | 150 | +input_jsonl_fpath=input/tasks.jsonl \ # Input dataset |
142 | 151 | +output_jsonl_fpath=output/rollouts.jsonl # Where to save results |
143 | 152 | ``` |
144 | 153 |
|
145 | | -Data Control |
| 154 | +::: |
| 155 | + |
| 156 | +:::{tab-item} Data Control |
| 157 | + |
146 | 158 | ```bash |
147 | 159 | +limit=100 \ # Limit examples processed (null = all) |
148 | | - +num_repeats=3 \ # Rollouts per example (null = 1) |
| 160 | + +num_repeats=3 \ # Rollouts per example (null = 1) |
149 | 161 | +num_samples_in_parallel=5 # Concurrent requests (null = default) |
150 | 162 | ``` |
151 | 163 |
|
152 | | -Model Behavior |
| 164 | +::: |
| 165 | + |
| 166 | +:::{tab-item} Model Behavior |
| 167 | + |
153 | 168 | ```bash |
154 | 169 | +responses_create_params.max_output_tokens=4096 \ # Response length limit |
155 | 170 | +responses_create_params.temperature=0.7 \ # Randomness (0-1) |
156 | 171 | +responses_create_params.top_p=0.9 # Nucleus sampling |
157 | 172 | ``` |
| 173 | + |
| 174 | +::: |
| 175 | + |
| 176 | +:::: |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +## Next Steps |
| 181 | + |
| 182 | +You've completed the get-started tutorials. Your `simple_weather_rollouts.jsonl` file is training data ready for RL, SFT, or DPO pipelines. |
| 183 | + |
| 184 | +From here, explore the [Tutorials](../tutorials/index.md) for advanced topics or [Concepts](../about/concepts/index.md) for deeper understanding. |
0 commit comments