diff --git a/docs/tutorials/nemo-rl-grpo/single-node-training.md b/docs/tutorials/nemo-rl-grpo/single-node-training.md index 574920d5e..11670abff 100644 --- a/docs/tutorials/nemo-rl-grpo/single-node-training.md +++ b/docs/tutorials/nemo-rl-grpo/single-node-training.md @@ -64,7 +64,7 @@ The Nemotron Nano 9B v2 model uses a custom chat template that must be modified ```bash tokenizer_config_path=$(find $PWD/.cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2 -name tokenizer_config.json) sed -i 's/enable_thinking=true/enable_thinking=false/g' $tokenizer_config_path -sed -i 's/{%- if messages\[-1\]\['\''role'\''\] == '\''assistant'\'' -%}{%- set ns.last_turn_assistant_content = messages\[-1\]\['\''content'\''\].strip() -%}{%- set messages = messages\[:-1\] -%}{%- endif -%}//g' $tokenizer_config_path +sed -i 's/{%- if messages\[-1\]\['\''role'\'\'] == '\''assistant'\'' -%}{%- set ns.last_turn_assistant_content = messages\[-1\]\['\''content'\'\'].strip() -%}{%- set messages = messages\[:-1\] -%}{%- endif -%}//g' $tokenizer_config_path ``` **✅ Success Check**: The `sed` commands complete without errors. @@ -75,6 +75,10 @@ sed -i 's/{%- if messages\[-1\]\['\''role'\''\] == '\''assistant'\'' -%}{%- set **Estimated time**: ~15-30 minutes +:::{note} +The configuration file and training script referenced below are located in the NeMo RL repository that you cloned during {doc}`Setup `. Make sure you're in the `RL/` directory before running the training command. +::: + By default, this runs only 3 training steps (`grpo.max_num_steps=3`) as a small test run in preparation for multi-node training. If you are using a single node for the full training run, you can remove this value. The full training will take several hours. ```bash @@ -116,4 +120,4 @@ The end of the command above does the following: 2. `&`: This final ampersand runs the job in the background, which frees up your terminal to do other things. You can view all the background jobs using the `jobs` command. If you need to quit the training run, you can use the `fg` command to bring the job from the background into the foreground and then Ctrl+C like normal. ::: -**✅ Success Check**: Training completes 3 steps on single node without any issues. Check the logs for errors and verify that training steps are progressing. +**✅ Success Check**: Training completes 3 steps on single node without any issues. Check the logs for errors and verify that training steps are progressing. \ No newline at end of file