Skip to content

Add NewtonBench Resource Server #650

Open
Kelvin0110 wants to merge 29 commits intoNVIDIA-NeMo:mainfrom
Kelvin0110:cmunley1/newton
Open

Add NewtonBench Resource Server #650
Kelvin0110 wants to merge 29 commits intoNVIDIA-NeMo:mainfrom
Kelvin0110:cmunley1/newton

Conversation

@Kelvin0110
Copy link

Contributing To NeMo-Gym (NewtonBench Resource Server)

1) Basic information

i. Description of the environment

A resource server wrapping the NewtonBench benchmark

  • Tasks: 324 scientific law discovery tasks across 12 physics domains.
  • Observation Space: Experimental results (numeric or structured dictionaries) returned after tool use.
  • Tools:
    • run_experiment: Query the environment with specific parameters to receive physical observations.
    • execute_python: (Optional) Python code-assisted discovery for complex data analysis.
  • Server: FastAPI resource server following NeMo Gym conventions.

ii. Description of the verification logic

The verifier uses the NewtonBench evaluation suite to score the agent's proposed scientific law:

  • Law Extraction: Attempts to find a law within <final_law> tags in the assistant's final response.
  • Success Criteria: Evaluates both symbolic equivalence (via an LLM judge) and numeric accuracy (Root Mean Square Logarithmic Error - RMSLE).
  • Reward Calculation:
    • reward = 0.3 * R_symbolic + 0.7 * R_numeric.
      • $R_{symbolic}$ is 1.0 if equivalent, -1.0 otherwise.
      • $R_{numeric} = 1.0 - (2.0 * \text{RMSLE} / (\text{RMSLE} + 3.0))$, yielding a score in $(-1, 1]$.
  • /verify endpoint processes the agent's submission and returns these detailed performance metrics.

iii. Description of the prompts/tasks (source + domain)

Domain: Maths (Scientific Law Discovery).
Source: Tasks and prompts adapted from the NewtonBench benchmark, which instruct the agent to discover a specific shifted scientific law (e.g., Newton's Law of Gravitation, Snell's Law) by performing interactive experiments.

iv. License information

  • Code: Apache 2.0.
  • Data: Apache 2.0
  • NewtonBench Benchmark: MIT (Copyright (c) 2025 HKUST-KnowComp).

2) Environment validity check

i. Commands used to collect rollouts

# Start NeMo Gym servers (agent + NewtonBench)
config_paths="resources_servers/newton_bench/configs/newton_bench.yaml,\
responses_api_models/vllm_model/configs/vllm_model.yaml"
ng_run "+config_paths=[$config_paths]"

# Collect sample rollouts
ng_collect_rollouts \
    +agent_name=newton_bench_simple_agent \
    +input_jsonl_fpath=resources_servers/newton_bench/data/example.jsonl \
    +output_jsonl_fpath=resources_servers/newton_bench/data/example_rollouts.jsonl \
    +limit=5

# View rollouts
ng_viewer +jsonl_fpath=resources_servers/newton_bench/data/example_rollouts.jsonl

ii. Resulting rollouts (5 examples)

See resources_servers/newton_bench/data/example_rollouts.jsonl
Expected behavior:

  • Agent performs several experiments, analyzes data, and submits a scientific law.
  • Successful discovery $\rightarrow$ positive reward ($\approx$ 1.0).
  • Failed discovery $\rightarrow$ reward $\approx$ 0.0 or negative.

3) Tests

i. Commands used to run the tests

source resources_servers/newton_bench/.venv/bin/activate 
pytest resources_servers/newton_bench/tests/test_app.py

Coverage notes:
Resource server tests provide comprehensive coverage of the following areas:

  • Session Lifecycle: Successful seeding, error handling for invalid modules, session ending, and background cleanup.
  • Experiment Execution: Dynamic handler registration for each modules, basic run experiment execution, and error handling for uninitialized sessions, mismatched module calls, etc.
  • Python Sandbox: Basic execution, session-based code persistence, timeout enforcement, and security validation (restricting dangerous imports/operations).
  • Verification Logic: Law extraction from diverse response structures, and reward calculation via symbolic equivalence (LLM judge) and numeric RMSLE.

4) Reward profiling

Models: Qwen/Qwen3-VL-8B-Thinking

Method:

  • 108 prompts based on version v0 of scientific laws.
  • 4 rollouts per prompt (432 total).
  • Tool calling of run_experiment enabled and agent loops until law submission.

Results:
Overall Metrics

  • Total Rollouts: 432
  • Mean Reward: $\approx$ 0.0675
  • Median Reward: 0.0
  • Min Reward: $\approx$ -0.8786
  • Max Reward: 1.0

Tool Call Statistics

  • Average Tool Calls: 22.95 per rollout
  • Min Tool Calls: 0
  • Max Tool Calls: 1770
  • Correlation (tool calls $\leftrightarrow$ reward): $\approx$ -0.0211 (Weak negative correlation)

Reward Distribution (Buckets)

Reward Range Count
[-1.0, -0.8) 16
[-0.8, -0.6) 16
[-0.6, -0.4) 60
[-0.4, -0.2) 39
[-0.2, 0.0) 24
[0.0, 0.2) 150
[0.2, 0.4) 46
[0.4, 0.6) 2
[0.6, 0.8) 1
[0.8, 1.0] 78

Performance by Tool Call Count Bins

Tool Call Range Rollouts (n) Mean Reward
0 23 $\approx$ -0.1112
1–10 329 $\approx$ 0.0824
11–50 60 $\approx$ 0.1308
51–200 15 $\approx$ -0.1959
201–2000 5 $\approx$ -0.0600

Key observations:

  • Symbolic Accuracy: Approximately 19.7% symbolic accuracy and a widespread RMSLE distribution indicate frequent failures to recover exact symbolic forms or precise numeric behavior.
  • Reward Distribution: Rewards cluster near zero (median 0.0, mean ~0.0675) with a long tail and many negative outcomes, reflecting frequent partial or failed discoveries.
  • Tool Usage Sweet Spot: Positive performance is observed with moderate tool use (1–50 calls), with a peak in the 11–50 range, suggesting that tool-driven data collection is critical for inducing scientific laws.
  • Diminishing Returns: Performance declines sharply after 50 calls, showing additional tool calls become detrimental and successful discovery depends on reasoning and hypothesis selection rather than raw data volume.

cmunley1 and others added 27 commits October 29, 2025 11:46
Signed-off-by: cmunley1 <cmunley@nvidia.com>
commit 647d1e5
Author: fsiino-nvidia <fsiino@nvidia.com>
Date:   Fri Dec 19 18:40:39 2025 -0800

    Remove PlainTextResponse response_class (NVIDIA-NeMo#544)

    https://nvidia.slack.com/archives/C08TG7CLEGY/p1766191655660079

    Initially in NVIDIA-NeMo#290 , the `response_class=PlainTextResponse` was added to
    the `/global_config_dict_yaml` endpoint of the HeadServer as an attempt
    to debug parsing server info for the `ng_status` command. This lead to a
    parsing error in `load_from_global_config`. This command now uses it's
    own separate endpoint `server_instances`, so this needs to be removed.

    Signed-off-by: Frankie Siino <fsiino@nvidia.com>

commit f250e0c
Author: cmunley1 <cmunley@nvidia.com>
Date:   Fri Dec 19 16:38:29 2025 -0800

    docs: remove trl docs (NVIDIA-NeMo#543)

    remove trl from docs, leaving just unsloth.

    was unclear that they are together.

    will make a trl section when we have a standalone trl notebook, or a
    section on trl's docs too.

    ---------

    Signed-off-by: Christian Munley <cmunley@nvidia.com>

commit 34a2b0f
Author: cmunley1 <cmunley@nvidia.com>
Date:   Fri Dec 19 14:01:56 2025 -0800

    add unsloth and trl to docs  (NVIDIA-NeMo#536)

    adds a section for single-step training with unsloth and trl

    not sure if these should be broken into separate sections. Left as one
    since the same notebook works for both, but could be confusing.

    not sure if we should also add more info about multi-step (hopefully)
    coming soon.

    Signed-off-by: Christian Munley <cmunley@nvidia.com>

commit 146b1a5
Author: cmunley1 <cmunley@nvidia.com>
Date:   Fri Dec 19 12:56:33 2025 -0800

    python flag for colab venv installation (NVIDIA-NeMo#526)

    need to set uv pip install python flag in colab environments when
    launching servers

    usage: `ng_run "+config_paths=[...]" +uv_pip_set_python=true `

    defaults to false

    For NVIDIA-NeMo#370

    Needed for notebook here:
    https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym

    ---------

    Signed-off-by: Christian Munley <cmunley@nvidia.com>

commit ba2153a
Author: cmunley1 <cmunley@nvidia.com>
Date:   Fri Dec 19 10:42:44 2025 -0800

    Salesforce xlam-function-calling-60k resources server (NVIDIA-NeMo#262)

    function calling resources server based on
    https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k

    ---------

    Signed-off-by: Christian Munley <cmunley@nvidia.com>
    Signed-off-by: cmunley1 <cmunley@nvidia.com>

commit 29d3511
Author: pjin-nvidia <pjin@nvidia.com>
Date:   Fri Dec 19 10:28:28 2025 -0800

    VLLMModel supports chat template kwargs (NVIDIA-NeMo#538)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>
    Signed-off-by: Peter Jin <pjin@nvidia.com>

commit 7d8fdda
Author: fsiino-nvidia <fsiino@nvidia.com>
Date:   Wed Dec 17 18:38:18 2025 -0800

    List running server health and status (NVIDIA-NeMo#290)

    This implements the `ng_status` command to list all running servers on
    the system and ping for health check.

    ---------

    Signed-off-by: Frankie Siino <fsiino@nvidia.com>

commit 076d002
Author: fsiino-nvidia <fsiino@nvidia.com>
Date:   Tue Dec 16 10:25:14 2025 -0800

    Debug server package versions (NVIDIA-NeMo#406)

    Adds `ng_pip_list` command to see the underlying uv pip list of the
    specified environment.

    ---------

    Signed-off-by: Frankie Siino <fsiino@nvidia.com>

commit c192ee4
Author: Lawrence Lane <llane@nvidia.com>
Date:   Tue Dec 16 12:19:31 2025 -0500

    docs settings update (NVIDIA-NeMo#525)

    Signed-off-by: Lawrence Lane <llane@nvidia.com>

commit 8ca39d6
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Mon Dec 15 19:56:03 2025 -0800

    docs: Miscellaneous GRPO tutorial fixes (NVIDIA-NeMo#512)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit 1539b2b
Author: Lawrence Lane <llane@nvidia.com>
Date:   Mon Dec 15 18:28:11 2025 -0500

    docs: redirect setup (NVIDIA-NeMo#513)

    Signed-off-by: Lawrence Lane <llane@nvidia.com>
    Signed-off-by: Brian Yu <bxyu@nvidia.com>
    Co-authored-by: Brian Yu <bxyu@nvidia.com>

commit 96ccdfc
Author: cmunley1 <cmunley@nvidia.com>
Date:   Mon Dec 15 14:31:59 2025 -0800

    reasoning-gym resource server (NVIDIA-NeMo#113)

    single turn tasks across various domains: "Reasoning Gym is a
    community-created Python library of procedural dataset generators and
    algorithmically verifiable reasoning environments for training reasoning
    models with reinforcement learning (RL). The goal is to generate
    virtually infinite training data with adjustable complexity.

    It currently provides more than 100 tasks over many domains, including
    but not limited to algebra, arithmetic, computation, cognition,
    geometry, graph theory, logic, and many common games."

    Tested all 100+ environments for errors, and tested training on many,
    demonstrated convergence.

    This dataset of 100+ environments is also used in ProRL
    (https://arxiv.org/abs/2505.24864)

    ---------

    Signed-off-by: cmunley1 <cmunley@nvidia.com>
    Signed-off-by: Christian Munley <cmunley@nvidia.com>
    Co-authored-by: ARC Bot <arc-bot@example.com>

commit 8c4c5e3
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Sun Dec 14 16:38:21 2025 -0800

    Bump to v0.2.0 (NVIDIA-NeMo#510)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit 3897ff4
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Sun Dec 14 16:28:58 2025 -0800

    Change to v0.1.1 release version (NVIDIA-NeMo#509)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit b1bf0f4
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Sun Dec 14 16:24:49 2025 -0800

    Update dataset configs with HuggingFace links (NVIDIA-NeMo#508)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit 9a9177e
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Sun Dec 14 16:12:06 2025 -0800

    docs: End-to-end GRPO Training with NeMo RL tutorial [master branch] (NVIDIA-NeMo#481)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>
    Signed-off-by: Lawrence Lane <llane@nvidia.com>
    Signed-off-by: Frankie Siino <fsiino@nvidia.com>
    Co-authored-by: L.B. <llane@nvidia.com>
    Co-authored-by: Frankie Siino <fsiino@nvidia.com>

commit d3646c5
Author: Chris Wing <cwing@nvidia.com>
Date:   Fri Dec 12 12:20:25 2025 -0800

    Reorder README structure (NVIDIA-NeMo#501)

    move available environments higher up in the README after the quickstart

    Signed-off-by: Chris Wing <cwing@nvidia.com>

commit b9cf8b2
Author: Chris Wing <cwing@nvidia.com>
Date:   Fri Dec 12 08:13:32 2025 -0800

    Simplify contributing.md (NVIDIA-NeMo#500)

    added links to contribute section of docs site and removed redundant
    content.
    links need to be verified after NVIDIA-NeMo#498 is merged to main

    ---------

    Signed-off-by: Chris Wing <cwing@nvidia.com>
    Signed-off-by: Lawrence Lane <llane@nvidia.com>
    Co-authored-by: Lawrence Lane <llane@nvidia.com>

commit eabcbcf
Author: Chris Wing <cwing@nvidia.com>
Date:   Fri Dec 12 07:43:04 2025 -0800

    FAQ cleanup (NVIDIA-NeMo#499)

    This PR removes redundant content from the FAQ and better organizes the
    documentation structure.

    **Removed redundant FAQ sections** now covered in dedicated
    documentation:
    - `ng_version` → `docs/reference/cli-commands.md`
    - Config anatomy → `docs/reference/configuration.md` (section was
    incomplete TODO)
    - DCO and commit signing → `CONTRIBUTING.md` and
    `docs/contribute/development-setup.md`
    - Copyright errors → `docs/contribute/development-setup.md`
    - CI/CD requirements → `docs/contribute/development-setup.md`

    **Reorganized FAQ placement:**
    - Moved `docs/how-to-faq.md` → `docs/reference/faq.md` (consistent with
    other reference docs)
    - Repositioned FAQ to bottom of Reference section (after Configuration,
    CLI Commands, API Reference)
    - Updated intro to clarify FAQ provides quick answers while
    comprehensive docs are developed

    ---------

    Signed-off-by: Chris Wing <cwing@nvidia.com>
    Co-authored-by: Lawrence Lane <llane@nvidia.com>

commit fc59615
Author: Chris Wing <cwing@nvidia.com>
Date:   Fri Dec 12 07:38:48 2025 -0800

    Add environment contribution docs (NVIDIA-NeMo#498)

    Signed-off-by: Lawrence Lane <llane@nvidia.com>
    Signed-off-by: Chris Wing <cwing@nvidia.com>
    Co-authored-by: Lawrence Lane <llane@nvidia.com>
    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

commit 39ee39e
Author: Chris Wing <cwing@nvidia.com>
Date:   Thu Dec 11 15:52:13 2025 -0800

    Docs: Contribution Home & Dev Setup (NVIDIA-NeMo#494)

    Added types of contribution to contribution overview and replicated dev
    setup instructions from contributing.md to docs

    ---------

    Signed-off-by: Chris Wing <cwing@nvidia.com>
    Signed-off-by: Lawrence Lane <llane@nvidia.com>
    Co-authored-by: Lawrence Lane <llane@nvidia.com>

commit aa48c20
Author: Chris Wing <cwing@nvidia.com>
Date:   Thu Dec 11 14:16:47 2025 -0800

    improve framing of training framework integration guide for contributing (NVIDIA-NeMo#493)

    Make it more clear this guide is for contributing training framework
    integrations

    Signed-off-by: Chris Wing <cwing@nvidia.com>

commit a4cfd5e
Author: pjin-nvidia <pjin@nvidia.com>
Date:   Thu Dec 11 13:31:09 2025 -0800

    Misc rollout fixes (NVIDIA-NeMo#447)

    Signed-off-by: Peter Jin <pjin@nvidia.com>

commit def5fdd
Author: L.B. <llane@nvidia.com>
Date:   Thu Dec 11 15:00:38 2025 -0500

    docs: contribute section (NVIDIA-NeMo#490)

    - move training content into new contribute section
    - create contributing overview page
    - add contributing section on home page with link to RL integrations
    content hub

    ---------

    Signed-off-by: Lawrence Lane <llane@nvidia.com>

commit 8f4d638
Author: L.B. <llane@nvidia.com>
Date:   Thu Dec 11 14:17:03 2025 -0500

    docs: move FAQ (NVIDIA-NeMo#489)

    moves how-to-faq to render under "references" and display as FAQ. no
    material changes to the content.

    Signed-off-by: Lawrence Lane <llane@nvidia.com>

commit 54b21db
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Thu Dec 11 10:27:28 2025 -0800

    Fix NeMo Gym Pyproject links (NVIDIA-NeMo#486)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit 82f0f0c
Author: fsiino-nvidia <fsiino@nvidia.com>
Date:   Thu Dec 11 10:18:58 2025 -0800

    More single tool call filename updates cont (NVIDIA-NeMo#484)

    Signed-off-by: Frankie Siino <fsiino@nvidia.com>

commit 8654ecf
Author: L.B. <llane@nvidia.com>
Date:   Wed Dec 10 22:08:20 2025 -0500

    docs: home pg, quickstart move, gh icon (NVIDIA-NeMo#463)

    - adds GH icon + link to global top nav
    - rebuilds the home page to standard layout
    - adds CTA to quickstart and tutorials
    - moves quickstart into get started
    - clarifies differences between the quickstart and more detailed
    onboarding materials

    ---------

    Signed-off-by: Lawrence Lane <llane@nvidia.com>
    Signed-off-by: Chris Wing <cwing@nvidia.com>
    Co-authored-by: Chris Wing <cwing@nvidia.com>

commit c345e5d
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Wed Dec 10 19:05:20 2025 -0800

    Fix duplicate reference sections (NVIDIA-NeMo#483)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit be25806
Author: fsiino-nvidia <fsiino@nvidia.com>
Date:   Wed Dec 10 17:24:13 2025 -0800

    docs: Fix wrong count vs actual (NVIDIA-NeMo#482)

    Signed-off-by: Frankie Siino <fsiino@nvidia.com>

commit a3417ce
Author: fsiino-nvidia <fsiino@nvidia.com>
Date:   Wed Dec 10 16:58:55 2025 -0800

    More single tool call filename updates (NVIDIA-NeMo#480)

    Signed-off-by: Frankie Siino <fsiino@nvidia.com>

commit 25808bf
Author: fsiino-nvidia <fsiino@nvidia.com>
Date:   Wed Dec 10 16:36:05 2025 -0800

    Rename examples simple_weather and stateful_counter (NVIDIA-NeMo#479)

    Signed-off-by: Frankie Siino <fsiino@nvidia.com>

commit bf0b0c5
Author: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Date:   Wed Dec 10 15:44:25 2025 -0800

    Expose server host and port in dataset viewer CLI (NVIDIA-NeMo#476)

    Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/126

    @bxyu-nvidia Per the issue, the PR also changes the default
    `server_host` to `0.0.0.0` (accessible from everywhere). But I would
    advise against this for security reasons. I think keeping the default to
    `127.0.0.1` is the right call even if the user needs to modify the
    command to access the server.

    ---------

    Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

commit 993543a
Author: pjin-nvidia <pjin@nvidia.com>
Date:   Wed Dec 10 14:38:22 2025 -0800

    Miscellaneous infra improvements/fixes (NVIDIA-NeMo#317)

    should resolve NVIDIA-NeMo#342

    Signed-off-by: Brian Yu <bxyu@nvidia.com>
    Co-authored-by: Brian Yu <bxyu@nvidia.com>
    Co-authored-by: Peter Jin <pjin@nvidia.com>

commit 845bf71
Author: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Date:   Wed Dec 10 14:15:07 2025 -0800

    pyproject typos and grammar fixes (NVIDIA-NeMo#473)

    Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/132

    Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

commit 81a0013
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Wed Dec 10 14:11:08 2025 -0800

    docs: Improve server reference info (NVIDIA-NeMo#474)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit 1d78f22
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Wed Dec 10 13:50:27 2025 -0800

    Bug: inconsistent documentation around servers running (NVIDIA-NeMo#472)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit 9f26473
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Wed Dec 10 13:25:42 2025 -0800

    docs: Training framework integration (NVIDIA-NeMo#439)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit f67fa48
Author: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Date:   Wed Dec 10 13:19:24 2025 -0800

    Remove penguin references (NVIDIA-NeMo#469)

    After this PR, the only remaining penguin references are in the NeMo-RL
    tutorial, but these should be fixed with tutorial rewrite.

    Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/131

    Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

commit eecb93c
Author: L.B. <llane@nvidia.com>
Date:   Wed Dec 10 16:13:44 2025 -0500

    docs(readme): fix Example Resource Servers table - correct Multi Step… (NVIDIA-NeMo#464)

    Update 'Demonstrates' column for Multi Step example:
    - Before: Instruction_Following example
    - After: Multi-step tool calling

    Fixes NVIDIA-NeMo#417

    ---------

    Signed-off-by: Brian Yu <bxyu@nvidia.com>
    Co-authored-by: Brian Yu <bxyu@nvidia.com>

commit 0e367c2
Author: Sanjay Kariyappa <sanjaykariyappa@users.noreply.github.com>
Date:   Thu Dec 11 02:38:51 2025 +0530

    add calendar env for multi-turn IF (NVIDIA-NeMo#297)

    This PR introduces the **Calendar Resource Server**, a new training
    environment that challenges models to schedule multiple events on a
    calendar while satisfying complex temporal constraints. The constraints
    are mentioned in a multi-turn conversation format (generated
    synthetically using a role-playing model). Achieving high performance on
    this benchmark requires the model to satisfy constraints mentioned in
    different user turns. When trained on this synthetic dataset, we observe
    an improvement in the model's multi-turn instruction following ability.

    The Calendar environment simulates a realistic scheduling task where an
    AI agent must:
    - Schedule multiple events within a working day time window
    - Satisfy various temporal constraints:
      - **"before"**: Event must end before a specific time
      - **"after"**: Event must start after a specific time
      - **"between"**: Event must start and end within a time window
      - **"at"**: Event must start at an exact time
    - Ensure no time conflicts between events
    - Match exact event durations
    - Stay within global min/max time boundaries

    This environment tests an agent's ability to:
    - Parse and understand natural language constraints.
    - Follow instructions that are mentioned in multiple user messages.
    - Infer scheduling conflicts and satisfy multiple constraints
    simultaneously.
    - Perform temporal reasoning and arithmetic.
    - **4 constraint types**: before, after, between, at
    - **Time window enforcement**: Global min/max boundaries for all events
    - **Conflict detection**: Automatic validation of event overlaps
    - **Duration matching**: Exact duration requirements per event
    The server includes a robust verification pipeline that:
    - Extracts JSON schedules from model responses
    - Validates all temporal constraints
    - Detects overlapping events
    - Returns binary rewards (1 for valid, 0 for invalid)
    - Filters out responses with thinking tags (`<think>`)
    - Script to generate diverse scheduling scenarios
    - Configurable number of events and constraint types
    - Natural language constraint descriptions
    - Validation data included
    - Tests for each constraint type (valid and violation cases)
    - Edge cases: empty schedules, wrong event counts, time conflicts
    - Complex multi-event scenarios
    Qwen3-8b shows steady improvement in rewards when trained with GRPO with
    a dataset of 4K synthetic samples. Wandb logs are below.

    https://wandb.ai/nvidia/skariyappa-nemo-gym-rl-integration/runs/t4v06nbg
    https://wandb.ai/nvidia/skariyappa-nemo-gym-rl-integration/runs/70yc23ew
    https://wandb.ai/nvidia/skariyappa-nemo-gym-rl-integration/runs/1jnwuhi3

    ---------

    Signed-off-by: Sanjay Kariyappa <skariyappa@nvidia.com>
    Signed-off-by: Brian Yu <bxyu@nvidia.com>
    Co-authored-by: Brian Yu <bxyu@nvidia.com>

commit a182171
Author: bxyu-nvidia <bxyu@nvidia.com>
Date:   Wed Dec 10 12:58:07 2025 -0800

    Explain where the name Gym comes from; Gym Key Terminology doc is missing some of the old material (NVIDIA-NeMo#470)

    Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit d8ecb8b
Author: Chris Wing <cwing@nvidia.com>
Date:   Wed Dec 10 10:56:29 2025 -0800

    Add benefits to About page aligned with README (NVIDIA-NeMo#452)

    Fixes NVIDIA-NeMo#451

    Signed-off-by: Chris Wing <cwing@nvidia.com>

commit e08906c
Author: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Date:   Wed Dec 10 10:35:33 2025 -0800

    docs: Moved configuration system under about (NVIDIA-NeMo#420)

    Moved configuration systems under "About" instead of "About>Concepts".
    Also removed configuration mentions and examples from core abstraction
    pages

    Closes NVIDIA-NeMo#392 and
    NVIDIA-NeMo#393

    ---------

    Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
    Signed-off-by: L.B <llane@nvidia.com>
    Signed-off-by: Chris Wing <cwing@nvidia.com>
    Signed-off-by: Brian Yu <bxyu@nvidia.com>
    Co-authored-by: L.B <llane@nvidia.com>
    Co-authored-by: Chris Wing <cwing@nvidia.com>
    Co-authored-by: Brian Yu <bxyu@nvidia.com>

commit 7aa8306
Author: Chris Wing <cwing@nvidia.com>
Date:   Wed Dec 10 05:59:38 2025 -0800

    Add Data Designer and links to ecosystem page (NVIDIA-NeMo#462)

    Fixes NVIDIA-NeMo#450

    Signed-off-by: Chris Wing <cwing@nvidia.com>

commit 287d08d
Author: Chris Wing <cwing@nvidia.com>
Date:   Tue Dec 9 12:45:35 2025 -0800

    Change NeMo Gym from framework to library (NVIDIA-NeMo#456)

    Changed description of NeMo Gym from a framework to library for
    consistency across NeMo products

    Signed-off-by: Chris Wing <cwing@nvidia.com>
- Add math to pre-imported libraries
- Implement session TTL and background cleanup for expired sessions
- Update the dataset tool description to reflect available libraries
- Expanded the README with detailed instructions for dataset generation, rollout collection, and testing
- Added example_rollouts.jsonl and updated example.jsonl
- Improved generate_dataset.py to support new CLI options for dataset customization
@Kelvin0110 Kelvin0110 requested a review from a team as a code owner February 5, 2026 07:53
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cmunley1 cmunley1 self-requested a review February 5, 2026 22:41
@cmunley1
Copy link
Contributor

cmunley1 commented Feb 5, 2026

can you please merge main?

@Kelvin0110
Copy link
Author

Sure, I’ve merged the latest main branch. Please let me know if you’d like me to take any further steps.

@cmunley1
Copy link
Contributor

cmunley1 commented Feb 6, 2026

have you tried training with NeMo RL (ideally we can test training before merging)? Also, I see you used a vision language model, does anything require vision here (not an issue, just curious) ?

@cmunley1
Copy link
Contributor

cmunley1 commented Feb 6, 2026

@cmunley1
Copy link
Contributor

cmunley1 commented Feb 6, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants