Skip to content

Conversation

@tohtana
Copy link

@tohtana tohtana commented Sep 11, 2025

This PR adds a template that demonstrates how to use Ray Train with DeepSpeed.

At a high level, this template covers:

  • A hands-on example of fine-tuning a language model.
  • Saving and loading model checkpoints with Ray Train and DeepSpeed.
  • Key DeepSpeed configurations (ZeRO stages, offloading, mixed precision).

Included artifacts:

  • README.md with step-by-step instructions.
  • A Jupyter notebook for interactive experimentation.
  • A standalone Python script for end-to-end runs.

Tested environment: 1 CPU head node + 2 NVIDIA T4 GPU worker nodes.

tohtana and others added 7 commits September 11, 2025 00:15
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant