[RELEASE] Community reimplementation using TRL SFTTrainer

Hi, great work on VLA-0! The simplicity of representing actions as text tokens is really elegant.

I made a minimal reimplementation using Hugging Face TRL's SFTTrainer for my own learning:
https://github.com/MilkClouds/vla0-trl

A few notes:
- ~1,200 lines (relies heavily on TRL abstractions)
- Flash Attention enabled by default
- Tested on LIBERO, gets ~90% avg (slightly below paper results, likely due to hyperparameter differences)

Not intended as a replacement—just a simpler entry point for those already familiar with the HF/TRL ecosystem. All credit goes to the original authors.

If there's any concern about this, please let me know.

Thanks again for open-sourcing the code and weights!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE] Community reimplementation using TRL SFTTrainer #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RELEASE] Community reimplementation using TRL SFTTrainer #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions