Questions about experiment results in the paper #44

XinyuZeng · 2025-10-13T08:41:14Z

XinyuZeng
Oct 13, 2025

For table 1, both LeRobot and Robo-DM use AV1 codec with the same CRF, then why Robo-DM is 1-4x smaller than LeRobot?
For Fig 4, why RLDS shows scalability but the others do not?
IIUC, the benefit of Robo-DM shows in Fig4 and Fig5 come from (a) video compression codec, and (b) caching via mmap. However the paper says "the HDF5 data is uncompressed and loaded in high disk throughput". However, RLDS also has bad compression, and yet why does RLDS have bad throughput?

Really appreciate it if you could help clarify those. Thanks! @KeplerC

KeplerC · 2025-10-13T19:39:14Z

KeplerC
Oct 13, 2025
Maintainer

Thank you for your comments! You may find commit 5bbb8b for more information. The current main branch is undergoing heavy refactor to make it easy to use. but happy to provide intuition.

For table 1, both LeRobot and Robo-DM use AV1 codec with the same CRF, then why Robo-DM is 1-4x smaller than LeRobot?

LeRobot (at least the time of benchmarking at the paper) indexes and stores the timestamp in float for every value and sync at run time - this can be bypassed with two ways (1) efficient encoding of time with binary-friendly storage format that stores with relative times, this saves a lot of space on saving the timestamps of the mapping of the video frames and action data (2) using metadata to store a fixed frequency can further saves the space, the data can be saved in chunks with parquet or other formats

For Fig 4, why RLDS shows scalability but the others do not?

I haven't done further analysis since that was the scale of the original datasets. Now it's probably worth to re-do some new and larger datasets, but it was a few classic ones at the time of paper writing. But I do agree with you that the RLDS is efficient in large scale datasets from what I heard (no experimental data here, but Octo paper seems to say so). I'm unable to extrapolate further on the data to make any claims. I'm currently rewriting as Ray dataset to improve the scalability.

IIUC, the benefit of Robo-DM shows in Fig4 and Fig5 come from (a) video compression codec, and (b) caching via mmap. However the paper says "the HDF5 data is uncompressed and loaded in high disk throughput". However, RLDS also has bad compression, and yet why does RLDS have bad throughput?

The intuition is to keep both the memory-disk throughput and CPU busy for RoboDM. HDF5 is definitely large and sometimes slow if loading with high throughput, this is what you observed in Fig4 in the previous question. RLDS seems to be a strange case for sequential access, which indeed it overuses a lot of CPU at the time of testing

1 reply

XinyuZeng Oct 14, 2025
Author

Thanks for the quick reply! For Fig 4, is it possible that RLDS allow parallel reading of multiple episodes at the same time while the others do not?

RLDS overuses a lot of CPU

I think this is because RLDS does not use video codecs but relies on protobuf. The lessons learned from the experiments is that you either do not do compression at all (HDF5), or do video-specific compression (LeRobot&RoboDM).

XinyuZeng · 2025-11-04T08:17:21Z

XinyuZeng
Nov 4, 2025
Author

@KeplerC May I ask what's the difference between commit a35a695 and 5bbb8b ?

0 replies

KeplerC · 2025-11-05T02:18:23Z

KeplerC
Nov 5, 2025
Maintainer

I think this is because RLDS does not use video codecs but relies on protobuf. The lessons learned from the experiments is that you either do not do compression at all (HDF5), or do video-specific compression (LeRobot&RoboDM).
Agreed. This is a great insight.

May I ask what's the difference between commit a35a695 and 5bbb8b ?
the main difference is that I hope to use ray dataset to implement the caching and pre-fetch
(1) prefetch buffer: previously I hand-implemented a separate process to prefetch the trajectories
(2) caching: previously I had a hand-implemented mmap cache (two versions, one with raw mmap, another with hdf5)
for these two points, I think ray dataset has the potential to do it (although the materialize functions can be a little bit suboptimal for shuffling), because the hand-implemented one is sometimes hard to maintain. let me know if the current ray dataset implementation is broken. I'm also open to suggestions / alternatives.

0 replies

XinyuZeng · 2025-11-07T03:15:17Z

XinyuZeng
Nov 7, 2025
Author

Thanks for your reply! I encounterd another issue. Under 5bbb8b, when using the https://github.com/BerkeleyAutomation/robodm/blob/5bbb8bdc8bc8e31b723bb2af319e8ac970beba45/examples/openx_loader.py to convert RLDS to VLA format in lossy version, some trajectories will have file size 0. Do you have an idea about the reason? Thanks!

1 reply

XinyuZeng Nov 7, 2025
Author

gentle ping @KeplerC

KeplerC · 2025-11-07T07:29:02Z

KeplerC
Nov 7, 2025
Maintainer

Does this happen to all RLDS or just a specific dataset? What's the minimal way of reproducing? Have you tried the official OXE downloaded from google research (it might be hardcoded with OXE if that’s the issue)? Also does it happen to flushing process or before flushing? Recently I’m maxed out by a few deadlines so I can only be responsive as possible, but unfortunately debugging older commit might not be the highest priority. My current priority list after my deadline is (1) solidify the current ray dataset implementation, (2) connect robodm to a more general dataset converter and loader (3) usability (e.g. partitions, multiprocess recording, ROS2 integration, visualization), CICD. Feel free to contribute or advise. If you need to use robodm as a baseline, feel free to use the main branch as it should be more stable. If you use all the compression, etc, it should come with what the original commit has. Dataloading throughput should also be maxed out as far as I tested.

…

On Nov 6, 2025, at 11:06 PM, Xinyu Zeng ***@***.***> wrote: gentle ping @KeplerC <https://github.com/KeplerC> — Reply to this email directly, view it on GitHub <#44 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAAQTZ627SLGSH7MVIWLJD33RAHFAVCNFSM6AAAAACJARA6PGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOBZHEZTKMQ>. You are receiving this because you were mentioned.

1 reply

XinyuZeng Nov 7, 2025
Author

Thanks for the quick response! It is the nyu_door_opening_surprising_effectiveness dataset (and I only tested this one). I will try to switch to the main branch, but it lacks the original benchmarking scripts (and perhaps also the lerobot loader). Maybe I can cherrypick the benchmarking scripts from mkv branch to the main branch. Take your time with the deadlines!

KeplerC · 2025-11-07T07:45:11Z

KeplerC
Nov 7, 2025
Maintainer

I see. A quick test on my machine works. Probably if you still want to see how old branch works, give me your environment with pip / conda list / docker. I will try to reproduce to see when I have time. I haven’t touched on the older commit for some time. Feel free to nudge me and you can also send me email privately to schedule a meeting for debugging if you find it helpful (after Dec 1). Migrating the benchmark script over is a good idea - feel free to PR because Im also interested to see how it goes with ray dataset.

…

On Nov 6, 2025, at 11:39 PM, Xinyu Zeng ***@***.***> wrote: Thanks for the quick response! It is the nyu_door_opening_surprising_effectiveness dataset. I will try to switch to the main branch, but it lacks the original benchmarking scripts (and perhaps also the lerobot loader). Maybe I can cherrypick the benchmarking scripts from mkv branch to the main branch. Take your time with the deadlines! — Reply to this email directly, view it on GitHub <#44 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAAQT22C35V7BNBVCF5JRT33REBRAVCNFSM6AAAAACJARA6PGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOBZHE3DAMQ>. You are receiving this because you were mentioned.

0 replies

XinyuZeng · 2025-11-08T09:06:44Z

XinyuZeng
Nov 8, 2025
Author

I made the benchmark work on the latest branch. Code at https://github.com/XinyuZeng/robodm/tree/add_bench_to_latest. It contains a uv.lock for environment sync. The numbers I got does not match the paper but I believe it is because the versions update in those formats and configs etc.

Aside from that,

Why in the experiment, batch_size=8 means loading 8 episodes? I though in VLA training, we should load 8 steps, not 8 full episodes.
When batch_size>1, I think it makes a difference if we a) use ds.take(batch_size) as we can use from RLDS and LeRobot, or b) read one data at a time sequentially in a for-loop. I saw the original benchmark code contains both the patterns and I think maybe we should make them consistent.

0 replies

KeplerC · 2025-11-08T10:04:19Z

KeplerC
Nov 8, 2025
Maintainer

Feel free to PR the benchmark script if you feel like it’s ready. I will try to reproduce your environment to see - likely it’s caused by the version of AV that the frame format doesn’t match the nyu dataset, and common symptom is that it fails silently. For loading different trajectories, the time of development, it was mainly inspired by Octo paper’s appendix section about data mixing (which was SOTA at the time), so the batch is a batch of trajectories from different trajectories / datasets, instead of the steps. For smaller scale DP training, indeed steps is sufficient. For the second question, just to make sure i understand your question and the context. My understanding for RLDS is that it has a prefetch buffer, so the continuous reading makes sense to evaluate as long as exhausting buffer quickly; LeRobot (at least my understanding to the time of benchmark, haven’t followed up recently) extracts single frame at a time. On the side, I agree with you that we should also bring the evaluation up to today’s data loading convention in policy training - we can work together on this to survey and figure out how to evaluate what training pipelines need. Do you think we can collaborate and do a systematic benchmark together here for the community?

…

On Sat, Nov 8, 2025 at 1:07 AM Xinyu Zeng ***@***.***> wrote: I made the benchmark work on the latest branch. Code at https://github.com/XinyuZeng/robodm/tree/add_bench_to_latest. It contains a uv.lock for environment sync. The numbers I got does not match the paper but I believe it is because the versions update in those formats and configs etc. Aside from that, 1. Why in the experiment, batch_size=8 means loading 8 episodes? I though in VLA training, we should load 8 steps, not 8 full episodes. 2. When batch_size>1, I think it makes a difference if we a) use ds.take(batch_size) as we can use from RLDS and LeRobot, or b) read one data at a time sequentially in a for-loop. I saw the original benchmark code contains both the patterns and I think maybe we should make them consistent. — Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAAQTZUH36RXYJC4IEISY333WXDRAVCNFSM6AAAAACJARA6PGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOJQHEZTCNI> . You are receiving this because you were mentioned.Message ID: ***@***.*** com>

1 reply

XinyuZeng Nov 8, 2025
Author

Sorry for the confusion ---- after using the lastest code there is no issues about writing .vla files.

Yes, I think an up-to-date benchmark/survey to better understand the shortcomings/painpoints of the current data loading frameworks for SoTA VLA training workflow is valuable, that is also what I am doing currently. Let me follow up an email with you about potential collaboration!

Uh oh!

Questions about experiment results in the paper #44

Uh oh!

XinyuZeng Oct 13, 2025

Replies: 8 comments · 4 replies

Uh oh!

Uh oh!

KeplerC Oct 13, 2025 Maintainer

Uh oh!

XinyuZeng Oct 14, 2025 Author

Uh oh!

XinyuZeng Nov 4, 2025 Author

Uh oh!

KeplerC Nov 5, 2025 Maintainer

Uh oh!

Uh oh!

XinyuZeng Nov 7, 2025 Author

Uh oh!

XinyuZeng Nov 7, 2025 Author

Uh oh!

KeplerC Nov 7, 2025 Maintainer

Uh oh!

Uh oh!

XinyuZeng Nov 7, 2025 Author

Uh oh!

KeplerC Nov 7, 2025 Maintainer

Uh oh!

XinyuZeng Nov 8, 2025 Author

Uh oh!

KeplerC Nov 8, 2025 Maintainer

Uh oh!

XinyuZeng Nov 8, 2025 Author

XinyuZeng
Oct 13, 2025

Replies: 8 comments 4 replies

KeplerC
Oct 13, 2025
Maintainer

XinyuZeng Oct 14, 2025
Author

XinyuZeng
Nov 4, 2025
Author

KeplerC
Nov 5, 2025
Maintainer

XinyuZeng
Nov 7, 2025
Author

XinyuZeng Nov 7, 2025
Author

KeplerC
Nov 7, 2025
Maintainer

XinyuZeng Nov 7, 2025
Author

KeplerC
Nov 7, 2025
Maintainer

XinyuZeng
Nov 8, 2025
Author

KeplerC
Nov 8, 2025
Maintainer

XinyuZeng Nov 8, 2025
Author