Conversation
LucasLLC
left a comment
There was a problem hiding this comment.
This is awesome! Impressed to see so much progress in a short time span.
Some recommended next steps:
-
Let's profile the current implementation and see what kind of speedup we're getting on batch put vs. none-batch put. Could be helpful to add some fine-grained logging (e.g. check out latency_tracker)
-
Next solid step would be to unpack the state dict within the storage volume
-
Once this is done, we can take a look at what it would take to "fetch in batch" as well
| MODEL_LINER_LENGTH = 10 | ||
|
|
||
|
|
||
| def _setup_process_group(): |
There was a problem hiding this comment.
I wonder if it's worth putting this function in a helper in tests/utils since it's used in multiple places?
https://github.com/meta-pytorch/torchstore/blob/main/tests/utils.py#L105
|
@daniellepintz can we land this as an experimental feature, even if E2E it's not yet complete? Otherwise let's close this PR. Thanks! |
|
Yes, created a new PR in #113 without merge conflicts |
Concatenate tensors into one blob of bytes for sending across transport (RDMA, Gloo, etc.) instead of sending one by one. In theory this should be faster than sending one by one due to overhead from transport buffers.