-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hi, thank you for such a nice work! I wanted to ask if by any chance you have a torchscript version of the feature extractor available? It will just make it much easier to incorporate it into existing pipelines (i would only need to replace I3D with V-JEPA and frechet distance with MMD). I tried to quickly prepare a torchscript version myself, but stumbled upon some issues:
- I noticed that the positional embeddings for e.g.
vith16do not load correctly if i use the imported V-JEPA version from yourvjepapackage since positional embeddings are ignored. From the shape sizes, seems like you only keep the first-frame PEs? Is it the intended behaviour?
RuntimeError: Error(s) in loading state_dict for VisionTransformer:
size mismatch for pos_embed: copying a param with shape torch.Size([1, 1568, 1280]) from checkpoint, the shape in current model is torch.Size([1, 196, 1280]).
size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([1280, 3, 2, 16, 16]) from checkpoint, the shape in current model is torch.Size([1280, 3, 16, 16]).
- I couldn't find the source repo for the vjepa package — is it public (from my understanding, the submitted pypi package is different from the jepa repo)?
- Judging by Fig 6, the vanilla pretrained version of V-JEPA works better, right?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels