Skip to content

TorchScript availability #5

@universome

Description

@universome

Hi, thank you for such a nice work! I wanted to ask if by any chance you have a torchscript version of the feature extractor available? It will just make it much easier to incorporate it into existing pipelines (i would only need to replace I3D with V-JEPA and frechet distance with MMD). I tried to quickly prepare a torchscript version myself, but stumbled upon some issues:

  • I noticed that the positional embeddings for e.g. vith16 do not load correctly if i use the imported V-JEPA version from your vjepa package since positional embeddings are ignored. From the shape sizes, seems like you only keep the first-frame PEs? Is it the intended behaviour?
RuntimeError: Error(s) in loading state_dict for VisionTransformer:
        size mismatch for pos_embed: copying a param with shape torch.Size([1, 1568, 1280]) from checkpoint, the shape in current model is torch.Size([1, 196, 1280]).
        size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([1280, 3, 2, 16, 16]) from checkpoint, the shape in current model is torch.Size([1280, 3, 16, 16]).
  • I couldn't find the source repo for the vjepa package — is it public (from my understanding, the submitted pypi package is different from the jepa repo)?
  • Judging by Fig 6, the vanilla pretrained version of V-JEPA works better, right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions