Skip to content

Feat: Frame-level Extraction and PyTorch API Updates#41

Open
TioSisai wants to merge 2 commits intofschmid56:mainfrom
TioSisai:main
Open

Feat: Frame-level Extraction and PyTorch API Updates#41
TioSisai wants to merge 2 commits intofschmid56:mainfrom
TioSisai:main

Conversation

@TioSisai
Copy link

This pull request introduces two main sets of changes: a new feature for frame-level embedding extraction and several updates to ensure compatibility with modern PyTorch versions by replacing deprecated APIs.


New Features:

Frame-level Feature Extraction:

  • Added a frame: bool parameter to the forward methods in both MobileNet (MN) and DyMN models.
  • When frame=True, the model preserves the temporal dimension during the final pooling stage, allowing for the extraction of frame-wise embeddings.
  • This enables more fine-grained temporal analysis, while maintaining backward compatibility with the default clip-level feature extraction.

Fixes & Maintenance:

PyTorch API Modernization:

  • Replaced the deprecated ConvNormActivation with the current Conv2dNormActivation.
  • Updated torch.stft to use return_complex=True and calculated the power magnitude with torch.square(torch.abs(x)) to align with modern complex tensor handling.
  • Replaced torch.cuda.amp.autocast with the more general torch.amp.autocast.

TioSisai added 2 commits July 15, 2025 16:14
- Replace closely deprecated ConvNormActivation with Conv2dNormActivation
- Update torch.stft to use return_complex=True for complex tensor handling and torch.square(torch.abs(x)) for power magnitude computation from complex-valued spectrogram
- Replace torch.cuda.amp.autocast with torch.amp.autocast for better device compatibility

These changes ensure compatibility with newer PyTorch versions while maintaining
backward compatibility and fixing deprecation warnings.
…odels

- Add 'frame' parameter to forward methods in MN and DyMN classes
- Modify _clf_forward and _forward_impl methods to support frame-level feature extraction
- Update adaptive pooling logic to preserve temporal dimension when frame=True
- Maintain backward compatibility with existing clip-level feature extraction
- Enable frame-wise embeddings output alongside classification results

This enhancement allows models to extract features at frame level (preserving temporal dimension)
in addition to the existing clip-level aggregation, enabling more fine-grained temporal analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant