This repository contains the implementation of a gesture classification pipeline based on segment-based frame sampling and pretrained ResNeXt-101 feature extraction, followed by training lightweight MLP classifiers.
The work was done as part of a Computer Vision assignment at Leiden University.
For a detailed explanation of the methodology, experiments, and results, please see the report.pdf.
- Task: Hand gesture classification using the Jester dataset.
- Approach:
- Videos are divided into temporal segments.
- A single frame is sampled from each segment.
- Features are extracted using a ResNeXt-101 CNN pretrained on ImageNet.
- Features are fed into a Multi-Layer Perceptron (MLP) for classification.
- Key Findings:
- Smaller MLPs perform better than larger ones.
- Equidistant (first frame) sampling outperforms random uniform sampling.
- Reducing segments from 8 β 4 halves training time with minimal accuracy loss.
Six models were trained, varying in MLP architecture, frame sampling, and number of segments.
| Model | Params (MLP) | Frame Selection | Segments | Test Accuracy | Test Loss |
|---|---|---|---|---|---|
| 1 | 75.5M | Random (uniform) | 8 | 0.499 | 1.708 |
| 2 | 257.3M | Random (uniform) | 8 | 0.456 | 1.783 |
| 3 | 8.4M | Random (uniform) | 8 | 0.531 | 1.742 |
| 4 | 8.4M | First frame | 8 | 0.546 | 1.796 |
| 5 | 4.2M | First frame | 4 | 0.537 | 1.714 |
| 6 | 2.1M | First frame | 2 | 0.433 | 1.995 |
Plots of training time, accuracy, and loss can be found in the plots directory.
helpers_scripts/split_train_val.pyβ Script to split dataset into training and validation setstest.ipynbβ Notebook for evaluating trained models
logs/train_model1.logβ Training log for Model 1train_model2.logβ Training log for Model 2train_model3.logβ Training log for Model 3train_model4.logβ Training log for Model 4train_model5.logβ Training log for Model 5train_model6.logβ Training log for Model 6
metrics/model_1/accuracies.pklβ Accuracy values across traininglosses.pklβ Loss values across training
model_2/β¦model_6/β Same structure asmodel_1/test_acc.pklβ Final test accuraciestest_loss.pklβ Final test lossestime_taken.pklβ Training times (approx., extracted from logs)
models_training_code/model1.pyβ Training script for Model 1model2.py...model6.pyβ Training scripts for Models 2-6model_1.ipynbβ Notebook version of Model 1 trainingmodel_2.ipynbβ¦model_6.ipynbβ Notebooks for Models 2β6
plots/Plots.ipynbβ Notebook to generate result plotsaccuracies_plot.pngβ Accuracy curveslosses_plot.pngβ Loss curvestime_bar.pngβ Training time comparison
splits/jester-v1-labels.csvβ Gesture class labelsjester-v1-train.csvβ Original training split (v1)jester-v1-validation.csvβ Original validation split (v1)train.csvβ Training setval.csvβ Validation settest.csvβ Test set
report.pdfβ Full project reportREADME.mdβ Project documentation (this file)
- Python 3.8+
- PyTorch
- torchvision
- pandas, numpy, matplotlib
- scikit-learn
Install dependencies:
pip install -r requirements.txt