This repository uses Pytorch Lightning to implement the training and models, Hydra to define the configurations, and Wandb to visualize the training.
The experiments are defined as YAML files in the configs/experiments folder. For more detailed information on the
structure of the config files, and how to create them, read configs/README.md. Here we show the basic information.
To run an experiment, call the run.py file with the name of the experiment:
python run.py +experiments=experiment_name # without the .yaml extensionIf the experiment name is inside a subfolder of the experiments directory, simply add it:
python run.py +experiments=folder_experiment/experiment_nameA specific command to run an actual experiment in this project would be:
python run.py +experiments=train/train_finegym/finegym_bbox_triplet_asymwhich trains our model with bounding box representations, for the short sequences case in FineGym. All the other training configuration files are under the same directory.
To debug, we have to add a debug configuration from the configs/debug folder, like debug/debug.yaml. It is
recommended to add it after the experiment:
python run.py +experiments=experiment_name +debug=debug For small changes that do not require creating a new experiment YAML file, we can just add the parameters in the same command:
python run.py +experiments=experiment_name \
wandb.name=new_name \
dataset.dataloader_params.batch_size=8 \
++trainer.max_epochs=100
+trainer.new_param_trainer=0.001 \
~trainer.profilerwhere ~ removes a parameter from the configuration, + adds a non-existing parameter, and no prefix or ++ changes
an existing parameter.
There are different options under resume:
load_all. Resuming same training. In this case we want to load weights, training state, wandb run, and have same config for dataset and everything else. We explicitly make sure all the config is the same, including dataset, dataloader, etc. Theidhas to be defined.load_state. Pre-train from a previous checkpoint, load both training state and model.load_model. Pre-train from a previous checkpoint, only model.
The priority is from top to bottom. So if load_all is true and load_state is false, load_all prevails.
If any of the first three is set, either the id of the experiment we are loading from, or the path of the checkpoint
we are loading from have to be set.
For load_state and load_model we do not check configuration. If parameters like lr are set, they will be
overwritten, but other configurations like a different optimizer or different network size will break because the load
will not work.
The option of resuming training with different parameters (like learning rate), but under the same wandb run and model folder is not supported, because it is confusing and bad for reproducibility. wandb logs in the filesystem (not on the web application) each run separately even when they have the same id (which is good and clear), but still not enough for this feature to be supported.
The checkpoints and wandb logs are stored in the wandb.save_dir directory (under {wandb.project} and wandb
folders, respectively). The checkpoints are stored with the experiment id (e.g. 1234abcd), and the logs under the run
ID, which has a format like run-{date}_{time}-{id}. The run name is not necessary, it is loaded from the experiment
id.
The logs are also stored online, and can be accessed in https://wandb.ai (you will need to create an account). The
experiment ID can be found by accessing a specific run, going to overview ("info" sign top left), and check the "run
path". Change wandb config in configs/wandb/wandb.yaml
We use the FineGym (downloadable from this link), Diving48 ( in this link), and FisV (this link) datasets.
The process to obtain the keypoints that the model uses is described next.
There are three steps to obtain keypoints from data:
- Extract keypoints from either videos or images using OpenPose. The code in
extract_keypoints_images.pyandextract_keypoints_videos.pyunderdata/data_utilsdoes that. We used OpenPose in a docker installation. - For videos that may contain multiple shots, extract divisions between shots in using
shot_detection.py. - Post-process keypoints to group them into trajectories (they are initially extracted per-frame). This is done automatically during the dataset creation when running experiments.
The configuration of the cuda environment is in requirements.yml. To create an environment with the same packages,
run:
conda env create --file requirements.ymlThe code is divided in different files and folders (all python):
run.py. Main file to be executed. Loads the configuration, creates model, trainer, and dataloader, and runs them.losses.py. File with loss functions and other evaluation functions.distances.py. File with distance functions.data. Dataset and dataloader code. Relies on a LightningDataModule, defined inmain_data_module.py, that manages the dataset. The datasets are defined underdata/datasets, and all inherit from theBaseDatasetdefined inbase_dataset.py. There is also adata_utilsfolder with general dataset utils.models. Under this folder we define the python modules (nn.Module), undernetworks, as well as the trainer, which is implemented using LightningModule. The lightning modules encapsulate all the training procedure, as well as the model definition.trajectory_dict.pyis an auxiliary file that defines the state of all input- and latent-space trajectories.utils. General utils for the project.
Most of the files and methods are described in the code. For more specific comments about how they work and what they do, go directly to the files.