-
Notifications
You must be signed in to change notification settings - Fork 234
Open
Description
I would like to kindly ask that the training parameters used for training of the semantic segmentation EfficientViT models be released.
The paper only specifies that the AdamW optimizer was used and that a cosine learning rate decay was used, and I am having some trouble finding the right parameters to use to train the model well.
For example, it would be nice to know:
- Batch size
- Number of training iterations
- Initial learning rate
- If the learning rate was set differently for the backbone vs the SegHead
- Settings for cosine lr scheduler. E.g., was a warmup used?
- If weight decay was used
- Augmentations used
- Etc.
It would also be good to know if any of the hyper-params differed between training on ADE20K and training on Cityscapes.
Thank you for sharing this amazing work, it is so cool that such an elegant method can produce such powerful results.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels