- "Once enabled, the speed of your data will be randomly changed when preprocessing. The ratio of the speed change will be emebedded into the networks, which allows you to control the frame-level speed or velocity (similar to but much more flexible than the VEL parameter in VOCALOID) at inference time. In other words, by applying global time stretching at training time, you gain the ability to apply local time stretching at inference time. This can be used to adjust the texture of consonants and the ratio of different parts of vowels. **Some audio segments will be longer after this augmentation is applied. Please be careful of your batch size and your GPU memory usage.**\n",
0 commit comments