Skip to content

v2.5.1: Important bug fix and slight feature enhancement

Latest

Choose a tag to compare

@yqzhishen yqzhishen released this 08 Jan 09:03
· 13 commits to main since this release

Bug fixes

We are sorry that some features of multi-dictionary support has never worked correctly since its release. The previous preprocessing code had a bug in collecting cross-lingual phonemes, and it unexpectedly marked almost all phonemes as "not merged", making all language IDs to be only zeros. Thus, the language embedding, which was designed to distinguish phonemes from different languages in each merged group, was not training at all. What made it worse is that the code inside ONNX model treated language IDs correctly, but what it actually embeded into the model are some vectors that had never updated since their random initialization. We cannot investigate what negative impact the long-existing bug had brought to the model, but luckily the model "seemed" working well.

This bug has now been fixed and the new models with consistent training and inference showed no problem in internal tests, with some (unconfirmed) improvements in cross-lingual pitch prediction.

Other small bug fixes:

  • pitch_r2 metric was not working in 2.5.0
  • RoPE cache issue about find_unused_parameters in DDP training (#244)
  • Some variable unexpectedly got float64 dtype with NumPy 2.x

Other changes and improvements

  • The binarizers now accept FLAC format (WAV is still preferred)
  • Default vocoder package in generated dsconfig.yaml is switched to pc_nsf_hifigan_44.1k_hop512_128bin_2025.02

See full change log: v2.5.0...v2.5.1