Hi @zhengkw18,
In the paper, you compared the rCM with DMD loss, I am wondering what if you train the model with sCM loss only? Do I need to turn the hyperparameters a lot if I want to use sCM loss only?
Btw, would you mind sharing your loss curves/grad norms over training time?
Thanks in advance!
Meng