-
Notifications
You must be signed in to change notification settings - Fork 4
Description
- I'd like to know how can you get the proposed training objective when the derivation (Appendix A.1, Equation 9) is wrong.
-
You mentioned that
the t_embedding is the embedding from input timestepat Consults in details of training LGD model in Grasp-anything++ database. #4, but the implementation of theLGDin the released code does not perform encoding operation for the timesteps, how can the model condition its features on noise level? As far as I know, the timesteps are usually embeded in the model, like the implementations in the repos ofiddpmandguided-diffusion. Did I misundestand? -
In your paper, the illustration of the
LGDimplementation indicates that the text–image representation$z^{*}_{vl} \in \mathbb{R}^{d_{vl}}$ is fed into an MLP. However, I could not find the corresponding implementation in the released code, where$z^{*}_{vl}$ does not appear to be used.
I would be most grateful for any clarification you could provide on these matters.
Thank you in advance for your time and consideration.