Skip to content

Remarks on guidance related lecture notes #12

@burichh

Description

@burichh

Remark 1

In page 37 the key idea of the Guided Generative Model is described.

Image

Although conceptually this description might be correct (just referring back to the formulation used in the Fokker-Planck equation

Image

but throughout the rest of the lecture notes $$u_t^\theta$$ was used to describe a vector field that drives the ODE process that contains no stochasticity (i.e. no diffusion term, thus $$\sigma=0$$). In other words, if I train a neural net with the $$\mathcal{L}_{CFM}^{guided}$$ guided conditional flow matching objective, and then if I use the formula above

$$\mathrm{d}X_t = u_t^\theta(X_t \vert y) \mathrm{d}t + \sigma_t \mathrm{d}W_t$$

then given a large $$\sigma_t$$ I'd get a result that accumulates a lot of noise as $$t \rightarrow 1$$, not really following the intended probability path $$p_t(\cdot | y)$$.

On page 40, I think the process is summarized more precisely:

Image

while just below, in the section "Guidance for Diffusion Models" the vector field $$u_t^\theta$$ and the score $$s_t^\theta$$ are indicated separately (which feels more descriptive and less ambiguous to me):

Image

So, what I feel that it could be a little bit confusing reading these formulas using the same $$u_t^\theta$$ notation (or in some case with the tilde $$\tilde{u}_t^\theta$$)

Image

My suggestion would be to follow the pedagocial structure that you built so far, introducing the idea of guidance with the example of flow matching and flow models, and only then display any kind of diffusion related terms (i.e. like $$\sigma_t$$ or the term "SDE") in the following score matching section. I think it might be even clearer if the Key Idea 5 (Guided Generative Model) box were removed, or it could discuss the ODE case only, leaving out $$\sigma_t$$.

I'm not sure if I could phrase my concerns accurately, let me know what you think!

Remark 2

Image

Isn't it the case that these pictures were generated with guidance, but it was not overscaled i.e. they were generated with $w=1$?

Remark 3 (typos)

Typo 1 (page 42, on the top):

Image

I think you wanted to write "CSM" there, standing for "Conditional Score Matching".

Typo 2 (page 48):

Image

Should be "encoded" instead of "encoder".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions