-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Hi,
I found that you don't show the vqa implement code in repo, but I wanna know some moredetails in image ecoder. Here is the situation, MUMC model put image into a 12 layer encoder, image-> (B, token, D),then interact with text encdoer. I found that most of task in GPFM, employing various pathology foundation models as the image encoder and maybe just use cls token (B D) as output. So I want to know in vqa task, do you only use cls token into multimodel encoder, having cross attention with text encoder? Or you also use full encoder(B, token, D)
Thank u!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels