image encoder in VQA task

Hi,
I found that you don't show the vqa implement code in repo, but I wanna know some  moredetails in image ecoder. Here is the situation, MUMC model put image into a 12 layer encoder, image-> (B, token, D),then interact with text encdoer. I found that most of task in GPFM, employing various pathology foundation models as the image encoder and maybe just use cls token (B D) as output. So I want to know in vqa task, do you only use cls token into multimodel encoder, having cross attention with text encoder? Or you also use full encoder(B, token, D)

Thank u!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image encoder in VQA task #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

image encoder in VQA task #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions