Skip to content

image encoder in VQA task #15

@Tuner12

Description

@Tuner12

Hi,
I found that you don't show the vqa implement code in repo, but I wanna know some moredetails in image ecoder. Here is the situation, MUMC model put image into a 12 layer encoder, image-> (B, token, D),then interact with text encdoer. I found that most of task in GPFM, employing various pathology foundation models as the image encoder and maybe just use cls token (B D) as output. So I want to know in vqa task, do you only use cls token into multimodel encoder, having cross attention with text encoder? Or you also use full encoder(B, token, D)

Thank u!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions