-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Hi, thank you for sharing your code and great work.
I am trying to reproduce the results reported in your paper (specifically Table [X]). I followed the instructions in the README.md and used the default configuration, but I am observing a performance gap.
Reported Result: 72.3
My Result: 70.0
Could you please advise if there are any specific hyperparameters or configurations that need to be adjusted to match the reported performance? Or, could you provide the exact command/script used for the show-o experiment (config.yaml)?
Thank you for your help.
wandb:
entity: null
resume: auto
run_id: 38iwlp78
experiment:
project: RecA
name: show-o-reca-clip
output_dir: output_512/datapath/reca_5000
max_train_examples_t2i: 20000000
max_train_examples_mmu: 40000000
save_every: 1000
eval_every: 5000
generate_every: 1000
log_every: 5
log_grad_norm_every: 500
resume_from_checkpoint: latest
logging_dir: output_512/datapath/reca_5000/logs
model:
vq_model:
type: magvitv2
vq_model_name: showlab/magvitv2
showo:
load_from_showo: true
pretrained_model_path: showlab/show-o-w-clip-vit-512x512
w_clip_vit: true
vocab_size: 58498
llm_vocab_size: 50295
llm_model_path: microsoft/phi-1_5
codebook_size: 8192
num_vq_tokens: 1024
num_new_special_tokens: 10
gradient_checkpointing: true
dataset:
gen_type: t2i
recon_type: mid
und_type: llava_tuning
combined_loader_mode: min_size
add_system_prompt: false
params:
train_t2i_shards_path_or_url:
- /root/journeydb/data/train/imgs/{000..009}.tgz
add_caption_prompt: true
external_caption_path: ''
validation_prompts_file: validation_prompts/text2image_prompts.txt
shuffle_buffer_size: 1000
num_workers: 0
resolution: 512
pin_memory: false
persistent_workers: false
wds_seed: 42
preprocessing:
max_seq_length: 576
resolution: 512
center_crop: false
random_flip: false
optimizer:
name: adamw
params:
learning_rate: 5.0e-07
scale_lr: false
beta1: 0.9
beta2: 0.999
weight_decay: 0.01
epsilon: 1.0e-08
lr_scheduler:
scheduler: cosine
params:
learning_rate: ${optimizer.params.learning_rate}
warmup_steps: 1000
training:
gradient_accumulation_steps: 5
noise_type: mask
batch_size_t2i: 1
batch_size_lm: -1
batch_size_mmu: 2
batch_size_recon: 2
mixed_precision: bf16
enable_tf32: true
seed: 10086
max_train_steps: 5000
overfit_one_batch: false
cond_dropout_prob: 0.1
min_masking_rate: 0.0
label_smoothing: 0.0
max_grad_norm: null
guidance_scale: 0.0
generation_timesteps: 12
t2i_coeff: 0.0
lm_coeff: 0.1
mmu_coeff: 1.0
recon_coeff: 1.0
config: configs/showo_reca_clip.yaml
GenEval matrix
Summary
Total images: 2212
Total prompts: 553
% correct images: 67.09%
% correct prompts: 81.37%
Task breakdown
single_object = 97.81% (313 / 320)
position = 30.00% (120 / 400)
counting = 64.38% (206 / 320)
color_attr = 50.75% (203 / 400)
colors = 77.39% (291 / 376)
two_object = 88.64% (351 / 396)
Overall score (avg. over tasks): 0.68161