Difficulty reproducing results on Show-o

Hi, thank you for sharing your code and great work.

I am trying to reproduce the results reported in your paper (specifically Table [X]). I followed the instructions in the README.md and used the default configuration, but I am observing a performance gap.

Reported Result: 72.3

My Result: 70.0

Could you please advise if there are any specific hyperparameters or configurations that need to be adjusted to match the reported performance? Or, could you provide the exact command/script used for the show-o experiment (config.yaml)?

Thank you for your help.

wandb:
  entity: null
  resume: auto
  run_id: 38iwlp78
experiment:
  project: RecA
  name: show-o-reca-clip
  output_dir: output_512/datapath/reca_5000
  max_train_examples_t2i: 20000000
  max_train_examples_mmu: 40000000
  save_every: 1000
  eval_every: 5000
  generate_every: 1000
  log_every: 5
  log_grad_norm_every: 500
  resume_from_checkpoint: latest
  logging_dir: output_512/datapath/reca_5000/logs
model:
  vq_model:
    type: magvitv2
    vq_model_name: showlab/magvitv2
  showo:
    load_from_showo: true
    pretrained_model_path: showlab/show-o-w-clip-vit-512x512
    w_clip_vit: true
    vocab_size: 58498
    llm_vocab_size: 50295
    llm_model_path: microsoft/phi-1_5
    codebook_size: 8192
    num_vq_tokens: 1024
    num_new_special_tokens: 10
  gradient_checkpointing: true
dataset:
  gen_type: t2i
  recon_type: mid
  und_type: llava_tuning
  combined_loader_mode: min_size
  add_system_prompt: false
  params:
    train_t2i_shards_path_or_url:
    - /root/journeydb/data/train/imgs/{000..009}.tgz
    add_caption_prompt: true
    external_caption_path: ''
    validation_prompts_file: validation_prompts/text2image_prompts.txt
    shuffle_buffer_size: 1000
    num_workers: 0
    resolution: 512
    pin_memory: false
    persistent_workers: false
    wds_seed: 42
  preprocessing:
    max_seq_length: 576
    resolution: 512
    center_crop: false
    random_flip: false
optimizer:
  name: adamw
  params:
    learning_rate: 5.0e-07
    scale_lr: false
    beta1: 0.9
    beta2: 0.999
    weight_decay: 0.01
    epsilon: 1.0e-08
lr_scheduler:
  scheduler: cosine
  params:
    learning_rate: ${optimizer.params.learning_rate}
    warmup_steps: 1000
training:
  gradient_accumulation_steps: 5
  noise_type: mask
  batch_size_t2i: 1
  batch_size_lm: -1
  batch_size_mmu: 2
  batch_size_recon: 2
  mixed_precision: bf16
  enable_tf32: true
  seed: 10086
  max_train_steps: 5000
  overfit_one_batch: false
  cond_dropout_prob: 0.1
  min_masking_rate: 0.0
  label_smoothing: 0.0
  max_grad_norm: null
  guidance_scale: 0.0
  generation_timesteps: 12
  t2i_coeff: 0.0
  lm_coeff: 0.1
  mmu_coeff: 1.0
  recon_coeff: 1.0
config: configs/showo_reca_clip.yaml


GenEval matrix
Summary
Total images: 2212
Total prompts: 553
% correct images: 67.09%
% correct prompts: 81.37%

Task breakdown
single_object    = 97.81% (313 / 320)
position         = 30.00% (120 / 400)
counting         = 64.38% (206 / 320)
color_attr       = 50.75% (203 / 400)
colors           = 77.39% (291 / 376)
two_object       = 88.64% (351 / 396)

Overall score (avg. over tasks): 0.68161

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficulty reproducing results on Show-o #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Difficulty reproducing results on Show-o #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions