Skip to content

Unable to get above a 0% win rate with COMA on SMACLite #88

@rob-pitkin

Description

@rob-pitkin

Hi,

I've been trying to train COMA on SMACLite using similar parameters to what I found in the original COMA paper, but it seems like the algorithm consistently gets stuck in a local optima with a return_mean and test_return_mean of ~10 and a win rate (train and test) always at 0%.

Here's my coma.yaml:

# --- COMA specific parameters ---

action_selector: "soft_policies"
mask_before_softmax: True


runner: "parallel"

buffer_size: 100
batch_size_run: 30
batch_size: 30

# update the target network every {} training steps
target_update_interval_or_tau: 150

lr: 0.0005

obs_agent_id: True
obs_last_action: False
obs_individual_obs: False


# use COMA
agent_output_type: "pi_logits"
learner: "coma_learner"
critic_q_fn: "coma"
standardise_returns: False
standardise_rewards: True

hidden_dim: 128

use_rnn: True
critic_baseline_fn: "coma"
critic_train_mode: "seq"
critic_train_reps: 1
entropy_coef: 0.05
q_nstep: 5  # 0 corresponds to default Q, 1 is r + gamma*Q, etc
critic_type: coma_critic

name: "coma"
t_max: 20050000

The main differences from the default args that I made were:

  • bumping up the entropy to 0.05
  • changing q_nstep from 10 to 5
  • increasing the hidden_dim from 64 --> 128
  • increasing the lr from 0.0003 --> 0.0005
  • decreasing the target_update_interval_or_tau from 200 --> 150
  • changing batch_size and batch_size_run to 30 and increasing the buffer_size to 100.

I'm running with seed=1 and map 2s_vs_1sc, but this also happens on 2s3z. I'd appreciate any guidance on how to get COMA to a remotely working state (doesn't have to match paper results, I'm really just using this as a baseline).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions