Unable to get above a 0% win rate with COMA on SMACLite

Hi,

I've been trying to train COMA on SMACLite using similar parameters to what I found in the original [COMA paper](https://arxiv.org/abs/1705.08926), but it seems like the algorithm consistently gets stuck in a local optima with a `return_mean` and `test_return_mean` of ~10 and a win rate (train and test) always at 0%.

Here's my `coma.yaml`:

```
# --- COMA specific parameters ---

action_selector: "soft_policies"
mask_before_softmax: True


runner: "parallel"

buffer_size: 100
batch_size_run: 30
batch_size: 30

# update the target network every {} training steps
target_update_interval_or_tau: 150

lr: 0.0005

obs_agent_id: True
obs_last_action: False
obs_individual_obs: False


# use COMA
agent_output_type: "pi_logits"
learner: "coma_learner"
critic_q_fn: "coma"
standardise_returns: False
standardise_rewards: True

hidden_dim: 128

use_rnn: True
critic_baseline_fn: "coma"
critic_train_mode: "seq"
critic_train_reps: 1
entropy_coef: 0.05
q_nstep: 5  # 0 corresponds to default Q, 1 is r + gamma*Q, etc
critic_type: coma_critic

name: "coma"
t_max: 20050000
```

The main differences from the default args that I made were:
- bumping up the entropy to 0.05
- changing `q_nstep` from 10 to 5
- increasing the `hidden_dim` from 64 --> 128
- increasing the `lr` from 0.0003 --> 0.0005
- decreasing the `target_update_interval_or_tau` from 200 --> 150
- changing `batch_size` and `batch_size_run` to 30 and increasing the `buffer_size` to 100.

I'm running with `seed=1` and map `2s_vs_1sc`, but this also happens on `2s3z`. I'd appreciate any guidance on how to get COMA to a remotely working state (doesn't have to match paper results, I'm really just using this as a baseline).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to get above a 0% win rate with COMA on SMACLite #88

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to get above a 0% win rate with COMA on SMACLite #88

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions