-
Notifications
You must be signed in to change notification settings - Fork 178
Open
Description
Hi,
I've been trying to train COMA on SMACLite using similar parameters to what I found in the original COMA paper, but it seems like the algorithm consistently gets stuck in a local optima with a return_mean and test_return_mean of ~10 and a win rate (train and test) always at 0%.
Here's my coma.yaml:
# --- COMA specific parameters ---
action_selector: "soft_policies"
mask_before_softmax: True
runner: "parallel"
buffer_size: 100
batch_size_run: 30
batch_size: 30
# update the target network every {} training steps
target_update_interval_or_tau: 150
lr: 0.0005
obs_agent_id: True
obs_last_action: False
obs_individual_obs: False
# use COMA
agent_output_type: "pi_logits"
learner: "coma_learner"
critic_q_fn: "coma"
standardise_returns: False
standardise_rewards: True
hidden_dim: 128
use_rnn: True
critic_baseline_fn: "coma"
critic_train_mode: "seq"
critic_train_reps: 1
entropy_coef: 0.05
q_nstep: 5 # 0 corresponds to default Q, 1 is r + gamma*Q, etc
critic_type: coma_critic
name: "coma"
t_max: 20050000
The main differences from the default args that I made were:
- bumping up the entropy to 0.05
- changing
q_nstepfrom 10 to 5 - increasing the
hidden_dimfrom 64 --> 128 - increasing the
lrfrom 0.0003 --> 0.0005 - decreasing the
target_update_interval_or_taufrom 200 --> 150 - changing
batch_sizeandbatch_size_runto 30 and increasing thebuffer_sizeto 100.
I'm running with seed=1 and map 2s_vs_1sc, but this also happens on 2s3z. I'd appreciate any guidance on how to get COMA to a remotely working state (doesn't have to match paper results, I'm really just using this as a baseline).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels