Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning
This is the code for implementing the MADDPG algorithm presented in the paper: Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning. It is configured to be run in conjunction with environments from the (https://github.com/qian18long/epciclr2020/tree/master/mpe_local). We show our gif results here (https://sites.google.com/view/epciclr2020/). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.
Installation
- Install tensorflow 1.13.1
pip install tensorflow==1.13.1
- Install OpenAI gym
pip install gym==0.13.0
- Install other dependencies
pip install joblib imageio
Case study: Multi-Agent Particle Environments
We demonstrate here how the code can be used in conjunction with the(https://github.com/qian18long/epciclr2020/tree/master/mpe_local). It is based on(https://github.com/openai/multiagent-particle-envs)
Quick start
- See
train_grassland_epc.sh
,train_adversarial_epc.sh
andtrain_food_collect_epc.sh
for the EPC algorithm for scenariograssland
,adversarial
andfood_collect
in the example setting presented in our paper.
Command-line options
Environment options
-
--scenario
: defines which environment in the MPE is to be used (default:"grassland"
) -
--map-size
: The size of the environment. 1 if normal and 2 otherwise. (default:"normal"
) -
--sight
: The agent's visibility radius. (default:100
) -
--alpha
: Reward shared weight. (default:0.0
) -
--max-episode-len
maximum length of each episode for the environment (default:25
) -
--num-episodes
total number of training episodes (default:200000
) -
--num-good
: number of good agents in the scenario (default:2
) -
--num-adversaries
: number of adversaries in the environment (default:2
) -
--num-food
: number of food(resources) in the scenario (default:4
) -
--good-policy
: algorithm used for the 'good' (non adversary) policies in the environment (default:"maddpg"
; options: {"att-maddpg"
,"maddpg"
,"PC"
,"mean-field"
}) -
--adv-policy
: algorithm used for the adversary policies in the environment (default:"maddpg"
; options: {"att-maddpg"
,"maddpg"
,"PC"
,"mean-field"
})
Core training parameters
-
--lr
: learning rate (default:1e-2
) -
--gamma
: discount factor (default:0.95
) -
--batch-size
: batch size (default:1024
) -
--num-units
: number of units in the MLP (default:64
) -
--good-num-units
: number of units in the MLP of good agents, if not providing it will be num-units. -
--adv-num-units
: number of units in the MLP of adversarial agents, if not providing it will be num-units. -
--n_cpu_per_agent
: cpu usage per agent (default:1
) -
--good-share-weights
: good agents share weights of the agents encoder within the model. -
--adv-share-weights
: adversarial agents share weights of the agents encoder within the model. -
--use-gpu
: Use GPU for training (default:False
) -
--n-envs
: number of environments instances in parallelization
Checkpointing
-
--save-dir
: directory where intermediate training results and model will be saved (default:"/test/"
) -
--save-rate
: model is saved every time this number of episodes has been completed (default:1000
) -
--load-dir
: directory where training state and model are loaded from (default:"test"
)
Evaluation
-
--restore
: restores previous training state stored inload-dir
(or insave-dir
if noload-dir
has been provided), and continues training (default:False
) -
--display
: displays to the screen the trained policy stored inload-dir
(or insave-dir
if noload-dir
has been provided), but does not continue training (default:False
) -
--save-gif-data
: Save the gif examples to the save-dir (default:False
) -
--render-gif
: Render the gif in the load-dir (default:False
)
EPC options
-
--initial-population
: initial population size in the first stage -
--num-selection
: size of the population selected for reproduction -
--num-stages
: number of stages -
--stage-num-episodes
: number of training episodes in each stage -
--stage-n-envs
: number of environments instances in parallelization in each stage -
--test-num-episodes
: number of episodes for the competing
Example scripts
.maddpg_o/experiments/train_normal.py
: apply the train_helpers.py for MADDPG, Att-MADDPG and mean-field training
-
.maddpg_o/experiments/train_x2.py
: apply a single step doubling training -
.maddpg_o/experiments/train_mix_match.py
: mix match of the good agents in--sheep-init-load-dirs
and adversarial agents in '--wolf-init-load-dirs' for model agents evaluation. -
.maddpg_o/experiments/train_epc.py
: train the scheduled EPC algorithm. -
.maddpg_o/experiments/compete.py
: evaluate different models by competition
Paper citation
@inproceedings{epciclr2020,
author = {Qian Long and Zihan Zhou and Abhinav Gupta and Fei Fang and Yi Wu and Xiaolong Wang},
title = {Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning},
booktitle = {International Conference on Learning Representations},
year = {2020}
}