LSF-SAC
Pytorch implementations of the paper Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients and several other multi-agent reinforcement learning algorithms, including IQL, QMIX, VDN, COMA, QTRAN(both QTRAN-base and QTRAN-alt), MAVEN, CommNet, DyMA-CL, and G2ANet, which are the state of the art MARL algorithms. The paper implementation and other algorithms' implementation is based on starry-sky6688's qmix impplementation.
Requirements
Acknowledgement
Quick Start
$ python main.py --map=3m
Directly run the main.py
, then the algorithm will start training on map 3m
. Note CommNet and G2ANet need an external training algorithm, so the name of them are like reinforce+commnet
or central_v+g2anet
, all the algorithms we provide are written in ./common/arguments.py
.
If you just want to use this project for demonstration, you should set --evaluate=True --load_model=True
.
The running of DyMA-CL is independent from others because it requires different environment settings, so we put it on another project. For more details, please read DyMA-CL documentation.
Result
We independently train these algorithms for 8 times and take the mean of the 8 independent results, and we evaluate them for 20 episodes every 100 training steps. All of the results are saved in ./result
. Results on other maps are still in training, we will update them later.
--difficulty=7(VeryHard)
1. Mean Win Rate of 8 Independent Runs with
Replay
Check the website for several replay examples here
If you want to see the replay from your own run, make sure the replay_dir
is an absolute path, which can be set in ./common/arguments.py
. Then the replays of each evaluation will be saved, you can find them in your path.
Citation
If you find this helpful to your research, please consider citing this paper as
@article{zhou2022value,
title={Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients},
author={Zhou, Hanhan and Lan, Tian and Aggarwal, Vaneet},
journal={arXiv preprint arXiv:2201.01247},
year={2022}
}