Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Related tags

Deep Learning UPDeT
Overview

UPDeT

Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight)

The framework is inherited from PyMARL. UPDeT is written in pytorch and uses SMAC as its environment.

Installation instructions

Installing dependencies:

pip install -r requirements.txt

Download SC2 into the 3rdparty/ folder and copy the maps necessary to run over.

bash install_sc2.sh

Run an experiment

Before training your own transformer-based multi-agent model, there are a list of things to note.

  • Currently, this repository supports marine-based battle scenarios. e.g. 3m, 8m, 5m_vs_6m.
  • If you are interested in training a different unit type, carefully modify the Transformer Parameters block at src/config/default.yaml and revise the _build_input_transformer function in basic_controller.python.
  • Before running the experiment, check the agent type in Agent Parameters block at src/config/default.yaml.
  • This repository contains two new transformer-based agents from the UPDeT paper including
    • Standard UPDeT
    • Aggregation Transformer

Training script

python3 src/main.py --config=vdn --env-config=sc2 with env_args.map_name=5m_vs_6m

All results will be stored in the Results/ folder.

Performance

Single battle scenario

Surpass the GRU baseline on hard 5m_vs_6m with:

Multiple battle scenarios

Zero-shot generalize to different tasks:

  • Result on 7m-5m-3m transfer learning.

Note: Only UPDeT can be deployed to other scenarios without changing the model's architecture.

More details please refer to UPDeT paper.

Bibtex

@article{hu2021updet,
  title={UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers},
  author={Hu, Siyi and Zhu, Fengda and Chang, Xiaojun and Liang, Xiaodan},
  journal={arXiv preprint arXiv:2101.08001},
  year={2021}
}

License

The MIT License

Comments
  • 代码调试问题

    代码调试问题

    我尝试debug您的代码。 1.VSCode配置(无法配置README.md 里面要求的参数:with env_args.map_name=5m_vs_6m):

    {  
        "version": "0.2.0",  
        "configurations": [  
            {  
                "name": "Python: Current File",   
                "type": "python",  
                "request": "launch",   
                "program": "${file}",  
                "console": "integratedTerminal",  
                "args": ["--config=qmix", "--env-config=sc2"]  
            }  
        ]  
    }  
    

    2.使用CPU运行 3.发现basic_controller.py的方法_build_inputs_transformer有数组尺寸错误问题:

    arranged_obs.size()  
    torch.Size([1, 3, 30])  
    
    

    显然,arranged_obs摊平的尺寸为90。 下面这行代码试图把它的尺寸修改为(-1,11,5) reshaped_obs = arranged_obs.view(-1, 1 + (self.args.enemy_num - 1) + self.args.ally_num, self.args.token_dim) 这显然是不行的,麻烦您抽空帮忙解答。截图如下 Screenshot from 2021-09-17 11-27-20

    opened by ouyangshixiong 7
  • How to reproduce the transfer learning

    How to reproduce the transfer learning

    hello, I'm interested in your work. I want to reproduce the transfer learning result. As you mentioned, it can be deployed to other scenarios without changing the model's architecture. And there is a figure given. I want to reproduce it.

    # --- Defaults ---
    
    # --- pymarl options ---
    runner: "episode" # Runs 1 env for an episode
    mac: "basic_mac" # Basic controller
    env: "sc2" # Environment name
    env_args: {} # Arguments for the environment
    batch_size_run: 1 # Number of environments to run in parallel
    test_nepisode: 20 # Number of episodes to test for
    test_interval: 2000 # Test after {} timesteps have passed
    test_greedy: True # Use greedy evaluation (if False, will set epsilon floor to 0
    log_interval: 2000 # Log summary of stats after every {} timesteps
    runner_log_interval: 2000 # Log runner stats (not test stats) every {} timesteps
    learner_log_interval: 2000 # Log training stats every {} timesteps
    t_max: 10000 # Stop running after this many timesteps
    use_cuda: True # Use gpu by default unless it isn't available
    buffer_cpu_only: True # If true we won't keep all of the replay buffer in vram
    
    # --- Logging options ---
    use_tensorboard: True # Log results to tensorboard
    save_model: True # Save the models to disk
    save_model_interval: 2000000 # Save models after this many timesteps
    checkpoint_path: "" # Load a checkpoint from this path
    evaluate: False # Evaluate model for test_nepisode episodes and quit (no training)
    load_step: 0 # Load model trained on this many timesteps (0 if choose max possible)
    save_replay: False # Saving the replay of the model loaded from checkpoint_path
    local_results_path: "results" # Path for local results
    
    # --- RL hyperparameters ---
    gamma: 0.99
    batch_size: 32 # Number of episodes to train on
    buffer_size: 32 # Size of the replay buffer
    lr: 0.0005 # Learning rate for agents
    critic_lr: 0.0005 # Learning rate for critics
    optim_alpha: 0.99 # RMSProp alpha
    optim_eps: 0.00001 # RMSProp epsilon
    grad_norm_clip: 10 # Reduce magnitude of gradients above this L2 norm
    
    # --- Agent parameters. Should be set manually. ---
    agent: "updet" # Options [updet, transformer_aggregation, rnn]
    rnn_hidden_dim: 64 # Size of hidden state for default rnn agent
    obs_agent_id: False # Include the agent's one_hot id in the observation
    obs_last_action: False # Include the agent's last action (one_hot) in the observation
    
    # --- Transformer parameters. Should be set manually. ---
    token_dim: 5 # Marines. For other unit type (e.g. Zeolot) this number can be different (6).
    emb: 32 # embedding dimension of transformer
    heads: 3 # head number of transformer
    depth: 2 # block number of transformer
    ally_num: 8 # number of ally (5m_vs_6m)
    enemy_num: 8 # number of enemy (5m_vs_6m)
    
    # --- Experiment running params ---
    repeat_id: 1
    label: "default_label"
    

    This is the config I used to train 8m and I change ally_num and enemy_num to 5. Should I change checkpoint_path? Is the figure you given showing the win rate in training process? How can I get the same one?

    opened by hellofinch 4
  • In which part do you implement policy decoupling

    In which part do you implement policy decoupling

    Hello, I am very interested in your work! I have learned the code, especially the class "TransformerAggregationAgent". But I have not found where you implement the policy decoupling. The only thing I find is q_agg = torch.mean(outputs, 1) q = self.q_linear(q_agg)

    I am confused that you calculatte the mean along the action dimension and then map the result back to the actions. Can you please explain the motivation of this part. Really look forward to your reply.

    Thanks!

    opened by donutQQ 3
  • question about 7m-5m-3m transfer learning

    question about 7m-5m-3m transfer learning

    for 7m-5m-3m transfer learning,when I use the updet model combined with qmix,there is a dimension mismatch problem. Only the updet model combined with VDN support transfer learning?

    opened by ohoneyd 3
  • TypeError: expected str, bytes or os.PathLike object, not NoneType

    TypeError: expected str, bytes or os.PathLike object, not NoneType

    Traceback (most recent call last): File "main.py", line 19, in ex = Experiment("pymarl") File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/experiment.py", line 75, in init _caller_globals=caller_globals) File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/ingredient.py", line 57, in init gather_sources_and_dependencies(_caller_globals) File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/dependencies.py", line 487, in gather_sources_and_dependencies sources = gather_sources(globs, experiment_path) File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/dependencies.py", line 440, in get_sources_from_imported_modules return get_sources_from_modules(iterate_imported_modules(globs), base_path) File "/home/username/anaconda3/lib/python3.7/site-packages/sacred/dependencies.py", line 409, in get_sources_from_modules filename = os.path.abspath(mod.file) File "/home/username/anaconda3/lib/python3.7/posixpath.py", line 371, in abspath path = os.fspath(path) TypeError: expected str, bytes or os.PathLike object, not NoneType

    Perhaps there's a problem about the path variable, how to solve it? Thanks

    opened by lxqpku 2
  • Unsatisfactory experimental results

    Unsatisfactory experimental results

    Hi, i wanna know why I run 5m_vs_6m win rate is only about 20%-40% in qmix, the same happens on 'rnn' and 'updet', I also use sc2 version 4.10, run 2M steps, torch=1.4.1, the code did not make any changes. Is there anyone else like this? How do I reproduce these results, if you can answer too thanks!

    opened by 775269512 1
  • Shouldn't pygame's version be 2.0.0?

    Shouldn't pygame's version be 2.0.0?

    When I install below pip install -r requirements.text

    smac 1.0.0 has requirement pygame>=2.0.0, but you'll have pygame 1.9.4 which is incompatible. is poped up

    opened by DH-O 1
Owner
hhhusiyi
hhhusiyi
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

Nugroho Dewantoro 9 Jun 6, 2022
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

Shengyu Zhao 373 Jan 2, 2023
Code for "MetaMorph: Learning Universal Controllers with Transformers", Gupta et al, ICLR 2022

MetaMorph: Learning Universal Controllers with Transformers This is the code for the paper MetaMorph: Learning Universal Controllers with Transformers

Agrim Gupta 50 Jan 3, 2023
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022
A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Multi-Agent-Deep-Deterministic-Policy-Gradients A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm This

Phil Tabor 159 Dec 28, 2022
Code for "The Intrinsic Dimension of Images and Its Impact on Learning" - ICLR 2021 Spotlight

dimensions Estimating the instrinsic dimensionality of image datasets Code for: The Intrinsic Dimensionaity of Images and Its Impact On Learning - Phi

Phil Pope 41 Dec 10, 2022
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

null 405 Jan 6, 2023
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

About Code release for Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR 2022 Spotlight)

THUML @ Tsinghua University 221 Dec 31, 2022
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Wonyong Jeong 15 Nov 21, 2022
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 8, 2023
A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

InstaDeep Ltd 463 Dec 23, 2022
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 6, 2023
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

Future Power Networks 83 Jan 6, 2023
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022