Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Overview

RIIT

Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standardize the hyperparameters of numerous QMIX variant algorithms that achieve SOTA.

Python MARL framework

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:

Value-based Methods:

Actor Critic Methods:

PyMARL is written in PyTorch and uses SMAC as its environment.

Installation instructions

Install Python packages

# require Anaconda 3 or Miniconda 3
bash install_dependecies.sh

Set up StarCraft II and SMAC:

bash install_sc2.sh

This will download SC2 into the 3rdparty folder and copy the maps necessary to run over.

Run an experiment

# For SMAC
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor
# For Cooperative Predator-Prey
python3 src/main.py --config=qmix_prey --env-config=stag_hunt with env_args.map_name=stag_hunt

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

Run parallel experiments:

# bash run.sh config_name map_name_list (threads_num arg_list gpu_list experinments_num)
bash run.sh qmix corridor 2 epsilon_anneal_time=500000 0,1 5

xxx_list is separated by ,.

All results will be stored in the Results folder and named with map_name.

Force all trainning processes to exit

# all python and game processes of current user will quit.
bash clean.sh

Some test results on Super Hard scenarios

Cite

@article{hu2021riit,
      title={RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning}, 
      author={Jian Hu and Siyang Jiang and Seth Austin Harding and Haibin Wu and Shih-wei Liao},
      year={2021},
      eprint={2102.03479},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Comments
  • 动作探索选择的问题

    动作探索选择的问题

    我比较了一下 pymarl 和 pymarl2 的代码

    发现在 pymarl 的 basic_controller.py 中的这个动作探索选择 https://github.com/oxwhirl/pymarl/blob/c971afdceb34635d31b778021b0ef90d7af51e86/src/controllers/basic_controller.py#L40-L48

    if not test_mode:
        # Epsilon floor
        epsilon_action_num = agent_outs.size(-1)
        if getattr(self.args, "mask_before_softmax", True):
            # With probability epsilon, we will pick an available action uniformly
            epsilon_action_num = reshaped_avail_actions.sum(dim=1, keepdim=True).float()
    
        agent_outs = ((1 - self.action_selector.epsilon) * agent_outs
                       + th.ones_like(agent_outs) * self.action_selector.epsilon/epsilon_action_num)
    

    被移动到了 action_selectors.py 中 https://github.com/hijkzzz/pymarl2/blob/d0aaf583605b2b012a1fd080eb6880a00954ed28/src/components/action_selectors.py#L94-L97

    而且计算方式貌似在是否mask上有所不同,请问一下为什么要这样改动哇

    opened by liushunyu 6
  • [Help]pysc2.lib.remote_controller.ConnectError: Failed to connect to the SC2 websocket. Is it up?

    [Help]pysc2.lib.remote_controller.ConnectError: Failed to connect to the SC2 websocket. Is it up?

    When I run command

    python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor
    

    I encounter this error.

    Traceback (most recent call last):
      File "src/main.py", line 109, in <module>
        ex.run_commandline(params)
      File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 318, in run_commandline
        options=args,
      File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 276, in run
        run()
      File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/run.py", line 238, in __call__
        self.result = self.main_function(*args)
      File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/config/captured_function.py", line 42, in captured_function
        result = wrapped(*args, **kwargs)
      File "src/main.py", line 35, in my_main
        run_REGISTRY[_config['run']](_run, config, _log)
      File "/root/pymarl2/src/run/run.py", line 54, in run
        run_sequential(args=args, logger=logger)
      File "/root/pymarl2/src/run/run.py", line 177, in run_sequential
        episode_batch = runner.run(test_mode=False)
      File "/root/pymarl2/src/runners/parallel_runner.py", line 89, in run
        self.reset()
      File "/root/pymarl2/src/runners/parallel_runner.py", line 78, in reset
        data = parent_conn.recv()
      File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 250, in recv
        buf = self._recv_bytes()
      File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
        buf = self._recv(4)
      File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
        chunk = read(handle, remaining)
    KeyboardInterrupt
    

    Here(https://github.com/deepmind/pysc2/issues/281) says I have to open the Starcraft2 game as well instead of just open the battle net, but I don't know how to open them.

    Could you give any advices?

    opened by jsrimr 5
  • RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

    RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

    你好,代码运行以下测试的时候会报错: main.py --config=coma --env-config=one_step_matrix_game with save_model=True use_tensorboard=True save_model_interval=1000 t_max=50000 runner='episode' batch_size_run=1 use_cuda=False

    报错信息: Traceback (most recent calls WITHOUT Sacred internals): File "D:/sby/RL/pymarl2/main.py", line 35, in my_main run_REGISTRY[_config['run']](_run, config, _log) File "D:\sby\RL\pymarl2\run\run.py", line 56, in run run_sequential(args=args, logger=logger) File "D:\sby\RL\pymarl2\run\run.py", line 181, in run_sequential episode_batch = runner.run(test_mode=False) File "D:\sby\RL\pymarl2\runners\episode_runner.py", line 70, in run actions = self.mac.select_actions(self.batch, t_ep=self.t, t_env=self.t_env, test_mode=test_mode) File "D:\sby\RL\pymarl2\controllers\basic_controller.py", line 23, in select_actions t_env, test_mode=test_mode) File "D:\sby\RL\pymarl2\components\action_selectors.py", line 105, in select_action picked_actions = Categorical(masked_policies).sample().long() File "D:\Anaconda3\lib\site-packages\torch\distributions\categorical.py", line 107, in sample samples_2d = torch.multinomial(probs_2d, sample_shape.numel(), True).T RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

    我找了下问题,应该是rnn_agent.py中,x = F.relu(self.fc1(inputs.view(-1, e)), inplace=True), 多次迭代后,梯度累积,导致梯度爆炸,从而输出存在nan. 你看能否解决一下,谢谢。 我的环境是win10, pytorch1.x

    opened by gingkg 5
  • Question about GRF

    Question about GRF

    Hi, Awesome work! You extended GRF into the pymarl famework. However when I run it with vdn_gfootball.yaml, there is a lot of debugging information. Could you please help me to fix it?

    Detail: absl Dump "episode_done": count limit reached / disabled

    opened by BITminicc 3
  • VMIX算法报NAN

    VMIX算法报NAN

    Traceback (most recent call last):                                                                                                                  [665/3388]
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline                                    
        return self.run(                                                                                                                                          
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run                                                
        run()                                                                                                                                                     
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__                                                  
        self.result = self.main_function(*args)                                                                                                                   
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function                     
        result = wrapped(*args, **kwargs)                                                                                                                         
      File "src/main.py", line 38, in my_main                                                                                                                     
        run_REGISTRY[_config['run']](_run, config, _log)                                                                                                          
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 54, in run                                                                 
        run_sequential(args=args, logger=logger)                                                                                                                  
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 195, in run_sequential                                                     
        learner.train(episode_sample, runner.t_env, episode)                                                                                                      
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 58, in train                      
        advantages, td_error, targets_taken, log_pi_taken, entropy = self._calculate_advs(batch, rewards, terminated, actions, avail_actions,                     
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 115, in _calculate_advs                                
        entropy = categorical_entropy(pi).reshape(-1)  #[bs, t, n_agents, 1]                                                                                      
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/components/action_selectors.py", line 110, in categorical_entropy                            
        return Categorical(probs=probs).entropy()                                                                                                                 
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/categorical.py", line 64, in __init__                              
        super(Categorical, self).__init__(batch_shape, validate_args=validate_args)                                                  
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
        raise ValueError(                                                                                                                                         
    ValueError: Expected parameter probs (Tensor of shape (8, 54, 10, 18)) of distribution Categorical(probs: torch.Size([8, 54, 10, 18])) to satisfy the constrai
    nt Simplex(), but found invalid values:
    

    后面一截是数据没有贴上来,问题就是里面有nan

    opened by Leo-xh 2
  • TensorBoard logger not working

    TensorBoard logger not working

    Hi, thanks for the good work! I installed the dependencies as instructed and successfully started training. However, it seams that the tensorboard logs are not written to /result directory although I set the use_tensorboard param to "true" in src/config/default.yaml. Could you please help me with this?

    opened by kjyeung 2
  • Set trained model as opponent?

    Set trained model as opponent?

    Hello,

    Based on the state-of-the-art algorithm, the reality is that the winning rate is close to 1 in many maps. Is the author interested in further expanding the function of pymarl2 to realize the battle between two models obtained by different algorithms? I think this can break through the upper limit of the difficulty of SC2's built-in computer, so as to keep SMAC alive forever.

    opened by linshi9658 1
  • About the Linux environment

    About the Linux environment

    Hello,

    I would like to do more test with pymarl2. However, it seems that SC2.4.10 can not work on CentOS Linux 7.9.2009 due to glibc_2.17 and I got the following:

    /StarCraftII/Versions/Base75689/SC2_x64: /usr/lib64/libc.so.6: version 'GLIBC_2.18' not found (required by/StarCraftII/Libs/libstdc++.so.6)

    Could you please provide your operating environment info? Thanks!

    opened by linshi9658 1
  • 策略迭代问题

    策略迭代问题

    你好,我在你们的文章中看到S=EPI。 where S is the total number of samples, E is the number of samples in each episode, P is the number of rollout processes, and I is the number of policy iterations. 这里的policy iterations是指的target_update_interval还是多少轮train一次

    opened by wubmu 1
  • AttributeError: 'TracebackException' object has no attribute 'exc_traceback' and RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

    AttributeError: 'TracebackException' object has no attribute 'exc_traceback' and RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

    when i run python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor,the result is an error.

    Traceback (most recent call last):
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline
        return self.run(
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run
        run()
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__
        self.result = self.main_function(*args)
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function
        result = wrapped(*args, **kwargs)
      File "src/main.py", line 38, in my_main
        run_REGISTRY[_config['run']](_run, config, _log)
      File "/home/jindingquan/pymarl2-master/src/run/run.py", line 54, in run
        run_sequential(args=args, logger=logger)
      File "/home/jindingquan/pymarl2-master/src/run/run.py", line 114, in run_sequential
        buffer = ReplayBuffer(scheme, groups, args.buffer_size, env_info["episode_limit"] + 1,
      File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 209, in __init__
        super(ReplayBuffer, self).__init__(scheme, groups, buffer_size, max_seq_length, preprocess=preprocess, device=device)
      File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 28, in __init__
        self._setup_data(self.scheme, self.groups, batch_size, max_seq_length, self.preprocess)
      File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 75, in _setup_data
        self.data.transition_data[field_key] = th.zeros((batch_size, max_seq_length, *shape), dtype=dtype, device=self.device)
    RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "src/main.py", line 112, in <module>
        ex.run_commandline(params)
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline
        print_filtered_stacktrace()
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace
        print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace
        return "".join(filtered_traceback_format(tb_exception))
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format
        current_tb = tb_exception.exc_traceback
    AttributeError: 'TracebackException' object has no attribute 'exc_traceback'
    

    how to fix it.Please help!!!!!!!!

    opened by brownsugar123 1
  • Problem when modify maps

    Problem when modify maps

    Hi,

    I am doing some personal researchs, and I used one of your maps (1o_10b_vs_1r.SC2Map) due to the terrain design tha allow me some tasks. I have modified the maps regarding the type and number of agents but when I tried to change some terrain features the code gives me the following error

    Error

    The only modifications I made are elevate some of the terrain so I do not change the size of the map.

    Do you know the reason of this error?

    Thanks!

    opened by AlRodA92 1
  • Add NDQ algorithm

    Add NDQ algorithm

    The source from NDQ's paper is too old and doesn't work with new pytorch. I modified the source, now it can easily work with new pytorch and is convinient to compare with other methods. By the way, I added a requirement file describing the versions of packages of my environment. 添加了NDQ 算法,使得代码可以运行在新的torch版本上,方便与其他方法进行比较。 并且添加了自己所使用的环境版本。

    opened by Sud0x67 0
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 8, 2023
A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

InstaDeep Ltd 463 Dec 23, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 6, 2023
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

Future Power Networks 83 Jan 6, 2023
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Despite its importance, there are few previous works applying I2I translation to webtoon. I collected dataset from naver webtoon 연애혁명 and tried to transfer human faces to webtoon domain.

이상윤 64 Oct 19, 2022
Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this

Twitter Research 239 Jan 2, 2023
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

Yicheng Luo 4 Sep 13, 2022
I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive constraining

I-SECRET This is the implementation of the MICCAI 2021 Paper "I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive con

null 13 Dec 2, 2022
Differentiable Annealed Importance Sampling (DAIS)

Differentiable Annealed Importance Sampling (DAIS) This repository contains the code to reproduce the DAIS results from the paper Differentiable Annea

Guodong Zhang 6 Dec 26, 2021
A multi-entity Transformer for multi-agent spatiotemporal modeling.

baller2vec This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotempor

Michael A. Alcorn 56 Nov 15, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 2, 2023