Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Last update: Jan 6, 2023

Related tags

Overview

RIIT

Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standardize the hyperparameters of numerous QMIX variant algorithms that achieve SOTA.

Python MARL framework

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:

Value-based Methods:

Actor Critic Methods:

PyMARL is written in PyTorch and uses SMAC as its environment.

Installation instructions

Install Python packages

# require Anaconda 3 or Miniconda 3
bash install_dependecies.sh

Set up StarCraft II and SMAC:

bash install_sc2.sh

This will download SC2 into the 3rdparty folder and copy the maps necessary to run over.

Run an experiment

# For SMAC
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor

# For Cooperative Predator-Prey
python3 src/main.py --config=qmix_prey --env-config=stag_hunt with env_args.map_name=stag_hunt

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

Run parallel experiments:

# bash run.sh config_name map_name_list (threads_num arg_list gpu_list experinments_num)
bash run.sh qmix corridor 2 epsilon_anneal_time=500000 0,1 5

xxx_list is separated by ,.

All results will be stored in the Results folder and named with map_name.

Force all trainning processes to exit

# all python and game processes of current user will quit.
bash clean.sh

Some test results on Super Hard scenarios

Cite

@article{hu2021riit,
      title={RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning}, 
      author={Jian Hu and Siyang Jiang and Seth Austin Harding and Haibin Wu and Shih-wei Liao},
      year={2021},
      eprint={2102.03479},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Comments

动作探索选择的问题
我比较了一下 pymarl 和 pymarl2 的代码

发现在 pymarl 的 basic_controller.py 中的这个动作探索选择 https://github.com/oxwhirl/pymarl/blob/c971afdceb34635d31b778021b0ef90d7af51e86/src/controllers/basic_controller.py#L40-L48

if not test_mode: # Epsilon floor epsilon_action_num = agent_outs.size(-1) if getattr(self.args, "mask_before_softmax", True): # With probability epsilon, we will pick an available action uniformly epsilon_action_num = reshaped_avail_actions.sum(dim=1, keepdim=True).float() agent_outs = ((1 - self.action_selector.epsilon) * agent_outs + th.ones_like(agent_outs) * self.action_selector.epsilon/epsilon_action_num)

被移动到了 action_selectors.py 中 https://github.com/hijkzzz/pymarl2/blob/d0aaf583605b2b012a1fd080eb6880a00954ed28/src/components/action_selectors.py#L94-L97

而且计算方式貌似在是否mask上有所不同，请问一下为什么要这样改动哇
opened by liushunyu 6

[Help]pysc2.lib.remote_controller.ConnectError: Failed to connect to the SC2 websocket. Is it up?

When I run command

python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor

I encounter this error.

Traceback (most recent call last):
  File "src/main.py", line 109, in <module>
    ex.run_commandline(params)
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 318, in run_commandline
    options=args,
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "src/main.py", line 35, in my_main
    run_REGISTRY[_config['run']](_run, config, _log)
  File "/root/pymarl2/src/run/run.py", line 54, in run
    run_sequential(args=args, logger=logger)
  File "/root/pymarl2/src/run/run.py", line 177, in run_sequential
    episode_batch = runner.run(test_mode=False)
  File "/root/pymarl2/src/runners/parallel_runner.py", line 89, in run
    self.reset()
  File "/root/pymarl2/src/runners/parallel_runner.py", line 78, in reset
    data = parent_conn.recv()
  File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

Here(https://github.com/deepmind/pysc2/issues/281) says I have to open the Starcraft2 game as well instead of just open the battle net, but I don't know how to open them.

Could you give any advices?

opened by jsrimr 5

RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

你好，代码运行以下测试的时候会报错: main.py --config=coma --env-config=one_step_matrix_game with save_model=True use_tensorboard=True save_model_interval=1000 t_max=50000 runner='episode' batch_size_run=1 use_cuda=False

报错信息： Traceback (most recent calls WITHOUT Sacred internals): File "D:/sby/RL/pymarl2/main.py", line 35, in my_main run_REGISTRY[_config['run']](_run, config, _log) File "D:\sby\RL\pymarl2\run\run.py", line 56, in run run_sequential(args=args, logger=logger) File "D:\sby\RL\pymarl2\run\run.py", line 181, in run_sequential episode_batch = runner.run(test_mode=False) File "D:\sby\RL\pymarl2\runners\episode_runner.py", line 70, in run actions = self.mac.select_actions(self.batch, t_ep=self.t, t_env=self.t_env, test_mode=test_mode) File "D:\sby\RL\pymarl2\controllers\basic_controller.py", line 23, in select_actions t_env, test_mode=test_mode) File "D:\sby\RL\pymarl2\components\action_selectors.py", line 105, in select_action picked_actions = Categorical(masked_policies).sample().long() File "D:\Anaconda3\lib\site-packages\torch\distributions\categorical.py", line 107, in sample samples_2d = torch.multinomial(probs_2d, sample_shape.numel(), True).T RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

我找了下问题，应该是rnn_agent.py中，x = F.relu(self.fc1(inputs.view(-1, e)), inplace=True)，多次迭代后，梯度累积，导致梯度爆炸，从而输出存在nan. 你看能否解决一下，谢谢。我的环境是win10, pytorch1.x

opened by gingkg 5
Question about GRF

Hi, Awesome work! You extended GRF into the pymarl famework. However when I run it with vdn_gfootball.yaml, there is a lot of debugging information. Could you please help me to fix it?

Detail: absl Dump "episode_done": count limit reached / disabled

opened by BITminicc 3

VMIX算法报NAN

Traceback (most recent call last):                                                                                                                  [665/3388]
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline                                    
    return self.run(                                                                                                                                          
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run                                                
    run()                                                                                                                                                     
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__                                                  
    self.result = self.main_function(*args)                                                                                                                   
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function                     
    result = wrapped(*args, **kwargs)                                                                                                                         
  File "src/main.py", line 38, in my_main                                                                                                                     
    run_REGISTRY[_config['run']](_run, config, _log)                                                                                                          
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 54, in run                                                                 
    run_sequential(args=args, logger=logger)                                                                                                                  
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 195, in run_sequential                                                     
    learner.train(episode_sample, runner.t_env, episode)                                                                                                      
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 58, in train                      
    advantages, td_error, targets_taken, log_pi_taken, entropy = self._calculate_advs(batch, rewards, terminated, actions, avail_actions,                     
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 115, in _calculate_advs                                
    entropy = categorical_entropy(pi).reshape(-1)  #[bs, t, n_agents, 1]                                                                                      
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/components/action_selectors.py", line 110, in categorical_entropy                            
    return Categorical(probs=probs).entropy()                                                                                                                 
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/categorical.py", line 64, in __init__                              
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)                                                  
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(                                                                                                                                         
ValueError: Expected parameter probs (Tensor of shape (8, 54, 10, 18)) of distribution Categorical(probs: torch.Size([8, 54, 10, 18])) to satisfy the constrai
nt Simplex(), but found invalid values:

后面一截是数据没有贴上来，问题就是里面有nan

opened by Leo-xh 2

TensorBoard logger not working

Hi, thanks for the good work! I installed the dependencies as instructed and successfully started training. However, it seams that the tensorboard logs are not written to /result directory although I set the use_tensorboard param to "true" in src/config/default.yaml. Could you please help me with this?

opened by kjyeung 2
Set trained model as opponent?

Hello,

Based on the state-of-the-art algorithm, the reality is that the winning rate is close to 1 in many maps. Is the author interested in further expanding the function of pymarl2 to realize the battle between two models obtained by different algorithms? I think this can break through the upper limit of the difficulty of SC2's built-in computer, so as to keep SMAC alive forever.

opened by linshi9658 1
About the Linux environment

Hello,

I would like to do more test with pymarl2. However, it seems that SC2.4.10 can not work on CentOS Linux 7.9.2009 due to glibc_2.17 and I got the following:

/StarCraftII/Versions/Base75689/SC2_x64: /usr/lib64/libc.so.6: version 'GLIBC_2.18' not found (required by/StarCraftII/Libs/libstdc++.so.6)

Could you please provide your operating environment info? Thanks!

opened by linshi9658 1
策略迭代问题

你好，我在你们的文章中看到S=EPI。 where S is the total number of samples, E is the number of samples in each episode, P is the number of rollout processes, and I is the number of policy iterations. 这里的policy iterations是指的target_update_interval还是多少轮train一次

opened by wubmu 1

AttributeError: 'TracebackException' object has no attribute 'exc_traceback' and RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

when i run python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor,the result is an error.

Traceback (most recent call last):
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline
    return self.run(
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "src/main.py", line 38, in my_main
    run_REGISTRY[_config['run']](_run, config, _log)
  File "/home/jindingquan/pymarl2-master/src/run/run.py", line 54, in run
    run_sequential(args=args, logger=logger)
  File "/home/jindingquan/pymarl2-master/src/run/run.py", line 114, in run_sequential
    buffer = ReplayBuffer(scheme, groups, args.buffer_size, env_info["episode_limit"] + 1,
  File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 209, in __init__
    super(ReplayBuffer, self).__init__(scheme, groups, buffer_size, max_seq_length, preprocess=preprocess, device=device)
  File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 28, in __init__
    self._setup_data(self.scheme, self.groups, batch_size, max_seq_length, self.preprocess)
  File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 75, in _setup_data
    self.data.transition_data[field_key] = th.zeros((batch_size, max_seq_length, *shape), dtype=dtype, device=self.device)
RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/main.py", line 112, in <module>
    ex.run_commandline(params)
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline
    print_filtered_stacktrace()
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace
    print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace
    return "".join(filtered_traceback_format(tb_exception))
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format
    current_tb = tb_exception.exc_traceback
AttributeError: 'TracebackException' object has no attribute 'exc_traceback'

how to fix it.Please help!!!!!!!!

opened by brownsugar123 1

Problem when modify maps

Hi,

I am doing some personal researchs, and I used one of your maps (1o_10b_vs_1r.SC2Map) due to the terrain design tha allow me some tasks. I have modified the maps regarding the type and number of agents but when I tried to change some terrain features the code gives me the following error

The only modifications I made are elevate some of the terrain so I do not change the size of the map.

Do you know the reason of this error?

Thanks!

opened by AlRodA92 1
Add NDQ algorithm

The source from NDQ's paper is too old and doesn't work with new pytorch. I modified the source, now it can easily work with new pytorch and is convinient to compare with other methods. By the way， I added a requirement file describing the versions of packages of my environment. 添加了NDQ 算法，使得代码可以运行在新的torch版本上，方便与其他方法进行比较。并且添加了自己所使用的环境版本。

opened by Sud0x67 0

Owner

GitHub https://arxiv.org/abs/2102.03479

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

1.4k Dec 22, 2022

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

96 Dec 22, 2022

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

297 Dec 12, 2022

A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

348 Jan 8, 2023

A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

463 Dec 23, 2022

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

183 Dec 28, 2022

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

334 Jan 6, 2023

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

83 Jan 6, 2023

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

10 Oct 10, 2022

Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

7 Sep 20, 2022

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

14 Sep 16, 2022

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Despite its importance, there are few previous works applying I2I translation to webtoon. I collected dataset from naver webtoon 연애혁명 and tried to transfer human faces to webtoon domain.

64 Oct 19, 2022

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this

239 Jan 2, 2023