Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Last update: Dec 25, 2022

Related tags

Overview

MARL Tricks

Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardized the hyperparameters of the SOTA MARL algorithms.

Python MARL framework

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:

Value-based Methods:

Actor Critic Methods:

PyMARL is written in PyTorch and uses SMAC as its environment.

Installation instructions

Install Python packages

# require Anaconda 3 or Miniconda 3
bash install_dependecies.sh

Set up StarCraft II and SMAC:

bash install_sc2.sh

This will download SC2 into the 3rdparty folder and copy the maps necessary to run over.

Run an experiment

# For SMAC
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor

# For Cooperative Predator-Prey
python3 src/main.py --config=qmix_prey --env-config=stag_hunt with env_args.map_name=stag_hunt

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

Run parallel experiments:

# bash run.sh config_name map_name_list (threads_num arg_list gpu_list experinments_num)
bash run.sh qmix corridor 2 epsilon_anneal_time=500000 0,1 5

xxx_list is separated by ,.

All results will be stored in the Results folder and named with map_name.

Force all processes to exit

# all python and game processes of current user will quit.
bash clean.sh

Some test results on Super Hard scenarios

Cite

@article{hu2021riit,
      title={RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning}, 
      author={Jian Hu and Haibin Wu and Seth Austin Harding and Siyang Jiang and Shih-wei Liao},
      year={2021},
      eprint={2102.03479},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Comments

动作探索选择的问题
我比较了一下 pymarl 和 pymarl2 的代码

发现在 pymarl 的 basic_controller.py 中的这个动作探索选择 https://github.com/oxwhirl/pymarl/blob/c971afdceb34635d31b778021b0ef90d7af51e86/src/controllers/basic_controller.py#L40-L48

if not test_mode: # Epsilon floor epsilon_action_num = agent_outs.size(-1) if getattr(self.args, "mask_before_softmax", True): # With probability epsilon, we will pick an available action uniformly epsilon_action_num = reshaped_avail_actions.sum(dim=1, keepdim=True).float() agent_outs = ((1 - self.action_selector.epsilon) * agent_outs + th.ones_like(agent_outs) * self.action_selector.epsilon/epsilon_action_num)

被移动到了 action_selectors.py 中 https://github.com/hijkzzz/pymarl2/blob/d0aaf583605b2b012a1fd080eb6880a00954ed28/src/components/action_selectors.py#L94-L97

而且计算方式貌似在是否mask上有所不同，请问一下为什么要这样改动哇
opened by liushunyu 6

[Help]pysc2.lib.remote_controller.ConnectError: Failed to connect to the SC2 websocket. Is it up?

When I run command

python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor

I encounter this error.

Traceback (most recent call last):
  File "src/main.py", line 109, in <module>
    ex.run_commandline(params)
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 318, in run_commandline
    options=args,
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "src/main.py", line 35, in my_main
    run_REGISTRY[_config['run']](_run, config, _log)
  File "/root/pymarl2/src/run/run.py", line 54, in run
    run_sequential(args=args, logger=logger)
  File "/root/pymarl2/src/run/run.py", line 177, in run_sequential
    episode_batch = runner.run(test_mode=False)
  File "/root/pymarl2/src/runners/parallel_runner.py", line 89, in run
    self.reset()
  File "/root/pymarl2/src/runners/parallel_runner.py", line 78, in reset
    data = parent_conn.recv()
  File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

Here(https://github.com/deepmind/pysc2/issues/281) says I have to open the Starcraft2 game as well instead of just open the battle net, but I don't know how to open them.

Could you give any advices?

opened by jsrimr 5

RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

你好，代码运行以下测试的时候会报错: main.py --config=coma --env-config=one_step_matrix_game with save_model=True use_tensorboard=True save_model_interval=1000 t_max=50000 runner='episode' batch_size_run=1 use_cuda=False

报错信息： Traceback (most recent calls WITHOUT Sacred internals): File "D:/sby/RL/pymarl2/main.py", line 35, in my_main run_REGISTRY[_config['run']](_run, config, _log) File "D:\sby\RL\pymarl2\run\run.py", line 56, in run run_sequential(args=args, logger=logger) File "D:\sby\RL\pymarl2\run\run.py", line 181, in run_sequential episode_batch = runner.run(test_mode=False) File "D:\sby\RL\pymarl2\runners\episode_runner.py", line 70, in run actions = self.mac.select_actions(self.batch, t_ep=self.t, t_env=self.t_env, test_mode=test_mode) File "D:\sby\RL\pymarl2\controllers\basic_controller.py", line 23, in select_actions t_env, test_mode=test_mode) File "D:\sby\RL\pymarl2\components\action_selectors.py", line 105, in select_action picked_actions = Categorical(masked_policies).sample().long() File "D:\Anaconda3\lib\site-packages\torch\distributions\categorical.py", line 107, in sample samples_2d = torch.multinomial(probs_2d, sample_shape.numel(), True).T RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

我找了下问题，应该是rnn_agent.py中，x = F.relu(self.fc1(inputs.view(-1, e)), inplace=True)，多次迭代后，梯度累积，导致梯度爆炸，从而输出存在nan. 你看能否解决一下，谢谢。我的环境是win10, pytorch1.x

opened by gingkg 5
Question about GRF

Hi, Awesome work! You extended GRF into the pymarl famework. However when I run it with vdn_gfootball.yaml, there is a lot of debugging information. Could you please help me to fix it?

Detail: absl Dump "episode_done": count limit reached / disabled

opened by BITminicc 3

VMIX算法报NAN

Traceback (most recent call last):                                                                                                                  [665/3388]
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline                                    
    return self.run(                                                                                                                                          
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run                                                
    run()                                                                                                                                                     
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__                                                  
    self.result = self.main_function(*args)                                                                                                                   
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function                     
    result = wrapped(*args, **kwargs)                                                                                                                         
  File "src/main.py", line 38, in my_main                                                                                                                     
    run_REGISTRY[_config['run']](_run, config, _log)                                                                                                          
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 54, in run                                                                 
    run_sequential(args=args, logger=logger)                                                                                                                  
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 195, in run_sequential                                                     
    learner.train(episode_sample, runner.t_env, episode)                                                                                                      
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 58, in train                      
    advantages, td_error, targets_taken, log_pi_taken, entropy = self._calculate_advs(batch, rewards, terminated, actions, avail_actions,                     
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 115, in _calculate_advs                                
    entropy = categorical_entropy(pi).reshape(-1)  #[bs, t, n_agents, 1]                                                                                      
  File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/components/action_selectors.py", line 110, in categorical_entropy                            
    return Categorical(probs=probs).entropy()                                                                                                                 
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/categorical.py", line 64, in __init__                              
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)                                                  
  File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(                                                                                                                                         
ValueError: Expected parameter probs (Tensor of shape (8, 54, 10, 18)) of distribution Categorical(probs: torch.Size([8, 54, 10, 18])) to satisfy the constrai
nt Simplex(), but found invalid values:

后面一截是数据没有贴上来，问题就是里面有nan

opened by Leo-xh 2

TensorBoard logger not working

Hi, thanks for the good work! I installed the dependencies as instructed and successfully started training. However, it seams that the tensorboard logs are not written to /result directory although I set the use_tensorboard param to "true" in src/config/default.yaml. Could you please help me with this?

opened by kjyeung 2
Set trained model as opponent?

Hello,

Based on the state-of-the-art algorithm, the reality is that the winning rate is close to 1 in many maps. Is the author interested in further expanding the function of pymarl2 to realize the battle between two models obtained by different algorithms? I think this can break through the upper limit of the difficulty of SC2's built-in computer, so as to keep SMAC alive forever.

opened by linshi9658 1
About the Linux environment

Hello,

I would like to do more test with pymarl2. However, it seems that SC2.4.10 can not work on CentOS Linux 7.9.2009 due to glibc_2.17 and I got the following:

/StarCraftII/Versions/Base75689/SC2_x64: /usr/lib64/libc.so.6: version 'GLIBC_2.18' not found (required by/StarCraftII/Libs/libstdc++.so.6)

Could you please provide your operating environment info? Thanks!

opened by linshi9658 1
策略迭代问题

你好，我在你们的文章中看到S=EPI。 where S is the total number of samples, E is the number of samples in each episode, P is the number of rollout processes, and I is the number of policy iterations. 这里的policy iterations是指的target_update_interval还是多少轮train一次

opened by wubmu 1

AttributeError: 'TracebackException' object has no attribute 'exc_traceback' and RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

when i run python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor,the result is an error.

Traceback (most recent call last):
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline
    return self.run(
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "src/main.py", line 38, in my_main
    run_REGISTRY[_config['run']](_run, config, _log)
  File "/home/jindingquan/pymarl2-master/src/run/run.py", line 54, in run
    run_sequential(args=args, logger=logger)
  File "/home/jindingquan/pymarl2-master/src/run/run.py", line 114, in run_sequential
    buffer = ReplayBuffer(scheme, groups, args.buffer_size, env_info["episode_limit"] + 1,
  File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 209, in __init__
    super(ReplayBuffer, self).__init__(scheme, groups, buffer_size, max_seq_length, preprocess=preprocess, device=device)
  File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 28, in __init__
    self._setup_data(self.scheme, self.groups, batch_size, max_seq_length, self.preprocess)
  File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 75, in _setup_data
    self.data.transition_data[field_key] = th.zeros((batch_size, max_seq_length, *shape), dtype=dtype, device=self.device)
RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/main.py", line 112, in <module>
    ex.run_commandline(params)
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline
    print_filtered_stacktrace()
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace
    print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace
    return "".join(filtered_traceback_format(tb_exception))
  File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format
    current_tb = tb_exception.exc_traceback
AttributeError: 'TracebackException' object has no attribute 'exc_traceback'

how to fix it.Please help!!!!!!!!

opened by brownsugar123 1

Problem when modify maps

Hi,

I am doing some personal researchs, and I used one of your maps (1o_10b_vs_1r.SC2Map) due to the terrain design tha allow me some tasks. I have modified the maps regarding the type and number of agents but when I tried to change some terrain features the code gives me the following error

The only modifications I made are elevate some of the terrain so I do not change the size of the map.

Do you know the reason of this error?

Thanks!

opened by AlRodA92 1
Add NDQ algorithm

The source from NDQ's paper is too old and doesn't work with new pytorch. I modified the source, now it can easily work with new pytorch and is convinient to compare with other methods. By the way， I added a requirement file describing the versions of packages of my environment. 添加了NDQ 算法，使得代码可以运行在新的torch版本上，方便与其他方法进行比较。并且添加了自己所使用的环境版本。

opened by Sud0x67 0

Owner

GitHub https://arxiv.org/abs/2102.03479

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

2.2k Jan 5, 2023

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Related tags

Overview

MARL Tricks

Python MARL framework

Installation instructions

Run an experiment

Run parallel experiments:

Force all processes to exit

Some test results on Super Hard scenarios

Cite

Comments

Owner

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

This is the official implementation of Multi-Agent PPO.

A general-purpose multi-agent training framework.

An open source robotics benchmark for meta- and multi-task reinforcement learning

A customisable 3D platform for agent-based AI research

Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

A toolkit for developing and comparing reinforcement learning algorithms.

Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

A toolkit for reproducible reinforcement learning research.

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Tensorforce: a TensorFlow library for applied reinforcement learning

TensorFlow Reinforcement Learning

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Deep Reinforcement Learning for Keras.

ChainerRL is a deep reinforcement learning library built on top of Chainer.

Open world survival environment for reinforcement learning