A general-purpose multi-agent training framework.

MARL @ SJTU

Last update: Jan 3, 2023

Related tags

Reinforcement Learning python games reinforcement-learning parallel distributed multiagent ray

Overview

MALib

A general-purpose multi-agent training framework.

Installation

step1: build environment

conda create -n malib python==3.7 -y
conda activate malib
pip install -e .

# for development
pip install -e .[dev]

step2: install openspiel

installation guides: openspiel

Quick Start

"""PSRO with PPO for Leduc Holdem"""

from malib.envs.poker import poker_aec_env as leduc_holdem
from malib.runner import run
from malib.rollout import rollout_func


env = leduc_holdem.env(fixed_player=True)

run(
    agent_mapping_func=lambda agent_id: agent_id,
    env_description={
        "creator": leduc_holdem.env,
        "config": {"fixed_player": True},
        "id": "leduc_holdem",
        "possible_agents": env.possible_agents,
    },
    training={
        "interface": {
            "type": "independent",
            "observation_spaces": env.observation_spaces,
            "action_spaces": env.action_spaces
        },
    },
    algorithms={
        "PSRO_PPO": {
            "name": "PPO",
            "custom_config": {
                "gamma": 1.0,
                "eps_min": 0,
                "eps_max": 1.0,
                "eps_decay": 100,
            },
        }
    },
    rollout={
        "type": "async",
        "stopper": "simple_rollout",
        "callback": rollout_func.sequential
    }
)

Comments

'List' from 'malib.utils.typing'

Hi I am trying to run a basic MARL setup using MAPPO.

Here's my yaml config file

name: "mappo_payload_carry"

training:
  interface:
    type: "centralized"
    population_size: -1
  config:
    # control the frequency of remote parameter update
    update_interval: 1
    saving_interval: 100
    batch_size: 32
    optimizer: "Adam"
    actor_lr: 5.e-4
    critic_lr: 5.e-4
    opti_eps: 1.e-5
    weight_decay: 0.0

rollout:
  type: "async"
  stopper: "simple_rollout"
  stopper_config:
    max_step: 10000
  metric_type: "simple"
  fragment_length: 100
  num_episodes: 4
  episode_seg: 1
  terminate: "any"
  num_env_per_worker: 1
  postprocessor_types:
    - copy_next_frame

env_description:
  #  scenario_name: "simple_spread"
  creator: "Gym"
  config:
    env_id: "urdf-env-v0"

algorithms:
  MAPPO:
    name: "MAPPO"
    model_config:
      initialization:
        use_orthogonal: True
        gain: 1.
      actor:
        network: mlp
        layers:
          - units: 256
            activation: ReLU
          - units: 128
            activation: ReLU
          - units: 64
            activation: ReLU
        output:
          activation: False
      critic:
        network: mlp
        layers:
          - units: 256
            activation: ReLU
          - units: 128
            activation: ReLU
          - units: 64
            activation: ReLU
        output:
          activation: False

    # set hyper parameter
    custom_config:
      gamma: 0.99
      use_cuda: False  # enable cuda or not
      use_q_head: False
      ppo_epoch: 4
      num_mini_batch: 1  # the number of mini-batches

      return_mode: gae
      gae:
        gae_lambda: 0.95
      vtrace:
        clip_rho_threshold: 1.0
        clip_pg_rho_threshold: 1.0


      use_rnn: False
      # this is not used, instead it is fixed to last hidden in actor/critic
      rnn_layer_num: 1
      rnn_data_chunk_length: 16

      use_feature_normalization: True
      use_popart: True
      popart_beta: 0.99999

      entropy_coef: 1.e-2



global_evaluator:
  name: "generic"

dataset_config:
  episode_capacity: 100
  fragment_length: 3001```

I have a custom environment where I created the env.

env = gym.make("urdf-env-v0", dt=0.01, robots=robots, render=render)
possible_agents = env.possible_agents
action_spaces = env.possible_actions
observation_spaces = env.observation_spaces
env_desc = {"creator": env, "possible_agents": possible_agents, "action_spaces": action_spaces, "observation_spaces": observation_spaces}
run(
    group=config["group"],
    name=config["name"],
    env_description=env_desc,
    agent_mapping_func=lambda agent: agent[
                                     :6
                                     ],  # e.g. "team_0_player_0" -> "team_0"
    training=training_config,
    algorithms=config["algorithms"],
    rollout=rollout_config,
    evaluation=config.get("evaluation", {}),
    global_evaluator=config["global_evaluator"],
    dataset_config=config.get("dataset_config", {}),
    parameter_server=config.get("parameter_server", {}),
    # worker_config=config["worker_config"],
    use_init_policy_pool=False,
    task_mode="marl",
)

I tried to see if malib.utils.typing had the List, Dict types, but it looks like they're non existent there, how do I fix this?

opened by josyulakrishna 5

3s5z in PyMARL can quickly reached to win_rate of 1 and where is the config for SMAC of malib?
Thanks for this nice repo. I'm interested in MARL for smac. I have some problem about this repo.

In the page 18 of your arxiv paper, you mentioned that "For the scenario 3s5z, however, both of MALib and PyMARL cannot reach 80% win rate." However, I run the PyMARL original code with his default config python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=3s5z, the win rate quickly reached 1 for these two smac versions (4.10 and 4.6.2).

I would like to use this repo to run smac, but I can't find the corresponding config in examples, will this section be opend source? Thank you.
opened by yifan123 3

[Question] How to debug `malib` (infinity loop when running `ray` in local mode)

Hi, I'm trying to run the PSRO algorithm with the quick start example on https://malib.io (PSRO PPO with leduc_holdem.env). I get the following errors:

2022-03-27 23:04:33,013    ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::RolloutWorker.simulation() (pid=185570, ip=127.0.0.1, repr=<malib.rollout.rollout_worker.RolloutWorker object at 0x7f1c52c37490>)
  File "/home/panxuehai/Projects/malib/malib/rollout/base_worker.py", line 441, in simulation
    raw_statistics, num_frames = self.sample(
  File "/home/panxuehai/Projects/malib/malib/rollout/rollout_worker.py", line 161, in sample
    for ret in rets:
  File "/home/panxuehai/Miniconda3/envs/malib/lib/python3.8/site-packages/ray/util/actor_pool.py", line 65, in map
    yield self.get_next()
  File "/home/panxuehai/Miniconda3/envs/malib/lib/python3.8/site-packages/ray/util/actor_pool.py", line 178, in get_next
    return ray.get(future)
ray.exceptions.RayTaskError(TypeError): ray::Stepping.run() (pid=185523, ip=127.0.0.1, repr=<malib.rollout.rollout_func.Stepping object at 0x7f8814a20dc0>)
  File "/home/panxuehai/Projects/malib/malib/rollout/rollout_func.py", line 431, in run
    rollout_results = env_runner(
  File "/home/panxuehai/Projects/malib/malib/rollout/rollout_func.py", line 243, in env_runner
    rets = env.reset(
  File "/home/panxuehai/Projects/malib/malib/envs/vector_env.py", line 192, in reset
    _ret = env.reset(max_step=max_step, custom_reset_config=custom_reset_config)
TypeError: reset() got an unexpected keyword argument 'max_step'

Then I'd debug this myself with ray.init(local_mode=True) in my debugger. All Ray actors will run sequentially when the "local mode" is on.

https://docs.ray.io/en/latest/ray-core/starting-ray.html#local-mode:

Local Mode

By default, Ray will parallelize its workload and run tasks on multiple processes and multiple nodes. However, if you need to debug your Ray program, it may be easier to do everything on a single process. You can force all Ray functions to occur on a single process by enabling local mode

The program is stuck into an infinity loop here:

https://github.com/sjtu-marl/malib/blob/5be07ac00761a34fb095adb2b3018a798ceea256/malib/agent/agent_interface.py#L277-L279

I wonder what's the best practice for debugging with malib? Thanks very much!

opened by XuehaiPan 2

How to run in cluster or cross multiply machines?
Dear MALib support,

My question as following:

Can MALib run in cluster or multiply machines? How to set the config?

When running in single machines, how to set the config about agent number?

Thank you.
opened by xuehui1991 2
For SMAC Qimx/MADDPG config

It's really a nice work and according to your paper, running Qmix/MADDPG is really fast. But we didn't find a config about Qmix/MADDPG algorithm, We don't know how to run your Qmix/MADDPG program，so can you give us a config for the Qmix/MADDPG algorithm?

opened by Weiyuhong-1998 2
A question for quick start

Thanks for the nice work. I want to ask for some help about the quick start. First, could you please give some instruction of running single rl in gym-based env, such as cartpole? I tried to run the file-run_gym.py and load yaml, but it doesnt work. Second, do we have to install open-spiel? Can we skip it?

Look forward to hearing from you. Cheers~

Best, Yutong

opened by Yutongamber 2
How to deploy malib in GPU cluster server

I have a login node and four computing nodes. Does malib need to be deployed on each node? Is the job task submitted on the login node? Is there a detailed example of cluster server usage? Looking forward to your help

opened by IDayday 2

A Minor Error in Quick Start Demo Code

In README Quick Start part

  env_description={
        "creator": leduc_holdem.env,
        "config": {"scenario_configs": {"fixed_player": True}, "env_id": "leduc_holdem"}
        "possible_agents": env.possible_agents,
    }

A missing , after "config": {"scenario_configs": {"fixed_player": True}, "env_id": "leduc_holdem"}

opened by hanmochen 2

Turn based env
refactore open_spiel env

remove async vector env implementation

turn-based rollout is supported

PSRO and marl scenarios have been tested

remove third-party replay

refractor episode collecting and sending
opened by KornbergFresnel 1
How to use the malib to rollout based on a trained model?

excuse me, when the training is done, where is the model saved, and how to use the model to rollout? Besides, how to replay the training process, can we render the envs like mpe, magent

opened by zhuerfei 1
Can not run the examples on GPU

When I run the examples such as psro_poker, maddpg_mpe, I set the config "use_cuda" as True, but got the error as follow:

2022-04-20 19:30:20,818 ERROR worker.py:80 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::RolloutWorker.simulation() (pid=32314, ip=172.28.78.34, repr=<matf.rollout.rollout_worker.RolloutWorker object at 0x7f083bb10fd0>) (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/base_worker.py", line 447, in simulation (pid=32320) role="simulation", (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_worker.py", line 161, in sample (pid=32320) for ret in rets: (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/ray/util/actor_pool.py", line 65, in map (pid=32320) yield self.get_next() (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/ray/util/actor_pool.py", line 178, in get_next (pid=32320) return ray.get(future) (pid=32320) ray.exceptions.RayTaskError(RuntimeError): ray::Stepping.run() (pid=32309, ip=172.28.78.34, repr=<matf.rollout.rollout_func.Stepping object at 0x7f243ce38190>) (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 447, in run (pid=32320) dataset_server=self._dataset_server if task_type == "rollout" else None, (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 276, in env_runner (pid=32320) active_policy_inputs, agent_interfaces, episodes (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 155, in _do_policy_eval (pid=32320) ) = interface.compute_action(**inputs) # 根据每个env_agent_id的态势信息，rnn_state, done 计算动作 (pid=32320) File "/home/qianmd/work/test/matf/matf/envs/agent_interface.py", line 268, in compute_action (pid=32320) rets = self.policies[policy_id].compute_action(*args, **kwargs) (pid=32320) File "/home/qianmd/work/test/matf/matf/algorithm/dqn/policy.py", line 88, in compute_action (pid=32320) logits = torch.softmax(self.critic(observation), dim=-1) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call (pid=32320) result = self.forward(*input, **kwargs) (pid=32320) File "/home/qianmd/work/test/matf/matf/algorithm/common/model.py", line 106, in forward (pid=32320) pi = self.net(obs) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call (pid=32320) result = self.forward(*input, **kwargs) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward (pid=32320) input = module(input) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call (pid=32320) result = self.forward(*input, **kwargs) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward (pid=32320) return F.linear(input, self.weight, self.bias) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/functional.py", line 1370, in linear (pid=32320) ret = torch.addmm(bias, input, weight.t()) (pid=32320) RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm (pid=32314) Exception ignored in: 'ray._raylet.task_execution_handler'

opened by zhuerfei 1
Performance Results

Throughput Comparison

All the experiment results listed are obtained with one of the following hardware settings: (1) System # 1: a 32-core computing node with dual graphics cards. (2) System # 2: a two-node cluster with each node owning 128-core and a single graphics card. All the GPUs mentioned are of the same model (NVIDIA RTX3090).

Throughput comparison among the existing RL frameworks and MALib. Due to resource limitation (32 cores, 256G RAM), RLlib fails under heavy loads (CPU case: #workers >32, GPU case: #workers > 8). MALib outperforms other frameworks with only CPU and achieves comparable performance with the highly tailored framework Sample-Factory with GPU despite higher abstraction introduced. To better illustrate the scalability of MALib, we show the MA-Atari and SC2 throughput on System # 2 under different worker settings, the 512-workers group on SC2 fails due to resource limitation.

Additional comparisons between MALib and other distributed RL training frameworks. (Left): System # 3 cluster throughput of MALib in 2-player MA-Atari and 3-player SC2. (Middle): 4-player MA-Atari throughput comparison on System # 1 without GPU. (Right)} 4-player MA-Atari throughput comparison on System # 1 with GPU.

Wall-time & Performance of PB-MARL Algorithm

Comparisons of PSRO between MALib and OpenSpiel. (a) indicates that MALib achieves the same performance on exploitability as OpenSpiel; (b) shows that the convergence rate of MALib is 3x faster than OpenSpiel; (c) shows that MALib achieves a higher execution efficiency than OpenSpiel, since it requires less time consumption to iterate the same learning steps, which means MALib has the potential to scale up in more complex tasks that need to run for much more steps.

Typical MARL Algorithms

Results on Multi-agent Particle Environments

Comparisons of MADDPG in simple adversary under different rollout worker settings. Figures in the top row depict each agent's episode reward w.r.t. the number of sampled episodes, which indicates that MALib converges faster than RLlib with equal sampled episodes. Figures in the bottom row show the average time and average episode reward at the same number of sampled episodes, which indicates that MALib achieves 5x speedup than RLlib.

Scenario Crypto

Simple Push

Simple Reference

Simple Speaker Listener

Simple Tag

documentation

opened by KornbergFresnel 0

Owner

MARL @ SJTU

Multi-Agent Research at Shanghai Jiao Tong University

GitHub

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

404 Dec 25, 2022

Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

6.4k Jan 5, 2023

A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

6.8k Jan 5, 2023

An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

823 Jan 6, 2023

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

1.1k Dec 24, 2022

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

10k Jan 7, 2023

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

309 Oct 19, 2022

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow

190 Dec 30, 2022

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

Vulkan Kompute The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabl

The Institute for Ethical Machine Learning

1k Dec 26, 2022

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

94 Nov 21, 2022

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

1k Jan 6, 2023

PyFlow is a general purpose visual scripting framework for python

PyFlow is a general purpose visual scripting framework for python. State Base structure of program implemented, such things as packages disco

1.8k Jan 7, 2023

Python bindings for ArrayFire: A general purpose GPU library.

ArrayFire Python Bindings ArrayFire is a high performance library for parallel computing with an easy-to-use API. It enables users to write scientific

402 Dec 20, 2022

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

962 Dec 23, 2022

Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

1.9k Dec 29, 2022

a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

9.9k Jan 8, 2023

ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

4k Dec 29, 2022

Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

1.9k Dec 29, 2022

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

160 Jan 4, 2023

A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

79 Dec 23, 2022

A general-purpose multi-agent training framework.

Related tags

Overview

MALib

Installation

Quick Start

Comments

Throughput Comparison

Wall-time & Performance of PB-MARL Algorithm

Typical MARL Algorithms

Results on Multi-agent Particle Environments

Owner

MARL @ SJTU

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

A customisable 3D platform for agent-based AI research

An open source robotics benchmark for meta- and multi-task reinforcement learning

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

PyFlow is a general purpose visual scripting framework for python

Python bindings for ArrayFire: A general purpose GPU library.

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

a general-purpose Transformer based vision backbone

ArrayFire: a general purpose GPU library.

Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

A task-agnostic vision-language architecture as a step towards General Purpose Vision