A parallel framework for population-based multi-agent reinforcement learning.

Overview

MALib: A parallel framework for population-based multi-agent reinforcement learning

GitHub license

MALib is a parallel framework of population-based learning nested with (multi-agent) reinforcement learning (RL) methods, such as Policy Space Response Oracle, Self-Play and Neural Fictitous Self-Play. MALib provides higher-level abstractions of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms. The design of MALib also strives to promote the research of other multi-agent learning, including multi-agent imitation learning and model-based MARL.

architecture

Installation

The installation of MALib is very easy. We've tested MALib on Python 3.6 and 3.7. This guide is based on ubuntu 18.04 and above. We strongly recommend using conda to manage your dependencies, and avoid version conflicts. Here we show the example of building python 3.7 based conda environment.

conda create -n malib python==3.7 -y
conda activate malib

# install dependencies
./install_deps.sh

# install malib
pip install -e .

External environments are integrated in MALib, such as StarCraftII and vizdoom, you can install them via pip install -e .[envs]. For users who wanna contribute to our repository, run pip install -e .[dev] to complete the development dependencies.

optional: if you wanna use alpha-rank to solve meta-game, install open-spiel with its installation guides

Quick Start

"""PSRO with PPO for Leduc Holdem"""

from malib.envs.poker import poker_aec_env as leduc_holdem
from malib.runner import run
from malib.rollout import rollout_func


env = leduc_holdem.env(fixed_player=True)

run(
    agent_mapping_func=lambda agent_id: agent_id,
    env_description={
        "creator": leduc_holdem.env,
        "config": {"fixed_player": True},
        "id": "leduc_holdem",
        "possible_agents": env.possible_agents,
    },
    training={
        "interface": {
            "type": "independent",
            "observation_spaces": env.observation_spaces,
            "action_spaces": env.action_spaces
        },
    },
    algorithms={
        "PSRO_PPO": {
            "name": "PPO",
            "custom_config": {
                "gamma": 1.0,
                "eps_min": 0,
                "eps_max": 1.0,
                "eps_decay": 100,
            },
        }
    },
    rollout={
        "type": "async",
        "stopper": "simple_rollout",
        "callback": rollout_func.sequential
    }
)

Citing MALib

If you use MALib in your work, please cite the accompanying paper.

@misc{zhou2021malib,
      title={MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning}, 
      author={Ming Zhou and Ziyu Wan and Hanjing Wang and Muning Wen and Runzhe Wu and Ying Wen and Yaodong Yang and Weinan Zhang and Jun Wang},
      year={2021},
      eprint={2106.07551},
      archivePrefix={arXiv},
      primaryClass={cs.MA}
}
Comments
  • 'List' from 'malib.utils.typing'

    'List' from 'malib.utils.typing'

    Hi I am trying to run a basic MARL setup using MAPPO.

    Here's my yaml config file

    name: "mappo_payload_carry"
    
    training:
      interface:
        type: "centralized"
        population_size: -1
      config:
        # control the frequency of remote parameter update
        update_interval: 1
        saving_interval: 100
        batch_size: 32
        optimizer: "Adam"
        actor_lr: 5.e-4
        critic_lr: 5.e-4
        opti_eps: 1.e-5
        weight_decay: 0.0
    
    rollout:
      type: "async"
      stopper: "simple_rollout"
      stopper_config:
        max_step: 10000
      metric_type: "simple"
      fragment_length: 100
      num_episodes: 4
      episode_seg: 1
      terminate: "any"
      num_env_per_worker: 1
      postprocessor_types:
        - copy_next_frame
    
    env_description:
      #  scenario_name: "simple_spread"
      creator: "Gym"
      config:
        env_id: "urdf-env-v0"
    
    algorithms:
      MAPPO:
        name: "MAPPO"
        model_config:
          initialization:
            use_orthogonal: True
            gain: 1.
          actor:
            network: mlp
            layers:
              - units: 256
                activation: ReLU
              - units: 128
                activation: ReLU
              - units: 64
                activation: ReLU
            output:
              activation: False
          critic:
            network: mlp
            layers:
              - units: 256
                activation: ReLU
              - units: 128
                activation: ReLU
              - units: 64
                activation: ReLU
            output:
              activation: False
    
        # set hyper parameter
        custom_config:
          gamma: 0.99
          use_cuda: False  # enable cuda or not
          use_q_head: False
          ppo_epoch: 4
          num_mini_batch: 1  # the number of mini-batches
    
          return_mode: gae
          gae:
            gae_lambda: 0.95
          vtrace:
            clip_rho_threshold: 1.0
            clip_pg_rho_threshold: 1.0
    
    
          use_rnn: False
          # this is not used, instead it is fixed to last hidden in actor/critic
          rnn_layer_num: 1
          rnn_data_chunk_length: 16
    
          use_feature_normalization: True
          use_popart: True
          popart_beta: 0.99999
    
          entropy_coef: 1.e-2
    
    
    
    global_evaluator:
      name: "generic"
    
    dataset_config:
      episode_capacity: 100
      fragment_length: 3001```
    
    I have a custom environment where I created the env. 
    
    env = gym.make("urdf-env-v0", dt=0.01, robots=robots, render=render)
    possible_agents = env.possible_agents
    action_spaces = env.possible_actions
    observation_spaces = env.observation_spaces
    env_desc = {"creator": env, "possible_agents": possible_agents, "action_spaces": action_spaces, "observation_spaces": observation_spaces}
    run(
        group=config["group"],
        name=config["name"],
        env_description=env_desc,
        agent_mapping_func=lambda agent: agent[
                                         :6
                                         ],  # e.g. "team_0_player_0" -> "team_0"
        training=training_config,
        algorithms=config["algorithms"],
        rollout=rollout_config,
        evaluation=config.get("evaluation", {}),
        global_evaluator=config["global_evaluator"],
        dataset_config=config.get("dataset_config", {}),
        parameter_server=config.get("parameter_server", {}),
        # worker_config=config["worker_config"],
        use_init_policy_pool=False,
        task_mode="marl",
    )
    

    I tried to see if malib.utils.typing had the List, Dict types, but it looks like they're non existent there, how do I fix this?

    opened by josyulakrishna 5
  • 3s5z in PyMARL can quickly reached to win_rate of 1 and where is the config for SMAC of malib?

    3s5z in PyMARL can quickly reached to win_rate of 1 and where is the config for SMAC of malib?

    Thanks for this nice repo. I'm interested in MARL for smac. I have some problem about this repo.

    1. In the page 18 of your arxiv paper, you mentioned that "For the scenario 3s5z, however, both of MALib and PyMARL cannot reach 80% win rate." However, I run the PyMARL original code with his default config python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=3s5z, the win rate quickly reached 1 for these two smac versions (4.10 and 4.6.2). image

    2. I would like to use this repo to run smac, but I can't find the corresponding config in examples, will this section be opend source? Thank you.

    opened by yifan123 3
  • [Question] How to debug `malib` (infinity loop when running `ray` in local mode)

    [Question] How to debug `malib` (infinity loop when running `ray` in local mode)

    Hi, I'm trying to run the PSRO algorithm with the quick start example on https://malib.io (PSRO PPO with leduc_holdem.env). I get the following errors:

    2022-03-27 23:04:33,013    ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::RolloutWorker.simulation() (pid=185570, ip=127.0.0.1, repr=<malib.rollout.rollout_worker.RolloutWorker object at 0x7f1c52c37490>)
      File "/home/panxuehai/Projects/malib/malib/rollout/base_worker.py", line 441, in simulation
        raw_statistics, num_frames = self.sample(
      File "/home/panxuehai/Projects/malib/malib/rollout/rollout_worker.py", line 161, in sample
        for ret in rets:
      File "/home/panxuehai/Miniconda3/envs/malib/lib/python3.8/site-packages/ray/util/actor_pool.py", line 65, in map
        yield self.get_next()
      File "/home/panxuehai/Miniconda3/envs/malib/lib/python3.8/site-packages/ray/util/actor_pool.py", line 178, in get_next
        return ray.get(future)
    ray.exceptions.RayTaskError(TypeError): ray::Stepping.run() (pid=185523, ip=127.0.0.1, repr=<malib.rollout.rollout_func.Stepping object at 0x7f8814a20dc0>)
      File "/home/panxuehai/Projects/malib/malib/rollout/rollout_func.py", line 431, in run
        rollout_results = env_runner(
      File "/home/panxuehai/Projects/malib/malib/rollout/rollout_func.py", line 243, in env_runner
        rets = env.reset(
      File "/home/panxuehai/Projects/malib/malib/envs/vector_env.py", line 192, in reset
        _ret = env.reset(max_step=max_step, custom_reset_config=custom_reset_config)
    TypeError: reset() got an unexpected keyword argument 'max_step'
    

    Then I'd debug this myself with ray.init(local_mode=True) in my debugger. All Ray actors will run sequentially when the "local mode" is on.

    https://docs.ray.io/en/latest/ray-core/starting-ray.html#local-mode:

    Local Mode

    By default, Ray will parallelize its workload and run tasks on multiple processes and multiple nodes. However, if you need to debug your Ray program, it may be easier to do everything on a single process. You can force all Ray functions to occur on a single process by enabling local mode

    The program is stuck into an infinity loop here:

    https://github.com/sjtu-marl/malib/blob/5be07ac00761a34fb095adb2b3018a798ceea256/malib/agent/agent_interface.py#L277-L279

    I wonder what's the best practice for debugging with malib? Thanks very much!

    opened by XuehaiPan 2
  • How to run in cluster or cross multiply machines?

    How to run in cluster or cross multiply machines?

    Dear MALib support,

    My question as following:

    1. Can MALib run in cluster or multiply machines? How to set the config?
    2. When running in single machines, how to set the config about agent number?

    Thank you.

    opened by xuehui1991 2
  • For SMAC Qimx/MADDPG config

    For SMAC Qimx/MADDPG config

    It's really a nice work and according to your paper, running Qmix/MADDPG is really fast. But we didn't find a config about Qmix/MADDPG algorithm, We don't know how to run your Qmix/MADDPG program,so can you give us a config for the Qmix/MADDPG algorithm? WeChat Work Screenshot_20211227124527

    opened by Weiyuhong-1998 2
  • A question for quick start

    A question for quick start

    Thanks for the nice work. I want to ask for some help about the quick start. First, could you please give some instruction of running single rl in gym-based env, such as cartpole? I tried to run the file-run_gym.py and load yaml, but it doesnt work. Second, do we have to install open-spiel? Can we skip it?

    Look forward to hearing from you. Cheers~

    Best, Yutong

    opened by Yutongamber 2
  • How to deploy malib in GPU cluster server

    How to deploy malib in GPU cluster server

    I have a login node and four computing nodes. Does malib need to be deployed on each node? Is the job task submitted on the login node? Is there a detailed example of cluster server usage? Looking forward to your help

    opened by IDayday 2
  • A Minor Error in Quick Start Demo Code

    A Minor Error in Quick Start Demo Code

    In README Quick Start part

      env_description={
            "creator": leduc_holdem.env,
            "config": {"scenario_configs": {"fixed_player": True}, "env_id": "leduc_holdem"}
            "possible_agents": env.possible_agents,
        }
    

    A missing , after "config": {"scenario_configs": {"fixed_player": True}, "env_id": "leduc_holdem"}

    opened by hanmochen 2
  • Turn based env

    Turn based env

    • refactore open_spiel env
    • remove async vector env implementation
    • turn-based rollout is supported
    • PSRO and marl scenarios have been tested
    • remove third-party replay
    • refractor episode collecting and sending
    opened by KornbergFresnel 1
  • How to use the malib to rollout based on a trained model?

    How to use the malib to rollout based on a trained model?

    excuse me, when the training is done, where is the model saved, and how to use the model to rollout? Besides, how to replay the training process, can we render the envs like mpe, magent

    opened by zhuerfei 1
  • Can not run the examples on GPU

    Can not run the examples on GPU

    When I run the examples such as psro_poker, maddpg_mpe, I set the config "use_cuda" as True, but got the error as follow:

    2022-04-20 19:30:20,818 ERROR worker.py:80 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::RolloutWorker.simulation() (pid=32314, ip=172.28.78.34, repr=<matf.rollout.rollout_worker.RolloutWorker object at 0x7f083bb10fd0>) (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/base_worker.py", line 447, in simulation (pid=32320) role="simulation", (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_worker.py", line 161, in sample (pid=32320) for ret in rets: (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/ray/util/actor_pool.py", line 65, in map (pid=32320) yield self.get_next() (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/ray/util/actor_pool.py", line 178, in get_next (pid=32320) return ray.get(future) (pid=32320) ray.exceptions.RayTaskError(RuntimeError): ray::Stepping.run() (pid=32309, ip=172.28.78.34, repr=<matf.rollout.rollout_func.Stepping object at 0x7f243ce38190>) (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 447, in run (pid=32320) dataset_server=self._dataset_server if task_type == "rollout" else None, (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 276, in env_runner (pid=32320) active_policy_inputs, agent_interfaces, episodes (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 155, in _do_policy_eval (pid=32320) ) = interface.compute_action(**inputs) # 根据每个env_agent_id的态势信息,rnn_state, done 计算动作 (pid=32320) File "/home/qianmd/work/test/matf/matf/envs/agent_interface.py", line 268, in compute_action (pid=32320) rets = self.policies[policy_id].compute_action(*args, **kwargs) (pid=32320) File "/home/qianmd/work/test/matf/matf/algorithm/dqn/policy.py", line 88, in compute_action (pid=32320) logits = torch.softmax(self.critic(observation), dim=-1) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call (pid=32320) result = self.forward(*input, **kwargs) (pid=32320) File "/home/qianmd/work/test/matf/matf/algorithm/common/model.py", line 106, in forward (pid=32320) pi = self.net(obs) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call (pid=32320) result = self.forward(*input, **kwargs) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward (pid=32320) input = module(input) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call (pid=32320) result = self.forward(*input, **kwargs) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward (pid=32320) return F.linear(input, self.weight, self.bias) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/functional.py", line 1370, in linear (pid=32320) ret = torch.addmm(bias, input, weight.t()) (pid=32320) RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm (pid=32314) Exception ignored in: 'ray._raylet.task_execution_handler'

    opened by zhuerfei 1
  • pickle5 PicklingError

    pickle5 PicklingError

    Environment: 1. Linux version 3.10.0-1160.71.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) 2. conda 4.5.4

    command: 1. conda create -n malib python==3.7 -y 2. conda activate malib 3. ./install.sh 4. python examples/run_psro.py

    bug report:

    KeyError: (<class 'RuntimeError'>, ('numba jitted function aborted due to unresolved symbol',), None) During handling of the above exception, another exception occurred: RecursionError: maximum recursion depth exceeded while calling a Python object

    The above exception was the direct cause of the following exception: pickle5.pickle.PicklingError: Could not pickle object as excessively deep recursion required.

    1.log

    opened by Dixit91 0
  • Performance Results

    Performance Results

    Throughput Comparison

    All the experiment results listed are obtained with one of the following hardware settings: (1) System # 1: a 32-core computing node with dual graphics cards. (2) System # 2: a two-node cluster with each node owning 128-core and a single graphics card. All the GPUs mentioned are of the same model (NVIDIA RTX3090).

    Throughput comparison among the existing RL frameworks and MALib. Due to resource limitation (32 cores, 256G RAM), RLlib fails under heavy loads (CPU case: #workers >32, GPU case: #workers > 8). MALib outperforms other frameworks with only CPU and achieves comparable performance with the highly tailored framework Sample-Factory with GPU despite higher abstraction introduced. To better illustrate the scalability of MALib, we show the MA-Atari and SC2 throughput on System # 2 under different worker settings, the 512-workers group on SC2 fails due to resource limitation.

    merged_throughput_report

    Additional comparisons between MALib and other distributed RL training frameworks. (Left): System # 3 cluster throughput of MALib in 2-player MA-Atari and 3-player SC2. (Middle): 4-player MA-Atari throughput comparison on System # 1 without GPU. (Right)} 4-player MA-Atari throughput comparison on System # 1 with GPU.

    merged_throughput_report_4p

    Wall-time & Performance of PB-MARL Algorithm

    Comparisons of PSRO between MALib and OpenSpiel. (a) indicates that MALib achieves the same performance on exploitability as OpenSpiel; (b) shows that the convergence rate of MALib is 3x faster than OpenSpiel; (c) shows that MALib achieves a higher execution efficiency than OpenSpiel, since it requires less time consumption to iterate the same learning steps, which means MALib has the potential to scale up in more complex tasks that need to run for much more steps.

    pb-marl_wall_time

    Typical MARL Algorithms

    Results on Multi-agent Particle Environments

    Comparisons of MADDPG in simple adversary under different rollout worker settings. Figures in the top row depict each agent's episode reward w.r.t. the number of sampled episodes, which indicates that MALib converges faster than RLlib with equal sampled episodes. Figures in the bottom row show the average time and average episode reward at the same number of sampled episodes, which indicates that MALib achieves 5x speedup than RLlib.

    simple_adversary

    Scenario Crypto

    simple_crypto

    Simple Push

    simple_push

    Simple Reference

    simple_reference

    Simple Speaker Listener

    simple_speaker_listener

    Simple Tag

    simple_tag

    documentation 
    opened by KornbergFresnel 0
Owner
MARL @ SJTU
Multi-Agent Research at Shanghai Jiao Tong University
MARL @ SJTU
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

null 405 Jan 6, 2023
A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

InstaDeep Ltd 463 Dec 23, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 6, 2023
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

Future Power Networks 83 Jan 6, 2023
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
Deep Reinforcement Learning based Trading Agent for Bitcoin

Deep Trading Agent Deep Reinforcement Learning based Trading Agent for Bitcoin using DeepSense Network for Q function approximation. For complete deta

Kartikay Garg 669 Dec 29, 2022
Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Deep Reinforcement Learning for Smart Cities Documentation RLlib: https://docs.ray.io/en/master/rllib.html Mesa: https://mesa.readthedocs.io/en/stable

null 1 May 15, 2022
Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Population-Based Bandits (PB2) Code for the Population-Based Bandits (PB2) Algorithm, from the paper Provably Efficient Online Hyperparameter Optimiza

Jack Parker-Holder 22 Nov 16, 2022
A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

AMAZ3DSim AMAZ3DSim is a lightweight python-based 3D network multi-agent simulator. It uses a cell-based congestion model. It calculates risk, battery

Daniel Hirsch 13 Nov 4, 2022
A multi-entity Transformer for multi-agent spatiotemporal modeling.

baller2vec This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotempor

Michael A. Alcorn 56 Nov 15, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 2, 2023
Minecraft agent to farm resources using reinforcement learning

BarnyardBot CS 175 group project using Malmo download BarnyardBot.py into the python examples directory and run 'python BarnyardBot.py' in the console

null 0 Jul 26, 2022
Simulate genealogical trees and genomic sequence data using population genetic models

msprime msprime is a population genetics simulator based on tskit. Msprime can simulate random ancestral histories for a sample of individuals (consis

Tskit developers 150 Dec 14, 2022