A general-purpose multi-agent training framework.

Overview

MALib

A general-purpose multi-agent training framework.

Installation

step1: build environment

conda create -n malib python==3.7 -y
conda activate malib
pip install -e .

# for development
pip install -e .[dev]

step2: install openspiel

installation guides: openspiel

Quick Start

"""PSRO with PPO for Leduc Holdem"""

from malib.envs.poker import poker_aec_env as leduc_holdem
from malib.runner import run
from malib.rollout import rollout_func


env = leduc_holdem.env(fixed_player=True)

run(
    agent_mapping_func=lambda agent_id: agent_id,
    env_description={
        "creator": leduc_holdem.env,
        "config": {"fixed_player": True},
        "id": "leduc_holdem",
        "possible_agents": env.possible_agents,
    },
    training={
        "interface": {
            "type": "independent",
            "observation_spaces": env.observation_spaces,
            "action_spaces": env.action_spaces
        },
    },
    algorithms={
        "PSRO_PPO": {
            "name": "PPO",
            "custom_config": {
                "gamma": 1.0,
                "eps_min": 0,
                "eps_max": 1.0,
                "eps_decay": 100,
            },
        }
    },
    rollout={
        "type": "async",
        "stopper": "simple_rollout",
        "callback": rollout_func.sequential
    }
)
Comments
  • 'List' from 'malib.utils.typing'

    'List' from 'malib.utils.typing'

    Hi I am trying to run a basic MARL setup using MAPPO.

    Here's my yaml config file

    name: "mappo_payload_carry"
    
    training:
      interface:
        type: "centralized"
        population_size: -1
      config:
        # control the frequency of remote parameter update
        update_interval: 1
        saving_interval: 100
        batch_size: 32
        optimizer: "Adam"
        actor_lr: 5.e-4
        critic_lr: 5.e-4
        opti_eps: 1.e-5
        weight_decay: 0.0
    
    rollout:
      type: "async"
      stopper: "simple_rollout"
      stopper_config:
        max_step: 10000
      metric_type: "simple"
      fragment_length: 100
      num_episodes: 4
      episode_seg: 1
      terminate: "any"
      num_env_per_worker: 1
      postprocessor_types:
        - copy_next_frame
    
    env_description:
      #  scenario_name: "simple_spread"
      creator: "Gym"
      config:
        env_id: "urdf-env-v0"
    
    algorithms:
      MAPPO:
        name: "MAPPO"
        model_config:
          initialization:
            use_orthogonal: True
            gain: 1.
          actor:
            network: mlp
            layers:
              - units: 256
                activation: ReLU
              - units: 128
                activation: ReLU
              - units: 64
                activation: ReLU
            output:
              activation: False
          critic:
            network: mlp
            layers:
              - units: 256
                activation: ReLU
              - units: 128
                activation: ReLU
              - units: 64
                activation: ReLU
            output:
              activation: False
    
        # set hyper parameter
        custom_config:
          gamma: 0.99
          use_cuda: False  # enable cuda or not
          use_q_head: False
          ppo_epoch: 4
          num_mini_batch: 1  # the number of mini-batches
    
          return_mode: gae
          gae:
            gae_lambda: 0.95
          vtrace:
            clip_rho_threshold: 1.0
            clip_pg_rho_threshold: 1.0
    
    
          use_rnn: False
          # this is not used, instead it is fixed to last hidden in actor/critic
          rnn_layer_num: 1
          rnn_data_chunk_length: 16
    
          use_feature_normalization: True
          use_popart: True
          popart_beta: 0.99999
    
          entropy_coef: 1.e-2
    
    
    
    global_evaluator:
      name: "generic"
    
    dataset_config:
      episode_capacity: 100
      fragment_length: 3001```
    
    I have a custom environment where I created the env. 
    
    env = gym.make("urdf-env-v0", dt=0.01, robots=robots, render=render)
    possible_agents = env.possible_agents
    action_spaces = env.possible_actions
    observation_spaces = env.observation_spaces
    env_desc = {"creator": env, "possible_agents": possible_agents, "action_spaces": action_spaces, "observation_spaces": observation_spaces}
    run(
        group=config["group"],
        name=config["name"],
        env_description=env_desc,
        agent_mapping_func=lambda agent: agent[
                                         :6
                                         ],  # e.g. "team_0_player_0" -> "team_0"
        training=training_config,
        algorithms=config["algorithms"],
        rollout=rollout_config,
        evaluation=config.get("evaluation", {}),
        global_evaluator=config["global_evaluator"],
        dataset_config=config.get("dataset_config", {}),
        parameter_server=config.get("parameter_server", {}),
        # worker_config=config["worker_config"],
        use_init_policy_pool=False,
        task_mode="marl",
    )
    

    I tried to see if malib.utils.typing had the List, Dict types, but it looks like they're non existent there, how do I fix this?

    opened by josyulakrishna 5
  • 3s5z in PyMARL can quickly reached to win_rate of 1 and where is the config for SMAC of malib?

    3s5z in PyMARL can quickly reached to win_rate of 1 and where is the config for SMAC of malib?

    Thanks for this nice repo. I'm interested in MARL for smac. I have some problem about this repo.

    1. In the page 18 of your arxiv paper, you mentioned that "For the scenario 3s5z, however, both of MALib and PyMARL cannot reach 80% win rate." However, I run the PyMARL original code with his default config python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=3s5z, the win rate quickly reached 1 for these two smac versions (4.10 and 4.6.2). image

    2. I would like to use this repo to run smac, but I can't find the corresponding config in examples, will this section be opend source? Thank you.

    opened by yifan123 3
  • [Question] How to debug `malib` (infinity loop when running `ray` in local mode)

    [Question] How to debug `malib` (infinity loop when running `ray` in local mode)

    Hi, I'm trying to run the PSRO algorithm with the quick start example on https://malib.io (PSRO PPO with leduc_holdem.env). I get the following errors:

    2022-03-27 23:04:33,013    ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::RolloutWorker.simulation() (pid=185570, ip=127.0.0.1, repr=<malib.rollout.rollout_worker.RolloutWorker object at 0x7f1c52c37490>)
      File "/home/panxuehai/Projects/malib/malib/rollout/base_worker.py", line 441, in simulation
        raw_statistics, num_frames = self.sample(
      File "/home/panxuehai/Projects/malib/malib/rollout/rollout_worker.py", line 161, in sample
        for ret in rets:
      File "/home/panxuehai/Miniconda3/envs/malib/lib/python3.8/site-packages/ray/util/actor_pool.py", line 65, in map
        yield self.get_next()
      File "/home/panxuehai/Miniconda3/envs/malib/lib/python3.8/site-packages/ray/util/actor_pool.py", line 178, in get_next
        return ray.get(future)
    ray.exceptions.RayTaskError(TypeError): ray::Stepping.run() (pid=185523, ip=127.0.0.1, repr=<malib.rollout.rollout_func.Stepping object at 0x7f8814a20dc0>)
      File "/home/panxuehai/Projects/malib/malib/rollout/rollout_func.py", line 431, in run
        rollout_results = env_runner(
      File "/home/panxuehai/Projects/malib/malib/rollout/rollout_func.py", line 243, in env_runner
        rets = env.reset(
      File "/home/panxuehai/Projects/malib/malib/envs/vector_env.py", line 192, in reset
        _ret = env.reset(max_step=max_step, custom_reset_config=custom_reset_config)
    TypeError: reset() got an unexpected keyword argument 'max_step'
    

    Then I'd debug this myself with ray.init(local_mode=True) in my debugger. All Ray actors will run sequentially when the "local mode" is on.

    https://docs.ray.io/en/latest/ray-core/starting-ray.html#local-mode:

    Local Mode

    By default, Ray will parallelize its workload and run tasks on multiple processes and multiple nodes. However, if you need to debug your Ray program, it may be easier to do everything on a single process. You can force all Ray functions to occur on a single process by enabling local mode

    The program is stuck into an infinity loop here:

    https://github.com/sjtu-marl/malib/blob/5be07ac00761a34fb095adb2b3018a798ceea256/malib/agent/agent_interface.py#L277-L279

    I wonder what's the best practice for debugging with malib? Thanks very much!

    opened by XuehaiPan 2
  • How to run in cluster or cross multiply machines?

    How to run in cluster or cross multiply machines?

    Dear MALib support,

    My question as following:

    1. Can MALib run in cluster or multiply machines? How to set the config?
    2. When running in single machines, how to set the config about agent number?

    Thank you.

    opened by xuehui1991 2
  • For SMAC Qimx/MADDPG config

    For SMAC Qimx/MADDPG config

    It's really a nice work and according to your paper, running Qmix/MADDPG is really fast. But we didn't find a config about Qmix/MADDPG algorithm, We don't know how to run your Qmix/MADDPG program,so can you give us a config for the Qmix/MADDPG algorithm? WeChat Work Screenshot_20211227124527

    opened by Weiyuhong-1998 2
  • A question for quick start

    A question for quick start

    Thanks for the nice work. I want to ask for some help about the quick start. First, could you please give some instruction of running single rl in gym-based env, such as cartpole? I tried to run the file-run_gym.py and load yaml, but it doesnt work. Second, do we have to install open-spiel? Can we skip it?

    Look forward to hearing from you. Cheers~

    Best, Yutong

    opened by Yutongamber 2
  • How to deploy malib in GPU cluster server

    How to deploy malib in GPU cluster server

    I have a login node and four computing nodes. Does malib need to be deployed on each node? Is the job task submitted on the login node? Is there a detailed example of cluster server usage? Looking forward to your help

    opened by IDayday 2
  • A Minor Error in Quick Start Demo Code

    A Minor Error in Quick Start Demo Code

    In README Quick Start part

      env_description={
            "creator": leduc_holdem.env,
            "config": {"scenario_configs": {"fixed_player": True}, "env_id": "leduc_holdem"}
            "possible_agents": env.possible_agents,
        }
    

    A missing , after "config": {"scenario_configs": {"fixed_player": True}, "env_id": "leduc_holdem"}

    opened by hanmochen 2
  • Turn based env

    Turn based env

    • refactore open_spiel env
    • remove async vector env implementation
    • turn-based rollout is supported
    • PSRO and marl scenarios have been tested
    • remove third-party replay
    • refractor episode collecting and sending
    opened by KornbergFresnel 1
  • How to use the malib to rollout based on a trained model?

    How to use the malib to rollout based on a trained model?

    excuse me, when the training is done, where is the model saved, and how to use the model to rollout? Besides, how to replay the training process, can we render the envs like mpe, magent

    opened by zhuerfei 1
  • Can not run the examples on GPU

    Can not run the examples on GPU

    When I run the examples such as psro_poker, maddpg_mpe, I set the config "use_cuda" as True, but got the error as follow:

    2022-04-20 19:30:20,818 ERROR worker.py:80 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::RolloutWorker.simulation() (pid=32314, ip=172.28.78.34, repr=<matf.rollout.rollout_worker.RolloutWorker object at 0x7f083bb10fd0>) (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/base_worker.py", line 447, in simulation (pid=32320) role="simulation", (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_worker.py", line 161, in sample (pid=32320) for ret in rets: (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/ray/util/actor_pool.py", line 65, in map (pid=32320) yield self.get_next() (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/ray/util/actor_pool.py", line 178, in get_next (pid=32320) return ray.get(future) (pid=32320) ray.exceptions.RayTaskError(RuntimeError): ray::Stepping.run() (pid=32309, ip=172.28.78.34, repr=<matf.rollout.rollout_func.Stepping object at 0x7f243ce38190>) (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 447, in run (pid=32320) dataset_server=self._dataset_server if task_type == "rollout" else None, (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 276, in env_runner (pid=32320) active_policy_inputs, agent_interfaces, episodes (pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 155, in _do_policy_eval (pid=32320) ) = interface.compute_action(**inputs) # 根据每个env_agent_id的态势信息,rnn_state, done 计算动作 (pid=32320) File "/home/qianmd/work/test/matf/matf/envs/agent_interface.py", line 268, in compute_action (pid=32320) rets = self.policies[policy_id].compute_action(*args, **kwargs) (pid=32320) File "/home/qianmd/work/test/matf/matf/algorithm/dqn/policy.py", line 88, in compute_action (pid=32320) logits = torch.softmax(self.critic(observation), dim=-1) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call (pid=32320) result = self.forward(*input, **kwargs) (pid=32320) File "/home/qianmd/work/test/matf/matf/algorithm/common/model.py", line 106, in forward (pid=32320) pi = self.net(obs) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call (pid=32320) result = self.forward(*input, **kwargs) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward (pid=32320) input = module(input) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call (pid=32320) result = self.forward(*input, **kwargs) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward (pid=32320) return F.linear(input, self.weight, self.bias) (pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/functional.py", line 1370, in linear (pid=32320) ret = torch.addmm(bias, input, weight.t()) (pid=32320) RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm (pid=32314) Exception ignored in: 'ray._raylet.task_execution_handler'

    opened by zhuerfei 1
  • Performance Results

    Performance Results

    Throughput Comparison

    All the experiment results listed are obtained with one of the following hardware settings: (1) System # 1: a 32-core computing node with dual graphics cards. (2) System # 2: a two-node cluster with each node owning 128-core and a single graphics card. All the GPUs mentioned are of the same model (NVIDIA RTX3090).

    Throughput comparison among the existing RL frameworks and MALib. Due to resource limitation (32 cores, 256G RAM), RLlib fails under heavy loads (CPU case: #workers >32, GPU case: #workers > 8). MALib outperforms other frameworks with only CPU and achieves comparable performance with the highly tailored framework Sample-Factory with GPU despite higher abstraction introduced. To better illustrate the scalability of MALib, we show the MA-Atari and SC2 throughput on System # 2 under different worker settings, the 512-workers group on SC2 fails due to resource limitation.

    merged_throughput_report

    Additional comparisons between MALib and other distributed RL training frameworks. (Left): System # 3 cluster throughput of MALib in 2-player MA-Atari and 3-player SC2. (Middle): 4-player MA-Atari throughput comparison on System # 1 without GPU. (Right)} 4-player MA-Atari throughput comparison on System # 1 with GPU.

    merged_throughput_report_4p

    Wall-time & Performance of PB-MARL Algorithm

    Comparisons of PSRO between MALib and OpenSpiel. (a) indicates that MALib achieves the same performance on exploitability as OpenSpiel; (b) shows that the convergence rate of MALib is 3x faster than OpenSpiel; (c) shows that MALib achieves a higher execution efficiency than OpenSpiel, since it requires less time consumption to iterate the same learning steps, which means MALib has the potential to scale up in more complex tasks that need to run for much more steps.

    pb-marl_wall_time

    Typical MARL Algorithms

    Results on Multi-agent Particle Environments

    Comparisons of MADDPG in simple adversary under different rollout worker settings. Figures in the top row depict each agent's episode reward w.r.t. the number of sampled episodes, which indicates that MALib converges faster than RLlib with equal sampled episodes. Figures in the bottom row show the average time and average episode reward at the same number of sampled episodes, which indicates that MALib achieves 5x speedup than RLlib.

    simple_adversary

    Scenario Crypto

    simple_crypto

    Simple Push

    simple_push

    Simple Reference

    simple_reference

    Simple Speaker Listener

    simple_speaker_listener

    Simple Tag

    simple_tag

    documentation 
    opened by KornbergFresnel 0
Owner
MARL @ SJTU
Multi-Agent Research at Shanghai Jiao Tong University
MARL @ SJTU
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

null 404 Dec 25, 2022
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

Serpent.AI 6.4k Jan 5, 2023
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

DeepMind 6.8k Jan 5, 2023
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Reinforcement Learning Working Group 823 Jan 6, 2023
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Wah Loon Keng 1.1k Dec 24, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 7, 2023
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

Maluuba Inc. 309 Oct 19, 2022
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

Vulkan Kompute The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabl

The Institute for Ethical Machine Learning 1k Dec 26, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

The Kompute Project 1k Jan 6, 2023
PyFlow is a general purpose visual scripting framework for python

PyFlow is a general purpose visual scripting framework for python. State Base structure of program implemented, such things as packages disco

null 1.8k Jan 7, 2023
Python bindings for ArrayFire: A general purpose GPU library.

ArrayFire Python Bindings ArrayFire is a high performance library for parallel computing with an easy-to-use API. It enables users to write scientific

ArrayFire 402 Dec 20, 2022
Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

AI Summer 962 Dec 23, 2022
Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Ray Holder 1.9k Dec 29, 2022
a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Microsoft 9.9k Jan 8, 2023
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

ArrayFire 4k Dec 29, 2022
Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Ray Holder 1.9k Dec 29, 2022
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

AI2 79 Dec 23, 2022