PantheonRL is a package for training and testing multi-agent reinforcement learning environments.

Overview

PantheonRL

PantheonRL is a package for training and testing multi-agent reinforcement learning environments. The goal of PantheonRL is to provide a modular and extensible framework for training agent policies, fine-tuning agent policies, ad-hoc pairing of agents, and more. PantheonRL also provides a web user interface suitable for lightweight experimentation and prototyping.

PantheonRL is built on top of StableBaselines3 (SB3), allowing direct access to many of SB3's standard RL training algorithms such as PPO. PantheonRL currently follows a decentralized training paradigm -- each agent is equipped with its own replay buffer and update algorithm. The agents objects are designed to be easily manipulable. They can be saved, loaded and plugged into different training procedures such as self-play, ad-hoc / cross-play, round-robin training, or finetuning.

This package will be presented as a demo at the AAAI-22 Demonstrations Program.

Demo Paper

Demo Video

"PantheonRL: A MARL Library for Dynamic Training Interactions"
Bidipta Sarkar*, Aditi Talati*, Andy Shih*, Dorsa Sadigh
In Proceedings of the 36th AAAI Conference on Artificial Intelligence (Demo Track), 2022

@inproceedings{sarkar2021pantheonRL,
  title={PantheonRL: A MARL Library for Dynamic Training Interactions},
  author={Sarkar, Bidipta and Talati, Aditi and Shih, Andy and Sadigh Dorsa},
  booktitle = {Proceedings of the 36th AAAI Conference on Artificial Intelligence (Demo Track)},
  year={2022}
}

Installation

# Optionally create conda environments
conda create -n PantheonRL python=3.7
conda activate PantheonRL

# Clone and install PantheonRL
git clone https://github.com/Stanford-ILIAD/PantheonRL.git
cd PantheonRL
pip install -e .

Overcooked Installation

# Optionally install Overcooked environment
git submodule update --init --recursive
pip install -e overcookedgym/human_aware_rl/overcooked_ai

PettingZoo Installation

# Optionally install PettingZoo environments
pip install pettingzoo

# to install a group of pettingzoo environments
pip install "pettingzoo[classic]"

Command Line Invocation

Example

python3 trainer.py LiarsDice-v0 PPO PPO --seed 10 --preset 1
# requires Overcooked installation (see above instructions)
python3 trainer.py OvercookedMultiEnv-v0 PPO PPO --env-config '{"layout_name":"simple"}' --seed 10 --preset 1

For examples on round-robin training followed by partner adaptation, check out these instructions.

For more examples, check out the examples/ directory.

Web User Interface

The first time the web interface is being run in a new location, the database must be initialized. After that, the init-db command should not be called again, because this will clear all user account data.

Set environment variables and (re)inititalize the database

export FLASK_APP=website
export FLASK_ENV=development
flask init-db

Start the web user interface. Make sure that ports 5000 and 5001 (used for Tensorboard) are not taken.

flask run --host=0.0.0.0 --port=5000


Agent selection screen. Users can customize the ego and partner agents.


Training screen. Users can view basic information, or spawn a Tensorboard tab for full monitoring.

Features

General Features PantheonRL
Documentation ✔️
Web user interface ✔️
Built on top of SB3 ✔️
Supports PettingZoo Envs ✔️
Environment Features PantheonRL
Frame stacking (recurrence) ✔️
Simultaneous multiagent envs ✔️
Turn-based multiagent envs ✔️
2-player envs ✔️
N-player envs ✔️
Custom environments ✔️
Training Features PantheonRL
Self-play ✔️
Ad-hoc / cross-play ✔️
Round-robin training ✔️
Finetune / adapt to new partners ✔️
Custom policies ✔️

Current Environments

Name Environment Type Reward Type Players Visualization
Rock Paper Scissors SimultaneousEnv Competitive 2
Liar's Dice TurnBasedEnv Competitive 2
Block World [1] TurnBasedEnv Cooperative 2 ✔️
Overcooked [2] SimultaneousEnv Cooperative 2 ✔️
PettingZoo [3] Mixed Mixed N ✔️

[1] Adapted from the block construction task from https://github.com/cogtoolslab/compositional-abstractions

[2] Adapted from the Human_Aware_Rl / Overcooked AI package from https://github.com/HumanCompatibleAI/human_aware_rl

[3] PettingZoo environments from https://github.com/Farama-Foundation/PettingZoo

Comments
  • In addition to PPO, other methods of stable baseline 3 can't be used in the overcooked environment or get a extremely bad strategy.

    In addition to PPO, other methods of stable baseline 3 can't be used in the overcooked environment or get a extremely bad strategy.

    Such as DQN and A2C, the result is extremely bad with a reward of nearly 0. It makes me confused. Other methods in stable baseline 3 do not support a discrete action space (e.g. SAC, TD3 ...).

    opened by momo-xiaoyi 5
  • About the conflict between Stable Baselines and PantheonRL

    About the conflict between Stable Baselines and PantheonRL

    Dear authors, I run the examples of pettingzoo on colab. However I still face these problems.

    2 Using cpu device Wrapping the env with a Monitor wrapper Wrapping the env in a DummyVecEnv.

    AssertionError Traceback (most recent call last) in 10 print(env.n_players) 11 for i in range(env.n_players - 1): ---> 12 partner = OnPolicyAgent(PPO('MlpPolicy', env.getDummyEnv(i), verbose=1)) 13 14 # The second parameter ensures that the partner is assigned to a certain

    5 frames /usr/local/lib/python3.8/dist-packages/stable_baselines3/common/vec_env/util.py in obs_space_info(obs_space) 65 subspaces = {i: space for i, space in enumerate(obs_space.spaces)} 66 else: ---> 67 assert not hasattr(obs_space, "spaces"), f"Unsupported structured space '{type(obs_space)}'" 68 subspaces = {None: obs_space} 69 keys = []

    AssertionError: Unsupported structured space '<class 'gymnasium.spaces.dict.Dict'>'

    opened by skydvn 2
  • some problems when run example with OvercookedMultiEnv-v0 PPO

    some problems when run example with OvercookedMultiEnv-v0 PPO

    Hello, I run this framework with the command for training PPO python3 trainer.py OvercookedMultiEnv-v0 PPO PPO --env-config '{"layout_name": "simple"}' --seed 10 --preset 1, However, in the training log, I found the loss of the value function increases as the reward increases, I am confused for this, and the ep_rew_mean can reach 300 when total-timesteps is 500000. And I wonder how to solve this because this looks like a bug.

    opened by lixiyun98 2
  • Bump minimist from 1.2.5 to 1.2.6 in /overcookedgym/overcooked-flask

    Bump minimist from 1.2.5 to 1.2.6 in /overcookedgym/overcooked-flask

    Bumps minimist from 1.2.5 to 1.2.6.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Round-robin implementation in the paper VS MultiagentEnv class

    Round-robin implementation in the paper VS MultiagentEnv class

    As far as I can see, the MultiagentEnv class randomly samples a partner from the partner list in the beginning of each episode. But in the paper "PantheonRL: A MARL Library for Dynamic Training Interactions" Listing 1, you implement round-robin by simply adding two partners and running the learning. This won't be round-robin though right? It will randomly sample a partner from partner-list instead of going through each partner in a pre-specified order.

    opened by aenneas 1
  • Bump ejs from 2.7.4 to 3.1.7 in /overcookedgym/overcooked-flask

    Bump ejs from 2.7.4 to 3.1.7 in /overcookedgym/overcooked-flask

    Bumps ejs from 2.7.4 to 3.1.7.

    Release notes

    Sourced from ejs's releases.

    v3.1.7

    Version 3.1.7

    v3.1.6

    Version 3.1.6

    v3.1.5

    Version 3.1.5

    v3.0.2

    No release notes provided.

    Changelog

    Sourced from ejs's changelog.

    v3.0.1: 2019-11-23

    • Removed require.extensions (@​mde)
    • Removed legacy preprocessor include (@​mde)
    • Removed support for EOL Nodes 4 and 6 (@​mde)
    Commits
    • 820855a Version 3.1.7
    • 076dcb6 Don't use template literal
    • faf8b84 Skip test -- error message vary depending on JS runtime
    • c028c34 Update packages
    • e4180b4 Merge pull request #629 from markbrouwer96/main
    • d5404d6 Updated jsdoc to 3.6.7
    • 7b0845d Merge pull request #609 from mde/dependabot/npm_and_yarn/glob-parent-5.1.2
    • 32fb8ee Bump glob-parent from 5.1.1 to 5.1.2
    • f21a9e4 Merge pull request #603 from mde/mde-null-proto-where-possible
    • a50e46f Merge pull request #606 from akash-55/main
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Usage of bctrainer.py

    Usage of bctrainer.py

    Hi, Could you please offer an example of bctrainer.py. I input the trajs file collected from web user interface but meet an error:

    Traceback (most recent call last): File "/Users/alexwang/phD/PantheonRL/bctrainer.py", line 106, in clone.train(n_epochs=args.total_epochs) File "/Users/alexwang/phD/PantheonRL/pantheonrl/algos/bc.py", line 352, in train batch["obs"], batch["acts"]) File "/Users/alexwang/phD/PantheonRL/pantheonrl/algos/bc.py", line 291, in _calculate_loss _, log_prob, entropy = self.policy.evaluate_actions(obs, acts) File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/stable_baselines3/common/policies.py", line 640, in evaluate_actions log_prob = distribution.log_prob(actions) File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/stable_baselines3/common/distributions.py", line 278, in log_prob return self.distribution.log_prob(actions) File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/torch/distributions/categorical.py", line 117, in log_prob self._validate_sample(value) File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/torch/distributions/distribution.py", line 277, in _validate_sample raise ValueError('The value argument must be within the support') ValueError: The value argument must be within the support

    My torch version is 1.8.1, stable-baselines3 version is 1.2.0a0

    Thanks

    opened by AlexWanghaoming 0
Owner
Stanford Intelligent and Interactive Autonomous Systems Group
Stanford Intelligent and Interactive Autonomous Systems Group
Stanford Intelligent and Interactive Autonomous Systems Group
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 3, 2023
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 2, 2023
A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

InstaDeep Ltd 463 Dec 23, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

null 405 Jan 6, 2023
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 8, 2023
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 6, 2023
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

Future Power Networks 83 Jan 6, 2023
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

Dynamic Systems Lab 300 Dec 28, 2022
PyTorch implementations of deep reinforcement learning algorithms and environments

Deep Reinforcement Learning Algorithms with PyTorch This repository contains PyTorch implementations of deep reinforcement learning algorithms and env

Petros Christodoulou 4.7k Jan 4, 2023
CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

Facebook Research 721 Jan 3, 2023
gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI Gym toolkit.

Robin Henry 99 Dec 12, 2022
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 49 Nov 28, 2022
Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Deep Reinforcement Learning for Smart Cities Documentation RLlib: https://docs.ray.io/en/master/rllib.html Mesa: https://mesa.readthedocs.io/en/stable

null 1 May 15, 2022
A multi-entity Transformer for multi-agent spatiotemporal modeling.

baller2vec This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotempor

Michael A. Alcorn 56 Nov 15, 2022