Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overview

Overcooked-AI

We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm.
In this repository, we implemented behavior cloning(BC), offline MADDPG, MADDPG+REM (MADDPG w/ REM), MADDPG+BCQ (MADDPG w/ BCQ) with pytorch. Now, BCQ is in ' Working In Progress', and it's not implemented completely.

We collected 0.5M multi-agent offline RL dataset and experimented with each comparison methods. We collected this data with online MADDPG agents, and it includes exploration trajectories using OU noise. The experiments are ran on Asymmetric Advantages on the Overcooked environment.

We are looking forward your contribution!

How to Run

Collect Offline Data

python train_online.py agent=maddpg save_replay_buffer=true

While the agents train with 0.5M steps, the trajectory replay buffer will be dumped in your experiment/{date}/{time}_maddpg_{exp_name}/buffer folder.
Please replace the path in config/data/local.yaml to the experiment by-product directory.

Download Dataset

Or, if you want to use our dataset pre-collected, please enjoy this link.
We provide 0.5M trajectories in Asymmetric Advantages layout.
Please download our dataset in your local computer and replace the path in config/data/local.yaml

Train Offline Models

Behavior Cloning

python train_bc.py agent=bc data=local

Offline MADDPG (Vanilla)

python train_offline.py agent=maddpg data=local

Offline MADDPG (w/ REM)

python train_offline.py agent=rem_maddpg data=local

Offline MADDPG (w/ BCQ) (WIP)

python train_offline.py agent=bcq_maddpg data=local

Result

Graph

Online Offline (0.5M Data) Offline (0.25M Data)
Online MADDPG Full Offline MADDPG Half Offline MADDPG

Video

Online BC Offline /w REM
Online MADDPG BC Offline REM

Acknowledgement

You might also like...
 Multi-agent reinforcement learning algorithm and environment
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

ExORL: Exploratory Data for Offline Reinforcement Learning This is an original PyTorch implementation of the ExORL framework from Don't Change the Alg

Offline Reinforcement Learning with Implicit Q-Learning

Offline Reinforcement Learning with Implicit Q-Learning This repository contains the official implementation of Offline Reinforcement Learning with Im

Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

Solving reinforcement learning tasks which require language and vision

Multimodal Reinforcement Learning JAX implementations of the following multimodal reinforcement learning approaches. Dual-coding Episodic Memory from

 MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning
MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

MINERVA is an out-of-the-box GUI tool for offline deep reinforcement learning, designed for everyone including non-programmers to do reinforcement learning as a tool.

Code for the paper
Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Offline Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are

RL-driven agent playing tic-tac-toe on starknet against challengers.

tictactoe-on-starknet RL-driven agent playing tic-tac-toe on starknet against challengers. GUI reference: https://pythonguides.com/create-a-game-using

A multi-entity Transformer for multi-agent spatiotemporal modeling.
A multi-entity Transformer for multi-agent spatiotemporal modeling.

baller2vec This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotempor

Comments
  • Fatal Python error: (pygame parachute) Segmentation Fault Python runtime state: initialized

    Fatal Python error: (pygame parachute) Segmentation Fault Python runtime state: initialized

    Hi, I tried to run the code, but failed. this is the error:

    pygame 2.0.1 (SDL 2.0.14, Python 3.8.0) Hello from the pygame community. https://www.pygame.org/contribute.html /home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'offline_rl': Defaults list is missing _self_. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) Workspace: /home/xubin/Overcooked-AI-offline/experiment/2022.08.31/2038_bcq_maddpg_orl-vanilla 2022-08-31 20:38:03.522401: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 Computing MediumLevelActionManager Loaded MediumLevelActionManager from /home/xubin/Overcooked-AI-offline/overcooked_ai_py/data/planners/asymmetric_advantages_am.pkl [2022-08-31 20:38:08,772][replay_buffer][INFO] - Loading data - /home/xubin/Overcooked-AI-offline/49999/state.npz Fatal Python error: (pygame parachute) Segmentation Fault Python runtime state: initialized

    Thread 0x00007fe948f56700 (most recent call first): File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/threading.py", line 306 in wait File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/queue.py", line 179 in get File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/site-packages/tensorboard/summary/writer/event_file_writer.py", line 232 in run File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/threading.py", line 932 in _bootstrap_inner File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/threading.py", line 890 in _bootstrap

    Current thread 0x00007fec8aaea700 (most recent call first): File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/zipfile.py", line 710 in _get_decompressor File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/zipfile.py", line 805 in init File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/zipfile.py", line 1552 in open File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/site-packages/numpy/lib/npyio.py", line 248 in getitem File "/home/xubin/Overcooked-AI-offline/replay_buffer.py", line 100 in loader File "/home/xubin/Overcooked-AI-offline/replay_buffer.py", line 104 in append_data File "train_offline.py", line 67 in init File "train_offline.py", line 136 in main File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/site-packages/hydra/core/utils.py", line 160 in run_job File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 98 in run File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/site-packages/hydra/_internal/utils.py", line 378 in File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211 in run_and_report File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/site-packages/hydra/_internal/utils.py", line 377 in _run_hydra File "/home/xubin/anaconda3/envs/overcookedoffline/lib/python3.8/site-packages/hydra/main.py", line 48 in decorated_main File "train_offline.py", line 141 in Aborted (core dumped)

    opened by heresyjj 0
Owner
Baek In-Chang
M.S.-Ph.D. Course Student Interested in Reinforcement Learning, Multi-Agent System
Baek In-Chang
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 7, 2022
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

null 405 Jan 6, 2023
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 8, 2023
A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

InstaDeep Ltd 463 Dec 23, 2022
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 6, 2023
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

Future Power Networks 83 Jan 6, 2023
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022