Implementation of MA-Trace - a general-purpose multi-agent RL algorithm for cooperative environments.

Last update: Aug 18, 2022

Related tags

Deep Learning seed_rl

Overview

Off-Policy Correction For Multi-Agent Reinforcement Learning

This repository is the official implementation of Off-Policy Correction For Multi-Agent Reinforcement Learning. It is based on SEED RL, commit 5f07ba2a072c7a562070b5a0b3574b86cd72980f.

Requirements

Execution of our code is done within Docker container, you must install Docker according to the instructions provided by the authors. The specific requirements for our project are prepared as dockerfile (docker/Dockerfile.starcraft) and installed inside a container during the first execution of running script. Before running training, firstly build its base image by running:

./docker_base/marlgrid/docker/build_base.sh

Note that to execute docker commands you may need to use sudo or install Docker in rootless mode.

Training

To train a MA-Trace model, run the following command:

./run_local.sh starcraft vtrace [nb of actors] [configuration]

The [nb of actors] specifies the number of workers used for training, should be a positive natural number.

The [configuration] specifies the hyperparameters of training.

The most important hyperparameters are:

learning_rate the learning rate
entropy_cost initial entropy cost
target_entropy final entropy cost
entropy_cost_adjustment_speed how fast should entropy cost be adjusted towards the final value
frames_stacked the number of stacked frames
batch_size the size of training batches
discounting the discount factor
full_state_critic whether to use full state as input to critic network, set False to use only agents' observations
is_centralized whether to perform centralized or decentralized training
task_name name of the SMAC task to train on, see the section below

There are other parameters to configure, listed in the files, though of minor importance.

The running script provides evaluation metrics during training. They are displayed using tmux, consider checking the navigation controls.

For example, to use default parameters and one actor, run:

./run_local.sh starcraft vtrace 1 ""

To train the algorithm specified in the paper:

MA-Trace (obs): ./run_local.sh starcraft vtrace 1 "--full_state_critic=False"
MA-Trace (full): ./run_local.sh starcraft vtrace 1 "--full_state_critic=True"
DecMa-Trace: ./run_local.sh starcraft vtrace 1 "--is_centralized=False"
MA-Trace (obs) with 3 stacked observations: ./run_local.sh starcraft vtrace 1 "--full_state_critic=False --frames_stacked=3"
MA-Trace (full) with 4 stacked observations: ./run_local.sh starcraft vtrace 1 "--full_state_critic=True --frames_stacked=4"

Note that to match the perforance presented in the paper it is required to use higher number of actors, e.g. 20.

StarCraft Multi-Agent Challange

We evaluate our models on the StarCraft Multi-Agent Challange benchmark (latest version, i.e. 4.10). The challange consists of 14 tasks: '2s_vs_1sc', '2s3z', '3s5z', '1c3s5z', '10m_vs_11m', '2c_vs_64zg', 'bane_vs_bane', '5m_vs_6m', '3s_vs_5z', '3s5z_vs_3s6z', '6h_vs_8z', '27m_vs_30m', 'MMM2' and 'corridor'.

To train on a chosen task, e.g. 'MMM2', add --task_name='MMM2' to configuration, e.g.

./run_local.sh starcraft vtrace 1 "--full_state_critic=False --task_name='MMM2'"

Results

Our model achieves the following performance on SMAC:

You might also like...

A general-purpose programming language, focused on simplicity, safety and stability.

The Rivet programming language A general-purpose programming language, focused on simplicity, safety and stability. Rivet's goal is to be a very power

17 Dec 29, 2022

[ICRA 2022] An opensource framework for cooperative detection. Official implementation for OPV2V.

OpenCOOD OpenCOOD is an Open COOperative Detection framework for autonomous driving. It is also the official implementation of the ICRA 2022 paper OPV

322 Dec 23, 2022

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

1.4k Dec 22, 2022

Space-event-trace - Tracing service for spaceteam events

Implementation of MA-Trace - a general-purpose multi-agent RL algorithm for cooperative environments.

Related tags

Overview

Off-Policy Correction For Multi-Agent Reinforcement Learning

Requirements

Training

StarCraft Multi-Agent Challange

Results

You might also like...

A general-purpose programming language, focused on simplicity, safety and stability.

[ICRA 2022] An opensource framework for cooperative detection. Official implementation for OPV2V.

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Space-event-trace - Tracing service for spaceteam events

A multi-entity Transformer for multi-agent spatiotemporal modeling.

Multi-task Multi-agent Soft Actor Critic for SMAC

Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

A generalized framework for prototyping full-stack cooperative driving automation applications under CARLA+SUMO.

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

Owner

A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Multi-agent reinforcement learning algorithm and environment

a general-purpose Transformer based vision backbone

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

A task-agnostic vision-language architecture as a step towards General Purpose Vision

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)