Malmo Collaborative AI Challenge - Team Pig Catcher

Kai Arulkumaran

Last update: Jun 29, 2022

Related tags

Overview

The Malmo Collaborative AI Challenge - Team Pig Catcher

Approach

The challenge involves 2 agents who can either cooperate or defect. The optimal policy, based on stag hunt [1], depends on the policy of the other agent. Not knowing the other agent's policy, the optimal solution is then based on modelling the other agent's policy. Similarly, the challenge can be considered a sequential social dilemma [2], as goals could change over time.

By treating the other agent as part of the environment, we can use model-free RL, and simply aim to maximise the reward of our agent. As a baseline we take a DRL algorithm - ACER [3] - and train it against the evaluation agent (which randomly uses a focused or random strategy every episode).

We chose to approach this challenge using hierarchical RL. We assume there are 2 subpolicies, one for each type of partner agent. To do so, we use option heads [4], whereby the agent has shared features, but separate heads for different subpolicies. In this case, ACER with 2 subpolicies has 2 Q-value heads and 2 policy heads. To choose which subpolicy to use at any given time, the agent also has an additional classifier head that is trained (using an oracle) to distinguish which option to use. Therefore, we ask the following questions:

Can the agent distinguish between the two possible behaviours of the evaluation agent?
Does the agent learn qualitatively different subpolicies?

Unfortunately, due to technical difficulties and time restrictions, we were unable to successfully train an agent. Full results and more details can be found in our video.

Design Decisions

For our baseline, we implemented ACER [3] in PyTorch based on reference code [5, 6]. In addition, we augmented the state that the agent receives with the previous action, reward and a step counter [7]. Our challenge entry augments the agent with option heads [4], and we aim to distinguish the different policies of the evaluation agent.

We also introduce a novel contribution - a batch version of ACER - which increases stability. We sample a batch of off-policy trajectories, and then truncate them to match the smallest.

Instructions

Dependencies:

Firstly, build the Malmo Docker image. Secondly, enable running Docker as a non-root user.

Run ACER with OMP_NUM_THREADS=1 python pc_main.py. The code automatically opens up Minecraft (Docker) instances.

Discussion

References

[1] Game Theory of Mind
[2] Multi-agent Reinforcement Learning in Sequential Social Dilemmas
[3] Sample Efficient Actor-Critic with Experience Replay
[4] Classifying Options for Deep Reinforcement Learning
[5] ikostrikov/pytorch-a3c
[6] pfnet/ChainerRL
[7] Learning to Navigate in Complex Environments

This repository contains the task definition and example code for the Malmo Collaborative AI Challenge. This challenge is organized to encourage research in collaborative AI - to work towards AI agents that learn to collaborate to solve problems and achieve goals. You can find additional details, including terms and conditions, prizes and information on how to participate at the Challenge Homepage.

Notes for challenge participants: Once you and your team decide to participate in the challenge, please make sure to register your team at our Registration Page. On the registration form, you need to provide a link to the GitHub repository that will contain your solution. We recommend that you fork this repository (learn how), and provide address of the forked repo. You can then update your submission as you make progress on the challenge task. We will consider the version of the code on branch master at the time of the submission deadline as your challenge submission. Your submission needs to contain code in working order, a 1-page description of your approach, and a 1-minute video that shows off your agent. Please see the challenge terms and conditions for further details.

Jump to:

Installation
Getting started
- Play the challenge task
- Run your first experiment
Next steps
- Run an experiment in Docker on Azure
- Resources

Installation

Prerequisites

Python 2.7+ (recommended) or 3.5+
Project Malmo - we recommend downloading the Malmo-0.21.0 release and installing dependencies for Windows, Linux or MacOS. Test your Malmo installation by launching Minecraft with Malmo and launching an agent.

Minimal installation

pip install -e git+https://github.com/Microsoft/malmo-challenge#egg=malmopy

git clone https://github.com/Microsoft/malmo-challenge
cd malmo-challenge
pip install -e .

Optional extensions

Some of the example code uses additional dependencies to provide 'extra' functionality. These can be installed using:

pip install -e '.[extra1, extra2]'

For example to install gym and chainer:

pip install -e '.[gym]'

Or to install all extras:

pip install -e '.[all]'

The following extras are available:

gym: OpenAI Gym is an interface to a wide range of reinforcement learning environments. Installing this extra enables the Atari example agents in samples/atari to train on the gym environments. Note that OpenAI gym atari environments are currently not available on Windows.
tensorflow: TensorFlow is a popular deep learning framework developed by Google. In our examples it enables visualizations through TensorBoard.

Getting started

Play the challenge task

The challenge task takes the form of a mini game, called Pig Chase. Learn about the game, and try playing it yourself on our Pig Chase Challenge page.

Run your first experiment

See how to run your first baseline experiment on the Pig Chase Challenge page.

Next steps

Run an experiment in Docker on Azure

Docker is a virtualization platform that makes it easy to deploy software with all its dependencies. We use docker to run experiments locally or in the cloud. Details on how to run an example experiment using docker are in the docker README.

Resources

You might also like...

CSAC - Collaborative Semantic Aggregation and Calibration for Separated Domain Generalization

CSAC Introduction This repository contains the implementation code for paper: Co

5 Jul 22, 2022

CKD - Collaborative Knowledge Distillation for Heterogeneous Information Network Embedding

Collaborative Knowledge Distillation for Heterogeneous Information Network Embed

9 Dec 5, 2022

Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Voronoi Multi_Robot Collaborate Exploration Introduction In the unknown environment, the cooperative exploration of multiple robots is completed by Vo

6 Nov 22, 2022

The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

33 Jan 5, 2023

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

29 Dec 10, 2022

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

Squirrel Core Share, load, and transform data in a collaborative, flexible, and efficient way What is Squirrel? Squirrel is a Python library that enab

249 Dec 7, 2022

Malmo Collaborative AI Challenge - Team Pig Catcher

Related tags

Overview

The Malmo Collaborative AI Challenge - Team Pig Catcher

Approach

Design Decisions

Instructions

Discussion

References

Installation

Prerequisites

Minimal installation

Optional extensions

Getting started

Play the challenge task

Run your first experiment

Next steps

Run an experiment in Docker on Azure

Resources

You might also like...

CSAC - Collaborative Semantic Aggregation and Calibration for Separated Domain Generalization

CKD - Collaborative Knowledge Distillation for Heterogeneous Information Network Embedding

Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️

This repo contains research materials released by members of the Google Brain team in Tokyo.

Owner

Kai Arulkumaran

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Python Implementation of algorithms in Graph Mining, e.g., Recommendation, Collaborative Filtering, Community Detection, Spectral Clustering, Modularity Maximization, co-authorship networks.

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training