This is the source code of RPG (Reward-Randomized Policy Gradient)

Related tags

Text Data & NLP RPG
Overview

RPG (Reward-Randomized Policy Gradient)

Zhenggang Tang*, Chao Yu*, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu (* equal contribution)

Website: https://sites.google.com/view/staghuntrpg

This is the source code for RPG (Reward-Randomized Policy Gradient), which is proposed in the paper "Discovering Diverse Multi-agent Strategic Behavior via Reward Randomization"(https://arxiv.org/abs/2103.04564).

1. Supported environments

1.1 Agar.io

Agar is a popular multi-player online game. Players control one or more cells in a Petri dish. The goal is to gain as much mass as possible by eating cells smaller than the player's cell while avoiding being eaten by larger ones. Larger cells move slower. Each player starts with one cell but can split a sufficiently large cell into two, allowing them to control multiple cells. The control is performed by mouse motion: all the cells of a player move towards the mouse position.

We transform the Free-For-All (FFA) mode of Agar (https://agar.io/) into an Reinforcement Learning (RL) environment and we believe it can be utilized as a new Multi-agent RL testbed for a wide range of problems, such as cooperation, team formation, intention modeling, etc. If you want to use Agar.io as your testbed, welcome to visit the agar repository: https://github.com/staghuntrpg/agar.

1.2 Grid World

  • Monster-Hunt In Monster-Hunt, there is a monster and two apples. The monster keeps moving towards its closest agent while apples are static. When a single agent meets the monster, it losses a penalty of 2; if two agents catch the monster at the same time, they both earn a bonus of 5. Eating an apple always gives an agent a bonus of 2. Whenever an apple is eaten or the monster meets an agent, the apple or the monster will respawn randomly. The monster may move over the apple during the chase, in this case, the agent will gain the sum of points if it catches the monster and the apple exactly.

  • Escalation In Escalation, two agents appear randomly and one grid lights up at the initialization. If two agents step on the lit grid simultaneously, each agent can gain 1 point, and the lit grid will go out with an adjacent grid lighting up. Both agents can gain 1 point again if they step on the next lit grid together. But if one agent steps off the path, the other agent will lose 0.9L points, where L is the current length of stepping together, and the game is over. Another option is that two agents choose to step off the path simultaneously, neither agent will be punished, and the game continues.

2. Usage

git clone https://github.com/staghuntrpg/RPG.git --recursive

Tips: Please don't forget the --recursive in the command, or else you will not have Agar.io environment in your fold.

This repository is separated into two folds, GridWorld and Agar, corresponding to the environments used in the paper "Discovering Diverse Multi-agent Strategic Behavior via Reward Randomization". The installation&training instructions can be found in the subfolders of each environment.

3. Publication

If you find this repository useful, please cite our paper:

@misc{tang2021discovering,
      title={Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization}, 
      author={Zhenggang Tang and Chao Yu and Boyuan Chen and Huazhe Xu and Xiaolong Wang and Fei Fang and Simon Du and Yu Wang and Yi Wu},
      year={2021},
      eprint={2103.04564},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}
You might also like...
(ACL 2022) The source code for the paper
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

Guide to using pre-trained large language models of source code
Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

💬   Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

An open source library for deep learning end-to-end dialog systems and chatbots.
An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

NLTK Source

Natural Language Toolkit (NLTK) NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets, and tutorials supporting

💬   Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

An open-source NLP research library, built on PyTorch.
An open-source NLP research library, built on PyTorch.

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic

Open Source Neural Machine Translation in PyTorch
Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

An open source library for deep learning end-to-end dialog systems and chatbots.
An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

Comments
  • run error

    run error

    Thanks for your great work.

    I'm facing a problem when I run the project.

    the best GPU is  0  with free memories of  6918 
    user name is luuu
    /home/luuu/anaconda3/envs/rpg/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
      warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
    Logging to /tmp/openai-2021-04-08-12-57-01-080091
    Creating dummy env object to get spaces
    Process ForkProcess-2:
    Process ForkProcess-5:
    Process ForkProcess-3:
    Process ForkProcess-6:
    Process ForkProcess-1:
    Process ForkProcess-7:
    Traceback (most recent call last):
    Traceback (most recent call last):
    Traceback (most recent call last):
      File "/home/luuu/anaconda3/envs/rpg/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
        self.run()
      File "/home/luuu/anaconda3/envs/rpg/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
        self.run()
      File "/home/luuu/anaconda3/envs/rpg/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
        self.run()
      File "/home/luuu/anaconda3/envs/rpg/lib/python3.6/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "/home/luuu/anaconda3/envs/rpg/lib/python3.6/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "/home/luuu/anaconda3/envs/rpg/lib/python3.6/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "/home/luuu/baselines/baselines/common/vec_env/shmem_vec_env.py", line 125, in _subproc_worker
        pipe.send(_write_obs(env.reset()))
      File "/home/luuu/baselines/baselines/common/vec_env/shmem_vec_env.py", line 125, in _subproc_worker
        pipe.send(_write_obs(env.reset()))
      File "/home/luuu/baselines/baselines/common/vec_env/shmem_vec_env.py", line 125, in _subproc_worker
        pipe.send(_write_obs(env.reset()))
      File "/home/luuu/liuqi/RPG/Agar/agar/Env.py", line 233, in reset
        observations = [self.parse_obs(self.agents[i], i) for i in range(self.num_agents)]
      File "/home/luuu/liuqi/RPG/Agar/agar/Env.py", line 233, in reset
        observations = [self.parse_obs(self.agents[i], i) for i in range(self.num_agents)]
      File "/home/luuu/liuqi/RPG/Agar/agar/Env.py", line 233, in reset
        observations = [self.parse_obs(self.agents[i], i) for i in range(self.num_agents)]
      File "/home/luuu/liuqi/RPG/Agar/agar/Env.py", line 233, in <listcomp>
        observations = [self.parse_obs(self.agents[i], i) for i in range(self.num_agents)]
      File "/home/luuu/liuqi/RPG/Agar/agar/Env.py", line 233, in <listcomp>
        observations = [self.parse_obs(self.agents[i], i) for i in range(self.num_agents)]
      File "/home/luuu/liuqi/RPG/Agar/agar/Env.py", line 300, in parse_obs
        obs_f[-22] = (self.killed[1 - id] != 0)
      File "/home/luuu/liuqi/RPG/Agar/agar/Env.py", line 300, in parse_obs
        obs_f[-22] = (self.killed[1 - id] != 0)
    IndexError: index 1 is out of bounds for axis 0 with size 1
    

    Can you give me some suggestions? Thanks a lot!

    opened by DashStone 2
Owner
null
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

null 16 Oct 8, 2022
source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

null 49 Dec 17, 2022
This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline For NLP with Bloom's Taxonomy Using Improved Question Classification and Question Generation using Deep Learning This repository contains all

Rohan Mathur 9 Jul 17, 2021
The source code of HeCo

HeCo This repo is for source code of KDD 2021 paper "Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning". Paper Link: htt

Nian Liu 106 Dec 27, 2022
Open source code for AlphaFold.

AlphaFold This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP

DeepMind 9.7k Jan 2, 2023
GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

Nathan Cooper 2.3k Jan 1, 2023
Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations Created by Jiahao Pang, Duanshun Li, and Dong Tian from InterDigital In

InterDigital 21 Dec 29, 2022
Source code for CsiNet and CRNet using Fully Connected Layer-Shared feedback architecture.

FCS-applications Source code for CsiNet and CRNet using the Fully Connected Layer-Shared feedback architecture. Introduction This repository contains

Boyuan Zhang 4 Oct 7, 2022