PAIRED in PyTorch 🔥

Related tags

Deep Learning paired
Overview

License

PAIRED

This codebase provides a PyTorch implementation of Protagonist Antagonist Induced Regret Environment Design (PAIRED), which was first introduced in "Emergent Complexity and Zero-Shot Transfer via Unsupervised Environment Design" (Dennis et al, 2020). This implementation comes integrated with custom adversarial maze environments based on MiniGrid environment (Chevalier-Boisvert et al, 2018), as used in Dennis et al, 2020.

Unsupervised environment design (UED) methods propose a curriculum of tasks or environment instances (levels) that aims to foster more sample efficient learning and robust policies. PAIRED performs unsupervised environment design (UED) using a three-player game among two student agents—the protagonist and antagonist—and an adversary. The antagonist is allied with the adversary, which proposes new environment instances (or levels) aiming to maximize the regret of the protagonist, estimated as the difference in returns achieved by the student agents across a batch of rollouts on proposed levels.

PAIRED has a strong guarantee of robustness in that at Nash equilibrium, it provably induces a minimax regret policy for the protagonist, which means that the protagonist optimally trades off regret across all possible levels that can be proposed by the adversary.

UED algorithms included

  • PAIRED (Protagonist Antagonist Induced Regret Environment Design)
  • Minimax
  • Domain randomization

Set up

To install the necessary dependencies, run the following commands:

conda create --name paired python=3.8
conda activate paired
pip install -r requirements.txt

git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
cd ..

Configuration

Detailed descriptions of the various command-line arguments for the main training script, train.py can be found in arguments.py.

Experiments

MiniGrid benchmark results

For convenience, configuration json files are provided to generate the commands to run the specific experimental settings featured in Dennis et al, 2020. To generate the command to launch 1 run of the experiment codified by the configuration file config.json in the local folder train_scripts/configs, simply run the following, and copy and paste the output into your command line.

python train_scripts/make_cmd.py --json config --num_trials 1

Alternatively, you can run the following to copy the command directly to your clipboard:

python train_scripts/make_cmd.py --json config --num_trials 1 | pbcopy

By default, each experiment run will generate a folder in ~/logs/paired named after the --xpid argument passed into the the train command. This folder will contain log outputs in logs.csv and periodic screenshots of generated levels in the directory screenshots. Each screenshot uses the naming convention update_<number of PPO updates>.png. The latest model checkpoint will be output to model.tar, and archived model checkpoints are also saved according to the naming convention model_<number of PPO updates>.tar.

The json files for reproducing various MiniGrid experiments from Dennis et al, 2020 are listed below:

Method json config
PAIRED minigrid/paired.json
Minimax minigrid/minimax.json
DR minigrid/dr.json

Evaluation

You can use the following command to batch evaluate all trained models whose output directory shares the same <xpid_prefix> before the indexing _[0-9]+ suffix:

python -m eval \
--base_path "~/logs/paired" \
--prefix '<xpid prefix>' \
--num_processes 2 \
--env_names \
'MultiGrid-SixteenRooms-v0,MultiGrid-Labyrinth-v0,MultiGrid-Maze-v0'
--num_episodes 100 \
--model_tar model
Comments
  • Failed to run MiniHack Experiments (Environments not found)

    Failed to run MiniHack Experiments (Environments not found)

    I am trying to run the MiniHack experiments with the following command:

    python -m train \
    --xpid=ued-MiniHack-GoalLastAdv-WallsLavaMonsterDoor-15x15-v0-paired-lstm256a-lr0.0001-epoch5-mb1-v0.5-henv0.0-ha0.0-tl_0 \
    --env_name=MiniHack-GoalLastAdv-WallsLavaMonsterDoor-15x15-v0 \
    --use_gae=True \
    --gamma=0.995 \
    --gae_lambda=0.95 \
    --seed=88 \
    --recurrent_arch=lstm \
    --recurrent_agent=True \
    --recurrent_adversary_env=False \
    --recurrent_hidden_size=256 \
    --lr=0.0001 \
    --num_steps=256 \
    --num_processes=4 \
    --num_env_steps=100000 \
    --ppo_epoch=5 \
    --num_mini_batch=1 \
    --entropy_coef=0.0 \
    --value_loss_coef=0.5 \
    --clip_param=0.2 \
    --clip_value_loss=True \
    --adv_entropy_coef=0.0 \
    --algo=ppo \
    --ued_algo=paired \
    --log_interval=10 \
    --screenshot_interval=1000 \
    --log_grad_norm=True \
    --handle_timelimits=True \
    --test_env_names=MiniHack-Room-15x15-v0,MiniHack-Room-Monster-15x15-v0,MiniHack-MazeWalk-9x9-v0,MiniHack-MazeWalk-15x15-v0,MiniHack-Labyrinth-Small-v0,MiniHack-LockedMultiRoom-N2-S4-v0,MiniHack-LavaMultiRoom-N2-S4-v0,MiniHack-MonsterMultiRoom-N2-S4-v0,MiniHack-ExtremeMultiRoom-N2-S4-v0,MiniHack-ExtremeMultiRoom-N4-S5-v0 \
    --log_dir=logs/paired/minihack \
    --log_action_complexity=True \
    --checkpoint=True
    

    I am receiving errors of the following kind:

    gym.error.UnregisteredEnv: No registered env with id: MiniHack-LockedMultiRoom-N2-S4-v0 
    

    I searched the codebase but did not find any matches with the string "MiniHack-LockedMultiRoom-N2-S4-v0". Any idea how to solve this?

    The error is originating from Line 133 envs/register.py, because no MiniHack environments was registered via EnvRegistry.register()

    opened by azadsalam 3
  • Is the implementation of final rewards correct?

    Is the implementation of final rewards correct?

    As per the original implementation, the final rewards are supposed to replace the reward at the end of each episode in the replay buffer.

    https://github.com/google-research/google-research/blob/901524f4d4ab15ef9d2f5165148347d0f26b32c2/social_rl/adversarial_env/agent_train_package.py#L260-L264

    Whereas in the case of this PyTorch implementation the final reward is replaced only for the final return. https://github.com/ucl-dark/paired/blob/c836e868c6cb805012f93590e0ece1bc8461dbcf/algos/storage.py#L201-L202

    Did I misunderstand anything in the code?

    opened by nikhilrayaprolu 0
  • Pytorch Code takes longer time than Tensorflow

    Pytorch Code takes longer time than Tensorflow

    On the same hardware and with the same parameters, the Pytorch code is almost 4 times slower than the original TensorFlow implementation. What could be the cause of this issue?

    opened by nikhilrayaprolu 1
  • minihack env

    minihack env

    git clone https://github.com/ucl-dark/blob/main/minihack is no longer available? Is it safe to assume using https://github.com/facebookresearch/minihack will work as well?

    opened by raymond2338 0
  • max_step in rollout and order of training

    max_step in rollout and order of training

    Hi there, thank you a lot for contributing this pytorch version of paired. I have two questions and I hope you could clarify for me. Really appreciate it.

    1. the num_steps for rollouting Protagonist's and Antagonist's policy in the grid env is set as 256 by default (https://github.com/ucl-dark/paired/blob/fd49543811dca1177eb34cb846035470c141aac1/envs/runners/adversarial_runner.py#L373). I am not sure will the env be terminated when the max_steps=256 is reached. If yes, then the two agents are only rollout on the env for one episode, which is not enough to produce max/mean return for Antagonist/Protagonist. If no, then the two agents will be rollout for several episodes, depending on how many steps they will perform in the env for one episode. But, if this is the case, then the Antagonist and Protagonist are not evaluated for the same number of episodes. So, I am confused about this.

    2. As stated in the first paragraph in Part 4 in the paper, the authors will first generate the env by env_adversary given the Protagonist with fixed policy, and then the Antagonist will be trained on this env to optimality. After training Antagonist, we compute the Regret based on the trained Antagonist's policy and pre-trained Protagonist's policy. However, in your implementation, in the run() function (https://github.com/ucl-dark/paired/blob/fd49543811dca1177eb34cb846035470c141aac1/envs/runners/adversarial_runner.py#L356), I found that you run env_adversary, Protagonist and Antagonist in order. Could you also clarify this?

    Again, thank you so much for your effort.

    opened by wenjunli-0 0
Owner
UCL DARK Lab
UCL Deciding, Acting, and Reasoning with Knowledge (DARK) Lab
UCL DARK Lab
Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors.

PairRE Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors. This implementation of PairRE for Open Graph Benchmak datasets (

Alipay 65 Dec 19, 2022
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision Project | Arxiv | Abstract It is very challenging for various visual tasks such as image

CVSM Group -  email: czhu@bupt.edu.cn 377 Jan 7, 2023
U-2-Net: U Square Net - Modified for paired image training of style transfer

U2-Net: U Square Net Modified for paired image training of style transfer This is an unofficial repo making use of the code which was made available b

Doron Adler 43 Oct 3, 2022
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 8, 2022
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

NVIDIA Corporation 6.9k Jan 3, 2023
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

Mayur 119 Nov 24, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 1, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 360 Dec 10, 2022
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 9.2k Jan 2, 2023
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 8, 2022
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 359 Jan 5, 2023
A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

Vince 0 Jul 13, 2021
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022