[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

Daochen Zha

Last update: Nov 21, 2022

Related tags

Deep Learning rapid

Overview

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning

This is the Tensorflow implementation of ICLR 2021 paper Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments. We propose a simple method RAPID for exploration through scroring the previous episodes and reproducing the good exploration behaviors with imitation learning.

The implementation is based on OpenAI baselines. For all the experiments, add the option --disable_rapid to see the baseline result. RAPID can achieve better performance and sample efficiency than state-of-the-art exploration methods on MiniGrid environments.

Cite This Work

@inproceedings{
zha2021rank,
title={Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments},
author={Daochen Zha and Wenye Ma and Lei Yuan and Xia Hu and Ji Liu},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=MtEE0CktZht}
}

Installation

Please make sure that you have Python 3.5+ installed. First, clone the repo with

git clone https://github.com/daochenzha/rapid.git
cd rapid

Then install the dependencies with pip:

pip install -r requirements.txt
pip install -e .

To run MuJoCo experiments, you need to have the MuJoCo license. Install mujoco-py with

pip install mujoco-py==1.50.1.68

How to run the code

The entry is main.py. Some important hyperparameters are as follows.

--env: what environment to be used
--num_timesteps: the number of timesteps to be run
--w0: the weight of extrinsic reward score
--w1: the weight of local score
--w2: the weight of global score
--sl_until: do the RAPID update until which timestep
--disable_rapid: use it to compare with PPO baseline
--log_dir: the directory to save logs

Reproducing the result of MiniGrid environments

For MiniGrid-KeyCorridorS3R2, run

python main.py --env MiniGrid-KeyCorridorS3R2-v0 --sl_until 1200000

For MiniGrid-KeyCorridorS3R3, run

python main.py --env MiniGrid-KeyCorridorS3R3-v0 --sl_until 3000000

For other environments, run

python main.py --env $ENV

where $ENV is the environment name.

Run MiniWorld Maze environment

Clone the latest master branch of MiniWorld and install it

git clone -b master --single-branch --depth=1 https://github.com/maximecb/gym-miniworld.git
cd gym-miniwolrd
pip install -e .
cd ..

Start training with

python main.py --env MiniWorld-MazeS5-v0 --num_timesteps 5000000 --nsteps 512 --w1 0.00001 --w2 0.0 --log_dir results/MiniWorld-MazeS5-v0

For server without screens, you may install xvfb with

apt-get install xvfb

Then start training with

xvfb-run -a -s "-screen 0 1024x768x24 -ac +extension GLX +render -noreset" python main.py --env MiniWorld-MazeS5-v0 --num_timesteps 5000000 --nsteps 512 --w1 0.00001 --w2 0.0 --log_dir results/MiniWorld-MazeS5-v0

Run MuJoCo experiments

Run

python main.py --seed 0 --env $env --num_timesteps 5000000 --lr 5e-4 --w1 0.001 --w2 0.0 --log_dir logs/$ENV/rapid

where $ENV can be EpisodeSwimmer-v2, EpisodeHopper-v2, EpisodeWalker2d-v2, EpisodeInvertedPendulum-v2, DensityEpisodeSwimmer-v2, or ViscosityEpisodeSwimmer-v2.

You might also like...

Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

Rank & Sort Loss for Object Detection and Instance Segmentation The official implementation of Rank & Sort Loss. Our implementation is based on mmdete

229 Dec 20, 2022

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

2.6k Jan 4, 2023

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

18 Sep 2, 2022

Comments

the learning of rapid is inconsist when seeds in rapid is set to the same seeds

hi daochen, When the seed in Rapid is set to 123, we run 5 experiments on Minigrid-MultiRoom-N1-S10 and find the training process and results are different for each group. Under the same parameters, we only disable the rapid and run 5 experiments on Minigrid-MultiRoom-N1-S10. We find the training process and results are same for each group. Do we need to set other parameters during training rapid, in order to keep the same results when the seed in Rapid is set to the same seed ? Our environment is ubuntu18.04.5 lts，python 3.7.11， anaconda 4.10.1

opened by mao-xu 6
Questions about how the reward is used and how imitation learning is used in your paper

Dear Authors,

After reading your paper, I have a question about how imitation learning is used in your paper. In your code (https://github.com/daochenzha/rapid/blob/HEAD/rapid/agent.py#L153), I found at the end of each episode, the agent's policy will be updated with imitation learning by using the good episodes. In https://github.com/daochenzha/rapid/blob/HEAD/rapid/agent.py#L272, the agent's model will be updated with RL. Is my understanding right? By the way, is imitation learning necessary and important for the improvement of the rewards?

In Line 8, Algorithm 1, Give score S_{\tau} to all the state-action pairs in \tau and store them into the buffer. Does it mean in this sparse reward setting, each timestep of this episode will be reassigned an identical reward?

opened by GoingMyWay 2
Question about EpisodeSwimmerEnv

Hi, thanks for your great work!

I have a question about EpisodeSwimmerEnv. According to your code, EpisodeSwimmerEnv provides dense rewards instead of sparse rewards mentioned in the paper.

Did I misunderstand something?

opened by CrazySssst 1

[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

Related tags

Overview

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning

Cite This Work

Installation

How to run the code

Reproducing the result of MiniGrid environments

Run MiniWorld Maze environment

Run MuJoCo experiments

You might also like...

Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

COD-Rank-Localize-and-Segment (CVPR2021)

Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Comments

the learning of rapid is inconsist when seeds in rapid is set to the same seeds

Questions about how the reward is used and how imitation learning is used in your paper

Question about EpisodeSwimmerEnv

Owner

Daochen Zha

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

BigbrotherBENL - Face recognition on the Big Brother episodes in Belgium and the Netherlands.

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Rank 1st in the public leaderboard of ScanRefer (2021-03-18)

Tracing Versus Freehand for Evaluating Computer-Generated Drawings (SIGGRAPH 2021)

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Implementation for Simple Spectral Graph Convolution in ICLR 2021

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

Code for "LoRA: Low-Rank Adaptation of Large Language Models"