PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

Denis Yarats

Last update: Jan 1, 2023

Related tags

Deep Learning python control reinforcement-learning deep-learning pytorch datasets mujoco model-free off-policy offline-rl unsupevised exporation

Overview

ExORL: Exploratory Data for Offline Reinforcement Learning

This is an original PyTorch implementation of the ExORL framework from

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning by

Denis Yarats*, David Brandfonbrener*, Hao Liu, Misha Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto.

*Equal contribution.

Prerequisites

Install MuJoCo if it is not already the case:

Download MuJoCo binaries here.
Unzip the downloaded archive into ~/.mujoco/.
Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip

Install dependencies:

conda env create -f conda_env.yml
conda activate exorl

Datasets

We provide exploratory datasets for 6 DeepMind Control Stuite domains

Domain	Dataset name	Available task names
Cartpole	`cartpole`	`cartpole_balance`, `cartpole_balance_sparse`, `cartpole_swingup`, `cartpole_swingup_sparse`
Cheetah	`cheetah`	`cheetah_run`, `cheetah_run_backward`
Jaco Arm	`jaco`	`jaco_reach_top_left`, `jaco_reach_top_right`, `jaco_reach_bottom_left`, `jaco_reach_bottom_right`
Point Mass Maze	`point_mass_maze`	`point_mass_maze_reach_top_left`, `point_mass_maze_reach_top_right`, `point_mass_maze_reach_bottom_left`, `point_mass_maze_reach_bottom_right`
Quadruped	`quadruped`	`quadruped_walk`, `quadruped_run`
Walker	`walker`	`walker_stand`, `walker_walk`, `walker_run`

For each domain we collected datasets by running 9 unsupervised RL algorithms from URLB for total of 10M steps. Here is the list of algorithms

Unsupervised RL method	Name	Paper
APS	`aps`	paper
APT(ICM)	`icm_apt`	paper
DIAYN	`diayn`	paper
Disagreement	`disagreement`	paper
ICM	`icm`	paper
ProtoRL	`proto`	paper
Random	`random`	N/A
RND	`rnd`	paper
SMM	`smm`	paper

You can download a dataset by running ./download.sh, for example to download ProtoRL dataset for Walker, run

./download.sh walker proto

The script will download the dataset from S3 and store it under datasets/walker/proto/, where you can find episodes (under buffer) and episode videos (under video).

Offline RL training

We also provide implementation of 5 offline RL algorithms for evaluating the datasets

Offline RL method	Name	Paper
Behavior Cloning	`bc`	paper
CQL	`cql`	paper
CRR	`crr`	paper
TD3+BC	`td3_bc`	paper
TD3	`td3`	paper

After downloading required datasets, you can evaluate it using offline RL methon for a specific task. For example, to evaluate a dataset collected by ProtoRL on Walker for the waling task using TD3+BC you can run

python train_offline.py agent=td3_bc expl_agent=proto task=walker_walk

Logs are stored in the output folder. To launch tensorboard run:

tensorboard --logdir output

Citation

If you use this repo in your research, please consider citing the paper as follows:

@article{yarats2022exorl,
  title={Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning},
  author={Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto},
  journal={arXiv preprint arXiv:2201.13425},
  year={2022}
}

License

The majority of ExORL is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 7, 2022

PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning.

Comments

Please add data collection code to repo

Hi, I tried to replicate the results from the paper by running from scratch using URL Benchmark to collect data and use your repo to relabel and train offline but could not replicate the results from the paper. Please release the data collection step to this repo for full replication of the papers results. Thanks

opened by Andrewzh112 0
Why the implementation of point mass is different from original dmc one?

In https://github.com/denisyarats/exorl/blob/main/custom_dmc_tasks/point_mass_maze.py#L155, why there are the following modifications over the original dmc point_mass environment?

physics.data.qpos[0] = np.random.uniform(-0.29, -0.15) physics.data.qpos[1] = np.random.uniform(0.15, 0.29) physics.named.data.geom_xpos['target'][:] = self._target

opened by zdhNarsil 0

PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

Related tags

Overview

ExORL: Exploratory Data for Offline Reinforcement Learning

Prerequisites

Datasets

Offline RL training

Citation

License

You might also like...

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning.

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

RoMA: Robust Model Adaptation for Offline Model-based Optimization

Generalized Decision Transformer for Offline Hindsight Information Matching

Active Offline Policy Selection With Python

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Ppq - A powerful offline neural network quantization tool with custimized IR

Comments

Please add data collection code to repo

Why the implementation of point mass is different from original dmc one?

Owner

Denis Yarats

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Offline Reinforcement Learning with Implicit Q-Learning

MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).