Reinforcement Learning via Supervised Learning
Installation
Run
pip install -e .
in an environment with Python >= 3.7.0, <3.9.
The code depends on MuJoCo 2.1.0 (for mujoco-py) and MuJoCo 2.1.1 (for dm-control). Here are instructions for installing MuJoCo 2.1.0 and instructions for installing MuJoCo 2.1.1.
If you use the provided Dockerfile
, it will automatically handle the MuJoCo dependencies for you. For example:
docker build -t rvs:latest .
docker run -it --rm -v $(pwd):/rvs rvs:latest bash
cd rvs
pip install -e .
Reproducing Experiments
The experiments
directory contains a launch script for each environment suite. For example, to reproduce the RvS-R results in D4RL Gym locomotion, run
bash experiments/launch_gym_rvs_r.sh
Each launch script corresponds to a configuration file in experiments/config
which serves as a reference for the hyperparameters associated with each experiment.
Adding New Environments
To run RvS on an environment of your own, you need to create a suitable dataset class. For example, in src/rvs/dataset.py
, we have a dataset class for the GCSL environments, a dataset class for RvS-R in D4RL, and a dataset class for RvS-G in D4RL. In particular, the D4RLRvSGDataModule
allows for conditioning on arbitrary dimensions of the goal state using the goal_columns
attribute; for AntMaze, we set goal_columns
to (0, 1)
to condition only on the x and y coordinates of the goal state.
Baseline Numbers
We replicated CQL using this codebase, which was recommended to us by the CQL authors. All hyperparameters and logs from our replication runs can be viewed at our CQL-R Weights & Biases project.
We replicated Decision Transformer using our fork of the author's codebase, which we customized to add AntMaze. All hyperparameters and logs from our replication runs can be viewed at our DT Weights & Biases project.
Citing RvS
To cite RvS, you can use the following BibTeX entry:
@misc{emmons2021rvs,
title={RvS: What is Essential for Offline RL via Supervised Learning?},
author={Scott Emmons and Benjamin Eysenbach and Ilya Kostrikov and Sergey Levine},
year={2021},
eprint={2112.10751},
archivePrefix={arXiv},
primaryClass={cs.LG}
}