Reinforcement Learning via Supervised Learning

Scott Emmons

Last update: Nov 28, 2022

Related tags

Deep Learning rvs

Overview

Reinforcement Learning via Supervised Learning

Installation

Run

pip install -e .

in an environment with Python >= 3.7.0, <3.9.

The code depends on MuJoCo 2.1.0 (for mujoco-py) and MuJoCo 2.1.1 (for dm-control). Here are instructions for installing MuJoCo 2.1.0 and instructions for installing MuJoCo 2.1.1.

If you use the provided Dockerfile, it will automatically handle the MuJoCo dependencies for you. For example:

docker build -t rvs:latest .
docker run -it --rm -v $(pwd):/rvs rvs:latest bash
cd rvs
pip install -e .

Reproducing Experiments

The experiments directory contains a launch script for each environment suite. For example, to reproduce the RvS-R results in D4RL Gym locomotion, run

bash experiments/launch_gym_rvs_r.sh

Each launch script corresponds to a configuration file in experiments/config which serves as a reference for the hyperparameters associated with each experiment.

Adding New Environments

To run RvS on an environment of your own, you need to create a suitable dataset class. For example, in src/rvs/dataset.py, we have a dataset class for the GCSL environments, a dataset class for RvS-R in D4RL, and a dataset class for RvS-G in D4RL. In particular, the D4RLRvSGDataModule allows for conditioning on arbitrary dimensions of the goal state using the goal_columns attribute; for AntMaze, we set goal_columns to (0, 1) to condition only on the x and y coordinates of the goal state.

Baseline Numbers

We replicated CQL using this codebase, which was recommended to us by the CQL authors. All hyperparameters and logs from our replication runs can be viewed at our CQL-R Weights & Biases project.

We replicated Decision Transformer using our fork of the author's codebase, which we customized to add AntMaze. All hyperparameters and logs from our replication runs can be viewed at our DT Weights & Biases project.

Citing RvS

To cite RvS, you can use the following BibTeX entry:

@misc{emmons2021rvs,
      title={RvS: What is Essential for Offline RL via Supervised Learning?}, 
      author={Scott Emmons and Benjamin Eysenbach and Ilya Kostrikov and Sergey Levine},
      year={2021},
      eprint={2112.10751},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

TensorFlow implementation of Barlow Twins (Barlow Twins: Self-Supervised Learning via Redundancy Reduction)

Barlow-Twins-TF This repository implements Barlow Twins (Barlow Twins: Self-Supervised Learning via Redundancy Reduction) in TensorFlow and demonstrat

36 Sep 14, 2022

An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

2 Oct 31, 2022

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

4 Jul 12, 2021

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

96 Nov 1, 2022

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Reinforcement Learning via Supervised Learning

Related tags

Overview

Reinforcement Learning via Supervised Learning

Installation

Reproducing Experiments

Adding New Environments

Baseline Numbers

Citing RvS

You might also like...

TensorFlow implementation of Barlow Twins (Barlow Twins: Self-Supervised Learning via Redundancy Reduction)

An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive constraining

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

Owner

Scott Emmons

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Learning to Reach Goals via Iterated Supervised Learning

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Public implementation of "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression" from CoRL'21

Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021