Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Nicklas Hansen

Last update: Nov 1, 2022

Related tags

Deep Learning reinforcement-learning deep-learning robotics deep-reinforcement-learning pytorch gym mujoco self-supervised-learning dm-control

Overview

Self-Supervised Policy Adaptation during Deployment

PyTorch implementation of PAD and evaluation benchmarks from

Self-Supervised Policy Adaptation during Deployment

Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

[Paper] [Website]

Citation

If you find our work useful in your research, please consider citing the paper as follows:

@article{hansen2020deployment,
  title={Self-Supervised Policy Adaptation during Deployment},
  author={Nicklas Hansen and Rishabh Jangir and Yu Sun and Guillem Alenyà and Pieter Abbeel and Alexei A. Efros and Lerrel Pinto and Xiaolong Wang},
  year={2020},
  eprint={2007.04309},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Setup

We assume that you have access to a GPU with CUDA >=9.2 support. All dependencies can then be installed with the following commands:

conda env create -f setup/conda.yml
conda activate pad
sh setup/install_envs.sh

Training & Evaluation

We have prepared training and evaluation scripts that can be run by sh scripts/train.sh and sh scripts/eval.sh. Alternatively, you can call the python scripts directly, e.g. for training call

CUDA_VISIBLE_DEVICES=0 python3 src/train.py \
    --domain_name cartpole \
    --task_name swingup \
    --action_repeat 8 \
    --mode train \
    --use_inv \
    --num_shared_layers 8 \
    --seed 0 \
    --work_dir logs/cartpole_swingup/inv/0 \
    --save_model

which should give you an output of the form

| train | E: 1 | S: 1000 | D: 0.8 s | R: 0.0000 | BR: 0.0000 | 
  ALOSS: 0.0000 | CLOSS: 0.0000 | RLOSS: 0.0000

We provide a pre-trained model that can be used for evaluation. To run Policy Adaptation during Deployment, call

CUDA_VISIBLE_DEVICES=0 python3 src/eval.py \
    --domain_name cartpole \
    --task_name swingup \
    --action_repeat 8 \
    --mode color_hard \
    --use_inv \
    --num_shared_layers 8 \
    --seed 0 \
    --work_dir logs/cartpole_swingup/inv/0 \
    --pad_checkpoint 500k

which should give you an output of the form

Evaluating logs/cartpole_swingup/inv/0 for 100 episodes (mode: color_hard)
eval reward: 666

Policy Adaptation during Deployment of logs/cartpole_swingup/inv/0 for 100 episodes (mode: color_hard)
pad reward: 722

Here's a few samples from the training and test environments of our benchmark:

Please refer to the project page and paper for results and experimental details.

Acknowledgements

We want to thank the numerous researchers and engineers involved in work of which this implementation is based on. Our SAC implementation is based on this repository, the original DeepMind Control suite is available here and the gym wrapper for it is available here. Go check them out!

Comments

Robot Experiments

Hello, there. This work is simple but impressive! I saw a robot experiment in the paper and want to try it. So does the mujoco environment for the robot experiment contains in this repo, or it is possible to run this algorithm with some popular robot benchmark?

By the way, I have another question about the difference between the simulated and real robot. Does simulation match the default transfer setting in figure 6 of the paper exactly, i.e. box size, friction, and mass, without ablating these variables?

opened by YeeCY 5
How to derive performance from various test domains

Hello, this work is amazing and code is well-written. I'm curious about the way of deriving performance in the paper. There are 10 test videos for video background, and there are 2 color domain (color_easy, color_hard) for randomized colors. Did you just simply get the mean value of the performances for these various test domains? Thank you very much.

opened by dandelionhsc 2
Questions about run eval.sh

When I run: bash script/eval.sh I met the below problem: Traceback (most recent call last): File "/home/yzc/anaconda3/envs/py37/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-2-b973f175d723>", line 1, in <module> state = env.reset() File "/home/yzc/policy-adaptation-during-deployment/src/env/wrappers.py", line 60, in reset self.randomize() File "/home/yzc/policy-adaptation-during-deployment/src/env/wrappers.py", line 76, in randomize self.reload_physics(self.get_random_color()) File "/home/yzc/policy-adaptation-during-deployment/src/env/wrappers.py", line 87, in reload_physics domain_name = self._get_dmc_wrapper()._domain_name File "/home/yzc/anaconda3/envs/py37/lib/python3.7/site-packages/dmc2gym/wrappers.py", line 99, in __getattr__ return getattr(self._env, name) AttributeError: 'Environment' object has no attribute '_domain_name'

opened by gemcollector 2
CRLMaze experimental set-up

Hello, Thanks a lot for your very inspiring and clearly-presented work ! Following your paper, I've seen you've made some experiments with CRLMaze environment. I'd like to know if you were to release some of your code examples regarding those experiments ? I'd be very interested !

Many thanks !

opened by PhiCtl 1
Questions on implementation details
Hi, thanks for your fantastic work and for making your code open. I face a few technical questions when reproducing your results and would appreciate any advice from you.

RAM usage is incredibly high. We see this simply by running the off-the-shelf script scripts/train.sh. The cartpole, swingup experiment consumes up to 80GB of RAM at the late training stage (close to 0.5M iterations). I wonder if this is expected or there is anything wrong with our computational environment.

We only managed to finish the training with few seeds, and indeed we find a large variance and significant deviation from the reported value in the paper. Could you please specify which exact random seeds were used to obtain the claimed results?

Many thanks, Qi
opened by qiyan98 1

Owner

Nicklas Hansen

PhD student @ UC San Diego. Previously: UC Berkeley, DTU, NTUsg. Working on machine learning, robotics, and computer vision.

GitHub

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

88 Nov 22, 2022

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

408 Jan 1, 2023

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

NeurIPS 2020 SEVIR Code for paper: SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology Requirement

USAF - MIT Artificial Intelligence Accelerator

46 Dec 15, 2022

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:

41 Dec 12, 2022

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

106 Jan 3, 2023

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

36 Oct 30, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

25 Jun 20, 2021

YoloV5 implemented by TensorFlow2 , with support for training, evaluation and inference.

Efficient implementation of YOLOV5 in TensorFlow2

202 Jan 6, 2023

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

983 Dec 23, 2022

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

NAS-Bench-301 This repository containts code for the paper: "NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search". The

57 Nov 30, 2022

Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

3.5k Jan 1, 2023

NeurIPS 2021 Datasets and Benchmarks Track

82 Dec 11, 2022

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

27 Dec 20, 2022

Benchmarks for semi-supervised domain generalization.

Semi-Supervised Domain Generalization This code is the official implementation of the following paper: Semi-Supervised Domain Generalization with Stoc

49 Dec 10, 2022

Benchmarks for the Optimal Power Flow Problem

Power Grid Lib - Optimal Power Flow This benchmark library is curated and maintained by the IEEE PES Task Force on Benchmarks for Validation of Emergi

A Library of IEEE PES Power Grid Benchmarks

207 Dec 8, 2022

Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

benchmark_spaces Benchmarks of how well different two dimensional spaces work fo

6 May 7, 2022

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

72 Jan 6, 2023

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

Our CIKM21 Paper "Incorporating Query Reformulating Behavior into Web Search Evaluation"

Reformulation-Aware-Metrics Introduction This codebase contains source-code of the Python-based implementation of our CIKM 2021 paper. Chen, Jia, et a

5 Mar 5, 2022

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Related tags

Overview

Self-Supervised Policy Adaptation during Deployment

Citation

Setup

Training & Evaluation

Acknowledgements

Comments

Robot Experiments

How to derive performance from various test domains

Questions about run eval.sh

CRLMaze experimental set-up

Questions on implementation details

Owner

Nicklas Hansen

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

YoloV5 implemented by TensorFlow2 , with support for training, evaluation and inference.

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

Sequence modeling benchmarks and temporal convolutional networks

NeurIPS 2021 Datasets and Benchmarks Track

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Benchmarks for semi-supervised domain generalization.

Benchmarks for the Optimal Power Flow Problem

Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Our CIKM21 Paper "Incorporating Query Reformulating Behavior into Web Search Evaluation"

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.