Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Related tags

Deep Learning BPref
Overview

B-Pref

Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments.

Install

conda env create -f conda_env.yml
pip install -e .[docs,tests,extra]
cd custom_dmcontrol
pip install -e .
cd custom_dmc2gym
pip install -e .
pip install git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld
pip install pybullet

Run experiments using GT rewards

SAC & SAC + unsupervised pre-training

Experiments can be reproduced with the following:

./scripts/[env_name]/run_sac.sh 
./scripts/[env_name]/run_sac_unsuper.sh

PPO & PPO + unsupervised pre-training

Experiments can be reproduced with the following:

./scripts/[env_name]/run_ppo.sh 
./scripts/[env_name]/run_ppo_unsuper.sh

Run experiments on irrational teacher

To design more realistic models of human teachers, we consider a common stochastic model and systematically manipulate its terms and operators:

teacher_beta: rationality constant of stochastic preference model (default: -1 for perfectly rational model)
teacher_gamma: discount factor to model myopic behavior (default: 1)
teacher_eps_mistake: probability of making a mistake (default: 0)
teacher_eps_skip: hyperparameters to control skip threshold (\in [0,1])
teacher_eps_equal: hyperparameters to control equal threshold (\in [0,1])

In B-Pref, we tried the following teachers:

Oracle teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Mistake teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0.1, teacher_eps_skip=0, teacher_eps_equal=0)

Noisy teacher: (teacher_beta=1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Skip teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0.1, teacher_eps_equal=0)

Myopic teacher: (teacher_beta=-1, teacher_gamma=0.9, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Equal teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0.1)

PEBBLE

Experiments can be reproduced with the following:

./scripts/[env_name]/[teacher_type]/[max_budget]/run_PEBBLE.sh [sampling_scheme: 0=uniform, 1=disagreement, 2=entropy]

PrefPPO

Experiments can be reproduced with the following:

./scripts/[env_name]/[teacher_type]/[max_budget]/run_PrefPPO.sh [sampling_scheme: 0=uniform, 1=disagreement, 2=entropy]

note: full hyper-paramters for meta-world will be updated soon!

Comments
  • GUI for preference-based RL

    GUI for preference-based RL

    Hello, I am currently interested in applying preference-based RL in various robotics tasks in the real world.

    I read the paper "BPref" and noticed that the preference is given to the agent by various simulated teachers, not real humans, to make the training and evaluation process fast. But in your previous paper "PEBBLE", it seems that real human preferences were used.

    I am curious what kind of GUI you used for the humans to interactively give preference feedback. Could you give me some tips?

    cf) For the simulation environment, I am using RaiSim and Isaac gym.

    opened by awesomericky 2
  • Some problem of multi CPUs

    Some problem of multi CPUs

    I test the experiments on Intel Xeon Gold 5118 and GTX2080 Ti. But I found that the utilization rate of GPU is very low. It takes 10 hours to train the Walker_walk/train_ppo.sh for 3 seeds. Is it my experimental parameters or is it normal?

    opened by Huwenbo-git 2
  • Outdated dependencies for hydra and dm_control

    Outdated dependencies for hydra and dm_control

    During the setup instructions, when creating the conda environment, the two dependencies for dm_control and hydra fail.

    You can just manually install dm_control and that seems to work, but installing hydra manually seems to cause some errors downstream.

    Error message during the setup:

    (bpref) kuwajerw@pop-os [12:41:39PM 10/11/2022]:
    (main) ~/repos/BPref/
    $ pip install git+git://github.com/facebookresearch/[email protected]_branch
    Collecting git+git://github.com/facebookresearch/[email protected]_branch
      Cloning git://github.com/facebookresearch/hydra (to revision 0.11_branch) to /tmp/pip-req-build-g15zht1f
      Running command git clone -q git://github.com/facebookresearch/hydra /tmp/pip-req-build-g15zht1f
      fatal: unable to connect to github.com:
      github.com[0: 140.82.113.4]: errno=Connection timed out
    
    WARNING: Discarding git+git://github.com/facebookresearch/[email protected]_branch. Command errored out with exit status 128: git clone -q git://github.com/facebookresearch/hydra /tmp/pip-req-build-g15zht1f Check the logs for full command output.
    ERROR: Command errored out with exit status 128: git clone -q git://github.com/facebookresearch/hydra /tmp/pip-req-build-g15zht1f Check the logs for full command output.
    (bpref) kuwajerw@pop-os [12:43:55PM 10/11/2022]:
    (main) ~/repos/BPref/
    $ pip install git+git://github.com/deepmind/dm_control.git@d1fe1d14fb229d3ad784e40d8c54491d30e1f08a
    Collecting git+git://github.com/deepmind/dm_control.git@d1fe1d14fb229d3ad784e40d8c54491d30e1f08a
      Cloning git://github.com/deepmind/dm_control.git (to revision d1fe1d14fb229d3ad784e40d8c54491d30e1f08a) to /tmp/pip-req-build-fe8j117j
      Running command git clone -q git://github.com/deepmind/dm_control.git /tmp/pip-req-build-fe8j117j
      fatal: unable to connect to github.com:
      github.com[0: 140.82.112.4]: errno=Connection timed out
    
    WARNING: Discarding git+git://github.com/deepmind/dm_control.git@d1fe1d14fb229d3ad784e40d8c54491d30e1f08a. Command errored out with exit status 128: git clone -q git://github.com/deepmind/dm_control.git /tmp/pip-req-build-fe8j117j Check the logs for full command output.
    ERROR: Command errored out with exit status 128: git clone -q git://github.com/deepmind/dm_control.git /tmp/pip-req-build-fe8j117j Check the logs for full command output.
    (bpref) kuwajerw@pop-os [12:52:36PM 10/11/2022]:
    (main) ~/repos/BPref/
    $ 
    

    Error message after manually installing hydra 0.11 (the version that this repo uses I believe)

    So first if I comment out the hydra line during the installation and run the code I obviously get a "hydra not found error"

    /home/kuwajerw/anaconda3/envs/bpref/lib/python3.6/site-packages/gym/logger.py:34: UserWarning: WARN: Box bound precision lowered by casting to float32
      warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow"))
    Traceback (most recent call last):
      File "train_SAC.py", line 13, in <module>
        import hydra
    ModuleNotFoundError: No module named 'hydra'
    

    Then I install hydra 0.11

    (bpref) kuwajerw@pop-os [01:02:30PM 10/11/2022]:
    (main) ~/repos/BPref/
    $ pip install hydra-core==0.11
    Collecting hydra-core==0.11
      Using cached hydra_core-0.11.0-py3-none-any.whl (71 kB)
    Collecting omegaconf>=1.4.0
      Using cached omegaconf-2.2.3-py3-none-any.whl (79 kB)
    Requirement already satisfied: PyYAML>=5.1.0 in /home/kuwajerw/anaconda3/envs/bpref/lib/python3.6/site-packages (from omegaconf>=1.4.0->hydra-core==0.11) (6.0)
    Requirement already satisfied: dataclasses in /home/kuwajerw/anaconda3/envs/bpref/lib/python3.6/site-packages (from omegaconf>=1.4.0->hydra-core==0.11) (0.8)
    Collecting antlr4-python3-runtime==4.9.*
      Using cached antlr4_python3_runtime-4.9.3-py3-none-any.whl
    Installing collected packages: antlr4-python3-runtime, omegaconf, hydra-core
    Successfully installed antlr4-python3-runtime-4.9.3 hydra-core-0.11.0 omegaconf-2.2.3
    

    But then when I try to run one of the demo scripts, I get an error in hydra about not being able to import "Config"

    (bpref) kuwajerw@pop-os [01:02:35PM 10/11/2022]:
    (main) ~/repos/BPref/
    $ ./scripts/button_press/run_sac.sh
    /home/kuwajerw/anaconda3/envs/bpref/lib/python3.6/site-packages/gym/logger.py:34: UserWarning: WARN: Box bound precision lowered by casting to float32
      warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow"))
    Traceback (most recent call last):
      File "train_SAC.py", line 13, in <module>
        import hydra
      File "/home/kuwajerw/anaconda3/envs/bpref/lib/python3.6/site-packages/hydra/__init__.py", line 2, in <module>
        from . import utils
      File "/home/kuwajerw/anaconda3/envs/bpref/lib/python3.6/site-packages/hydra/utils.py", line 7, in <module>
        from hydra.plugins.common.utils import HydraConfig
      File "/home/kuwajerw/anaconda3/envs/bpref/lib/python3.6/site-packages/hydra/plugins/__init__.py", line 11, in <module>
        from .completion_plugin import CompletionPlugin
      File "/home/kuwajerw/anaconda3/envs/bpref/lib/python3.6/site-packages/hydra/plugins/completion_plugin.py", line 7, in <module>
        from omegaconf import DictConfig, ListConfig, Config, MissingMandatoryValue
    ImportError: cannot import name 'Config'
    

    Maybe we can use a different hydra version? That also seems to get other errors, I will update this issue.

    Edit: So if I manually use hydra version 1.0, I get this error.

    (bpref) kuwajerw@pop-os [01:20:11PM 10/11/2022]:
    (main) ~/repos/BPref/
    $ ./scripts/button_press/run_sac.sh
    /home/kuwajerw/anaconda3/envs/bpref/lib/python3.6/site-packages/gym/logger.py:34: UserWarning: WARN: Box bound precision lowered by casting to float32
      warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow"))
    
    Key 'name' not in 'HydraConf'
        full_key: hydra.name
        object_type=HydraConf
    
    Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
    
    opened by alik-git 0
  • Request: Code for generating rliable plots

    Request: Code for generating rliable plots

    Hi,

    First of all, I want to congratulate you on the excellent work.

    I was wondering if it would be possible to share also the code that was used to get the plots with rliable?

    opened by risufaj 0
  • version.txt is missing in /stable_baseline3

    version.txt is missing in /stable_baseline3

    Hi, it seems that the version.txt is missing in /stable_baseline3, which will cause a error when installing (running pip install -e .[docs,tests,extra])

    opened by xuewanqi 1
Owner
null
Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

Facebook Research 68 Dec 29, 2022
Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

CopeNLU 36 Dec 5, 2022
Code to reproduce experiments in the paper "Explainability Requires Interactivity".

Explainability Requires Interactivity This repository contains the code to train all custom models used in the paper Explainability Requires Interacti

Digital Health & Machine Learning 5 Apr 7, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save <SAVE_NAME> --data <PATH_TO_DATA_DIR> --dataset <DATASET> --model <model_name> [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:

Joseph P. Robinson 41 Dec 12, 2022
Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

yzf 1 Jun 12, 2022
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 210 Dec 28, 2022
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

Kevin Lu 1.4k Jan 7, 2023
Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

Laura Smith 70 Dec 7, 2022
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

null 47 Dec 28, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan                                             32 Dec 23, 2022
[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

RegionProxy Figure 2. Performance vs. GFLOPs on ADE20K val split. Semantic Segmentation by Early Region Proxy Yifan Zhang, Bo Pang, Cewu Lu CVPR 2022

Yifan 54 Nov 29, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 3, 2023
3ds-Ghidra-Scripts - Ghidra scripts to help with 3ds reverse engineering

3ds Ghidra Scripts These are ghidra scripts to help with 3ds reverse engineering

Zak 7 May 23, 2022
Omniverse sample scripts - A guide for developing with Python scripts on NVIDIA Ominverse

Omniverse sample scripts ここでは、NVIDIA Omniverse ( https://www.nvidia.com/ja-jp/om

ft-lab (Yutaka Yoshisaka) 37 Nov 17, 2022
This package contains deep learning models and related scripts for RoseTTAFold

RoseTTAFold This package contains deep learning models and related scripts to run RoseTTAFold This repository is the official implementation of RoseTT

null 1.6k Jan 3, 2023
Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning"

VANET Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning" Introduction This is the implementation of article VAN

EMDATA-AILAB 23 Dec 26, 2022