Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

Last update: Nov 22, 2022

Related tags

Deep Learning CAP

Overview

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

This is the official repository for Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning. We provide the commands to run the PETS and PlaNet experiments included in the paper. This repository is made minimal for ease of experimentation.

Installations

This repository requires Python (3.6), Pytorch (version 1.3 or above) run the following command to create a conda environment (tested using CUDA10.2):

conda env create -f environment.yml

Experiments

To run the PETS experiments on the HalfCheetah environment used in our ablation study, run:

cd cap-pets

CAP

python cap-pets/run_cap_pets.py --algo cem --env HalfCheetah-v3 --cost_lim 152 \
--cost_constrained --penalize_uncertainty --learn_kappa --seed 1

CAP with fixed kappa

python cap-pets/run_cap_pets.py --algo cem --env HalfCheetah-v3 --cost_lim 152 \
--cost_constrained --penalize_uncertainty --kappa 1.0 --seed 1

CCEM

python cap-pets/run_cap_pets.py --algo cem --env HalfCheetah-v3 --cost_lim 152 \
--cost_constrained --seed 1

CEM

python cap-pets/run_cap_pets.py --algo cem --env HalfCheetah-v3 --cost_lim 152 \
--seed 1

The commands for the PlaNet experiment on the CarRacing environment are:

CAP

python cap-planet/run_cap_planet.py --env CarRacingSkiddingConstrained-v0 \
--cost-limit 0 --binary-cost \
--cost-constrained --penalize-uncertainty \
--learn-kappa --penalty-kappa 0.1 \
--id CarRacing-cap --seed 1

CAP with fixed kappa

python cap-planet/run_cap_planet.py --env CarRacingSkiddingConstrained-v0 \
--cost-limit 0 --binary-cost \
--cost-constrained --penalize-uncertainty \
--penalty-kappa 1.0 \
--id CarRacing-kappa1 --seed 1

CCEM

python cap-planet/run_cap_planet.py --env CarRacingSkiddingConstrained-v0 \
--cost-limit 0 --binary-cost \
--cost-constrained \
--id CarRacing-ccem --seed 1

CEM

python cap-planet/run_cap_planet.py --env CarRacingSkiddingConstrained-v0 \
--cost-limit 0 --binary-cost \
--id CarRacing-cem --seed 1

Contact

If you have any questions regarding the code or paper, feel free to contact [email protected] or open an issue on this repository.

Acknowledgement

This repository contains code adapted from the following repositories: PETS and PlaNet. We thank the authors and contributors for open-sourcing their code.

On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

46 Dec 15, 2022

The code for the NSDI'21 paper "BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing".

BMC The code for the NSDI'21 paper "BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing". BibTex entry available here. B

383 Dec 16, 2022

A solution to ensure Crowd Management with Contactless and Safe systems.

CovidTrack A Solution to ensure Crowd Management with Contactless and Safe systems. ML Model Mask Detection Social Distancing Detection Analytics Page

1 Nov 10, 2021

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

300 Dec 28, 2022

Safe Bayesian Optimization

SafeOpt - Safe Bayesian Optimization This code implements an adapted version of the safe, Bayesian optimization algorithm, SafeOpt [1], [2]. It also p

111 Dec 11, 2022

[NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training

Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training Code for NeurIPS 2021 paper "Better Safe Than Sorry: Preventing Delu

29 Sep 20, 2022

Infrastructure as Code (IaC) for a self-hosted version of Gnosis Safe on AWS

Welcome to Yearn Gnosis Safe! Setting up your local environment Infrastructure Deploying Gnosis Safe Prerequisites 1. Create infrastructure for secret

16 Jul 18, 2022

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

983 Dec 23, 2022

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 7, 2022

Comments

How is the episode total cost tested during training?

Hello! I think CAP is a wonderful job but there are still some implementational details that confuse me. It seems that there are no additional testing/evaluating stage during policy training? Is this epoch_cost used as the testing metric? If so, why is the cost discounted while the reward isn't?

opened by Zarzard 0

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

Related tags

Overview

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

Installations

Experiments

To run the PETS experiments on the HalfCheetah environment used in our ablation study, run:

The commands for the PlaNet experiment on the CarRacing environment are:

Contact

Acknowledgement

You might also like...

On the model-based stochastic value gradient for continuous reinforcement learning

The code for the NSDI'21 paper "BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing".

A solution to ensure Crowd Management with Contactless and Safe systems.

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

Safe Bayesian Optimization

[NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training

Infrastructure as Code (IaC) for a self-hosted version of Gnosis Safe on AWS

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Comments

How is the episode total cost tested during training?

Owner

Adaptive Attention Span for Reinforcement Learning

Optimal Adaptive Allocation using Deep Reinforcement Learning in a Dose-Response Study

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

SafePicking: Learning Safe Object Extraction via Object-Level Mapping, ICRA 2022

A pytorch reprelication of the model-based reinforcement learning algorithm MBPO

Model-based reinforcement learning in TensorFlow

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

Related tags

Overview

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

Installations

Experiments

To run the PETS experiments on the HalfCheetah environment used in our ablation study, run:

The commands for the PlaNet experiment on the CarRacing environment are:

Contact

Acknowledgement

You might also like...

On the model-based stochastic value gradient for continuous reinforcement learning

The code for the NSDI'21 paper "BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing".

A solution to ensure Crowd Management with Contactless and Safe systems.

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

Safe Bayesian Optimization

[NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training

Infrastructure as Code (IaC) for a self-hosted version of Gnosis Safe on AWS

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Comments

How is the episode total cost tested during training?

Owner

Adaptive Attention Span for Reinforcement Learning

Optimal Adaptive Allocation using Deep Reinforcement Learning in a Dose-Response Study

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

SafePicking: Learning Safe Object Extraction via Object-Level Mapping, ICRA 2022

A pytorch reprelication of the model-based reinforcement learning algorithm MBPO

Model-based reinforcement learning in TensorFlow

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.