Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

Overview

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

This repository is the official implementation of

  • Seohong Park, Jaekyeom Kim, Gunhee Kim. Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods. In NeurIPS, 2021.

It contains the implementations for SAR, FiGAR-C and base policy gradient algorithms (PPO, TRPO and A2C).

The code is based on Stable Baselines3 (SB3) for PPO and A2C, and Stable Baselines (SB) for TRPO.

Requirements

Run examples

PPO and A2C (based on Stable Baselines3)

To install requirements:

cd sb3
pip install -r requirements.txt
pip install -e .

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.01 --max_t=0.05 --max_d=0.5

Train FiGAR-C-PPO on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.01 --max_t=0.05

Train PPO on InvertedPendulum-v2 with the original δ:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg

Train SAR-A2C on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --algo=a2c_rg --frame_skip=1 --dt=0.01 --max_t=0.05 --max_d=0.5

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.002 and the "Action Noise" setting:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.002 --max_t=0.05 --max_d=0.5 --anoise_type=action --anoise_prob=0.05 --anoise_std=3

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.002 and the "External Force" setting:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.002 --max_t=0.05 --max_d=0.5 --anoise_type=ext_f --anoise_prob=0.05 --anoise_std=300

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.002 and the "Strong External Force (Perceptible)" setting:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.002 --max_t=0.05 --max_d=0.5 --anoise_type=ext_fpc --anoise_prob=0.05 --anoise_std=1000

TRPO (based on Stable Baselines)

To install requirements:

cd sb
pip install -r requirements.txt
pip install -e .

Train SAR-TRPO on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --frame_skip=1 --dt=0.01 --max_t=0.05 --max_d=0.5

License

This codebase is licensed under the MIT License. See also sb3/LICENSE_SB3 and sb/LICENSE_SB.

You might also like...
Find existing email addresses by nickname using API/SMTP checking methods without user notification. Please, don't hesitate to improve cat's job! 🐱🔎 📬
Find existing email addresses by nickname using API/SMTP checking methods without user notification. Please, don't hesitate to improve cat's job! 🐱🔎 📬

mailcat The only cat who can find existing email addresses by nickname. Usage First install requirements: pip3 install -r requirements.txt Then just

Denial Attacks by Various Methods

Denial Service Attack Denial Attacks by Various Methods IIIIIIIIIIIIIIIIIIII PPPPPPPPPPPPPPPPP VVVVVVVV VVVVVVVV I::

OpenTOTP is yet another time-based, one-time passwords (OTPs) generator/verifier inspired by RFC 6238.

OpenTOTP is yet another time-based, one-time passwords (OTPs) generator/verifier inspired by RFC 6238. It generates and validates OTPs based

Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.
Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.

The Recon-ng Framework Recon-ng content now available on Pluralsight! Recon-ng is a full-featured reconnaissance framework designed with the goal of p

LittleBrother is a simple parental control application monitoring specific processes on Linux hosts to monitor and limit the play time of children.
LittleBrother is a simple parental control application monitoring specific processes on Linux hosts to monitor and limit the play time of children.

Parental Control Application LittleBrother Overview LittleBrother is a simple parental control application monitoring specific processes (read "games"

JS Deobfuscation is a Python script that deobfuscate JS code and it's time saver for you.

JS Deobfuscation is a Python script that deobfuscate JS code and it's time saver for you. Although it may not work with high degrees of obfuscation, it's a pretty nice tool to help you even if it's just a bit.

D-810 is an IDA Pro plugin which can be used to deobfuscate code at decompilation time by modifying IDA Pro microcode.
D-810 is an IDA Pro plugin which can be used to deobfuscate code at decompilation time by modifying IDA Pro microcode.

Introduction fork from https://gitlab.com/eshard/d810 What is D-810 D-810 is an IDA Pro plugin which can be used to deobfuscate code at decompilation

NS-Defacer: a auto html injecter, In other words It's a auto defacer to deface a lot of websites in less time
NS-Defacer: a auto html injecter, In other words It's a auto defacer to deface a lot of websites in less time

Overview NS-Defacer is a auto html injecter, In other words It's a auto defacer

Owner
Seohong Park
Seohong Park
Set the draft security HTTP header Permissions-Policy (previously Feature-Policy) on your Django app.

django-permissions-policy Set the draft security HTTP header Permissions-Policy (previously Feature-Policy) on your Django app. Requirements Python 3.

Adam Johnson 76 Nov 30, 2022
Wonk is a tool for combining a set of AWS policy files into smaller compiled policy sets.

Wonk is a tool for combining a set of AWS policy files into smaller compiled policy sets.

Amino, Inc 140 Dec 16, 2022
Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method

Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method Hieu Trung Nguyen, Khang Tran and Ngoc Hoang Luong Setup Clone thi

Evolutionary Learning & Optimization (ELO) Lab 6 Jun 29, 2022
Mad Spammer is a python webhook spammer which is very easy and safe to use.

Mad Spammer ?? Pre-Setup: Open your terminal/console and type: pip install module colorama python MadSpammer.py Setup: After doing that, you should be

null 1 Nov 26, 2021
Dark-Fb No Login 100% safe

Dark-Fb No Login 100% safe TERMUX • pkg install python2 && git -y • pip2 install requests mechanize tqdm • git clone https://github.com/BOT-033/Sensei

Bukan Hamkel 1 Dec 4, 2021
This tool allows to automatically test for Content Security Policy bypass payloads.

CSPass This tool allows to automatically test for Content Security Policy bypass payloads. Usage [cspass]$ ./cspass.py -h usage: cspass.py [-h] [--no-

Ruulian 30 Nov 22, 2022
GitHub Advance Security Compliance Action

advanced-security-compliance This Action was designed to allow users to configure their Risk threshold for security issues reported by GitHub Code Sca

Mathew Payne 121 Dec 14, 2022
A GitHub action for organizations that enables advanced security code scanning on all new repos

Advanced-Security-Enforcer What this repository does This code is for an active GitHub Action written in Python to check (on a schedule) for new repos

Zack Koppert 30 May 17, 2022
Automatically fetch, measure, and merge subscription links on the network, use Github Action

Free Node Merge Introduction Modified from alanbobs999/TopFreeProxies It measures the speed of free nodes on the network and import the stable and hig

null 52 Jul 16, 2022
Small Python library that adds password hashing methods to ORM objects

Password Mixin Mixin that adds some useful methods to ORM objects Compatible with Python 3.5 >= 3.9 Install pip install password-mixin Setup first cre

Joe Gasewicz 5 Nov 22, 2022