Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

Seohong Park

Last update: Aug 2, 2022

Related tags

Security related resources SAR

Overview

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

This repository is the official implementation of

Seohong Park, Jaekyeom Kim, Gunhee Kim. Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods. In NeurIPS, 2021.

It contains the implementations for SAR, FiGAR-C and base policy gradient algorithms (PPO, TRPO and A2C).

The code is based on Stable Baselines3 (SB3) for PPO and A2C, and Stable Baselines (SB) for TRPO.

Requirements

Python 3.7.8
MuJoCo 1.5

Run examples

PPO and A2C (based on Stable Baselines3)

To install requirements:

cd sb3
pip install -r requirements.txt
pip install -e .

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.01 --max_t=0.05 --max_d=0.5

Train FiGAR-C-PPO on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.01 --max_t=0.05

Train PPO on InvertedPendulum-v2 with the original δ:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg

Train SAR-A2C on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --algo=a2c_rg --frame_skip=1 --dt=0.01 --max_t=0.05 --max_d=0.5

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.002 and the "Action Noise" setting:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.002 --max_t=0.05 --max_d=0.5 --anoise_type=action --anoise_prob=0.05 --anoise_std=3

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.002 and the "External Force" setting:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.002 --max_t=0.05 --max_d=0.5 --anoise_type=ext_f --anoise_prob=0.05 --anoise_std=300

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.002 and the "Strong External Force (Perceptible)" setting:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.002 --max_t=0.05 --max_d=0.5 --anoise_type=ext_fpc --anoise_prob=0.05 --anoise_std=1000

TRPO (based on Stable Baselines)

To install requirements:

cd sb
pip install -r requirements.txt
pip install -e .

Train SAR-TRPO on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --frame_skip=1 --dt=0.01 --max_t=0.05 --max_d=0.5

License

This codebase is licensed under the MIT License. See also sb3/LICENSE_SB3 and sb/LICENSE_SB.

Find existing email addresses by nickname using API/SMTP checking methods without user notification. Please, don't hesitate to improve cat's job! 🐱🔎 📬

mailcat The only cat who can find existing email addresses by nickname. Usage First install requirements: pip3 install -r requirements.txt Then just

282 Dec 30, 2022

Denial Attacks by Various Methods

Denial Service Attack Denial Attacks by Various Methods IIIIIIIIIIIIIIIIIIII PPPPPPPPPPPPPPPPP VVVVVVVV VVVVVVVV I::

9 Nov 26, 2022

OpenTOTP is yet another time-based, one-time passwords (OTPs) generator/verifier inspired by RFC 6238.

OpenTOTP is yet another time-based, one-time passwords (OTPs) generator/verifier inspired by RFC 6238. It generates and validates OTPs based

1 Nov 15, 2021

Hubble is a modular, open-source security compliance framework. The project provides on-demand profile-based auditing, real-time security event notifications, alerting, and reporting. HubbleStack is a free and open source project made possible by Adobe. https://github.com/adobe

Welcome to HubbleStack!! You can find the docs here You can file an issue here Follow us on Twitter! Development Below are sample instructions to setu

368 Jan 4, 2023

Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.

The Recon-ng Framework Recon-ng content now available on Pluralsight! Recon-ng is a full-featured reconnaissance framework designed with the goal of p

2.4k Jan 7, 2023

LittleBrother is a simple parental control application monitoring specific processes on Linux hosts to monitor and limit the play time of children.

Parental Control Application LittleBrother Overview LittleBrother is a simple parental control application monitoring specific processes (read "games"

40 Dec 21, 2022

D-810 is an IDA Pro plugin which can be used to deobfuscate code at decompilation time by modifying IDA Pro microcode.

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

Related tags

Overview

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

Requirements

Run examples

PPO and A2C (based on Stable Baselines3)

TRPO (based on Stable Baselines)

License

You might also like...

Find existing email addresses by nickname using API/SMTP checking methods without user notification. Please, don't hesitate to improve cat's job! 🐱🔎 📬

Denial Attacks by Various Methods

OpenTOTP is yet another time-based, one-time passwords (OTPs) generator/verifier inspired by RFC 6238.

Hubble is a modular, open-source security compliance framework. The project provides on-demand profile-based auditing, real-time security event notifications, alerting, and reporting. HubbleStack is a free and open source project made possible by Adobe. https://github.com/adobe

Open Source Intelligence gathering tool aimed at reducing the time spent harvesting information from open sources.

LittleBrother is a simple parental control application monitoring specific processes on Linux hosts to monitor and limit the play time of children.

D-810 is an IDA Pro plugin which can be used to deobfuscate code at decompilation time by modifying IDA Pro microcode.

NS-Defacer: a auto html injecter, In other words It's a auto defacer to deface a lot of websites in less time

Sentinel-1 SAR time series analysis for OSINT use

Owner

Seohong Park

Set the draft security HTTP header Permissions-Policy (previously Feature-Policy) on your Django app.

Wonk is a tool for combining a set of AWS policy files into smaller compiled policy sets.

Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method

Mad Spammer is a python webhook spammer which is very easy and safe to use.

Dark-Fb No Login 100% safe

This tool allows to automatically test for Content Security Policy bypass payloads.

GitHub Advance Security Compliance Action

A GitHub action for organizations that enables advanced security code scanning on all new repos

Automatically fetch, measure, and merge subscription links on the network, use Github Action

Small Python library that adds password hashing methods to ORM objects