An implementation of the proximal policy optimization algorithm

Martin Huber

Last update: Dec 9, 2022

Related tags

Deep Learning ppo_libtorch

Overview

PPO Pytorch C++

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm. Below is a small visualization of the environment, the algorithm is tested in.

Fig. 1: The agent in testing mode.

Build

You first need to install PyTorch. For a clean installation from Anaconda, checkout this short tutorial, or this tutorial, to only install the binaries.

mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=/absolut/path/to/libtorch ..
make

Run

Run the executable with

cd build
./train_ppo

It should produce something like shown below.

Fig. 2: From left to right, the agent for successive epochs in training mode as it takes actions in the environment to reach the goal.

The algorithm can also be used in test mode, once trained. Therefore, run

cd build
./test_ppo

Visualization

The results are saved to data/data.csv and can be visualized by running python plot.py.

A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Multi-Agent-Deep-Deterministic-Policy-Gradients A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm This

159 Dec 28, 2022

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

1 Oct 24, 2021

A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

MADGRAD Optimization Algorithm For Tensorflow This package implements the MadGrad Algorithm proposed in Adaptivity without Compromise: A Momentumized,

20 Aug 18, 2022

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

1.4k Dec 25, 2022

Implemented fully documented Particle Swarm Optimization algorithm (basic model with few advanced features) using Python programming language

Implemented fully documented Particle Swarm Optimization (PSO) algorithm in Python which includes a basic model along with few advanced features such as updating inertia weight, cognitive, social learning coefficients and maximum velocity of the particle.

9 Nov 29, 2022

A Python Package for Portfolio Optimization using the Critical Line Algorithm

PyCLA A Python Package for Portfolio Optimization using the Critical Line Algorithm Getting started To use PyCLA, clone the repo and install the requi

19 Oct 11, 2022

Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

ttopt Description Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train (TT) format and maximu

5 May 23, 2022

A simple and lightweight genetic algorithm for optimization of any machine learning model

geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins

8 Aug 10, 2022

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

56 Jan 1, 2023

Comments

this->is_training() not working

Hi, when I test in models.h if(this->is_training()){} ever is true, but execute the "else", I was thinking maybe is for the version of libtorch, I am using 1.2.0 I will appreciate you indicate which version are you using and if you have one idea why not change the is_training() thanks

opened by N1ckfm 1
where is the formula in c++ file

https://github.com/Mikoto10032/DeepLearning/blob/master/books/%5B%E6%B7%B1%E5%BA%A6%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0%5D%5BHung-yi%20Lee%5D/PPO%20(v3).pdf

in this pdf page 9. formula as this 𝑝𝜃 𝜏 = 𝑝 𝑠1 𝑝𝜃 𝑎1|𝑠1 𝑝 𝑠2|𝑠1, 𝑎1 𝑝𝜃 𝑎2|𝑠2 𝑝 𝑠3|𝑠2, 𝑎2 ⋯

where is the formula in c++ file? which function implement it? or where define it? help me find out

opened by fatalfeel 3

Owner

Martin Huber

Hi :), I'm Martin.

GitHub

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

pytorch-a2c-ppo-acktr Update (April 12th, 2021) PPO is great, but Soft Actor Critic can be better for many continuous control tasks. Please check out

3k Jan 9, 2023

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

163 Dec 26, 2022

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

3k Dec 31, 2022

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

3.7k Jan 3, 2023

An implementation of the proximal policy optimization algorithm

Related tags

Overview

PPO Pytorch C++

Build

Run

Visualization

You might also like...

A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

Implemented fully documented Particle Swarm Optimization algorithm (basic model with few advanced features) using Python programming language

A Python Package for Portfolio Optimization using the Critical Line Algorithm

Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

A simple and lightweight genetic algorithm for optimization of any machine learning model

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

Comments

this->is_training() not working

where is the formula in c++ file

Owner

Martin Huber

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

ProMP: Proximal Meta-Policy Search

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

PyTorch implementation of Trust Region Policy Optimization

PyTorch implementation of Constrained Policy Optimization