A clean and robust Pytorch implementation of PPO on continuous action space.

XinJingHao

Last update: Dec 16, 2022

Related tags

Deep Learning PPO-Continuous-Pytorch

Overview

PPO-Continuous-Pytorch

I found the current implementation of PPO on continuous action space is whether somewhat complicated or not stable.
And this is a clean and robust Pytorch implementation of PPO on continuous action space. Here is the result:

All the experiments are trained with same hyperparameters.

Dependencies

gym==0.18.3
box2d==2.3.10
numpy==1.21.2
pytorch==1.8.1

How to use my code

Play with trained model

run 'python main.py --write False --render True --Loadmodel True --ModelIdex 400'

Train from scratch

run 'python main.py', where the default enviroment is Pendulum-v0.

Change Enviroment

If you want to train on different enviroments, just run 'python main.py --EnvIdex 0'.
The --EnvIdex can be set to be 0~5, where
'--EnvIdex 0' for 'BipedalWalker-v3'
'--EnvIdex 1' for 'BipedalWalkerHardcore-v3'
'--EnvIdex 2' for 'LunarLanderContinuous-v2'
'--EnvIdex 3' for 'Pendulum-v0'
'--EnvIdex 4' for 'Humanoid-v2'
'--EnvIdex 5' for 'HalfCheetah-v2'

Visualize the training curve

You can use the tensorboard to visualize the training curve. History training curve is saved at '\runs'

Hyperparameter Setting

For more details of Hyperparameter Setting, please check 'main.py'

You might also like...

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Automatic, Readable, Reusable, Extendable Machin is a reinforcement library designed for pytorch. Build status Platform Status Linux Windows Supported

348 Dec 24, 2022

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

70 Nov 4, 2022

Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

27 Jul 20, 2022

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow. They have a parallel sampling feature in order to increase computation speed (especially in high-performance computing (HPC)).

3 Dec 28, 2021

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

3 Jan 7, 2022

PPO is a very popular Reinforcement Learning algorithm at present.

Comments

extending continuous action to muzero

We appreciate your clean & robust implementation of PPO Continuous Action! We wonder if you could extend Continuous Action to MuZero? There have been implementations of MuZero Continuous Action by others but they are not robust. Thanks.

opened by meioses 1

A clean and robust Pytorch implementation of PPO on continuous action space.

Related tags

Overview

PPO-Continuous-Pytorch

Dependencies

How to use my code

Play with trained model

Train from scratch

Change Enviroment

Visualize the training curve

Hyperparameter Setting

You might also like...

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Human Action Controller - A human action controller running on different platforms.

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

PPO is a very popular Reinforcement Learning algorithm at present.

PPO Lagrangian in JAX

Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

Self-driving car env with PPO algorithm from stable baseline3

Comments

extending continuous action to muzero

Owner

XinJingHao

MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.