Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Nugroho Dewantoro

Last update: Jun 6, 2022

Related tags

Overview

V-MPO

Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pytorch

Getting Started

This project is using Pytorch for Deep Learning Framework, Gym for Reinforcement Learning Environment. Although it's not required, but i recommend run this project on a PC with GPU and 8 GB Ram

Prerequisites

Make sure you have installed Pytorch and Gym.

Click here to install gym
Click here to install pytorch

Installing

Just clone this project into your work folder

git clone https://github.com/wisnunugroho21/reinforcement_learning_v_mpo.git

Running the project

After you clone the project, run following script in cmd/terminal :

Discrete

python discrete.py

Continous

python continous.py

On-Policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Some of the most successful applications of deep reinforcement learning to chal- lenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradient algo- rithms, we introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state- value function. We show that V-MPO surpasses previously reported scores for both the Atari-57 and DMLab-30 benchmark suites in the multi-task setting, and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters. On individual DMLab and Atari levels, the proposed algorithm can achieve scores that are substantially higher than has previously been reported. V-MPO is also applicable to problems with high-dimensional, continuous action spaces, which we demonstrate in the context of learning to control simulated humanoids with 22 degrees of freedom from full state observations and 56 degrees of freedom from pixel observations, as well as example OpenAI Gym tasks where V-MPO achieves substantially higher asymptotic scores than previously reported.

You can read full detail of V-MPO in here

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Related tags

Overview

V-MPO

Getting Started

Prerequisites

Installing

Running the project

Discrete

Continous

On-Policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

You might also like...

Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

FedMM: Saddle Point Optimization for Federated Adversarial Domain Adaptation

RoMA: Robust Model Adaptation for Offline Model-based Optimization

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

PyTorch implementation of Trust Region Policy Optimization

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PyTorch implementation of Constrained Policy Optimization

Owner

Nugroho Dewantoro

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

DRLib：A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)