Implement A3C for Mujoco gym envs

Andrew

Last update: Dec 12, 2022

Related tags

Deep Learning reinforcement-learning pytorch a3c continuous-control actor-critic mujoco

Overview

pytorch-a3c-mujoco

Disclaimer: my implementation right now is unstable (you ca refer to the learning curve below), I'm not sure if it's my problems. All comments are welcomed and feel free to contact me!

This code aims to solve some control problems, espicially in Mujoco, and is highly based on pytorch-a3c. What's difference between this repo and pytorch-a3c:

compatible to Mujoco envionments
the policy network output the mu, and sigma
construct a gaussian distribution from mu and sigma
sample the data from the gaussian distribution
modify entropy

Note that this repo is only compatible with Mujoco in OpenAI gym. If you want to train agent in Atari domain, please refer to pytorch-a3c.

Usage

There're three tasks/modes for you: train, eval, develop.

train:

python main.py --env-name InvertedPendulum-v1 --num-processes 16 --task train

eval:

python main.py --env-name InvertedPendulum-v1 --task eval --display True --load_ckpt ckpt/a3c/InvertedPendulum-v1.a3c.100

You can choose to display or not using display flags

develop:

python main.py --env-name InvertedPendulum-v1 --num-processes 16 --task develop

In some case that you want to check if you code runs as you want, you might resort to pdb. Here, I provide a develop mode, which only runs in one thread (easy to debug).

Experiment results

learning curve

The plot of total reward/episode length in 1000 steps:

InvertedPendulum-v1

In InvertedPendulum-v1, total reward exactly equal to episode length.

InvertedDoublePendulum-v1

Note that the x axis denote the time in minute

The above curve is plotted from python plot.py --log_path ./logs/a3c/InvertedPendulum-v1.a3c.log

video

InvertedPendulum-v1

InvertedDoublePendulum-v1

Requirements

gym
mujoco-py
pytorch
matplotlib (optional)
seaborn (optional)

TODO

I implement the ShareRMSProp in my_optim.py, but I haven't tried it yet.

Reference

pytorch-a3c

You might also like...

Comments

Action in test and train

In both of the test and train procedure, the actions are all sampled from the Gaussian distribution. Should the test thread just take the mu directly?

opened by onlytailei 5
Problem with the entropy

The entropy of a Gaussian distribution is k/2 * log(2 * pi * e) + 1/2 * log(|Sigma|) according to the Wikipedia where k is the dimension of the distribution.

However, in the code, the entropy is calculated by -1/2 * (log(2*pi + |sigma|) + 1).

Why?

opened by xuehy 3

Implement A3C for Mujoco gym envs

Related tags

Overview

pytorch-a3c-mujoco

Usage

Experiment results

learning curve

video

Requirements

TODO

Reference

You might also like...

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Customizable RecSys Simulator for OpenAI Gym

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

Robot Servers and Server Manager software for robo-gym

Deep Q Learning with OpenAI Gym and Pokemon Showdown

Manipulation OpenAI Gym environments to simulate robots at the STARS lab

An OpenAI Gym environment for Super Mario Bros

Uses Open AI Gym environment to create autonomous cryptocurrency bot to trade cryptocurrencies.

AI virtual gym is an AI program which can be used to exercise and can be used to see if we are doing the exercises

Comments

Action in test and train

Problem with the entropy

Owner

Andrew

DM-ACME compatible implementation of the Arm26 environment from Mujoco

Simple renderer for use with MuJoCo (>=2.1.2) Python Bindings.

A3C LSTM Atari with Pytorch plus A3G design

PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

Plug-n-Play Reinforcement Learning in Python with OpenAI Gym and JAX

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

CL-Gym: Full-Featured PyTorch Library for Continual Learning