Implementation of Deep Deterministic Policy Gradiet Algorithm in Tensorflow

Overview

ddpg-aigym

Deep Deterministic Policy Gradient

Implementation of Deep Deterministic Policy Gradiet Algorithm (Lillicrap et al.arXiv:1509.02971.) in Tensorflow

How to use

git clone https://github.com/stevenpjg/ddpg-aigym.git
cd ddpg-aigym
python main.py

During training

Once trained

Learning Curve

The learning curve for InvertedPendulum-v1 environment.

Dependencies

Features

  • Batch Normalization (improvement in learning speed)
  • Grad-inverter (given in arXiv: arXiv:1511.04143)

Note

To use different environment

experiment= 'InvertedPendulum-v1' #specify environments here

To use batch normalization

is_batch_norm = True #batch normalization switch

Let me know if there are any issues and clarifications regarding hyperparameter tuning.

Comments
  • A question of running speed about your code

    A question of running speed about your code

    Hello! I have run your code and there is a problem about it. It seems that the update part where tf.assign is used becomes slower as the code keeps running, and it becomes the bottleneck of running speed. I am wondering if you have come across with the same problem? If so, I am looking forward to the solution. Thanks a lot!

    opened by zhuyifengzju 7
  • Error with GLEW initialization

    Error with GLEW initialization

    This is the output that I got: Creating window glfw ERROR: GLEW initalization error: Missing GL version My setup: Python3.5, Ubuntu 16.04, gym from openai official github.

    opened by williamissirius 2
  • A question on action_gradients in critic_net_bn.py

    A question on action_gradients in critic_net_bn.py

    Hi,

    I just read through your DDPG implementation, and it looks awesome. Thanks for sharing!

    Currently, I feel confusion about the below code self.action_gradients = [self.act_grad_v[0]/tf.to_float(tf.shape(self.act_grad_v[0])[0])] in critic_net_bn.py.

    Why do we add [0] after self.act_grad_v since we use a batch of actions to compute gradients? What does "[0]" use for?

    Thank you so much!

    opened by pxlong 1
  • Run the codes in the

    Run the codes in the "Reacher" task

    Hi, steven! Recently, I have downloaded your codes and test it on the "Reacher" task. However, I found that with GPU-based tensorflow, it could run 200 episodes per day. It seems a bit slow. Is there anything I need to adjust to fasten the process?(I found that the usage of GPU is low, around 3%~10%, maybe the GPU is not used sufficiently) Plus, you said that we could use one more wrapper to scale the reward, can you explain it more specifically? Thanks a lot!

    opened by cardwing 1
  • how to visualize the result with

    how to visualize the result with "episode_reward"

    hi steven,my system do not have Mujoco,so I combine your code with nrod80's code(https://github.com/nrod80/ddpg-for-openai) to build a new code.But I could not visualize the result.Could you told where is the visualize API?

    opened by 937552416 1
  • Question on Loss function of Critic Network training

    Question on Loss function of Critic Network training

    Hello,

    I just read through your code on DDPG implementation, and it looks awesome :) Currently I have a question to consult you, and I wonder how's the curve of Q loss function looks like with respect to training time when you train Inverted Pendulum with DDPG. Actually, I also implemented the DDPG code by myself, and I noticed that Inverted Pendulum did learn something, but the Q loss was diverged, and I wonder if you have the same issue with your implementation.

    Thank you so much!

    opened by RuofanKong 1
  • It si very very slow for Pendulum-v0 of classic control environment

    It si very very slow for Pendulum-v0 of classic control environment

    I ran this code for Pendulum-v0 environment, its too too slow on this particular environment. But its considerably faster on InvertedPendulum-v1. Do you have any idea why is it so ?

    opened by sarvghotra 1
  • Error, when ran for other environments like reacher-v1.

    Error, when ran for other environments like reacher-v1.

    I tried to run this code for Reacher-v1 and Swimmer-v1 but it threw an error due to this line. ValueError: total size of new array must be unchanged

    Could you please also explain why do you even need this step for InvertedPendulum ?

    opened by sarvghotra 1
  • Need help to understand how grad-inv accelerate learning process

    Need help to understand how grad-inv accelerate learning process

    I hope I am not troubling you too much by asking questions.

    Could you please help me to understand the notion of the recent changes made to accelerate learning ? BTW is it converging on Reacher-v1 ? Could you please also mention the time taken to learn and your system configuration ? Also, look at this paper for reward scaling, it could be a reason for divergence just in case it is not converging.

    opened by sarvghotra 1
  • Error spotted

    Error spotted

    This line of code looks wrong. (https://github.com/stevenpjg/ddpg-aigym/blob/master/critic_net.py#L84) It should have critic model predicting not actor model.

    opened by sarvghotra 1
Owner
Steven Spielberg P
Steven Spielberg P
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 3, 2023
Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion This repository contains a pytorch implementation of "Learning to Listen: Modeling

null 50 Dec 17, 2022
ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 9, 2022
An implementation of the proximal policy optimization algorithm

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 9, 2022
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

null 187 Dec 26, 2022
Code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Deep Deterministic Uncertainty This repository contains the code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic

Jishnu Mukhoti 69 Nov 28, 2022
This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

Quinn Herden 1 Feb 4, 2022
RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow. They have a parallel sampling feature in order to increase computation speed (especially in high-performance computing (HPC)).

Fangjian Li 3 Dec 28, 2021
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos A concise deep reinforcement learning libr

null 329 Jan 3, 2023
This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

Stock Trading Market OpenAI Gym Environment with Deep Reinforcement Learning using Keras Overview This project provides a general environment for stoc

Kim, Ki Hyun 769 Dec 25, 2022
PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

Zhengyao Jiang 1.5k Dec 29, 2022
A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

NNAISENSE 56 Jan 1, 2023
PyTorch implementation of Trust Region Policy Optimization

PyTorch implementation of TRPO Try my implementation of PPO (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.

Ilya Kostrikov 366 Nov 15, 2022
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

Alexis David Jacq 163 Dec 26, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
PyTorch implementation of Constrained Policy Optimization

PyTorch implementation of Constrained Policy Optimization (CPO) This repository has a simple to understand and use implementation of CPO in PyTorch. A

Sapana Chaudhary 25 Dec 8, 2022