TensorFlow implementation of Deep Reinforcement Learning papers

Overview

Deep Reinforcement Learning in TensorFlow

TensorFlow implementation of Deep Reinforcement Learning papers. This implementation contains:

[1] Playing Atari with Deep Reinforcement Learning
[2] Human-Level Control through Deep Reinforcement Learning
[3] Deep Reinforcement Learning with Double Q-learning
[4] Dueling Network Architectures for Deep Reinforcement Learning
[5] Prioritized Experience Replay (in progress)
[6] Deep Exploration via Bootstrapped DQN (in progress)
[7] Asynchronous Methods for Deep Reinforcement Learning (in progress)
[8] Continuous Deep q-Learning with Model-based Acceleration (in progress)

Requirements

Usage

First, install prerequisites with:

$ pip install -U 'gym[all]' tqdm scipy

Don't forget to also install the latest TensorFlow. Also note that you need to install the dependences of doom-py which is required by gym[all]

Train with DQN model described in [1] without gpu:

$ python main.py --network_header_type=nips --env_name=Breakout-v0 --use_gpu=False

Train with DQN model described in [2]:

$ python main.py --network_header_type=nature --env_name=Breakout-v0

Train with Double DQN model described in [3]:

$ python main.py --double_q=True --env_name=Breakout-v0

Train with Deuling network with Double Q-learning described in [4]:

$ python main.py --double_q=True --network_output_type=dueling --env_name=Breakout-v0

Train with MLP model described in [4] with corridor environment (useful for debugging):

$ python main.py --network_header_type=mlp --network_output_type=normal --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025
$ python main.py --network_header_type=mlp --network_output_type=normal --double_q=True --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025
$ python main.py --network_header_type=mlp --network_output_type=dueling --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025
$ python main.py --network_header_type=mlp --network_output_type=dueling --double_q=True --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025

Results

Result of Corridor-v5 in [4] for DQN (purple), DDQN (red), Dueling DQN (green), Dueling DDQN (blue).

model

Result of `Breakout-v0' for DQN without frame-skip (white-blue), DQN with frame-skip (light purple), Dueling DDQN (dark blue).

model

The hyperparameters and gradient clipping are not implemented as it is as [4].

References

Author

Taehoon Kim / @carpedm20

Comments
  • About tqdm and its constantly decreasing iterative speed

    About tqdm and its constantly decreasing iterative speed

    hello, I'm very confused about the speed of iteration. Within a few minutes after the program runs, the value of it/s was pretty big and it runs very fast that's what I'd love to see. But, in the process of running, the value of it/s is constantly decreasing. After about 30 minutes, the value will drop from about 10000 to 900, and it's going down.

    Is this the problem of setting up the GPU or tqdm?

    The graphics card I used is two Nvidia K40.

    image The picture below shows 30 minutes later image

    for nvidia-smi image

    opened by fredchenjialin 10
  • can't run in my pc

    can't run in my pc

    Traceback (most recent call last):
      File "main.py", line 168, in 
        tf.app.run()
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
        _sys.exit(main(_sys.argv[:1] + flags_passthrough))
      File "main.py", line 163, in main
        agent.train(conf.t_train_max)
      File "/home/ddy/workspace/deep-rl-tensorflow/agents/agent.py", line 67, in train
        observation, reward, terminal = self.new_game()
      File "/home/ddy/workspace/deep-rl-tensorflow/environments/environment.py", line 87, in new_random_game
        self.lives = self.env.ale.lives()
    AttributeError: 'TimeLimit' object has no attribute 'ale'
    
    opened by dave-yxw 2
  • Small installation instructions enhancement

    Small installation instructions enhancement

    Plus minor nitpicks and likely bugs.

    Also, I am working on an implementation of Async Actor-Critic, which I read somewhere you have in progress. If you're also working on it now, please tell me so we can coordinate and avoid duplicate work.

    opened by rhaps0dy 2
  • Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

    Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

    Hi,

    I have the above error whenever I use GPU for training My command used for running: python3 main.py --network_header_type=nature --env_name=Breakout-v0 --is_train=True --display=True --t_train_max=50 The crash comes whenever it reaches a certain number of training (about 10%). Full error report:

    10%|██▍ | 49973/500000 [02:52<25:54, 289.55it/s]2018-05-28 23:08:09.915921: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2018-05-28 23:08:09.915959: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2018-05-28 23:08:09.915981: F tensorflow/core/kernels/conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms) Aborted (core dumped)

    Is anyway to fix this? Thanks!

    opened by hai-h-nguyen 1
  • MemoryError when running the examples

    MemoryError when running the examples

    I'm getting the MemoryError. Ubuntu /2Gb ram + 4 Gb gwap:

    Traceback (most recent call last): File "main.py", line 168, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) File "main.py", line 160, in main agent = TrainAgent(sess, pred_network, env, stat, conf, target_network=target_network) File "/home/strky/deep-rl-tensorflow/agents/deep_q.py", line 13, in init super(DeepQ, self).init(sess, pred_network, env, stat, conf, target_network=target_network) File "/home/strky/deep-rl-tensorflow/agents/agent.py", line 53, in init conf.batch_size, conf.history_length, conf.memory_size, conf.observation_dims) File "/home/strky/deep-rl-tensorflow/agents/experience.py", line 15, in init self.observations = np.empty([self.memory_size] + observation_dims, dtype=np.uint8) MemoryError

    opened by ethene 1
  • 'TimeLimit' object has no attribute 'ale' Error

    'TimeLimit' object has no attribute 'ale' Error

    Hi, this error occurs when I try to run the demo of Breakout-v0.

    And, I have install gym[atari] and can make the environment 'Breakout-v0', however, this env has no 'ale' attribute to get the current lives.

    opened by NoobFang 1
  • fixed descriptions of learning_rate_minimum and learning_rate_decay

    fixed descriptions of learning_rate_minimum and learning_rate_decay

    hi, noticed a minor change in the descriptions (was copy pasted probably), suggested changing. great work in here, I am just starting to delve deeper in to your code (hopefully to use it)

    opened by borgr 1
  • questions about the Dueling logic in network.py

    questions about the Dueling logic in network.py

    Hi It's really a good code for learning Reinforcement Learning. In the network.py, I have 2 questions.

    1. I think you want to assert len(value_hidden_sizes) != 0 and len(advantage_hidden_sizes) != 0.
    2. About the Dueling part, the logic in code is layer contains value_hidden_sizes linear, then the layer is delivered to the next advantage logic. But I read the related paper, if I understand correct, it describes that the state-value and advantage are generated from the same source observation, then they're added together, and minus the mean advantage value.

    Looking forward to your further response.

    opened by liurida 1
  • clipping the delta zeros gradients

    clipping the delta zeros gradients

    It seems to me that in all of the agents you are clipping the gradient. This would mean that the gradients are zero for large errors. It might be because the paper "Human-level control through deep reinforcement learning" makes a mistake when talking about clipping the loss. What they actually do in the implementation is: abs(loss) for abs(loss) > 1 and loss^2 for abs(loss)<1. This can be implemented like this:

    delta_grad_clip = 1                                                                                       
    batch_delta = Y - DQN_acted                                                                               
    batch_delta_abs = tf.abs(batch_delta)                                                                     
    batch_delta_quadratic = tf.minimum(batch_delta_abs, delta_grad_clip)                                      
    batch_delta_linear = batch_delta_abs - batch_delta_quadratic                                              
    batch_loss = batch_delta_linear + batch_delta_quadratic**2                                                
    loss = tf.reduce_mean(batch_loss)        
    
    opened by cgel 1
  • SystemError: new style getargs format but argument is not a tuple

    SystemError: new style getargs format but argument is not a tuple

    @carpedm20 In my mac, when I ran the example, error occurred:

    SystemError: new style getargs format but argument is not a tuple
    

    Finally I found the solution for the problem, go to http://stackoverflow.com/questions/26964379/systemerror-new-style-getargs-format-but-argument-is-not-a-tuple-in-ros-camerac for more detailed explanation

    in environment/environment.py line 111, the last argument for imresize should be cast to tuple, using tuple(***) for that, like this: y_screen = imresize(y, tuple(self.observation_dims)).

    opened by tigerneil 1
  • How did you plot results?

    How did you plot results?

    Hi,

    It seems that you didn't include the code related plotting, I don't know in which way you plot your training results, do you have any suggestions?

    opened by SunCherry 0
  • TypeError: __init__() got an unexpected keyword argument 'timestep_limit'

    TypeError: __init__() got an unexpected keyword argument 'timestep_limit'

    Can't train the DQN!, I have installed gym[all] and tensorflow 1.9.0 with python 3.6.8, any idea? $ python main.py --network_header_type=nature --env_name=Breakout-v0 Traceback (most recent call last): File "main.py", line 10, in <module> from environments.environment import ToyEnvironment, AtariEnvironment File "/media/bigdata/Solid2/DQN/deep-rl-tensorflow-master/environments/environment.py", line 6, in <module> from .corridor import CorridorEnv File "/media/bigdata/Solid2/DQN/deep-rl-tensorflow-master/environments/corridor.py", line 131, in <module> timestep_limit=100, File "/home/bigdata/.conda/envs/tensorflow/lib/python3.6/site-packages/gym/envs/registration.py", line 153, in register return registry.register(id, **kwargs) File "/home/bigdata/.conda/envs/tensorflow/lib/python3.6/site-packages/gym/envs/registration.py", line 147, in register self.env_specs[id] = EnvSpec(id, **kwargs) TypeError: __init__() got an unexpected keyword argument 'timestep_limit'

    opened by qiyang77 2
  • file catalog is too long that can't be created

    file catalog is too long that can't be created

    Seeing agents/statistic.py line 20: self.writer = tf.summary.FileWriter('./logs/%s' % self.model_dir, self.sess.graph) It tries to use self.model_dir to create file but it raises error : "tensorflow.python.framework.errors_impl.NotFoundError: Failed to create a directory:" How can I fix it? My python version is 3.6, and tensorflow version is 1.12

    opened by past-is-past 2
  • Where is render() referenced in the environment?

    Where is render() referenced in the environment?

    Thanks for sharing this repo! I wish to try changing the rendering of display, and noticed the render() function is not explictly implemented in both AtariEnvironment and ToyEnvironment. Do you mind letting me know where to find the rendering options? (e.g. where you referenced the atari_env.py of gym in your code? or you didn't?)

    Thanks in advance!

    opened by doerlbh 0
  • Unknown command line flag 'data_format'

    Unknown command line flag 'data_format'

    I got the following bug. Could you please help? Thank you!

    $python main.py --network_header_type=nips --env_name=Breakout-v0 --use_gpu=False Traceback (most recent call last): File "main.py", line 173, in tf.app.run() File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "main.py", line 107, in main conf.data_format = 'NHWC' File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/tensorflow/python/platform/flags.py", line 88, in setattr return self.dict['__wrapped'].setattr(name, value) File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 496, in setattr return self._set_unknown_flag(name, value) File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 374, in _set_unknown_flag raise _exceptions.UnrecognizedFlagError(name, value) absl.flags._exceptions.UnrecognizedFlagError: Unknown command line flag 'data_format'

    opened by sonata2016 1
Owner
Taehoon Kim
ex OpenAI
Taehoon Kim
arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Andrej 671 Dec 31, 2022
Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning

Human-Level Control through Deep Reinforcement Learning Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning. This imp

Devsisters Corp. 2.4k Dec 26, 2022
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 7, 2022
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
Research on Tabular Deep Learning (Python package & papers)

Research on Tabular Deep Learning For paper implementations, see the section "Papers and projects". rtdl is a PyTorch-based package providing a user-f

Yura Gorishniy 510 Dec 30, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

null 195 Dec 7, 2022
Model-based reinforcement learning in TensorFlow

Bellman Website | Twitter | Documentation (latest) What does Bellman do? Bellman is a package for model-based reinforcement learning (MBRL) in Python,

null 46 Nov 9, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 2, 2023
Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extens

TensorLayer Community 7.1k Dec 27, 2022
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extens

TensorLayer Community 7.1k Dec 29, 2022
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

null 2.6k Jan 4, 2023
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

?? Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

xmu-xiaoma66 7.7k Jan 5, 2023
A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

Phil Wang 515 Dec 26, 2022
A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

Phil Wang 115 Dec 9, 2021
Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

Google Research 137 Dec 23, 2022