Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning

Overview

Human-Level Control through Deep Reinforcement Learning

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning.

model

This implementation contains:

  1. Deep Q-network and Q-learning
  2. Experience replay memory
    • to reduce the correlations between consecutive updates
  3. Network for Q-learning targets are fixed for intervals
    • to reduce the correlations between target and predicted Q-values

Requirements

Usage

First, install prerequisites with:

$ pip install tqdm gym[all]

To train a model for Breakout:

$ python main.py --env_name=Breakout-v0 --is_train=True
$ python main.py --env_name=Breakout-v0 --is_train=True --display=True

To test and record the screen with gym:

$ python main.py --is_train=False
$ python main.py --is_train=False --display=True

Results

Result of training for 24 hours using GTX 980 ti.

best

Simple Results

Details of Breakout with model m2(red) for 30 hours using GTX 980 Ti.

tensorboard

Details of Breakout with model m3(red) for 30 hours using GTX 980 Ti.

tensorboard

Detailed Results

[1] Action-repeat (frame-skip) of 1, 2, and 4 without learning rate decay

A1_A2_A4_0.00025lr

[2] Action-repeat (frame-skip) of 1, 2, and 4 with learning rate decay

A1_A2_A4_0.0025lr

[1] & [2]

A1_A2_A4_0.00025lr_0.0025lr

[3] Action-repeat of 4 for DQN (dark blue) Dueling DQN (dark green) DDQN (brown) Dueling DDQN (turquoise)

The current hyper parameters and gradient clipping are not implemented as it is in the paper.

A4_duel_double

[4] Distributed action-repeat (frame-skip) of 1 without learning rate decay

A1_0.00025lr_distributed

[5] Distributed action-repeat (frame-skip) of 4 without learning rate decay

A4_0.00025lr_distributed

References

License

MIT License.

Comments
  • Segmentation fault (core dumped) | MemoryError

    Segmentation fault (core dumped) | MemoryError

    "Segmentation fault (core dumped)" while trying to run it.

    I have no GPU configured with tensorflow. I suspect thats the reason. Is there any way to make it work just with the CPU?

    Tried a couple of flags, but they didn't work. python main.py --env_name=Breakout-v0 --is_train=True --display=True --cpu=True

    bug 
    opened by LecJackS 13
  • A bug in the implementation

    A bug in the implementation

    Hello, I spotted what I believe might be a bug in the DQN implementation on line 291 here:

    https://github.com/devsisters/DQN-tensorflow/blob/master/dqn/agent.py#L291

    The code tries to clip the self.delta with tf.clip_by_value, I assume with the intention of being robust when the discrepancy in Q is above a threshold:

    self.delta = self.target_q_t - q_acted
    self.clipped_delta = tf.clip_by_value(self.delta, self.min_delta, self.max_delta, name='clipped_delta')
    self.global_step = tf.Variable(0, trainable=False)
    self.loss = tf.reduce_mean(tf.square(self.clipped_delta), name='loss')
    

    However, the clip_by_value function's local gradient outside of the min_delta, max_delta range is zero. Therefore, with the current code whenever the discrepancy is above min/max delta, the gradient becomes exactly zero in backprop. This might not be what you intend, and is certainly not standard, I believe.

    I think you probably want to clip the gradient here, not the raw Q. In that case you would have to use the Huber loss:

    def clipped_error(x): 
        return tf.select(tf.abs(x) < 1.0, 0.5 * tf.square(x), tf.abs(x) - 0.5) # condition, true, false
    

    and use this on this.delta instead of tf.square. This would have the desired effect of increased robustness to outliers.

    opened by karpathy 6
  • Error after 50,000 iterations with Alien-v0 environment running on my MacOS Sierra

    Error after 50,000 iterations with Alien-v0 environment running on my MacOS Sierra

    This is the log of the events. What should I do? Thanks!

    iMac:DQN-tensorflow shyamalsuhanachandra$ python main.py --env_name=Alien-v0 --is_train=True --display=True
     [*] GPU : 1.0000
    [2016-09-27 17:28:27,334] Making new env: Alien-v0
    {'_save_step': 500000,
     '_test_step': 50000,
     'action_repeat': 4,
     'backend': 'tf',
     'batch_size': 32,
     'cnn_format': 'NCHW',
     'discount': 0.99,
     'display': True,
     'double_q': False,
     'dueling': False,
     'env_name': 'Alien-v0',
     'env_type': 'detail',
     'ep_end': 0.1,
     'ep_end_t': 1000000,
     'ep_start': 1.0,
     'history_length': 4,
     'learn_start': 50000.0,
     'learning_rate': 0.00025,
     'learning_rate_decay': 0.96,
     'learning_rate_decay_step': 50000,
     'learning_rate_minimum': 0.00025,
     'max_delta': 1,
     'max_reward': 1.0,
     'max_step': 50000000,
     'memory_size': 1000000,
     'min_delta': -1,
     'min_reward': -1.0,
     'model': 'm1',
     'random_start': 30,
     'scale': 10000,
     'screen_height': 84,
     'screen_width': 84,
     'target_q_update_step': 10000,
     'train_frequency': 4}
     [*] Loading checkpoints...
     [!] Load FAILED: checkpoints/Alien-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/double_q-False/memory_size-1000000/action_repeat-4/ep_end_t-1000000/dueling-False/min_reward--1.0/backend-tf/random_start-30/scale-10000/env_type-detail/learning_rate_decay_step-50000/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NCHW/learning_rate-0.00025/batch_size-32/discount-0.99/max_step-50000000/max_reward-1.0/learning_rate_decay-0.96/learning_rate_minimum-0.00025/env_name-Alien-v0/ep_end-0.1/model-m1/screen_height-84/
    2016-09-27 17:28:28.996 Python[26135:3913383] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/m1/b_t9_2151y30ryvtr_2gznch0000gp/T/org.python.python.savedState
      0%|                    | 49999/50000000 [14:17<235:26:28, 58.93it/s]E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Invalid argument: CPU BiasOp only supports NHWC.
         [[Node: target/target_l1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target/target_l1/Conv2D, target/target_l1/biases/read)]]
    
    Traceback (most recent call last):
      File "main.py", line 66, in <module>
        tf.app.run()
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
        sys.exit(main(sys.argv))
      File "main.py", line 61, in main
        agent.train()
      File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 56, in train
        self.observe(screen, reward, action, terminal)
      File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 135, in observe
        self.q_learning_mini_batch()
      File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 157, in q_learning_mini_batch
        q_t_plus_1 = self.target_q.eval({self.target_s_t: s_t_plus_1})
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 559, in eval
        return _eval_using_default_session(self, feed_dict, self.graph, session)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3656, in _eval_using_default_session
        return session.run(tensors, feed_dict)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 710, in run
        run_metadata_ptr)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 908, in _run
        feed_dict_string, options, run_metadata)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 958, in _do_run
        target_list, options, run_metadata)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 978, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors.InvalidArgumentError: CPU BiasOp only supports NHWC.
         [[Node: target/target_l1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target/target_l1/Conv2D, target/target_l1/biases/read)]]
    Caused by op u'target/target_l1/BiasAdd', defined at:
      File "main.py", line 66, in <module>
        tf.app.run()
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
        sys.exit(main(sys.argv))
      File "main.py", line 58, in main
        agent = Agent(config, env, sess)
      File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 29, in __init__
        self.build_dqn()
      File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 240, in build_dqn
        32, [8, 8], [4, 4], initializer, activation_fn, self.cnn_format, name='target_l1')
      File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/ops.py", line 25, in conv2d
        out = tf.nn.bias_add(conv, b, data_format)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 391, in bias_add
        return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 279, in _bias_add
        data_format=data_format, name=name)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
        op_def=op_def)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2317, in create_op
        original_op=self._default_original_op, op_def=op_def)
      File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1239, in __init__
        self._traceback = _extract_stack()
    
    opened by shyamalschandra 6
  • AttributeError: 'TimeLimit' object has no attribute 'ale'

    AttributeError: 'TimeLimit' object has no attribute 'ale'

    Hi, I downloaded the codes, and then test it as it described here. However, I got this error as follows, I think, all requirements are installed except opencv2 and openAI gym was tested. I would appreciate that someone finds the cause and the solution.

    Traceback (most recent call last): File "/DQN-tensorflow-master/main.py", line 69, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) File "/DQN-tensorflow-master/main.py", line 64, in main agent.train() File "/DQN-tensorflow-master/dqn/agent.py", line 40, in train screen, reward, action, terminal = self.env.new_random_game() File /DQN-tensorflow-master/dqn/environment.py", line 28, in new_random_game self.new_game(True) File "/DQN-tensorflow-master/dqn/environment.py", line 21, in new_game if self.lives == 0: File "/DQN-tensorflow-master/dqn/environment.py", line 52, in lives return self.env.ale.lives() AttributeError: 'TimeLimit' object has no attribute 'ale'

    opened by minimok7 4
  • unsupported operand type(s) for +: 'dict_values' and 'list

    unsupported operand type(s) for +: 'dict_values' and 'list

    When I use python3.6 to implement the program, the error shows :

    Traceback (most recent call last): File "main.py", line 70, in tf.app.run() File "/home/tanggy/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "main.py", line 62, in main agent = Agent(config, env, sess) File "/home/tanggy/Downloads/DQN-tensorflow-master/dqn/agent.py", line 30, in init self.build_dqn() File "/home/tanggy/Downloads/DQN-tensorflow-master/dqn/agent.py", line 328, in build_dqn self._saver = tf.train.Saver(self.w.values() + [self.step_op], max_to_keep=30) TypeError: unsupported operand type(s) for +: 'dict_values' and 'list

    opened by tygrer 3
  • Agent.learning_rate is never set?

    Agent.learning_rate is never set?

    I was reading through the code and couldn't figure out where Agent.learning_rate is being set. It's used here:

    https://github.com/devsisters/DQN-tensorflow/blob/master/dqn/agent.py#L299

    But it's only set on the Config object, not the Agent.

    opened by halflings 3
  • UnboundLocalError: local variable 'avg_ep_reward' referenced before assignment

    UnboundLocalError: local variable 'avg_ep_reward' referenced before assignment

    When I run training with python main.py --env_name=Breakout-v0 --is_train=True --display=True --cpu=True , I got this output after a couple of training episodes:

    python main.py --env_name=Breakout-v0 --is_train=True --display=True --cpu=True [_] GPU : 0.5000 [2016-05-20 17:00:38,585] Making new env: Breakout-v0 {'_save_step': 50000, '_test_step': 10000, 'action_repeat': 4, 'backend': 'tf', 'batch_size': 32, 'cnn_format': 'NHWC', 'discount': 0.99, 'display': True, 'env_name': 'Breakout-v0', 'env_type': 'simple', 'ep_end': 0.1, 'ep_end_t': 1000000, 'ep_start': 1.0, 'history_length': 4, 'learn_start': 50000.0, 'learning_rate': 0.00025, 'max_delta': 1, 'max_reward': 1.0, 'max_step': 50000000, 'memory_size': 1000000, 'min_delta': -1, 'min_reward': -1.0, 'model': 'm2', 'random_start': 30, 'scale': 10000, 'screen_height': 84, 'screen_width': 84, 'target_q_update_step': 10000, 'train_frequency': 4} [_] Loading checkpoints... [!] Load FAILED: checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/memory_size-1000000/action_repeat-4/ep_end_t-1000000/backend-tf/random_start-30/scale-10000/env_type-simple/min_reward--1.0/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NHWC/learning_rate-0.00025/batch_size-32/discount-0.99/max_reward-1.0/max_step-50000000/env_name-Breakout-v0/ep_end-0.1/model-m2/screen_height-84/ 2016-05-20 17:00:40.195 Python[25567:405995] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/t0/tw1pt8nn5xv2ykn_4tmnxg5m0000gn/T/org.python.python.savedState 0%| | 49978/50000000 [02:47<39:09:30, 354.33it/s] Traceback (most recent call last): File "main.py", line 63, in tf.app.run() File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv)) File "main.py", line 58, in main agent.train() File "/Users/x0r/Documents/codes/DQN-tensorflow/dqn/agent.py", line 110, in train if max_avg_ep_reward >= avg_ep_reward * 0.9: UnboundLocalError: local variable 'avg_ep_reward' referenced before assignment

    opened by YigitDemirag 3
  • History is not updated with new game screen created after a terminal state is reached

    History is not updated with new game screen created after a terminal state is reached

    Hi. I am trying to understand the code and I came across what I think is a bug in: https://github.com/devsisters/DQN-tensorflow/blob/c7b1f1051dfa152530322445fc8febb9a2ea078b/dqn/agent.py#L32 It is related with the way the agent interacts with the environment: at the beginning of training the environment is reset via self.env.new_random_game() and afterwards the history is filled with the new random state via self.history.add(screen), which is needed because the agent always chooses its actions taking that history as input via action = self.predict(self.history.get()).

    When a terminal state is reached a new random game is created but the new random state is not added to the history this time. This causes that the agent will use the terminal state of the last episode to decide which action to take in the first state of the new episode, which I think is wrong.

    A way to fix it would be to add

    for _ in range(self.history_length):
        self.history.add(screen)
    

    after this line.

    I don't know if fixing this would have any positive impact on performance since it only affects the first self.history_length steps of each episode but anyways I wanted to share it.

    Thanks in advance.

    opened by hipoglucido 2
  • Rendering error

    Rendering error

    When running DQN with --display option getting the following error Traceback (most recent call last): File "main.py", line 66, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) File "main.py", line 61, in main agent.train() File "/home/savvai/Documents/DQN-tensorflow/dqn/agent.py", line 40, in train screen, reward, action, terminal = self.env.new_random_game() File "/home/savvai/Documents/DQN-tensorflow/dqn/environment.py", line 28, in new_random_game self.new_game(True) File "/home/savvai/Documents/DQN-tensorflow/dqn/environment.py", line 24, in new_game self.render() File "/home/savvai/Documents/DQN-tensorflow/dqn/environment.py", line 60, in render self.env.render() File "/usr/local/lib/python2.7/dist-packages/gym/core.py", line 174, in render return self._render(mode=mode, close=close) File "/usr/local/lib/python2.7/dist-packages/gym/envs/atari/atari_env.py", line 119, in _render from gym.envs.classic_control import rendering File "/usr/local/lib/python2.7/dist-packages/gym/envs/classic_control/rendering.py", line 23, in from pyglet.gl import * File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/init.py", line 236, in import pyglet.window File "/usr/local/lib/python2.7/dist-packages/pyglet/window/init.py", line 1817, in gl._create_shadow_window() File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/init.py", line 205, in _create_shadow_window _shadow_window = Window(width=1, height=1, visible=False) File "/usr/local/lib/python2.7/dist-packages/pyglet/window/xlib/init.py", line 163, in init super(XlibWindow, self).init(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/pyglet/window/init.py", line 505, in init config = screen.get_best_config(template_config) File "/usr/local/lib/python2.7/dist-packages/pyglet/canvas/base.py", line 161, in get_best_config configs = self.get_matching_configs(template) File "/usr/local/lib/python2.7/dist-packages/pyglet/canvas/xlib.py", line 179, in get_matching_configs configs = template.match(canvas) File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/xlib.py", line 29, in match have_13 = info.have_version(1, 3) File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/glx_info.py", line 89, in have_version client = [int(i) for i in client_version.split('.')] ValueError: invalid literal for int() with base 10: 'None'

    opened by SavvaI 2
  • How to generalize to other Atari Games?

    How to generalize to other Atari Games?

    Hi i want to run your code to train agent on other games. Is there any code or hyperparameter that needs to be changed in order to train a nice agent ?

    opened by chiahsuan156 2
  • Fixed bug in import utils, added Python3 compatibility

    Fixed bug in import utils, added Python3 compatibility

    Changed import utils to import .utils in all files as utils.py is inside the dqn folder. Also changed pickle import and added print compatibility for Python3

    opened by rsnk96 1
  • why the Breakout-v0's action_size is 4 ,but in the checkpoint is 6

    why the Breakout-v0's action_size is 4 ,but in the checkpoint is 6

    when i run it ,get this wrong,is it the version of gym ? tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

    Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [6] [[node save/Assign_9 (defined at \Users\tang\Desktop\DQN-tensorflow-master\dqn\agent.py:328) ]]

    opened by JUZI1 1
  • AssertionError: Cannot call env.step() before calling reset()

    AssertionError: Cannot call env.step() before calling reset()

    I don't know why this error shows up. Please help me out here.

    [*] GPU : 1.0000 2019-05-23 23:34:52.437286: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA {'_save_step': 500000, '_test_step': 50000, 'action_repeat': 4, 'backend': 'tf', 'batch_size': 32, 'cnn_format': 'NHWC', 'discount': 0.99, 'display': True, 'double_q': False, 'dueling': False, 'env_name': 'Breakout-v0', 'env_type': 'detail', 'ep_end': 0.1, 'ep_end_t': 100000, 'ep_start': 1.0, 'history_length': 4, 'learn_start': 50000.0, 'learning_rate': 0.00025, 'learning_rate_decay': 0.96, 'learning_rate_decay_step': 50000, 'learning_rate_minimum': 0.00025, 'max_delta': 1, 'max_reward': 1.0, 'max_step': 50000000, 'memory_size': 100000, 'min_delta': -1, 'min_reward': -1.0, 'model': 'm1', 'random_start': 30, 'scale': 10000, 'screen_height': 84, 'screen_width': 84, 'target_q_update_step': 10000, 'train_frequency': 4} WARNING:tensorflow:From /home/tejask98/Desktop/DQN-tensorflow/dqn/agent.py:225: calling argmax (from tensorflow.python.ops.math_ops) with dimension is deprecated and will be removed in a future version. Instructions for updating: Use the axis argument instead WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py:189: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use tf.global_variables_initializer instead. [*] Loading checkpoints... [!] Load FAILED: checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/double_q-False/memory_size-100000/action_repeat-4/ep_end_t-100000/dueling-False/min_reward--1.0/backend-tf/random_start-30/scale-10000/env_type-detail/learning_rate_decay_step-50000/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NHWC/learning_rate-0.00025/batch_size-32/discount-0.99/max_step-50000000/max_reward-1.0/learning_rate_decay-0.96/learning_rate_minimum-0.00025/env_name-Breakout-v0/ep_end-0.1/model-m1/screen_height-84/ Traceback (most recent call last): File "main.py", line 70, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "main.py", line 65, in main agent.train() File "/home/tejask98/Desktop/DQN-tensorflow/dqn/agent.py", line 43, in train screen, reward, action, terminal = self.env.new_random_game() File "/home/tejask98/Desktop/DQN-tensorflow/dqn/environment.py", line 28, in new_random_game self.new_game(True) File "/home/tejask98/Desktop/DQN-tensorflow/dqn/environment.py", line 23, in new_game self._step(0) File "/home/tejask98/Desktop/DQN-tensorflow/dqn/environment.py", line 35, in _step self._screen, self.reward, self.terminal, _ = self.env.step(action) File "/usr/local/lib/python2.7/dist-packages/gym/wrappers/time_limit.py", line 30, in step assert self._episode_started_at is not None, "Cannot call env.step() before calling reset()" AssertionError: Cannot call env.step() before calling reset()

    opened by frenzytejask98 1
  • Any one who can share model details?

    Any one who can share model details?

    class M1(DQNConfig): backend = 'tf' env_type = 'detail' action_repeat = 1

    class M2(DQNConfig): backend = 'tf' env_type = 'detail' action_repeat = 4

    I use python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m2 and python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m1

    The "avg_ep_r" in both models reaches 2.1 - 2.3 at around 5 million iterations. But when it comes to even 15 million iterations, the "avg_ep_r" still fluctuates between 2.1 and 2.3.

    Just like the result they have shown( I guess that is the result of Action-repeat (frame-skip) of 1, without learning rate decay). I didn't change any parameters.

    image

    The strange thing is, even when I use model m2(Action-repeat (frame-skip) of 4), my result is similar to model m1. The "avg_ep_r" fluctuates between 2.1 and 2.3 from around 5 million to 15 million iterations. The max_ep_r fluctuates between 10 and 18 from around 5 million to 15 million iterations.

    class M2(DQNConfig): backend = 'tf' env_type = 'detail' action_repeat = 4

    Do I need to change some parameters to reach the best result they have shown?

    Thank you very much.

    opened by Richardxxxxxxx 3
  • terminal in agent.py seem not handle properly

    terminal in agent.py seem not handle properly

    `

    for self.step in tqdm(range(start_step, self.max_step), ncols=70, initial=start_step):
    
    if self.step == self.learn_start:
        num_game, self.update_count, ep_reward = 0, 0, 0.
        total_reward, self.total_loss, self.total_q = 0., 0., 0.
        ep_rewards, actions = [], []
    
    # 1. predict
    action = self.predict(self.history.get())
    # 2. act
    screen, reward, terminal = self.env.act(action, is_training=True)
    # 3. observe
    
    self.observe(screen, reward, action, terminal)
    
    if terminal:
        screen, reward, action, terminal = self.env.new_random_game()
        num_game += 1
        ep_rewards.append(ep_reward)
        ep_reward = 0.
    

    ` Function train in agent.py may not handle properly when the game is terminated. As the game is terminated, the new screen didn't add into history and memory, self.history isn't get updated. And in the next iteration, action = self.predict(self.history.get()) will be the same, i.e. terminated.

    opened by martin6336 1
Owner
Devsisters Corp.
Devsisters Corp.
Thermal Control of Laser Powder Bed Fusion using Deep Reinforcement Learning

This repository is the implementation of the paper "Thermal Control of Laser Powder Bed Fusion Using Deep Reinforcement Learning", linked here. The project makes use of the Deep Reinforcement Library stable-baselines3 to derive a control policy that maximizes melt pool depth consistency.

BaratiLab 11 Dec 27, 2022
Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

TFLearn 9.6k Jan 2, 2023
Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

TFLearn 9.5k Feb 12, 2021
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

Dynamic Systems Lab 300 Dec 28, 2022
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 7, 2022
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

null 184 Dec 11, 2022
[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

MIT Graphics Group 65 Jan 7, 2023
JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Optimal Model Design for Reinforcement Learning This repository contains JAX code for the paper Control-Oriented Model-Based Reinforcement Learning wi

Evgenii Nikishin 43 Sep 28, 2022
Doosan robotic arm, simulation, control, visualization in Gazebo and ROS2 for Reinforcement Learning.

Robotic Arm Simulation in ROS2 and Gazebo General Overview This repository includes: First, how to simulate a 6DoF Robotic Arm from scratch using GAZE

David Valencia 12 Jan 2, 2023
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

Future Power Networks 83 Jan 6, 2023
Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks This repository contains the code and data for the corresp

Friederike Metz 7 Apr 23, 2022
"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

This is code repo for our EMNLP 2017 paper "Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback", which implements the A2C algorithm on top of a neural encoder-decoder model and benchmarks the combination under simulated noisy rewards.

Khanh Nguyen 131 Oct 21, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

null 195 Dec 7, 2022