TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Overview

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

PyPI tf-agents

TF-Agents makes implementing, deploying, and testing new Bandits and RL algorithms easier. It provides well tested and modular components that can be modified and extended. It enables fast code iteration, with good test integration and benchmarking.

To get started, we recommend checking out one of our Colab tutorials. If you need an intro to RL (or a quick recap), start here. Otherwise, check out our DQN tutorial to get an agent up and running in the Cartpole environment. API documentation for the current stable release is on tensorflow.org.

TF-Agents is under active development and interfaces may change at any time. Feedback and comments are welcome.

Table of contents

Agents
Tutorials
Multi-Armed Bandits
Examples
Installation
Contributing
Releases
Principles
Citation
Disclaimer

Agents

In TF-Agents, the core elements of RL algorithms are implemented as Agents. An agent encompasses two main responsibilities: defining a Policy to interact with the Environment, and how to learn/train that Policy from collected experience.

Currently the following algorithms are available under TF-Agents:

Tutorials

See docs/tutorials/ for tutorials on the major components provided.

Multi-Armed Bandits

The TF-Agents library contains a comprehensive Multi-Armed Bandits suite, including Bandits environments and agents. RL agents can also be used on Bandit environments. There is a tutorial in bandits_tutorial.ipynb. and ready-to-run examples in tf_agents/bandits/agents/examples/v2.

Examples

End-to-end examples training agents can be found under each agent directory. e.g.:

Installation

TF-Agents publishes nightly and stable builds. For a list of releases read the Releases section. The commands below cover installing TF-Agents stable and nightly from pypi.org as well as from a GitHub clone.

Stable

Run the commands below to install the most recent stable release. API documentation for the release is on tensorflow.org.

$ pip install --user tf-agents[reverb]

# Use this tag get the matching examples and colabs.
$ git clone https://github.com/tensorflow/agents.git
$ cd agents
$ git checkout v0.6.0

If you want to install TF-Agents with versions of Tensorflow or Reverb that are flagged as not compatible by the pip dependency check, use the following pattern below at your own risk.

$ pip install --user tensorflow
$ pip install --user dm-reverb
$ pip install --user tf-agents

If you want to use TF-Agents with TensorFlow 1.15 or 2.0, install version 0.3.0:

# Newer versions of tensorflow-probability require newer versions of TensorFlow.
$ pip install tensorflow-probability==0.8.0
$ pip install tf-agents==0.3.0

Nightly

Nightly builds include newer features, but may be less stable than the versioned releases. The nightly build is pushed as tf-agents-nightly. We suggest installing nightly versions of TensorFlow (tf-nightly) and TensorFlow Probability (tfp-nightly) as those are the versions TF-Agents nightly are tested against.

To install the nightly build version, run the following:

# `--force-reinstall helps guarantee the right versions.
$ pip install --user --force-reinstall tf-nightly
$ pip install --user --force-reinstall tfp-nightly
$ pip install --user --force-reinstall dm-reverb-nightly

# Installing with the `--upgrade` flag ensures you'll get the latest version.
$ pip install --user --upgrade tf-agents-nightly

From GitHub

After cloning the repository, the dependencies can be installed by running pip install -e .[tests]. TensorFlow needs to be installed independently: pip install --user tf-nightly.

Contributing

We're eager to collaborate with you! See CONTRIBUTING.md for a guide on how to contribute. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

Releases

TF Agents has stable and nightly releases. The nightly releases are often fine but can have issues due to upstream libraries being in flux. The table below lists the version(s) of TensorFlow tested with each TF Agents' release to help users that may be locked into a specific version of TensorFlow. 0.3.0 was the last release compatible with Python 2.

Release Branch / Tag TensorFlow Version
Nightly master tf-nightly
0.7.1 v0.7.1 2.4.0
0.6.0 v0.6.0 2.3.0
0.5.0 v0.5.0 2.2.0
0.4.0 v0.4.0 2.1.0
0.3.0 v0.3.0 1.15.0 and 2.0.0

Principles

This project adheres to Google's AI principles. By participating, using or contributing to this project you are expected to adhere to these principles.

Citation

If you use this code, please cite it as:

@misc{TFAgents,
  title = {{TF-Agents}: A library for Reinforcement Learning in TensorFlow},
  author = {Sergio Guadarrama and Anoop Korattikara and Oscar Ramirez and
     Pablo Castro and Ethan Holly and Sam Fishman and Ke Wang and
     Ekaterina Gonina and Neal Wu and Efi Kokiopoulou and Luciano Sbaiz and
     Jamie Smith and Gábor Bartók and Jesse Berent and Chris Harris and
     Vincent Vanhoucke and Eugene Brevdo},
  howpublished = {\url{https://github.com/tensorflow/agents}},
  url = "https://github.com/tensorflow/agents",
  year = 2018,
  note = "[Online; accessed 25-June-2019]"
}

Disclaimer

This is not an official Google product.

Comments
  • Error loading DqnAgent saved model.

    Error loading DqnAgent saved model.

    I am creating a tf-agent DqnAgent in the following code:

        tf_agent = dqn_agent.DqnAgent(
            train_env.time_step_spec(),
            train_env.action_spec(),
            q_network=q_net,
            optimizer=optimizer,
            td_errors_loss_fn=dqn_agent.element_wise_squared_loss,
            train_step_counter=train_step_counter
    )
    

    During the training loop I am saving this model with

        tf.saved_model.save(tf_agent, saved_models_path)
    

    Once trained, I want to load saved model with

        if tf.saved_model.contains_saved_model(saved_models_path):
            tf_agent = tf.saved_model.load(saved_models_path)
    

    This code will load the saved model only if the folder in saved_path contains one, the functions contains_saved_model(saved_models_path) returns True, so the model is loaded, but there is an excetion and the program crashes:

        Traceback (most recent call last):
            File "/home/claudino/Projetos/dino-tf-agents/dino_ia/model/agent.py", line 50, in <module>
                tf_agent = tf.saved_model.load(saved_models_path)
            File "/home/claudino/Projetos/dino-tf-agents/venv/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 408, in load
                return load_internal(export_dir, tags)
            File "/home/claudino/Projetos/dino-tf-agents/venv/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 432, in load_internal
                export_dir)
            File "/home/claudino/Projetos/dino-tf-agents/venv/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 58, in __init__
                self._load_all()
            File "/home/claudino/Projetos/dino-tf-agents/venv/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 168, in _load_all
                slot_variable = optimizer_object.add_slot(
            AttributeError: '_UserObject' object has no attribute 'add_slot'
    
            Process finished with exit code 1
    
    opened by andreclaudino 23
  • TRAIN TF-AGENTS WITH MULTIPLE GPUs

    TRAIN TF-AGENTS WITH MULTIPLE GPUs

    Hi, I finally got my vm up and running using: 2 Tesla P100 NVIDIA driver 440.33.01 cuda 10.2 tensorflow=2.1.0 tf_agents=0.3.0

    I start training a custom model/env based on SAC agent v2 train loop but only one GPU is used. My question : at the moment is tf-agents able to manage distribute training on multiple GPU? or should I use only one?

    type:support level:p1 
    opened by JCMiles 22
  • network.create_variables() clogs all GPU memory

    network.create_variables() clogs all GPU memory

    On calling network.create_variables() for my agent (using a DDPG agent), my GPU memory gets used 100% instantly and never clears up. I can control it by using a virtual memory cap, but I need memory for other computation downstream (CNN etc.) and the memory cap ensures there is no memory left for anything else.

    Why might this be happening and how do I get around this?

    opened by PrieureDeSion 20
  • tf-agents SAC 10x slower than stable-baselines on same hardware

    tf-agents SAC 10x slower than stable-baselines on same hardware

    I am running a simple test of SAC using the LunarLanderContinuous-v2 environment. Training is for 500,000 steps with a replay buffer of size 50,000 (see code below). tf-agents takes over 10 hours to complete training whereas the stable-baselines implementation of SAC using the same hyperparameters only takes 39 minutes. I've checked and double-check my version of CUDA, tensorflow-gpu, tf-agent, etc and cannot speed things up.

    Here are the details to reproduce:

    Ubuntu 16.04, tf-agents==0.3.0, tensorflow-gpu==1.15.0, gym==0.15.4, CUDA==10.0, cudnn==7.6.5, stable-baselines==2.9.0a0, GPU==Quadro M4000 8Gb, CPU==i7 64 Gb

    My tf-agents test script is simply the v2 train_eval.py script from the sac/examples after substituting the LunarLanderContinuous-v2 environment for Half Cheetah and changing the hyperparameters as you can see below:

    # coding=utf-8
    # Copyright 2018 The TF-Agents Authors.
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    r"""Train and Eval SAC.
    
    To run:
    
    #bash
    #tensorboard --logdir $HOME/tmp/sac/gym/HalfCheetah-v2/ --port 2223 &
    #
    #python tf_agents/agents/sac/examples/v2/train_eval.py \
    #  --root_dir=$HOME/tmp/sac/gym/HalfCheetah-v2/ \
    #  --alsologtostderr
    #```
    #"""
    
    from __future__ import absolute_import
    from __future__ import division
    from __future__ import print_function
    
    import os
    import time
    
    from absl import app
    from absl import flags
    from absl import logging
    
    import gin
    import tensorflow as tf
    
    from tf_agents.agents.ddpg import critic_network
    from tf_agents.agents.sac import sac_agent
    from tf_agents.drivers import dynamic_step_driver
    from tf_agents.environments import parallel_py_environment
    from tf_agents.environments import suite_mujoco
    from tf_agents.environments import tf_py_environment
    from tf_agents.eval import metric_utils
    from tf_agents.metrics import tf_metrics
    from tf_agents.networks import actor_distribution_network
    from tf_agents.networks import normal_projection_network
    from tf_agents.policies import greedy_policy
    from tf_agents.policies import random_tf_policy
    from tf_agents.replay_buffers import tf_uniform_replay_buffer
    from tf_agents.utils import common
    
    flags.DEFINE_string('root_dir', os.getenv('TEST_UNDECLARED_OUTPUTS_DIR'),
                        'Root directory for writing logs/summaries/checkpoints.')
    flags.DEFINE_multi_string('gin_file', None, 'Path to the trainer config files.')
    flags.DEFINE_multi_string('gin_param', None, 'Gin binding to pass through.')
    
    FLAGS = flags.FLAGS
    
    
    @gin.configurable
    def normal_projection_net(action_spec,
                              init_action_stddev=0.35,
                              init_means_output_factor=0.1):
      del init_action_stddev
      return normal_projection_network.NormalProjectionNetwork(
          action_spec,
          mean_transform=None,
          state_dependent_std=True,
          init_means_output_factor=init_means_output_factor,
          std_transform=sac_agent.std_clip_transform,
          scale_distribution=True)
    
    
    _DEFAULT_REWARD_SCALE = 0
    
    
    @gin.configurable
    def train_eval(
        root_dir,
        env_name='LunarLanderContinuous-v2',
        eval_env_name=None,
        env_load_fn=suite_mujoco.load,
        num_iterations=500000,
        actor_fc_layers=(64, 64),
        critic_obs_fc_layers=None,
        critic_action_fc_layers=None,
        critic_joint_fc_layers=(64, 64),
        num_parallel_environments=1,
        # Params for collect
        initial_collect_steps=100,
        collect_steps_per_iteration=1,
        replay_buffer_capacity=50000,
        # Params for target update
        target_update_tau=0.005,
        target_update_period=1,
        # Params for train
        train_steps_per_iteration=1,
        batch_size=64,
        actor_learning_rate=3e-4,
        critic_learning_rate=3e-4,
        alpha_learning_rate=3e-4,
        td_errors_loss_fn=tf.compat.v1.losses.mean_squared_error,
        gamma=0.99,
        reward_scale_factor=_DEFAULT_REWARD_SCALE,
        gradient_clipping=None,
        use_tf_functions=True,
        # Params for eval
        num_eval_episodes=100,
        eval_interval=1000,
        # Params for summaries and logging
        train_checkpoint_interval=10000,
        policy_checkpoint_interval=5000,
        rb_checkpoint_interval=50000,
        log_interval=1000,
        summary_interval=1000,
        summaries_flush_secs=10,
        debug_summaries=False,
        summarize_grads_and_vars=False,
        eval_metrics_callback=None):
      """A simple train and eval for SAC on Mujoco.
    
      All hyperparameters come from the original SAC paper
      (https://arxiv.org/pdf/1801.01290.pdf).
      """
    
      if reward_scale_factor == _DEFAULT_REWARD_SCALE:
        # Use value recommended by https://arxiv.org/abs/1801.01290
        if env_name.startswith('Humanoid'):
          reward_scale_factor = 20.0
        else:
          reward_scale_factor = 5.0
    
      root_dir = os.path.expanduser(root_dir)
    
      summary_writer = tf.compat.v2.summary.create_file_writer(
          root_dir, flush_millis=summaries_flush_secs * 1000)
      summary_writer.set_as_default()
    
      eval_metrics = [
          tf_metrics.AverageReturnMetric(buffer_size=num_eval_episodes),
          tf_metrics.AverageEpisodeLengthMetric(buffer_size=num_eval_episodes)
      ]
    
      global_step = tf.compat.v1.train.get_or_create_global_step()
      with tf.compat.v2.summary.record_if(
          lambda: tf.math.equal(global_step % summary_interval, 0)):
        # create training environment
        if num_parallel_environments == 1:
          py_env = env_load_fn(env_name)
        else:
          py_env = parallel_py_environment.ParallelPyEnvironment(
              [lambda: env_load_fn(env_name)] * num_parallel_environments)
        tf_env = tf_py_environment.TFPyEnvironment(py_env)
        # create evaluation environment
        eval_env_name = eval_env_name or env_name
        eval_py_env = env_load_fn(eval_env_name)
        eval_tf_env = tf_py_environment.TFPyEnvironment(eval_py_env)
    
        time_step_spec = tf_env.time_step_spec()
        observation_spec = time_step_spec.observation
        action_spec = tf_env.action_spec()
    
        actor_net = actor_distribution_network.ActorDistributionNetwork(
            observation_spec,
            action_spec,
            fc_layer_params=actor_fc_layers,
            continuous_projection_net=normal_projection_net)
        critic_net = critic_network.CriticNetwork(
            (observation_spec, action_spec),
            observation_fc_layer_params=critic_obs_fc_layers,
            action_fc_layer_params=critic_action_fc_layers,
            joint_fc_layer_params=critic_joint_fc_layers)
    
        tf_agent = sac_agent.SacAgent(
            time_step_spec,
            action_spec,
            actor_network=actor_net,
            critic_network=critic_net,
            actor_optimizer=tf.compat.v1.train.AdamOptimizer(
                learning_rate=actor_learning_rate),
            critic_optimizer=tf.compat.v1.train.AdamOptimizer(
                learning_rate=critic_learning_rate),
            alpha_optimizer=tf.compat.v1.train.AdamOptimizer(
                learning_rate=alpha_learning_rate),
            target_update_tau=target_update_tau,
            target_update_period=target_update_period,
            td_errors_loss_fn=td_errors_loss_fn,
            gamma=gamma,
            reward_scale_factor=reward_scale_factor,
            gradient_clipping=gradient_clipping,
            debug_summaries=debug_summaries,
            summarize_grads_and_vars=summarize_grads_and_vars,
            train_step_counter=global_step)
        tf_agent.initialize()
    
        # Make the replay buffer.
        replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
            data_spec=tf_agent.collect_data_spec,
            batch_size=num_parallel_environments,
            max_length=replay_buffer_capacity)
        replay_observer = [replay_buffer.add_batch]
    
        env_steps = tf_metrics.EnvironmentSteps(prefix='Train')
        average_return = tf_metrics.AverageReturnMetric(
            prefix='Train',
            buffer_size=num_eval_episodes,
            batch_size=tf_env.batch_size)
        train_metrics = [
            tf_metrics.NumberOfEpisodes(prefix='Train'),
            env_steps,
            average_return,
            tf_metrics.AverageEpisodeLengthMetric(
                prefix='Train',
                buffer_size=num_eval_episodes,
                batch_size=tf_env.batch_size),
        ]
    
        eval_policy = greedy_policy.GreedyPolicy(tf_agent.policy)
        initial_collect_policy = random_tf_policy.RandomTFPolicy(
            tf_env.time_step_spec(), tf_env.action_spec())
        collect_policy = tf_agent.collect_policy
    
        train_checkpointer = common.Checkpointer(
            ckpt_dir=os.path.join(root_dir, 'train'),
            agent=tf_agent,
            global_step=global_step,
            metrics=metric_utils.MetricsGroup(train_metrics, 'train_metrics'))
        policy_checkpointer = common.Checkpointer(
            ckpt_dir=os.path.join(root_dir, 'policy'),
            policy=eval_policy,
            global_step=global_step)
        rb_checkpointer = common.Checkpointer(
            ckpt_dir=os.path.join(root_dir, 'replay_buffer'),
            max_to_keep=1,
            replay_buffer=replay_buffer)
    
        train_checkpointer.initialize_or_restore()
        rb_checkpointer.initialize_or_restore()
    
        initial_collect_driver = dynamic_step_driver.DynamicStepDriver(
            tf_env,
            initial_collect_policy,
            observers=replay_observer + train_metrics,
            num_steps=initial_collect_steps)
    
        collect_driver = dynamic_step_driver.DynamicStepDriver(
            tf_env,
            collect_policy,
            observers=replay_observer + train_metrics,
            num_steps=collect_steps_per_iteration)
    
        if use_tf_functions:
          initial_collect_driver.run = common.function(initial_collect_driver.run)
          collect_driver.run = common.function(collect_driver.run)
          tf_agent.train = common.function(tf_agent.train)
    
        # Collect initial replay data.
        if env_steps.result() == 0 or replay_buffer.num_frames() == 0:
          logging.info(
              'Initializing replay buffer by collecting experience for %d steps'
              'with a random policy.', initial_collect_steps)
          initial_collect_driver.run()
    
        results = metric_utils.eager_compute(
            eval_metrics,
            eval_tf_env,
            eval_policy,
            num_episodes=num_eval_episodes,
            train_step=env_steps.result(),
            summary_writer=summary_writer,
            summary_prefix='Eval',
        )
        if eval_metrics_callback is not None:
          eval_metrics_callback(results, env_steps.result())
        metric_utils.log_metrics(eval_metrics)
    
        time_step = None
        policy_state = collect_policy.get_initial_state(tf_env.batch_size)
    
        time_acc = 0
        env_steps_before = env_steps.result().numpy()
    
        # Dataset generates trajectories with shape [Bx2x...]
        dataset = replay_buffer.as_dataset(
            num_parallel_calls=3, sample_batch_size=batch_size,
            num_steps=2).prefetch(3)
        iterator = iter(dataset)
    
        def train_step():
          experience, _ = next(iterator)
          return tf_agent.train(experience)
    
        if use_tf_functions:
          train_step = common.function(train_step)
    
        for _ in range(num_iterations):
          start_time = time.time()
          time_step, policy_state = collect_driver.run(
              time_step=time_step,
              policy_state=policy_state,
          )
          for _ in range(train_steps_per_iteration):
            train_step()
          time_acc += time.time() - start_time
    
          if global_step.numpy() % log_interval == 0:
            logging.info('env steps = %d, average return = %f', env_steps.result(),
                         average_return.result())
            env_steps_per_sec = (env_steps.result().numpy() -
                                 env_steps_before) / time_acc
            logging.info('%.3f env steps/sec', env_steps_per_sec)
            tf.compat.v2.summary.scalar(
                name='env_steps_per_sec',
                data=env_steps_per_sec,
                step=env_steps.result())
            time_acc = 0
            env_steps_before = env_steps.result().numpy()
    
          for train_metric in train_metrics:
            train_metric.tf_summaries(train_step=env_steps.result())
    
          if global_step.numpy() % eval_interval == 0:
            results = metric_utils.eager_compute(
                eval_metrics,
                eval_tf_env,
                eval_policy,
                num_episodes=num_eval_episodes,
                train_step=env_steps.result(),
                summary_writer=summary_writer,
                summary_prefix='Eval',
            )
            if eval_metrics_callback is not None:
              eval_metrics_callback(results, env_steps.result())
            metric_utils.log_metrics(eval_metrics)
    
          global_step_val = global_step.numpy()
          if global_step_val % train_checkpoint_interval == 0:
            train_checkpointer.save(global_step=global_step_val)
    
          if global_step_val % policy_checkpoint_interval == 0:
            policy_checkpointer.save(global_step=global_step_val)
    
          if global_step_val % rb_checkpoint_interval == 0:
            rb_checkpointer.save(global_step=global_step_val)
    
    
    def main(_):
      tf.compat.v1.enable_v2_behavior()
      logging.set_verbosity(logging.INFO)
      gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_param)
      train_eval(FLAGS.root_dir)
    
    
    if __name__ == '__main__':
      flags.mark_flag_as_required('root_dir')
      app.run(main)
    

    My stable-baselines script looks like this:

    import gym
    import numpy as np
    
    from stable_baselines.common.vec_env import DummyVecEnv
    from stable_baselines.common import make_vec_env
    from stable_baselines.sac.policies import MlpPolicy
    from stable_baselines import SAC
    
    env = make_vec_env('LunarLanderContinuous-v2', n_envs=1)
    
    model_name = "sac_lunar_lander"
    
    model = SAC(MlpPolicy, env, verbose=1, tensorboard_log="./tensorboard_logs/stable_baselines_test")
    
    model.learn(total_timesteps=500000, log_interval=10)
    model.save(model_name)
    
    

    Finally, here is the output when I run the tf-agents script to show that the GPU is being detected and used:

    2019-12-22 11:26:35.054589: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
    2019-12-22 11:26:35.068596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
    name: Quadro M4000 major: 5 minor: 2 memoryClockRate(GHz): 0.7725
    pciBusID: 0000:01:00.0
    2019-12-22 11:26:35.068767: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2019-12-22 11:26:35.069770: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    2019-12-22 11:26:35.070479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
    2019-12-22 11:26:35.070640: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
    2019-12-22 11:26:35.071572: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
    2019-12-22 11:26:35.072306: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
    2019-12-22 11:26:35.074604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2019-12-22 11:26:35.075808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
    2019-12-22 11:26:35.076022: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2019-12-22 11:26:35.080915: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407920000 Hz
    2019-12-22 11:26:35.081214: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555945a77880 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2019-12-22 11:26:35.081228: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    2019-12-22 11:26:35.144953: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555945a9b180 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
    2019-12-22 11:26:35.144974: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro M4000, Compute Capability 5.2
    2019-12-22 11:26:35.145550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
    name: Quadro M4000 major: 5 minor: 2 memoryClockRate(GHz): 0.7725
    pciBusID: 0000:01:00.0
    2019-12-22 11:26:35.145578: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2019-12-22 11:26:35.145588: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    2019-12-22 11:26:35.145597: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
    2019-12-22 11:26:35.145605: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
    2019-12-22 11:26:35.145629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
    2019-12-22 11:26:35.145650: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
    2019-12-22 11:26:35.145674: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2019-12-22 11:26:35.146551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
    2019-12-22 11:26:35.146575: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2019-12-22 11:26:35.147375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-12-22 11:26:35.147384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
    2019-12-22 11:26:35.147388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
    2019-12-22 11:26:35.148348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6876 MB memory) -> physical GPU (device: 0, name: Quadro M4000, pci bus id: 0000:01:00.0, compute capability: 5.2)
    /home/patrick/src/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
      warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
    WARNING:tensorflow:From /home/patrick/src/tf_agents/tf_agents/agents/ddpg/critic_network.py:141: The name tf.keras.initializers.RandomUniform is deprecated. Please use tf.compat.v1.keras.initializers.RandomUniform instead.
    
    W1222 11:26:35.589284 140187933329152 module_wrapper.py:139] From /home/patrick/src/tf_agents/tf_agents/agents/ddpg/critic_network.py:141: The name tf.keras.initializers.RandomUniform is deprecated. Please use tf.compat.v1.keras.initializers.RandomUniform instead.
    
    2019-12-22 11:26:35.600509: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    WARNING:tensorflow:From /home/patrick/src/tf_agents/tf_agents/distributions/utils.py:92: AffineScalar.__init__ (from tensorflow_probability.python.bijectors.affine_scalar) is deprecated and will be removed after 2020-01-01.
    Instructions for updating:
    `AffineScalar` bijector is deprecated; please use `tfb.Shift(loc)(tfb.Scale(...))` instead.
    W1222 11:26:35.787435 140187933329152 deprecation.py:323] From /home/patrick/src/tf_agents/tf_agents/distributions/utils.py:92: AffineScalar.__init__ (from tensorflow_probability.python.bijectors.affine_scalar) is deprecated and will be removed after 2020-01-01.
    Instructions for updating:
    `AffineScalar` bijector is deprecated; please use `tfb.Shift(loc)(tfb.Scale(...))` instead.
    I1222 11:26:35.814536 140187933329152 common.py:920] Checkpoint available: tensorboard_logs/tf_agents_v2/train/ckpt-30000
    I1222 11:26:35.902629 140187933329152 common.py:920] Checkpoint available: tensorboard_logs/tf_agents_v2/policy/ckpt-35000
    I1222 11:26:35.908307 140187933329152 common.py:923] No checkpoint available at tensorboard_logs/tf_agents_v2/replay_buffer
    I1222 11:26:35.910735 140187933329152 tf_agents_v2_lunar_lander.py:267] Initializing replay buffer by collecting experience for 100 stepswith a random policy.
    WARNING:tensorflow:From /home/patrick/src/tf_agents/tf_agents/metrics/tf_metrics.py:161: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    W1222 11:26:36.424730 140187933329152 deprecation.py:323] From /home/patrick/src/tf_agents/tf_agents/metrics/tf_metrics.py:161: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    I1222 11:28:23.095548 140187933329152 metric_utils.py:47]  
    		 AverageReturn = 1.452040195465088
    		 AverageEpisodeLength = 501.0
    I1222 11:28:34.015443 140187933329152 tf_agents_v2_lunar_lander.py:314] env steps = 31200, average return = -80.228371
    I1222 11:28:34.015817 140187933329152 tf_agents_v2_lunar_lander.py:317] 131.060 env steps/sec
    etc.
    

    And the output from nvidia-smi while running the script:

    Sun Dec 22 11:29:16 2019       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 410.129      Driver Version: 410.129      CUDA Version: 10.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Quadro M4000        Off  | 00000000:01:00.0  On |                  N/A |
    | 51%   56C    P0    43W / 120W |   7865MiB /  8104MiB |     10%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0      1370      G   /usr/lib/xorg/Xorg                           435MiB |
    |    0      2062      G   compiz                                       146MiB |
    |    0      3479      G   ...uest-channel-token=17571043003057555071   211MiB |
    |    0     17466      C   python                                      7057MiB |
    +-----------------------------------------------------------------------------+
    
    type:performance level:p1 
    opened by pirobot 18
  • tf-agents-nightly installed on colab seems very different from the master branch

    tf-agents-nightly installed on colab seems very different from the master branch

    tf-agents-nightly installed on colab seems very different from the master branch. The experimental examples folder is missing . Not 100% if this is a colab issue or a tf-agents issue.

    opened by chokosabe 17
  • Problem with importing the

    Problem with importing the "reverb" package with Tutorial: SAC minitaur with the Actor-Learner API

    HI,

    I am getting an ImportError when trying to import the "reverb" package as done in the tutorial.

    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-2-38745e83da94> in <module>
          4 import matplotlib.pyplot as plt
          5 import os
    ----> 6 import reverb
          7 import tempfile
          8 import PIL.Image
    
    ~/Desktop/AI/ai_venv/lib/python3.7/site-packages/reverb/__init__.py in <module>
         25 # pylint: enable=g-bad-import-order
         26 
    ---> 27 from reverb import item_selectors as selectors
         28 from reverb import rate_limiters
         29 
    
    ~/Desktop/AI/ai_venv/lib/python3.7/site-packages/reverb/item_selectors.py in <module>
         17 import functools
         18 
    ---> 19 from reverb import pybind
         20 
         21 Fifo = pybind.FifoSelector
    
    ~/Desktop/AI/ai_venv/lib/python3.7/site-packages/reverb/pybind.py in <module>
    ----> 1 import tensorflow as _tf; from .libpybind import *; del _tf
    
    ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
    

    I have tried to export this variable: export LD_LIBRARY_PATH=/home/orie/Desktop/AI/ai_venv/lib/

    I have also tried including this environment variable in my python notebook:

    import os
    os.environ['LD_LIBRARY_PATH'] = '/home/orie/Desktop/AI/ai_venv/lib/'
    

    I also tried: sudo ldconfig /home/orie/Desktop/AI/ai_venv/lib I'm using Ubuntu and a virtual environment.

    Thx for anyone who helps!

    opened by orshemtov 16
  • DQN Agent Issue With Custom Environment

    DQN Agent Issue With Custom Environment

    So I've been following the DQN agent example / tutorial and I set it up like in the example, only difference is that I built my own custom python environment which I then wrapped in TensorFlow. However, no matter how I shape my observations and action specs, I can't seem to get it to work whenever I give it an observation and request an action. Here's the error that I get:

    tensorflow.python.framework.errors_impl.InvalidArgumentError: In[0] is not a matrix. Instead it has shape [10] [Op:MatMul]

    Here's how I'm setting up my agent:

        layer_parameters = (10,) #10 layers deep, shape is unspecified
        
        #placeholders 
        learning_rate = 1e-3  # @param {type:"number"}
        train_step_counter = tf.Variable(0)
    
        #instantiate agent
    
        optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)
        
        env = SumoEnvironment(self._num_actions,self._num_states)
        env2 = tf_py_environment.TFPyEnvironment(env)
        q_net= q_network.QNetwork(env2.observation_spec(),env2.action_spec(),fc_layer_params = layer_parameters)
        
        print("Time step spec")
        print(env2.time_step_spec())
    
        agent = dqn_agent.DqnAgent(env2.time_step_spec(),
        env2.action_spec(),
        q_network=q_net,
        optimizer = optimizer,
        td_errors_loss_fn=common.element_wise_squared_loss,
        train_step_counter=train_step_counter)`
    

    And here's how I'm setting up my environment:

    `class SumoEnvironment(py_environment.PyEnvironment):

    def __init__(self, no_of_Actions, no_of_Observations):
    
        #this means that the observation consists of a number of arrays equal to self._num_states, with datatype float32
        self._observation_spec = specs.TensorSpec(shape=(16,),dtype=np.float32,name='observation')
        #action spec, shape unknown, min is 0, max is the number of actions
        self._action_spec = specs.BoundedArraySpec(shape=(1,),dtype=np.int32,minimum=0,maximum=no_of_Actions-1,name='action')
        
       
        self._state = 0
        self._episode_ended = False`
    

    And here is what my input / observations look like:

    tf.Tensor([ 0. 0. 0. 0. 0. 0. 0. 0. -1. -1. -1. -1. 0. 0. 0. -1.], shape=(16,), dtype=float32)

    I've tried experimenting with the shape and depth of my Q_Net and it seems to me that the [10] in the error is related to the shape of my q network. Setting its layer parameters to (4,) yields an error of:

    tensorflow.python.framework.errors_impl.InvalidArgumentError: In[0] is not a matrix. Instead it has shape [4] [Op:MatMul]

    opened by IbraheemNofal 16
  • Feature request make it easier to supply custom model

    Feature request make it easier to supply custom model

    I tried assigning my own layers to the post_processing variable within my categorical qnetwork but i get a message that weights are shared. when i try to use then create my categorical dqn agent. It would be nice if the main categorical q network constructor allowed a parameter for you to provide a set of keras layers where the q_layer is just appended to the end like it is the the encoding network scheme. The weights will be copied for you.

    opened by ben-arnao 15
  • AttributeError: 'tuple' object has no attribute 'rank'

    AttributeError: 'tuple' object has no attribute 'rank'

    Trying out the most basic example on

    • Windows 10
    • Python 3.7
    • tensorflow 2.1.0
    • tf-agents 0.4.0

    Error i get

    Traceback (most recent call last):
      File "src\agent.py", line 58, in <module>
        action, _states = agent.policy.action(obs)
      File "C:\Users\andre\.virtualenvs\ZeusTrader\lib\site-packages\tf_agents\policies\tf_policy.py", line 279, in action
        step = action_fn(time_step=time_step, policy_state=policy_state, seed=seed)
      File "C:\Users\andre\.virtualenvs\ZeusTrader\lib\site-packages\tf_agents\utils\common.py", line 154, in with_check_resource_vars
        return fn(*fn_args, **fn_kwargs)
      File "C:\Users\andre\.virtualenvs\ZeusTrader\lib\site-packages\tf_agents\policies\random_tf_policy.py", line 89, in _action
        outer_dims = nest_utils.get_outer_shape(time_step, self._time_step_spec)
      File "C:\Users\andre\.virtualenvs\ZeusTrader\lib\site-packages\tf_agents\utils\nest_utils.py", line 394, in get_outer_shape
        nested_tensor, spec, num_outer_dims=num_outer_dims):
      File "C:\Users\andre\.virtualenvs\ZeusTrader\lib\site-packages\tf_agents\utils\nest_utils.py", line 97, in is_batched_nested_tensors
        if any(spec_shape.rank is None for spec_shape in spec_shapes):
      File "C:\Users\andre\.virtualenvs\ZeusTrader\lib\site-packages\tf_agents\utils\nest_utils.py", line 97, in <genexpr>
        if any(spec_shape.rank is None for spec_shape in spec_shapes):
    AttributeError: 'tuple' object has no attribute 'rank'
    
    

    Code i run

    import tensorflow as tf
    from collections import Counter, defaultdict
    from tf_agents.networks import q_network
    from tf_agents.utils import common
    from tf_agents.agents.dqn import dqn_agent
    from tf_agents.agents.random.random_agent import RandomAgent
    from tf_agents.environments import suite_gym
    from environment import StockExchangeEnv01
    
    # tried with and without..error persists
    # tf.compat.v1.enable_v2_behavior()
    
    learning_rate = 0.0001
    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)
    
    # tried both my own Environment and the basic "cartpole-v0"
    train_env = StockExchangeEnv01()
    env_name = 'CartPole-v0'
    #train_env = suite_gym.load(env_name)
    
    train_env.reset()
    print(train_env.action_spec())
    """
    # Neural Net of the Agent. This NN will get x (env) and spit out y (action).
    q_net = q_network.QNetwork(
      train_env.observation_spec(),
      train_env.action_spec(),
      fc_layer_params=(100,))
    print(train_env.action_spec())
    
    #
    agent = dqn_agent.DqnAgent(
      train_env.time_step_spec(),
      train_env.action_spec(),
      q_network=q_net,
      optimizer=optimizer)
    """
    
    # tried both..dqn agent and random agent
    
    agent = RandomAgent(
        train_env.time_step_spec(),
        train_env.action_spec()
    )
    agent.initialize()
    
    obs = train_env.reset()
    actions = Counter()
    pnl = defaultdict(float)
    total_rewards = 0.0
    
    for i in range(300):
        #action, _states = model.predict(obs)
        action, _states = agent.policy.action(obs)
        obs, rewards, dones, info = train_env.step(action)
        actions[action[0].item()] += 1
        pnl[action[0].item()] += rewards
        total_rewards += rewards
        if dones:
            break
    
    print('actions : {}'.format(actions))
    print('rewards : {}'.format(total_rewards))
    
    

    The code in tf agents gets the 'shape' from the action_spec, which is a tuple in my case. Then it tries to retrieve key "rank" from a tuple.

    What am i missing?

    opened by AndreyBulezyuk 15
  • Memory leak with DqnAgent

    Memory leak with DqnAgent

    I have built basic DQN agent to play within CartPole environment by following the DQN tutorial: https://www.tensorflow.org/agents/tutorials/1_dqn_tutorial However, after couple of training hours I noticed that process is increasing memory consumption substantially. I was able to simplify the training script in order to narrow down the problem and figured out that memory leaks whenever driver is using agent.policy or agent.collect_policy (replacing that one with RandomTFPolicy eliminates the issue):

    import tensorflow as tf
    import gc
    
    from tf_agents.environments import suite_gym, tf_py_environment
    from tf_agents.networks import q_network
    from tf_agents.agents.dqn import dqn_agent
    from tf_agents.drivers import dynamic_step_driver
    from tf_agents.utils import common
    
    tf.compat.v1.enable_v2_behavior()
    
    # Create CartPole as TFPyEnvironment
    env = suite_gym.load('CartPole-v0')
    tf_env = tf_py_environment.TFPyEnvironment(env)
    
    # Create DQN Agent
    q_net = q_network.QNetwork(
            tf_env.observation_spec(),
            tf_env.action_spec(),
            fc_layer_params=(100,))
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
    train_step_counter = tf.Variable(0)
    
    agent = dqn_agent.DqnAgent(
        tf_env.time_step_spec(),
        tf_env.action_spec(),
        q_network=q_net,
        optimizer=optimizer,
        td_errors_loss_fn=common.element_wise_squared_loss,
        train_step_counter=train_step_counter)
    
    agent.initialize()
    
    # Replacing agent.collect_policy with tf_policy eliminates issue a of memory leak
    # tf_policy = random_tf_policy.RandomTFPolicy(action_spec=train_env.action_spec(),
    #                                            time_step_spec=train_env.time_step_spec())
    
    # Create dynamic step driver with no observers
    driver = dynamic_step_driver.DynamicStepDriver(
        env = tf_env,
        policy = agent.collect_policy,
        observers = [],
        num_steps = 1)
    
    # Calls to driver end up continuously increasing memory consumption 
    while True:
        driver.run()
        # One of the possible solutions is to call gc.collect() but it significantly slows down training
    

    Other hotfix as mentioned in the code above is to call gc.collect() after each driver.run() but that has huge impact on the performance.

    This memory leak prevents long-running training process which might be a bit of bummer for more complex environments based on DQN.

    Running setup:

    • Ubuntu 20.10 / 64-bit
    • Python 3.8.6 + tensorflow==2.4.1 + tf-agents==0.7.1
    • Running on the CPU: AMD Ryzen Threadripper 3960x
    • RAM: 128GB

    Same script has been also run within Docker container and confirmed memory leak.

    What could be possible cause for this problem and how to properly fix it?

    opened by romandunets 14
  • OOM after a couple of iterations

    OOM after a couple of iterations

    I am running DQN on an Atari game (BeamRider-v0). I just get the input image and flatten it and connect it to a fully connected layer with 32 neurons. It runs for 14000 iterations on a Telsa v100 GPU. After 14000 iterations, I get OOM. Is there a memory leak? I am using tf-nightly-gpu-2.0-preview. I have also tried tf-nightly-gpu and the same problem exists. My question is why I don't get the error at the very first iterations? What causes memory usage to grow for 14000 iterations?

    opened by siavash-khodadadeh 14
  • AttributeError: module 'tree' has no attribute 'assert_same_structure'

    AttributeError: module 'tree' has no attribute 'assert_same_structure'

    when I import tf_agents, there is no error. However, when I run "from tf_agents.agents.dqn import dqn_agent" it gives me AttributeError: module 'tree' has no attribute 'assert_same_structure'

    opened by abbiesgame 0
  • collect_step slow speed

    collect_step slow speed

    Hi, I reference the TensorFlow official website example, it shows the collect_step function and usage as the following.

    def collect_step(environment, policy):
      time_step = environment.current_time_step()
      action_step = policy.action(time_step)
      next_time_step = environment.step(action_step.action)
      traj = trajectory.from_transition(time_step, action_step, next_time_step)
    
      # Add trajectory to the replay buffer
      replay_buffer.add_batch(traj)
    
    for _ in range(initial_collect_steps):
      collect_step(train_env, random_policy)
    
    for _ in range(num_iterations):
    
      # Collect a few steps using collect_policy and save to the replay buffer.
      for _ in range(collect_steps_per_iteration):
        collect_step(train_env, agent.collect_policy)
    

    However, when it comes to multi-step, the above code is quite slow. To my understanding, the reason it is slow is because of the communication between GPU and CPU for each action. If I am wrong please let me know.

    I wonder if is there any way I can speed this up with the TensorFlow library function so that the iteration for collection_step can run inside GPU for faster training?

    Thanks in advance.

    Best Regards, Jack Lu

    opened by jacklu333333 0
  • SAC minitaur with the Actor-Learner API demonstrator fails

    SAC minitaur with the Actor-Learner API demonstrator fails

    i run the SAC minitaur with the Actor-Learner API code from the tutorial

    At first I get the error that I need to upgrade tensorflow to version 2.11.0, because of incompatibility with tensorflow probability

    • tensorflow 2.11.0
    • tensorflow-estimator 2.11.0
    • tensorflow-intel 2.11.0
    • tensorflow-io-gcs-filesystem 0.27.0
    • tensorflow-probability 0.19.0
    • termcolor 2.0.1
    • terminado 0.17.0
    • tf-agents 0.15.0

    after upgrade i get the following error when importing any tf_agents module File "C:\tools\lib\site-packages\tf_agents\__init__.py", line 55, in _ensure_tf_install tf_version = tf.version.VERSION AttributeError: module 'tensorflow' has no attribute 'version'

    opened by ThorAvaTahr 0
  • Errors with numpy 1.24.0

    Errors with numpy 1.24.0

    I tried to use tf-agents, the latest version. However, if I run a simple class which only extends PyEnvironment but nothing else, I receive an with a message like

    module 'numpy' does not contain attribute named 'bool'. Did you mean 'bool_'

    There are several similar issues with numpy, sometimes re-installing numpy helps. In my case it didn't, I tried the common workflow of uninstalling setuptools and numpy.

    I'm using:

    • Python 3.10 (Python 3.11 doesn't work also...)
    • Numpy 1.24.0

    Is there anything I've left?

    opened by sebastianknopf 3
  • PPO with Mini-Batches Tutorial

    PPO with Mini-Batches Tutorial

    The documentation of PPO describes the training process of PPO as the following:

    # Build PPO agent
    ppo_agent = PPOClipAgent(num_epochs=40, ...)
    
    # Build Replay Buffer
    replay_buffer = TFUniformReplayBuffer(data_spec=ppo_agent.collect_data_spec,batch_size=env.batch_size, max_length=1000)
    
    # Train agent
    experiences, _ = replay_buffer.gather_all()
    loss = ppo_agent.train(experiences).loss
    replay_buffer.clear()
    

    However, that way you train ppo_agent with 1 large batch of experiences for 40 epochs. However, if the number of experiences is high (e.g. 1024 experiences), you might want to to train PPO on mini batches (e.g. 4 mini-batches of 256 experiences, 40 epochs each mini-batch).

    The only way to do that is to build a dataset from replay_buffer and fetch experiences by iterating the dataset. However, this produces random batches, instead of equally selected mini-batches:

    # Use 1 epoch per batch
    ppo_agent = PPOClipAgent(num_epochs=1, ...)
    
    # Build dataset iter
    dataset = replay_buffer.as_dataset(sample_batch_size=200, num_steps=2, num_parallel_calls=2).prefetch(2)
    dataset_iter = iter(dataset)
    
    # Training part
    loss = 0
    for _ in range(40):
        for _ in range(4):
            mini_batch_experiences, _ = next(dataset_iter)
            loss += ppo_agent.train(mini_batch_experiences)
    replay_buffer.clear()
    loss /= (40*4)
    

    However, this approach has the following issue: It randomly selects 256 experiences from the memory, in a uniform way, but that doesn't ensure that each experience will be equally selected. Is there a better method to train PPO? Also, for some reason, this takes way more time to train than using a single batch as in the approach, and gets worse training results, so am I missing something else here?

    opened by kochlisGit 0
Owner
null
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Intel Labs 2.2k Jan 5, 2023
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Wah Loon Keng 1.1k Dec 24, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 2, 2023
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Chainer 1.1k Dec 26, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 1, 2023
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Reinforcement Learning Working Group 823 Jan 6, 2023
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Marek Wydmuch 1.5k Dec 30, 2022
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Reinforcement Learning Working Group 1.6k Jan 9, 2023
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

OpenAI 13.5k Jan 7, 2023
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Ashley Hill 3.7k Jan 1, 2023
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 7, 2023
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 5.4k Jan 4, 2023
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 5, 2023
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

null 404 Dec 25, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

AgentMaker 117 Dec 12, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Facebook Research 3.3k Jan 5, 2023
Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

Carousel Personalization in Music Streaming Apps with Contextual Bandits - RecSys 2020 This repository provides Python code and data to reproduce expe

Deezer 48 Jan 2, 2023
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 23.2k Dec 30, 2022