OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

OpenAI

Last update: Jan 7, 2023

Related tags

Reinforcement Learning baselines

Overview

Status: Maintenance (expect bug fixes and minor updates)

Baselines

OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.

These algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones.

Prerequisites

Baselines requires python3 (>=3.5) with the development headers. You'll also need system packages CMake, OpenMPI and zlib. Those can be installed as follows

Ubuntu

sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev

Mac OS X

Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the following:

brew install cmake openmpi

Virtual environment

From the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via

pip install virtualenv

Virtualenvs are essentially folders that have copies of python executable and all python packages. To create a virtualenv called venv with python3, one runs

virtualenv /path/to/venv --python=python3

To activate a virtualenv:

. /path/to/venv/bin/activate

More thorough tutorial on virtualenvs and options can be found here

Tensorflow versions

The master branch supports Tensorflow from version 1.4 to 1.14. For Tensorflow 2.0 support, please use tf2 branch.

Installation

Clone the repo and cd into it:

git clone https://github.com/openai/baselines.git
cd baselines

If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases, you may use
```
pip install tensorflow-gpu==1.14 # if you have a CUDA-compatible gpu and proper drivers
```
or
```
pip install tensorflow==1.14
```
to install Tensorflow 1.14, which is the latest version of Tensorflow supported by the master branch. Refer to TensorFlow installation guide for more details.
Install baselines package
```
pip install -e .
```

MuJoCo

Some of the baselines examples use MuJoCo (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from www.mujoco.org). Instructions on setting up MuJoCo can be found here

Testing the installation

All unit tests in baselines can be run using pytest runner:

pip install pytest
pytest

Training models

Most of the algorithms in baselines repo are used as follows:

python -m baselines.run --alg=<name of the algorithm> --env=<environment_id> [additional arguments]

Example 1. PPO with MuJoCo Humanoid

For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7

Note that for mujoco environments fully-connected network is default, so we can omit --network=mlp The hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance:

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy

will set entropy coefficient to 0.1, and construct fully connected network with 3 layers with 32 hidden units in each, and create a separate network for value function estimation (so that its parameters are not shared with the policy network, but the structure is the same)

See docstrings in common/models.py for description of network parameters for each type of model, and docstring for baselines/ppo2/ppo2.py/learn() for the description of the ppo2 hyperparameters.

Example 2. DQN on Atari

DQN with Atari is at this point a classics of benchmarks. To run the baselines implementation of DQN on Atari Pong:

python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6

Saving, loading and visualizing models

Saving and loading the model

The algorithms serialization API is not properly unified yet; however, there is a simple method to save / restore trained models. --save_path and --load_path command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively. Let's imagine you'd like to train ppo2 on Atari Pong, save the model and then later visualize what has it learnt.

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2

This should get to the mean reward per episode about 20. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize:

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --load_path=~/models/pong_20M_ppo2 --play

NOTE: Mujoco environments require normalization to work properly, so we wrap them with VecNormalize wrapper. Currently, to ensure the models are saved with normalization (so that trained models can be restored and run without further training) the normalization coefficients are saved as tensorflow variables. This can decrease the performance somewhat, so if you require high-throughput steps with Mujoco and do not need saving/restoring the models, it may make sense to use numpy normalization instead. To do that, set 'use_tf=False` in baselines/run.py.

Logging and vizualizing learning curves and other training metrics

By default, all summary data, including progress, standard output, is saved to a unique directory in a temp folder, specified by a call to Python's tempfile.gettempdir(). The directory can be changed with the --log_path command-line option.

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2 --log_path=~/logs/Pong/

NOTE: Please be aware that the logger will overwrite files of the same name in an existing directory, thus it's recommended that folder names be given a unique timestamp to prevent overwritten logs.

Another way the temp directory can be changed is through the use of the $OPENAI_LOGDIR environment variable.

For examples on how to load and display the training data, see here.

Subpackages

A2C
ACER
ACKTR
DDPG
DQN
GAIL
HER
PPO1 (obsolete version, left here temporarily)
PPO2
TRPO

Benchmarks

Results of benchmarks on Mujoco (1M timesteps) and Atari (10M timesteps) are available here for Mujoco and here for Atari respectively. Note that these results may be not on the latest version of the code, particular commit hash with which results were obtained is specified on the benchmarks page.

To cite this repository in publications:

@misc{baselines,
  author = {Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai and Zhokhov, Peter},
  title = {OpenAI Baselines},
  year = {2017},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/openai/baselines}},
}

Comments

Error when restoring model to run enjoy.py

Hi,

I was running these two commands:

python -m baselines.deepq.experiments.atari.download_model --blob model-atari-prior-duel-breakout-1 --model-dir /tmp/models
python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling

in the bottom of README.

However, I got the following error:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [6]
         [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@deepq/q_func/action_value/fully_connected_1/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](deepq/q_func/action_value/fully_connected_1/biases, save/RestoreV2_3/_1)]]

opened by yenchenlin 24

Unable to reproduce HER results from Plappert et al., 2018

I ran the code baselines.her.experiment.train for five different seeds and generated plots with some minor changes to the plotting code (see issue #311) for HandManipulateBlockRotateXYZ-v0 across 200 epochs. I verified from the .json file that the 'scope' was 'DDPG' and that the rewards were sparse.

Unfortunately the performance of the algorithm was substantially worse than I found in the paper (above are my results and below are the results from Plappert et al., 2018 https://arxiv.org/abs/1802.09464):

Any idea why the results are so much worse? Further, this issue seems to apply to multiple other cases including 'FetchSlide' and 'HandManipulateBlockRotateZ' although I have only run one seed for these.

In case its useful, I also copy the details of the 'params.json' file:

{"pi_lr": 0.001, "network_class": "baselines.her.actor_critic:ActorCritic", "norm_clip": 5, "polyak": 0.95, "scope": "ddpg", "n_cycles": 50, "random_eps": 0.3, "env_name": "HandManipulateBlockRotateXYZ-v0", "rollout_batch_size": 2, "n_batches": 40, "layers": 3, "buffer_size": 1000000, "action_l2": 1.0, "Q_lr": 0.001, "clip_obs": 200.0, "hidden": 256, "test_with_polyak": false, "batch_size": 256, "noise_eps": 0.2, "n_test_rollouts": 10, "relative_goals": false, "norm_eps": 0.01, "max_u": 1.0, "replay_strategy": "future", "replay_k": 4}

And of course, I'd like to thank OpenAI for making their code and environments available; it's really helpful to independent researchers!

opened by sanjeevanahilan 18
Could you explain how to execute PPO and TRPO?

There is readme explaining all the process to execute deepqn algorithm.

However, there is no such thing for PPO and TRPO....

Could you please explain how to execute PPO and TRPO?

opened by wonchul-kim 18
Running into issues on example execution
Get this error when I run the first example python3 -m baselines.deepq.experiments.train_cartpole:

/usr/bin/python3: Error while finding spec for 'baselines.deepq.experiments.train_cartpole' (<class 'ImportError'>: cannot import name 'deepq')

I have both Python 2 and 3 installed. Thus I installed baselines with pip3. Any suggestions?
opened by ampnirvana 14
Deobfuscation of the code base + pep8 and fixes
Fixed tf.session().__enter__() being used, rather than sess = tf.session() and passing the session to the objects

Fixed uneven scoping of TensorFlow Sessions throughout the code

Fixed rolling vecwrapper to handle observations that are not only grayscale images

Fixed deepq saving the environment when trying to save itself

Fixed ValueError: Cannot take the length of Shape with unknown rank. in acktr, when running run_atari.py script.

Fixed calling baselines sequentially no longer creates graph conflicts

Fixed mean on empty array warning with deepq

Fixed kfac eigen decomposition not cast to float64, when the parameter use_float64 is set to True

Fixed Dataset data loader, not correctly resetting id position if shuffling is disabled

Fixed EOFError when reading from connection in the worker in subproc_vec_env.py

Fixed behavior_clone weight loading and saving for GAIL

Avoid taking root square of negative number in trpo_mpi.py

Removed some duplicated code (a2cpolicy, trpo_mpi)

Removed unused, undocumented and crashing function reset_task in subproc_vec_env.py

Reformated code to PEP8 style

Documented all the codebase

Added atari tests

Added logger tests

Missing: tests for acktr continuous (+ HER, gail but they rely on mujoco...)
opened by hill-a 12

Support for Fetch environments?

It seems like baselines is not directly implemented to deal with Box() type action spaces. This same exact code works for the CartPole environment. It fails on FetchReach-v1. Here is the code:

import gym
from baselines import deepq


def callback(lcl, _glb):
    # stop training if reward exceeds 199
    is_solved = lcl['t'] > 100 and sum(lcl['episode_rewards'][-101:-1]) / 100 >= 199
    return is_solved


def main():
    env = gym.make("FetchReach-v1")
    model = deepq.models.mlp([64])
    act = deepq.learn(
        env,
        q_func=model,
        lr=1e-3,
        max_timesteps=100000,
        buffer_size=50000,
        exploration_fraction=0.1,
        exploration_final_eps=0.02,
        print_freq=10,
        callback=callback
    )
    print("Saving model to Fetch_model.pkl")
    act.save("Fetch_model.pkl")


if __name__ == '__main__':
    main()

When I try to use the same deepq algorithm trained on cartpole, with a discrete action space, on FetchReach-V1, I get the following:

File "train_FetchReach.py", line 31, in <module>
    main()
  File "train_FetchReach.py", line 24, in main
    callback=callback
  File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/simple.py", line 180, in learn
    num_actions=env.action_space.n,
AttributeError: 'Box' object has no attribute 'n'

I tried adding

env.action_space.n = len(env.action_space.sample())

but that just lead to more errors:

Traceback (most recent call last):
  File "train_FetchReach.py", line 32, in <module>
    main()
  File "train_FetchReach.py", line 25, in main
    callback=callback
  File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/simple.py", line 184, in learn
    param_noise=param_noise
  File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/build_graph.py", line 376, in build_train
    act_f = build_act(make_obs_ph, q_func, num_actions, scope=scope, reuse=reuse)
  File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/build_graph.py", line 177, in build_act
    observations_ph = make_obs_ph("observation")
  File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/simple.py", line 175, in make_obs_ph
    return BatchInput(observation_space_shape, name=name)
  File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/utils.py", line 66, in __init__
    super().__init__(tf.placeholder(dtype, [None] + list(shape), name=name))
TypeError: 'NoneType' object is not iterable

opened by jeremyf21 12

NotImplementedError when executing Pong example

Hey guys, first of all thanks a lot for this project. It might become handy during my studies :)

I ran into an error while executing the example. I downloaded the pretrained model, but python3 -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling raised the following error:

[2018-01-31 12:24:50,221] Making new env: PongNoFrameskip-v4
Traceback (most recent call last):
  File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/username/keras-tf-p3/baselines/baselines/deepq/experiments/atari/enjoy.py", line 70, in <module>
    play(env, act, args.stochastic, args.video)
  File "/home/username/keras-tf-p3/baselines/baselines/deepq/experiments/atari/enjoy.py", line 43, in play
    obs = env.reset()
  File "/home/username/keras-tf-p3/baselines/baselines/common/atari_wrappers.py", line 167, in reset
    ob = self.env.reset()
  File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 104, in reset
    return self._reset()
  File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 283, in _reset
    return self.env.reset()
  File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 104, in reset
    return self._reset()
  File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 310, in _reset
    observation = self.env.reset()
  File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 104, in reset
    return self._reset()
  File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 311, in _reset
    return self._observation(observation)
  File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 321, in _observation
    raise NotImplementedError
NotImplementedError

The cartpole example however works fine (python -m baselines.deepq.experiments.train_cartpole)

I found this issue in gym (https://github.com/openai/gym/issues/775) with the same error message, but mine seems to occur in reset() and not in render(). Downgrading to pyglet version 1.2.4 as suggested did not change anything.

Does someone know a solution to this?

opened by shakenes 12

NaN values in acktr
Hi everyone, I'm trying to use continuous acktr to learn to reach a target with a mujoco simulation of the jaco arm. I use exactly the same hyperparameters as for the reacher env and acktr definitely learns something meaningful, the reward goes up and I can also see it when I render the frames.

The problem is, that after some 2000-3000 iterations, the algorithm starts to produce nan values.

The log at the time when it starts to happen looks as follows:

Iteration 3025 kl just right! | EVAfter | 0.984 | | EVBefore | 0.976 | | EpLenMean | 200 | | EpRewMean | -8.5 | | EpRewSEM | 0.82 | | KL | 0.00148061 | Iteration 3026 kl too low | EVAfter | 0.984 | | EVBefore | 0.98 | | EpLenMean | 200 | | EpRewMean | -7.31 | | EpRewSEM | 0.613 | | KL | 0.000913428 | Iteration 3027 kl just right! | EVAfter | 0.98 | | EVBefore | 0.976 | | EpLenMean | 200 | | EpRewMean | -8.92 | | EpRewSEM | 0.937 | | KL | nan |

Then of course the nans start to spread and everything becomes nan. Does anyone have an idea what could cause such behaviour and what to do against it?
opened by lukashermann 12
RNN support for PPO2
This is RNN support for PPO2 related to #294 #340 #525 issues.

Improve the replay buffer management.

RNN support for value/policy functions (i.e, copy options)

Improve visualizations in Tesorboard graph.

I'm trying to make a minimal code changes. This, however, involves pretty many code changes originated from both the replay buffer and the RNNs implementation.

Because original replay buffers are individually created for each columns (ex. actions, observations, states, dones, ...), it is not easy to append a new column. The replay buffer have been refactored as dictionaries. Therefore, additional experiments could be easily implemented. In case of this PR, I had to refactor the replay buffer to save the memory state of RNN value/policy networks.

Original RNN code is confusing to me. When sampling a RNN memory state, the number of experienced memory states are not eqaul to number of steps. i.e, It was [1 x num_lstm]. I think original RNN in baselines.common.models are not designed for step by step sampling, but piling RNNs as usual in NLP or image processing. I had properly copied and modified the RNN code.

When using a RNN model, it is quite difficult to manage the memory of RNNs. It had been moved to the inside of the model, and it will reduce the effort of managing it. For example actions, _, _, _ = model.step(obs, done) instead of actions, _, state, _ = model.step(obs ,S=state, M=done)

Lastly, some tf.variable_scopes and tf.name_scope were added for Tensorboard graph visualization. It might be helpful for one's studying and debugging the code.

Thanks!

Example

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4--network=ppo_cnn_lstm runs on an Atari Pong with ppo_cnn_lstm network.

python -m baselines.run --alg=ppo2 --env=Ant-v2 --num_timesteps=1e6--network=ppo_lstm --value_network=copy runs on a Mujoco Ant environment with ppo_lstm network whose value and policy networks are separated, but have same structure.

Simple Benchmark on HalfCheetah-v2

PR vs Original

The original version MLP and PR version MLP/LSTM are almost same. Original LSTM has poor performance.

Networks of PR

PR version RNNs are also well performed when these are used with other FC layers.

All Tested

Params

Params value_network: 'copy' except original lstm which only supports shared. nsteps: 128 num_env: 16 nminibatches: 16 noptepochs: 10 lr: 0.0003 num_timesteps: 2000000

Other parameters are set to PPO2 default.
opened by gyunt 11
What is the version of mujoco and gym that is required to run a baseline code?

I currently have gym==0.9.3 and mujoco version at mujoco-py==0.5.7. I get error when I run the PPO code. Below is the error I get from gym.wrappers import FlattenDictWrapper ImportError: cannot import name 'FlattenDictWrapper'

opened by NishanthVAnand 11
Wrote some comments to explain the A2C and PPO2 implementation
Hello,

I've added some comments in A2C implementations to help the users to better understand this implementation. I've tried to be succinct. --> I think it can help readers who try to modify the implementations.

Also, I've modified the readme file to explain rapidly what each file do.

Have a great day,
opened by simoninithomas 11
Each mpi worker holds an individual replay buffer in HER?

For example, if I want to run the script python baselines/her/experiment/train.py --num_cpu 19 --env_name HandManipulateBlock-v0 --n_epochs 200 --replay_strategy future

Will each mpi worker hold an individual replay buffer?

Within each mpi worker, I calculated the sum of the actions in the replay buffer and found them to be different.

opened by NoListen 0
The use of close() in shmem_vec_env
Hi! I'm wondering about the close() in shmem_vec_env.py

When initializing the Process, the code uses close() for parent_pipe

https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/common/vec_env/shmem_vec_env.py#L120

After initializing the Process, the code uses close() for child_pipe

https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/common/vec_env/shmem_vec_env.py#L57

So both child_pipe and parent_pipe use close()? How can we use pipe to communicate later?

Could you tell me why use close() here?
opened by dwyzzy 0
Possible bug in gradient clipping of deepq_learner (tf2 branch)

https://github.com/openai/baselines/blob/b99a73afe37206775ac8b884d32a36e213a3fac2/baselines/deepq/deepq_learner.py#L174-L181

In line 179, shouldn't it be: grads = clipped_grads instead of clipped_grads = grads ?

opened by Giullar 0
'OPENAI_LOGDIR' is not recognized as an internal or external command, operable program or batch file.

When I try to assign a path to OPENAI_LOGDIR, this error shows up, and it only shows up when I use my pc, on AWS it works well. I have installed the baselines and reinstalled Anaconda, I have also assigned a value to OPENAI_LOGDIR in my environment variable but they did not work. Can someone help me with this, please?

opened by JamesL404 0

ValueError: too many values to unpack (expected 2)

When i run the following code to test the model atari_wrapper :

env=atari_wrappers.wrap_deepmind(
    atari_wrappers.make_atari(env_id='PongNoFrameskip-v4'),  # PongNoFrameskip-v4
    clip_rewards=False,
    frame_stack=True,
    scale=False,
)
env.reset()

I got a ValueError:

A.L.E: Arcade Learning Environment (version 0.8.0+919230b)
[Powered by Stella]
Traceback (most recent call last):
  File "C:/Users/HNXCD/Desktop/adaptive-transformers-in-rl-master/Model/test.py", line 18, in <module>
    env.reset()
  File "C:\Users\HNXCD\Desktop\adaptive-transformers-in-rl-master\Model\atari_wrapper2.py", line 205, in reset
    ob = self.env.reset()
  File "D:\app\Anaconda\envs\gym_env\lib\site-packages\gym\core.py", line 379, in reset
    obs, info = self.env.reset(**kwargs)
ValueError: too many values to unpack (expected 2)

I want konw how to fix it?

opened by hydro-man 0

Owner

OpenAI

GitHub

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

2.2k Jan 5, 2023

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.

Reinforcement Learning (PyTorch) ?? + ?? = ❤️ This repo will contain PyTorch implementation of various fundamental RL algorithms. It's aimed at making

123 Dec 23, 2022

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

1.1k Dec 24, 2022

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

81 Nov 26, 2022

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Related tags

Overview

Baselines

Prerequisites

Ubuntu

Mac OS X

Virtual environment

Tensorflow versions

Installation

MuJoCo

Testing the installation

Training models

Example 1. PPO with MuJoCo Humanoid

Example 2. DQN on Atari

Saving, loading and visualizing models

Saving and loading the model

Logging and vizualizing learning curves and other training metrics

Subpackages

Benchmarks

Comments

Example

Simple Benchmark on HalfCheetah-v2

PR vs Original

Networks of PR

All Tested

Params

Owner

OpenAI

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

A toolkit for developing and comparing reinforcement learning algorithms.

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

A toolkit for reproducible reinforcement learning research.

An open source robotics benchmark for meta- and multi-task reinforcement learning

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Tensorforce: a TensorFlow library for applied reinforcement learning

TensorFlow Reinforcement Learning

Deep Reinforcement Learning for Keras.

ChainerRL is a deep reinforcement learning library built on top of Chainer.

Open world survival environment for reinforcement learning

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.