OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Overview

Status: Maintenance (expect bug fixes and minor updates)

Build status

Baselines

OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.

These algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones.

Prerequisites

Baselines requires python3 (>=3.5) with the development headers. You'll also need system packages CMake, OpenMPI and zlib. Those can be installed as follows

Ubuntu

sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev

Mac OS X

Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the following:

brew install cmake openmpi

Virtual environment

From the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via

pip install virtualenv

Virtualenvs are essentially folders that have copies of python executable and all python packages. To create a virtualenv called venv with python3, one runs

virtualenv /path/to/venv --python=python3

To activate a virtualenv:

. /path/to/venv/bin/activate

More thorough tutorial on virtualenvs and options can be found here

Tensorflow versions

The master branch supports Tensorflow from version 1.4 to 1.14. For Tensorflow 2.0 support, please use tf2 branch.

Installation

  • Clone the repo and cd into it:

    git clone https://github.com/openai/baselines.git
    cd baselines
  • If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases, you may use

    pip install tensorflow-gpu==1.14 # if you have a CUDA-compatible gpu and proper drivers

    or

    pip install tensorflow==1.14

    to install Tensorflow 1.14, which is the latest version of Tensorflow supported by the master branch. Refer to TensorFlow installation guide for more details.

  • Install baselines package

    pip install -e .

MuJoCo

Some of the baselines examples use MuJoCo (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from www.mujoco.org). Instructions on setting up MuJoCo can be found here

Testing the installation

All unit tests in baselines can be run using pytest runner:

pip install pytest
pytest

Training models

Most of the algorithms in baselines repo are used as follows:

python -m baselines.run --alg=<name of the algorithm> --env=<environment_id> [additional arguments]

Example 1. PPO with MuJoCo Humanoid

For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7

Note that for mujoco environments fully-connected network is default, so we can omit --network=mlp The hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance:

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy

will set entropy coefficient to 0.1, and construct fully connected network with 3 layers with 32 hidden units in each, and create a separate network for value function estimation (so that its parameters are not shared with the policy network, but the structure is the same)

See docstrings in common/models.py for description of network parameters for each type of model, and docstring for baselines/ppo2/ppo2.py/learn() for the description of the ppo2 hyperparameters.

Example 2. DQN on Atari

DQN with Atari is at this point a classics of benchmarks. To run the baselines implementation of DQN on Atari Pong:

python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6

Saving, loading and visualizing models

Saving and loading the model

The algorithms serialization API is not properly unified yet; however, there is a simple method to save / restore trained models. --save_path and --load_path command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively. Let's imagine you'd like to train ppo2 on Atari Pong, save the model and then later visualize what has it learnt.

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2

This should get to the mean reward per episode about 20. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize:

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --load_path=~/models/pong_20M_ppo2 --play

NOTE: Mujoco environments require normalization to work properly, so we wrap them with VecNormalize wrapper. Currently, to ensure the models are saved with normalization (so that trained models can be restored and run without further training) the normalization coefficients are saved as tensorflow variables. This can decrease the performance somewhat, so if you require high-throughput steps with Mujoco and do not need saving/restoring the models, it may make sense to use numpy normalization instead. To do that, set 'use_tf=False` in baselines/run.py.

Logging and vizualizing learning curves and other training metrics

By default, all summary data, including progress, standard output, is saved to a unique directory in a temp folder, specified by a call to Python's tempfile.gettempdir(). The directory can be changed with the --log_path command-line option.

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2 --log_path=~/logs/Pong/

NOTE: Please be aware that the logger will overwrite files of the same name in an existing directory, thus it's recommended that folder names be given a unique timestamp to prevent overwritten logs.

Another way the temp directory can be changed is through the use of the $OPENAI_LOGDIR environment variable.

For examples on how to load and display the training data, see here.

Subpackages

Benchmarks

Results of benchmarks on Mujoco (1M timesteps) and Atari (10M timesteps) are available here for Mujoco and here for Atari respectively. Note that these results may be not on the latest version of the code, particular commit hash with which results were obtained is specified on the benchmarks page.

To cite this repository in publications:

@misc{baselines,
  author = {Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai and Zhokhov, Peter},
  title = {OpenAI Baselines},
  year = {2017},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/openai/baselines}},
}
Comments
  • Error when restoring model to run enjoy.py

    Error when restoring model to run enjoy.py

    Hi,

    I was running these two commands:

    python -m baselines.deepq.experiments.atari.download_model --blob model-atari-prior-duel-breakout-1 --model-dir /tmp/models
    python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling
    

    in the bottom of README.

    However, I got the following error:

    InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [6]
             [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@deepq/q_func/action_value/fully_connected_1/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](deepq/q_func/action_value/fully_connected_1/biases, save/RestoreV2_3/_1)]]
    
    opened by yenchenlin 24
  • Unable to reproduce HER results from Plappert et al., 2018

    Unable to reproduce HER results from Plappert et al., 2018

    I ran the code baselines.her.experiment.train for five different seeds and generated plots with some minor changes to the plotting code (see issue #311) for HandManipulateBlockRotateXYZ-v0 across 200 epochs. I verified from the .json file that the 'scope' was 'DDPG' and that the rewards were sparse.

    Unfortunately the performance of the algorithm was substantially worse than I found in the paper (above are my results and below are the results from Plappert et al., 2018 https://arxiv.org/abs/1802.09464):

    ahilan_handmanipulateblockrotatexyz-v0 plappert_handmanipulateblockrotatexyz

    Any idea why the results are so much worse? Further, this issue seems to apply to multiple other cases including 'FetchSlide' and 'HandManipulateBlockRotateZ' although I have only run one seed for these.

    In case its useful, I also copy the details of the 'params.json' file:

    {"pi_lr": 0.001, "network_class": "baselines.her.actor_critic:ActorCritic", "norm_clip": 5, "polyak": 0.95, "scope": "ddpg", "n_cycles": 50, "random_eps": 0.3, "env_name": "HandManipulateBlockRotateXYZ-v0", "rollout_batch_size": 2, "n_batches": 40, "layers": 3, "buffer_size": 1000000, "action_l2": 1.0, "Q_lr": 0.001, "clip_obs": 200.0, "hidden": 256, "test_with_polyak": false, "batch_size": 256, "noise_eps": 0.2, "n_test_rollouts": 10, "relative_goals": false, "norm_eps": 0.01, "max_u": 1.0, "replay_strategy": "future", "replay_k": 4}

    And of course, I'd like to thank OpenAI for making their code and environments available; it's really helpful to independent researchers!

    opened by sanjeevanahilan 18
  • Could you explain how to execute PPO and TRPO?

    Could you explain how to execute PPO and TRPO?

    There is readme explaining all the process to execute deepqn algorithm.

    However, there is no such thing for PPO and TRPO....

    Could you please explain how to execute PPO and TRPO?

    opened by wonchul-kim 18
  • Running into issues on example execution

    Running into issues on example execution

    Get this error when I run the first example python3 -m baselines.deepq.experiments.train_cartpole:

    /usr/bin/python3: Error while finding spec for 'baselines.deepq.experiments.train_cartpole' (<class 'ImportError'>: cannot import name 'deepq')
    

    I have both Python 2 and 3 installed. Thus I installed baselines with pip3. Any suggestions?

    opened by ampnirvana 14
  • Deobfuscation of the code base + pep8 and fixes

    Deobfuscation of the code base + pep8 and fixes

    • Fixed tf.session().__enter__() being used, rather than sess = tf.session() and passing the session to the objects
    • Fixed uneven scoping of TensorFlow Sessions throughout the code
    • Fixed rolling vecwrapper to handle observations that are not only grayscale images
    • Fixed deepq saving the environment when trying to save itself
    • Fixed ValueError: Cannot take the length of Shape with unknown rank. in acktr, when running run_atari.py script.
    • Fixed calling baselines sequentially no longer creates graph conflicts
    • Fixed mean on empty array warning with deepq
    • Fixed kfac eigen decomposition not cast to float64, when the parameter use_float64 is set to True
    • Fixed Dataset data loader, not correctly resetting id position if shuffling is disabled
    • Fixed EOFError when reading from connection in the worker in subproc_vec_env.py
    • Fixed behavior_clone weight loading and saving for GAIL
    • Avoid taking root square of negative number in trpo_mpi.py
    • Removed some duplicated code (a2cpolicy, trpo_mpi)
    • Removed unused, undocumented and crashing function reset_task in subproc_vec_env.py
    • Reformated code to PEP8 style
    • Documented all the codebase
    • Added atari tests
    • Added logger tests

    Missing: tests for acktr continuous (+ HER, gail but they rely on mujoco...)

    opened by hill-a 12
  • Support for Fetch environments?

    Support for Fetch environments?

    It seems like baselines is not directly implemented to deal with Box() type action spaces. This same exact code works for the CartPole environment. It fails on FetchReach-v1. Here is the code:

    import gym
    from baselines import deepq
    
    
    def callback(lcl, _glb):
        # stop training if reward exceeds 199
        is_solved = lcl['t'] > 100 and sum(lcl['episode_rewards'][-101:-1]) / 100 >= 199
        return is_solved
    
    
    def main():
        env = gym.make("FetchReach-v1")
        model = deepq.models.mlp([64])
        act = deepq.learn(
            env,
            q_func=model,
            lr=1e-3,
            max_timesteps=100000,
            buffer_size=50000,
            exploration_fraction=0.1,
            exploration_final_eps=0.02,
            print_freq=10,
            callback=callback
        )
        print("Saving model to Fetch_model.pkl")
        act.save("Fetch_model.pkl")
    
    
    if __name__ == '__main__':
        main()
    

    When I try to use the same deepq algorithm trained on cartpole, with a discrete action space, on FetchReach-V1, I get the following:

    File "train_FetchReach.py", line 31, in <module>
        main()
      File "train_FetchReach.py", line 24, in main
        callback=callback
      File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/simple.py", line 180, in learn
        num_actions=env.action_space.n,
    AttributeError: 'Box' object has no attribute 'n'
    

    I tried adding

    env.action_space.n = len(env.action_space.sample())

    but that just lead to more errors:

    Traceback (most recent call last):
      File "train_FetchReach.py", line 32, in <module>
        main()
      File "train_FetchReach.py", line 25, in main
        callback=callback
      File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/simple.py", line 184, in learn
        param_noise=param_noise
      File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/build_graph.py", line 376, in build_train
        act_f = build_act(make_obs_ph, q_func, num_actions, scope=scope, reuse=reuse)
      File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/build_graph.py", line 177, in build_act
        observations_ph = make_obs_ph("observation")
      File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/simple.py", line 175, in make_obs_ph
        return BatchInput(observation_space_shape, name=name)
      File "/home/jeremy/.local/share/virtualenvs/cgw-i4TbRcn4/lib/python3.6/site-packages/baselines/deepq/utils.py", line 66, in __init__
        super().__init__(tf.placeholder(dtype, [None] + list(shape), name=name))
    TypeError: 'NoneType' object is not iterable
    
    opened by jeremyf21 12
  • NotImplementedError when executing Pong example

    NotImplementedError when executing Pong example

    Hey guys, first of all thanks a lot for this project. It might become handy during my studies :)

    I ran into an error while executing the example. I downloaded the pretrained model, but python3 -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling raised the following error:

    [2018-01-31 12:24:50,221] Making new env: PongNoFrameskip-v4
    Traceback (most recent call last):
      File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
        "__main__", mod_spec)
      File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/username/keras-tf-p3/baselines/baselines/deepq/experiments/atari/enjoy.py", line 70, in <module>
        play(env, act, args.stochastic, args.video)
      File "/home/username/keras-tf-p3/baselines/baselines/deepq/experiments/atari/enjoy.py", line 43, in play
        obs = env.reset()
      File "/home/username/keras-tf-p3/baselines/baselines/common/atari_wrappers.py", line 167, in reset
        ob = self.env.reset()
      File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 104, in reset
        return self._reset()
      File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 283, in _reset
        return self.env.reset()
      File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 104, in reset
        return self._reset()
      File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 310, in _reset
        observation = self.env.reset()
      File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 104, in reset
        return self._reset()
      File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 311, in _reset
        return self._observation(observation)
      File "/home/username/keras-tf-p3/lib/python3.4/site-packages/gym/core.py", line 321, in _observation
        raise NotImplementedError
    NotImplementedError
    

    The cartpole example however works fine (python -m baselines.deepq.experiments.train_cartpole)

    I found this issue in gym (https://github.com/openai/gym/issues/775) with the same error message, but mine seems to occur in reset() and not in render(). Downgrading to pyglet version 1.2.4 as suggested did not change anything.

    Does someone know a solution to this?

    opened by shakenes 12
  • NaN values in acktr

    NaN values in acktr

    Hi everyone, I'm trying to use continuous acktr to learn to reach a target with a mujoco simulation of the jaco arm. I use exactly the same hyperparameters as for the reacher env and acktr definitely learns something meaningful, the reward goes up and I can also see it when I render the frames.

    The problem is, that after some 2000-3000 iterations, the algorithm starts to produce nan values.

    The log at the time when it starts to happen looks as follows:

    
    Iteration 3025
    kl just right!
    
    | EVAfter   | 0.984      |
    | EVBefore  | 0.976      |
    | EpLenMean | 200        |
    | EpRewMean | -8.5       |
    | EpRewSEM  | 0.82       |
    | KL        | 0.00148061 |
    
    Iteration 3026 
    kl too low
    
    | EVAfter   | 0.984       |
    | EVBefore  | 0.98        |
    | EpLenMean | 200         |
    | EpRewMean | -7.31       |
    | EpRewSEM  | 0.613       |
    | KL        | 0.000913428 |
    
    Iteration 3027
    kl just right!
    
    | EVAfter   | 0.98     |
    | EVBefore  | 0.976    |
    | EpLenMean | 200      |
    | EpRewMean | -8.92    |
    | EpRewSEM  | 0.937    |
    | KL        | nan      |
    

    Then of course the nans start to spread and everything becomes nan. Does anyone have an idea what could cause such behaviour and what to do against it?

    opened by lukashermann 12
  • RNN support for PPO2

    RNN support for PPO2

    This is RNN support for PPO2 related to #294 #340 #525 issues.

    • Improve the replay buffer management.
    • RNN support for value/policy functions (i.e, copy options)
    • Improve visualizations in Tesorboard graph.

    I'm trying to make a minimal code changes. This, however, involves pretty many code changes originated from both the replay buffer and the RNNs implementation.

    Because original replay buffers are individually created for each columns (ex. actions, observations, states, dones, ...), it is not easy to append a new column. The replay buffer have been refactored as dictionaries. Therefore, additional experiments could be easily implemented. In case of this PR, I had to refactor the replay buffer to save the memory state of RNN value/policy networks.

    Original RNN code is confusing to me. When sampling a RNN memory state, the number of experienced memory states are not eqaul to number of steps. i.e, It was [1 x num_lstm]. I think original RNN in baselines.common.models are not designed for step by step sampling, but piling RNNs as usual in NLP or image processing. I had properly copied and modified the RNN code.

    When using a RNN model, it is quite difficult to manage the memory of RNNs. It had been moved to the inside of the model, and it will reduce the effort of managing it. For example actions, _, _, _ = model.step(obs, done) instead of actions, _, state, _ = model.step(obs ,S=state, M=done)

    Lastly, some tf.variable_scopes and tf.name_scope were added for Tensorboard graph visualization. It might be helpful for one's studying and debugging the code.

    Thanks!

    Example

    • python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4--network=ppo_cnn_lstm runs on an Atari Pong with ppo_cnn_lstm network.
    • python -m baselines.run --alg=ppo2 --env=Ant-v2 --num_timesteps=1e6--network=ppo_lstm --value_network=copy runs on a Mujoco Ant environment with ppo_lstm network whose value and policy networks are separated, but have same structure.

    Simple Benchmark on HalfCheetah-v2

    PR vs Original

    The original version MLP and PR version MLP/LSTM are almost same. Original LSTM has poor performance. image

    Networks of PR

    PR version RNNs are also well performed when these are used with other FC layers. image

    All Tested

    image

    Params

    Params value_network: 'copy' except original lstm which only supports shared. nsteps: 128 num_env: 16 nminibatches: 16 noptepochs: 10 lr: 0.0003 num_timesteps: 2000000

    Other parameters are set to PPO2 default.

    opened by gyunt 11
  • What is the version of mujoco and gym that is required to run a baseline code?

    What is the version of mujoco and gym that is required to run a baseline code?

    I currently have gym==0.9.3 and mujoco version at mujoco-py==0.5.7. I get error when I run the PPO code. Below is the error I get from gym.wrappers import FlattenDictWrapper ImportError: cannot import name 'FlattenDictWrapper'

    opened by NishanthVAnand 11
  • Wrote some comments to explain the A2C and PPO2 implementation

    Wrote some comments to explain the A2C and PPO2 implementation

    Hello,

    1. I've added some comments in A2C implementations to help the users to better understand this implementation. I've tried to be succinct. --> I think it can help readers who try to modify the implementations.

    2. Also, I've modified the readme file to explain rapidly what each file do.

    Have a great day,

    opened by simoninithomas 11
  • Each mpi worker holds an individual replay buffer in HER?

    Each mpi worker holds an individual replay buffer in HER?

    For example, if I want to run the script python baselines/her/experiment/train.py --num_cpu 19 --env_name HandManipulateBlock-v0 --n_epochs 200 --replay_strategy future

    Will each mpi worker hold an individual replay buffer?

    Within each mpi worker, I calculated the sum of the actions in the replay buffer and found them to be different.

    opened by NoListen 0
  • The use of close() in shmem_vec_env

    The use of close() in shmem_vec_env

    Hi! I'm wondering about the close() in shmem_vec_env.py

    When initializing the Process, the code uses close() for parent_pipe

    https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/common/vec_env/shmem_vec_env.py#L120
    

    After initializing the Process, the code uses close() for child_pipe

    https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/common/vec_env/shmem_vec_env.py#L57
    

    So both child_pipe and parent_pipe use close()? How can we use pipe to communicate later?

    Could you tell me why use close() here?

    opened by dwyzzy 0
  • Possible bug in gradient clipping of deepq_learner (tf2 branch)

    Possible bug in gradient clipping of deepq_learner (tf2 branch)

    https://github.com/openai/baselines/blob/b99a73afe37206775ac8b884d32a36e213a3fac2/baselines/deepq/deepq_learner.py#L174-L181

    In line 179, shouldn't it be: grads = clipped_grads instead of clipped_grads = grads ?

    opened by Giullar 0
  • 'OPENAI_LOGDIR' is not recognized as an internal or external command, operable program or batch file.

    'OPENAI_LOGDIR' is not recognized as an internal or external command, operable program or batch file.

    When I try to assign a path to OPENAI_LOGDIR, this error shows up, and it only shows up when I use my pc, on AWS it works well. I have installed the baselines and reinstalled Anaconda, I have also assigned a value to OPENAI_LOGDIR in my environment variable but they did not work. Can someone help me with this, please?

    opened by JamesL404 0
  • ValueError: too many values to unpack (expected 2)

    ValueError: too many values to unpack (expected 2)

    When i run the following code to test the model atari_wrapper :

    env=atari_wrappers.wrap_deepmind(
        atari_wrappers.make_atari(env_id='PongNoFrameskip-v4'),  # PongNoFrameskip-v4
        clip_rewards=False,
        frame_stack=True,
        scale=False,
    )
    env.reset()
    

    I got a ValueError:

    A.L.E: Arcade Learning Environment (version 0.8.0+919230b)
    [Powered by Stella]
    Traceback (most recent call last):
      File "C:/Users/HNXCD/Desktop/adaptive-transformers-in-rl-master/Model/test.py", line 18, in <module>
        env.reset()
      File "C:\Users\HNXCD\Desktop\adaptive-transformers-in-rl-master\Model\atari_wrapper2.py", line 205, in reset
        ob = self.env.reset()
      File "D:\app\Anaconda\envs\gym_env\lib\site-packages\gym\core.py", line 379, in reset
        obs, info = self.env.reset(**kwargs)
    ValueError: too many values to unpack (expected 2)
    

    I want konw how to fix it?

    opened by hydro-man 0
Owner
OpenAI
OpenAI
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Intel Labs 2.2k Jan 5, 2023
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Wah Loon Keng 1.1k Dec 24, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 1, 2023
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 7, 2023
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Marek Wydmuch 1.5k Dec 30, 2022
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Reinforcement Learning Working Group 1.6k Jan 9, 2023
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Reinforcement Learning Working Group 823 Jan 6, 2023
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Facebook Research 3.3k Jan 5, 2023
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

null 2.4k Dec 29, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 2, 2023
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 5.4k Jan 4, 2023
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Chainer 1.1k Dec 26, 2022
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 5, 2023
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

null 404 Dec 25, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

AgentMaker 117 Dec 12, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Ashley Hill 3.7k Jan 1, 2023
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 1, 2023
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022