Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Intel Labs

Last update: Jan 5, 2023

Related tags

Reinforcement Learning reinforcement-learning deep-learning mxnet tensorflow openai-gym rl starcraft imitation-learning hierarchical-reinforcement-learning coach mujoco starcraft2 onnx roboschool carla starcraft2-ai distributed-reinforcement-learning

Overview

Coach

Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms.

It exposes a set of easy-to-use APIs for experimenting with new RL algorithms, and allows simple integration of new environments to solve. Basic RL components (algorithms, environments, neural network architectures, exploration policies, ...) are well decoupled, so that extending and reusing existing components is fairly painless.

Training an agent to solve an environment is as easy as running:

coach -p CartPole_DQN -r

Benchmarks
Installation
Getting Started
Supported Environments
Supported Algorithms
Citation
Contact
Disclaimer

Benchmarks

One of the main challenges when building a research project, or a solution based on a published algorithm, is getting a concrete and reliable baseline that reproduces the algorithm's results, as reported by its authors. To address this problem, we are releasing a set of benchmarks that shows Coach reliably reproduces many state of the art algorithm results.

Installation

Note: Coach has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.

For some information on installing on Ubuntu 17.10 with Python 3.6.3, please refer to the following issue: https://github.com/IntelLabs/coach/issues/54

In order to install coach, there are a few prerequisites required. This will setup all the basics needed to get the user going with running Coach on top of OpenAI Gym environments:

# General
sudo -E apt-get install python3-pip cmake zlib1g-dev python3-tk python-opencv -y

# Boost libraries
sudo -E apt-get install libboost-all-dev -y

# Scipy requirements
sudo -E apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran -y

# PyGame
sudo -E apt-get install libsdl-dev libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev
libsmpeg-dev libportmidi-dev libavformat-dev libswscale-dev -y

# Dashboard
sudo -E apt-get install dpkg-dev build-essential python3.5-dev libjpeg-dev  libtiff-dev libsdl1.2-dev libnotify-dev 
freeglut3 freeglut3-dev libsm-dev libgtk2.0-dev libgtk-3-dev libwebkitgtk-dev libgtk-3-dev libwebkitgtk-3.0-dev
libgstreamer-plugins-base1.0-dev -y

# Gym
sudo -E apt-get install libav-tools libsdl2-dev swig cmake -y

We recommend installing coach in a virtualenv:

sudo -E pip3 install virtualenv
virtualenv -p python3 coach_env
. coach_env/bin/activate

Finally, install coach using pip:

pip3 install rl_coach

Or alternatively, for a development environment, install coach from the cloned repository:

cd coach
pip3 install -e .

If a GPU is present, Coach's pip package will install tensorflow-gpu, by default. If a GPU is not present, an Intel-Optimized TensorFlow, will be installed.

In addition to OpenAI Gym, several other environments were tested and are supported. Please follow the instructions in the Supported Environments section below in order to install more environments.

Getting Started

Tutorials and Documentation

Jupyter notebooks demonstrating how to run Coach from command line or as a library, implement an algorithm, or integrate an environment.

Framework documentation, algorithm description and instructions on how to contribute a new agent/environment.

Basic Usage

Running Coach

To allow reproducing results in Coach, we defined a mechanism called preset. There are several available presets under the presets directory. To list all the available presets use the -l flag.

To run a preset, use:

coach -r -p <preset_name>

For example:

CartPole environment using Policy Gradients (PG):
```
coach -r -p CartPole_PG
```
Basic level of Doom using Dueling network and Double DQN (DDQN) algorithm:
```
coach -r -p Doom_Basic_Dueling_DDQN
```

Some presets apply to a group of environment levels, like the entire Atari or Mujoco suites for example. To use these presets, the requeseted level should be defined using the -lvl flag.

For example:

Pong using the Neural Episodic Control (NEC) algorithm:
```
coach -r -p Atari_NEC -lvl pong
```

There are several types of agents that can benefit from running them in a distributed fashion with multiple workers in parallel. Each worker interacts with its own copy of the environment but updates a shared network, which improves the data collection speed and the stability of the learning process. To specify the number of workers to run, use the -n flag.

For example:

Breakout using Asynchronous Advantage Actor-Critic (A3C) with 8 workers:
```
coach -r -p Atari_A3C -lvl breakout -n 8
```

It is easy to create new presets for different levels or environments by following the same pattern as in presets.py

More usage examples can be found here.

Running Coach Dashboard (Visualization)

Training an agent to solve an environment can be tricky, at times.

In order to debug the training process, Coach outputs several signals, per trained algorithm, in order to track algorithmic performance.

While Coach trains an agent, a csv file containing the relevant training signals will be saved to the 'experiments' directory. Coach's dashboard can then be used to dynamically visualize the training signals, and track algorithmic behavior.

To use it, run:

dashboard

Distributed Multi-Node Coach

As of release 0.11.0, Coach supports horizontal scaling for training RL agents on multiple nodes. In release 0.11.0 this was tested on the ClippedPPO and DQN agents. For usage instructions please refer to the documentation here.

Batch Reinforcement Learning

Training and evaluating an agent from a dataset of experience, where no simulator is available, is supported in Coach. There are example presets and a tutorial.

Supported Environments

OpenAI Gym:

Installed by default by Coach's installer
ViZDoom:

Follow the instructions described in the ViZDoom repository -

https://github.com/mwydmuch/ViZDoom

Additionally, Coach assumes that the environment variable VIZDOOM_ROOT points to the ViZDoom installation directory.
Roboschool:

Follow the instructions described in the roboschool repository -

https://github.com/openai/roboschool
GymExtensions:

Follow the instructions described in the GymExtensions repository -

https://github.com/Breakend/gym-extensions

Additionally, add the installation directory to the PYTHONPATH environment variable.
PyBullet:

Follow the instructions described in the Quick Start Guide (basically just - 'pip install pybullet')
CARLA:

Download release 0.8.4 from the CARLA repository -

https://github.com/carla-simulator/carla/releases

Install the python client and dependencies from the release tarball:
```
pip3 install -r PythonClient/requirements.txt
pip3 install PythonClient
```
Create a new CARLA_ROOT environment variable pointing to CARLA's installation directory.

A simple CARLA settings file (CarlaSettings.ini) is supplied with Coach, and is located in the environments directory.
Starcraft:

Follow the instructions described in the PySC2 repository -

https://github.com/deepmind/pysc2
DeepMind Control Suite:

Follow the instructions described in the DeepMind Control Suite repository -

https://github.com/deepmind/dm_control

Supported Algorithms

Memory Types

Exploration Techniques

E-Greedy (code)
Boltzmann (code)
Ornstein–Uhlenbeck process (code)
Normal Noise (code)
Truncated Normal Noise (code)
Bootstrapped Deep Q Network (code)
UCB Exploration via Q-Ensembles (UCB) (code)
Noisy Networks for Exploration (code)

Citation

If you used Coach for your work, please use the following citation:

@misc{caspi_itai_2017_1134899,
  author       = {Caspi, Itai and
                  Leibovich, Gal and
                  Novik, Gal and
                  Endrawis, Shadi},
  title        = {Reinforcement Learning Coach},
  month        = dec,
  year         = 2017,
  doi          = {10.5281/zenodo.1134899},
  url          = {https://doi.org/10.5281/zenodo.1134899}
}

Contact

We'd be happy to get any questions or contributions through GitHub issues and PRs.

Please make sure to take a look here before filing an issue or proposing a PR.

The Coach development team can also be contacted over email

Disclaimer

Coach is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product. Additional algorithms and environments are planned to be added to the framework. Feedback and contributions from the open source and RL research communities are more than welcome.

Comments

invalid object?

I just tested running coach from source: 4fe9cba44508f258fc73286d6cbf0af4b1fdfa50

on both ubuntu 14.04 and Macos high Sierra and I get the exact same error:

python coach.py -p CartPole_DQN -r

/home/jtoy/anaconda3/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
Warning: failed to import the following packages - RoboSchool, GymExtensions, ViZDoom, CARLA, Neon
Please enter an experiment name: test
Using tensorflow framework
Traceback (most recent call last):
  File "coach.py", line 275, in <module>
    env_instance = create_environment(tuning_parameters)
  File "/home/jtoy/sandbox/touchnet/related_projects/coach/environments/__init__.py", line 32, in create_environment
    env = eval(env_type)(tuning_parameters)
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 267, in eval
    ret = eng_inst.evaluate()
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/pandas/core/computation/engines.py", line 75, in evaluate
    res = self._evaluate()
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/pandas/core/computation/engines.py", line 122, in _evaluate
    return ne.evaluate(s, local_dict=scope, truediv=truediv)
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/numexpr/necompiler.py", line 807, in evaluate
    zip(names, arguments)]
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/numexpr/necompiler.py", line 806, in <listcomp>
    signature = [(name, getType(arg)) for (name, arg) in
  File "/home/jtoy/anaconda3/lib/python3.6/site-packages/numexpr/necompiler.py", line 704, in getType
    raise ValueError("unknown type %s" % a.dtype.name)
ValueError: unknown type object

I installed all dependencies with pip install -r requirements_coach.txt

Is there an issue in master or am I missing something basic?

opened by jtoy 11

The network's configuration of CARLA_DDPG

Hi，I try to run The CARLA_DDPG preset in coach，with The help of The document，I run it successfully. However，i want to dig into the implementation of DDPG in coach, I have reviewed The code of CARLA_DDPG.py, and i have figured out both the network of actor and critic, as the below picture, who can help me revise the understanding and give me some supplementary advise?

opened by fangchuan 10
Problems with PPO/ClippedPPO
Hey Guys,

I've trouble with the likely hood ratio and nan/inf. I'm using entropy regularization for the exploration and its getting quite low so i think that the distributions at some point can't be compared anymore. My model learns until a certain point and then the nans happen. Adding a small epsilon in the ratio avoids the nans but then the reward curve is just dropping at some point and the model is not learning anymore. (The KL Divergence is also divergent) I'm using my own environment, a feed forward architecture and have a continious problem.

I've already tried many things:

Optimizers: Adam(with different epsilons), RMSProp

Reducing the LR (that just postpones the crash)

Reducing the clipping(0.1) and the epochs, clip the gradients

Changing the coefficients for the value loss, policy loss and entropy

Changed the weight initializers and the network sizes (~a bigger network postpones the problem)

Changed the activation function (relu, lrelu, selu, tanh)

If i change the beta coefficient for the entropy i get either an ever increasing entropy or it falls until the crash happens. The agent learns pretty well until that point, so i suppose i haven't made any error in my implementation. I may have made an error in the amount i have changed the parameters.

Any tips or ideas to that?
opened by KuenstlicheIntelligenz 10
Further improvement of using trained agents in production
Hey all,

After using rl_coach for some days now, I have trained some models that seem promising. Related directly to issue #71 I have tried to do this with tensorflow. As suggested in the referenced issue, TF Serving can be used to accomplish it. However, on my side I don't need to go online (and I think that many users won't need it also) so something like:

Loading graph > Loading weights/parameters > Performing the operation (i.e. somekind of .act() or just .run() the op in a tf.Session) would be sufficient.

I have tried this path using the different checkpoints saved. Let me share some code:

### Create the session in which we will run. tensorflowSess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) ############# LOADING ############# ### First let's load meta graph: ### NOTE: Restarting training from saved meta_graph only works if the device assignments have not changed > allow_soft_placement=True restorerObject = tf.train.import_meta_graph(metaGraphPath) ### Then, restore weights (paramterers) of that graph: restorerObject.restore(tensorflowSess, ckptFilePath) ### Finally, get the operations we want to run and create the feed_dict: restoredGraph = tf.get_default_graph() '''If we want to return a value, we need to get the tensor of that operation (whatever:0/1...) because the tensor is the thing that holds the returned value, not directly the operation we get with get_operation_by_name. Furthermore, it seems that taking the last operation of the graph, populates the graph up to the beginning''' feedingXObservation = restoredGraph.get_tensor_by_name('main_level/agent/main/online/Placeholder:0')

The problem here is that the we need to know 1) the name of the tensor that feeds the data at the beggining to the NN architecture used in each agent and 2) the last operation that outputs the action values (or its probabilities in the case of the Rainbow algorithm for example) so that we can feed the new observation to make inference.

The print_networks_summary=True in the VisualizationParameters gives some hint about what to look for. However, there is no clarity on how to go about this. For example, let's say that as most of us we want to get the first placeholder to feed the observation and for the last operation to get the tensor (in the example of a Rainbow agent, being one of the most complex, we have the following architecture:)

Network: main, Copies: 2 (online network | target network) ---------------------------------------------------------- Input Embedder: observation Input size = [163] Noisy Dense (num outputs = 256) Activation (type = <function relu at 0x7f90a65608c8>) Middleware: No layers Output Head: rainbow_q_values_head State Value Stream - V Dense (num outputs = 512) Dense (num outputs = 51) Action Advantage Stream - A Dense (num outputs = 512) Dense (num outputs = 153) Reshape (new size = 3 x 51) Subtract(A, Mean(A)) Add (V, A) Softmax

Let's look for the first placeholder and for the last softmax layer.

print([n.name for n in restoredGraph.as_graph_def().node if 'Softmax' in n.op]) gives:

['main_level/agent/main/online/network_0/rainbow_q_values_head_0/Softmax', 'main_level/agent/main/online/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg', 'main_level/agent/main/online/network_0/rainbow_q_values_head_0/softmax_1', 'main_level/agent/main/online/gradients/main_level/agent/main/online/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg_grad/LogSoftmax', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/Softmax', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/softmax_1', 'main_level/agent/main/target/gradients/main_level/agent/main/target/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg_grad/LogSoftmax']

and looking for the first placeholder like:

print([n.name for n in restoredGraph.as_graph_def().node if 'Placeholder' in n.op]) gives:

['main_level/agent/main/online/Placeholder', 'main_level/agent/main/online/network_0/observation/observation', 'main_level/agent/main/online/network_0/gradients_from_head_0-0_rescalers_1', 'main_level/agent/main/online/network_0/rainbow_q_values_head_0/distributions', 'main_level/agent/main/online/network_0/rainbow_q_values_head_0/rainbow_q_values_head_0_importance_weight', 'main_level/agent/main/online/0_holder', 'main_level/agent/main/online/1_holder', 'main_level/agent/main/online/2_holder', 'main_level/agent/main/online/3_holder', 'main_level/agent/main/online/4_holder', 'main_level/agent/main/online/5_holder', 'main_level/agent/main/online/6_holder', 'main_level/agent/main/online/7_holder', 'main_level/agent/main/online/8_holder', 'main_level/agent/main/online/9_holder', 'main_level/agent/main/online/10_holder', 'main_level/agent/main/online/11_holder', 'main_level/agent/main/online/12_holder', 'main_level/agent/main/online/13_holder', 'main_level/agent/main/online/14_holder', 'main_level/agent/main/online/15_holder', 'main_level/agent/main/online/16_holder', 'main_level/agent/main/online/17_holder', 'main_level/agent/main/online/18_holder', 'main_level/agent/main/online/19_holder', 'main_level/agent/main/online/20_holder', 'main_level/agent/main/online/output_gradient_weights', 'main_level/agent/main/target/Placeholder', 'main_level/agent/main/target/network_0/observation/observation', 'main_level/agent/main/target/network_0/gradients_from_head_0-0_rescalers_1', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/distributions', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/rainbow_q_values_head_0_importance_weight', 'main_level/agent/main/target/0_holder', 'main_level/agent/main/target/1_holder', 'main_level/agent/main/target/2_holder', 'main_level/agent/main/target/3_holder', 'main_level/agent/main/target/4_holder', 'main_level/agent/main/target/5_holder', 'main_level/agent/main/target/6_holder', 'main_level/agent/main/target/7_holder', 'main_level/agent/main/target/8_holder', 'main_level/agent/main/target/9_holder', 'main_level/agent/main/target/10_holder', 'main_level/agent/main/target/11_holder', 'main_level/agent/main/target/12_holder', 'main_level/agent/main/target/13_holder', 'main_level/agent/main/target/14_holder', 'main_level/agent/main/target/15_holder', 'main_level/agent/main/target/16_holder', 'main_level/agent/main/target/17_holder', 'main_level/agent/main/target/18_holder', 'main_level/agent/main/target/19_holder', 'main_level/agent/main/target/20_holder', 'main_level/agent/main/target/output_gradient_weights', 'Placeholder', 'Placeholder_1', 'Placeholder_2', 'Placeholder_3', 'Placeholder_4', 'Placeholder_5', 'Placeholder_6', 'Placeholder_7', 'Placeholder_8', 'Placeholder_9', 'Placeholder_10', 'Placeholder_11', 'Placeholder_12', 'Placeholder_13', 'Placeholder_14', 'Placeholder_15', 'Placeholder_16', 'Placeholder_17', 'Placeholder_18', 'Placeholder_19', 'Placeholder_20', 'Placeholder_21', 'Placeholder_22', 'Placeholder_23', 'Placeholder_24', 'Placeholder_25', 'Placeholder_26', 'Placeholder_27', 'Placeholder_28', 'Placeholder_29', 'Placeholder_30', 'Placeholder_31', 'Placeholder_32', 'Placeholder_33', 'Placeholder_34', 'Placeholder_35', 'Placeholder_36', 'Placeholder_37', 'Placeholder_38', 'Placeholder_39', 'Placeholder_40', 'Placeholder_41', 'Placeholder_42', 'Placeholder_43', 'Placeholder_44', 'Placeholder_45', 'Placeholder_46', 'Placeholder_47', 'Placeholder_48', 'Placeholder_49', 'Placeholder_50', 'Placeholder_51', 'Placeholder_52', 'Placeholder_53', 'Placeholder_54', 'Placeholder_55', 'Placeholder_56', 'Placeholder_57', 'Placeholder_58', 'Placeholder_59', 'Placeholder_60', 'Placeholder_61', 'Placeholder_62', 'Placeholder_63', 'Placeholder_64', 'Placeholder_65', 'save/filename', 'save/Const']

So, my intuition is towards picking the Placeholder:0 and the 'main_level/agent/main/target/gradients/main_level/agent/main/target/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg_grad/LogSoftmax:0' tensor by using get_tensor_by_name, but I'm not sure on how to interpret all that information and how to be certain.

I think that this feature is crucial so that the framework can complete the creation and development cycle and be further developed in a PR or at least upgraded not directly in rl_coach but with TF (my idea would be to just give explicit names to the tensors that are needed to make this happen > i.e. the first one and the final one).

¿Any thougts on this? @gal-leibovich @galnov and others I can try to help it happen on my side, but I don't know your ideas regarding this important core part of coach.

If there is another way to do it (I'm aware that it could be done loading all the coach framework, something like:)

### Create all the graph and then restore_checkpoint(). ### Get the observation... action_info = coach.graph_manager.get_agent().choose_action(observation) print("State:{}, Action:{}".format(observation,action_info.action))

If that is possible and "the way" to go, it would be awesome to create a mini-tutorial on how to load a pretained model, once exited the training.
opened by Eriz11 9

Direct Future Prediction

I get a strange error when I try to run DFP on a custom environment (with a Discrete action space).

AttributeError                            Traceback (most recent call last)
<ipython-input-35-2055451d4b4a> in <module>()
     22     agent_params=agent_params,
     23     env_params=env_params,
---> 24     schedule_params=SimpleSchedule()
     25 )

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/rl_coach/graph_managers/basic_rl_graph_manager.py in __init__(self, agent_params, env_params, schedule_params, vis_params, preset_validation_params)
     39 
     40         self.agent_params.visualization = vis_params
---> 41         if self.agent_params.input_filter is None:
     42             self.agent_params.input_filter = env_params.default_input_filter()
     43         if self.agent_params.output_filter is None:

AttributeError: 'DFPAlgorithmParameters' object has no attribute 'input_filter'

The invocation is as follows:

# define the environment parameters
bit_length = 10
env_params = GymVectorEnvironment(level='./custom.py')
env_params.additional_simulator_parameters = { 'num_states': 100}

agent_params = DFPAlgorithmParameters()

graph_manager = BasicRLGraphManager(
    agent_params=agent_params,
    env_params=env_params,
    schedule_params=SimpleSchedule()
)

opened by dmadeka 8

Now able to use and create custom tensorflow heads, embedders, and middleware.
Ref #134

I modified the following classes:

HeadParameters

MiddlewareParameters

InputEmbedderParameters

Adding a path property (or function as I mention in challenges).

Then I modified:

GeneralTensorFlowNetwork.get_input_embedder

GeneralTensorFlowNetwork.get_middleware

GeneralTensorFlowNetwork.get_output_head

To use these paths instead of their own local's.

I moved a local dictionary inside GeneralTensorFlowNetwork.get_input_embedder called mod_names to embedder_parameters.MOD_NAMES so that it's more accessible.

Challenges

InputEmbedderParameters.path can not be a property like the rest. You can call it with emb_type and the path will be created. But that's different than how most path's are made.

Pytest

I ran pytest locally and do not see any dramatic changes in the number of passing tests.
opened by ryanpeach 8

Installation using pip failed

After running command pip3 install rl_coach, I got the following error message:

Collecting rl_coach
  Downloading https://files.pythonhosted.org/packages/95/c9/3e92accfc8f967cda8fd37632ec7ec0a4b5ba71e5a8a4a6df2390adba625/rl-coach-0.10.0.4.tar.gz (223kB)
    100% |████████████████████████████████| 225kB 258kB/s
    Complete output from command python setup.py egg_info:
    /bin/sh: 1: pip: not found
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-kocjns51/rl-coach/setup.py", line 63, in <module>
        shell=True)
      File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['pip install https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow-1.6.0-cp35-cp35m-linux_x86_64.whl']' returned non-zero exit status 127

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-kocjns51/rl-coach/

Does anybody encounter the same problem?

opened by ybj14 8

some wonderful algorithm

https://github.com/pathak22/noreward-rl https://pathak22.github.io/noreward-rl/ realAI for Deep Reinforcement Learning ICM algorithm ? （Curiosity-driven Exploration for Deep Reinforcement Learning - realAI
enhancement

opened by zdx3578 8

Cannot import minio.error ResponseError

Hi experts, I just refer to the tutorials and found this error running it. Do I must have minio working to use Coach RL? May I know how do I solve this? Is it only for visualization? What lines could I remove to make it work?

Environment:

Ubuntu 18.04
minio==7.0.2
rl-coach==1.0.1

Traceback (most recent call last):
  File "batch_rl.py", line 13, in <module>
    from rl_coach.agents.ddqn_bcq_agent import DDQNBCQAgentParameters, KNNParameters
  File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/agents/ddqn_bcq_agent.py", line 25, in <module>
    from rl_coach.graph_managers.batch_rl_graph_manager import BatchRLGraphManager
  File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/graph_managers/batch_rl_graph_manager.py", line 26, in <module>
    from rl_coach.graph_managers.graph_manager import ScheduleParameters
  File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py", line 35, in <module>
    from rl_coach.data_stores.data_store_impl import get_data_store as data_store_creator
  File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/data_stores/data_store_impl.py", line 19, in <module>
    from rl_coach.data_stores.s3_data_store import S3DataStore, S3DataStoreParameters
  File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/data_stores/s3_data_store.py", line 21, in <module>
    from minio.error import ResponseError
ImportError: cannot import name 'ResponseError'

from minio.error import ResponseError
ImportError: cannot import name 'ResponseError'

opened by HYDesmondLiu 7

The reward function in carla_environment.py

Hi, recently i am concerned on my graduation project in CARLA, I have noticed that the reward function of CARLA in coach was totally different from the formula introduced by "CARLA: An Open Urban Driving Simulator". While in the implementation of carla_environment.py, I saw the reward was calculated in this way:

self.reward = speed_reward - (measurements.player_measurements.intersection_otherlane * 5) - (measurements.player_measurements.intersection_offroad * 5) - is_collision * 100 - np.abs(self.control.steer) * 10

Honestly, I have trained my agent based on the reward formula of CARLA's paper, it seemed he needs many episodes to run util produce a good performance, sometimes, it even couldn't converge, although I used the similar network in DDPG algorithm. Could you explain why you chose this reward formula? I really appreciate that. @galnov @galleibo-intel @shadiendrawis @itaicaspi

opened by fangchuan 7
Changes to avoid memory leak in Rollout worker

Currently in rollout worker, we call restore_checkpoint repeatedly to load the latest model in memory. The restore checkpoint functions calls checkpoint_saver. Checkpoint saver uses GlobalVariablesSaver which does not release the references of the previous model variables. This leads to the situation where the memory keeps on growing before crashing the rollout worker.

This change avoid using the checkpoint saver in the rollout worker as I believe it is not needed in this code path.

Also added a test to easily reproduce the issue using CartPole example. We were also seeing this issue with the AWS DeepRacer implementation and the current implementation avoid the memory leak there as well.

opened by x77a1 7
ImportError: cannot import name 'ResponseError' from 'minio.error' when rl-coach installed with pip

I saw this issue was closed earlier, but I still receiving it with the version coming from pip.

ImportError: cannot import name 'ResponseError' from 'minio.error'

It can be solved manually after install by replacing all "ResponseError" to "InvalidResponseError" in /rl_coach/data_stores/s3_data_store.py

opened by AmetistDrake 0

ERROR: No matching distribution found for tensorflow-gpu==1.9.0

Collecting joblib>=0.17.0
  Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu<=1.14.0,>=1.9.0 (from rl-coach) (from versions: 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.8.0rc0, 2.8.0rc1, 2.8.0)
ERROR: No matching distribution found for tensorflow-gpu<=1.14.0,>=1.9.0

Seems like the joblib library is deprecated, and using old version of tensorflow.

opened by AmetistDrake 0

Categorical DQN - dimension error

Hi,

I don't post issues very often, so I hope my problem is clear enough the way I present it below. When trying to train a Categorical DQN (for Batch RL, no interaction with environment), I run into the following error:

_Traceback (most recent call last):

File "", line 129, in graph_manager.improve()

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\batch_rl_graph_manager.py", line 234, in improve self.train()

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in train [manager.train() for manager in self.level_managers]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in [manager.train() for manager in self.level_managers]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in train [agent.train() for agent in self.agents.values()]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in [agent.train() for agent in self.agents.values()]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\agent.py", line 741, in train total_loss, losses, unclipped_grads = self.learn_from_batch(batch)

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 113, in learn_from_batch self.q_values.add_sample(self.distribution_prediction_to_q_values(TD_targets))

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 82, in distribution_prediction_to_q_values return np.dot(prediction, self.z_values)

File "<array_function internals>", line 6, in dot

ValueError: shapes (128,2) and (51,) not aligned: 2 (dim 1) != 51 (dim 0)_

The 2 (dim 1) is the number of actions in my ActionSpace, and the 51 (dim 0) corresponds to the number of atoms set in the agent's parameters. So the error suggests that these should be of equal length, which seems strange to me. Is this indeed true? Should these be of the same length? When setting the numbers of atoms to 2 (to get rid of this error) I got the following error:

_Traceback (most recent call last):

File "", line 129, in graph_manager.improve()

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\batch_rl_graph_manager.py", line 234, in improve self.train()

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in train [manager.train() for manager in self.level_managers]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in [manager.train() for manager in self.level_managers]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in train [agent.train() for agent in self.agents.values()]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in [agent.train() for agent in self.agents.values()]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\agent.py", line 741, in train total_loss, losses, unclipped_grads = self.learn_from_batch(batch)

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 116, in learn_from_batch target_actions = np.argmax(self.distribution_prediction_to_q_values(distributional_q_st_plus_1), axis=1)

File "<array_function internals>", line 6, in argmax

File "C:\Users\colin.conda\envs\py36\lib\site-packages\numpy\core\fromnumeric.py", line 1188, in argmax return _wrapfunc(a, 'argmax', axis=axis, out=out)

File "C:\Users\colin.conda\envs\py36\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc return bound(*args, **kwds)

AxisError: axis 1 is out of bounds for array of dimension 1_

I tried setting the axis to zero, but this results in more complex errors, so I assumed this is not the way to go. Does anyone have a clue how I can fix this error? Any suggestions would be of great help, thanks in advance!

opened by Colin1998 0
How to load a pretrained model (e.g. SAC) pb file to coach and continue to train ?

Now I have a mode.pb file and know its network architecture (which is trained by coach before) but I have no access to its original code. I want to load it by coach and write some code to continue to train this model. How can that be accomplished?

opened by Currycurrycurry 1

Releases(v1.0.0)

v1.0.0(Jul 24, 2019)

TD3 New APIs for Coach usage as a library Updated Getting Started tutorial Batch RL tutorial
Source code(tar.gz)
Source code(zip)
v0.12.1(May 30, 2019)

Fixes for breaking API changes (OpenAI Gym, Scipy) OPE: Weighted Importance Sampling Creating a dataset using an agent Printing input size as part of network summary
Source code(tar.gz)
Source code(zip)
v0.12.0(May 1, 2019)

ACER Soft Actor-Critic BCQ Batch RL Off-policy evaluation (estimators: DM, DR, Sequential DR, IPS)
Source code(tar.gz)
Source code(zip)
v0.11.2(May 1, 2019)

Intel Tensorflow fix.
Source code(tar.gz)
Source code(zip)
v0.11.1(Jan 24, 2019)

Roll out worker memory leak fix wxPython dependency removal
Source code(tar.gz)
Source code(zip)
v0.11.0(Nov 27, 2018)

Horizontal scaling MxNet support ONNX export New documentation
Source code(tar.gz)
Source code(zip)
v0.10.0(Aug 26, 2018)
A complete redesign - non-backward compatible. Enabling multi-agent support.

New features -

PIP package

Benchmarks

Hierarchical Reinforcement Learning (demonstrated by Hierarchical Actor-Critic)

Tutorials

Shared memory (e.g. Replay Buffer) between workers

Tests (unit-tests, reward-based tests, trace-based tests)

Using Coach as a library (see example here)

New Environments -

Toy Environments (Exploration Chain, BitFlip)

DeepMind PySC2 support (Starcraft 2)

DeepMind Control Suite

New Algorithms -

Hindsight Experience Replay

Prioritized Experience Replay

Hierarchical Actor-Critic

UCB with Q-Ensembles

Source code(tar.gz)
Source code(zip)
v0.9.0(Dec 19, 2017)
New features -

CARLA 0.7 simulator integration

Human control of the game play

Recording of human game play and storing / loading the replay buffer

Behavioral cloning agent and presets

Golden tests for several presets

Selecting between deep / shallow image embedders

Rendering through pygame (with some boost in performance)

API changes -

Improved environment wrapper API

Added an evaluate flag to allow convenient evaluation of existing checkpoints

Improve frameskip definition in Gym

Bug fixes -

Fixed loading of checkpoints for agents with more than one network

Fixed the N Step Q learning agent python3 compatibility

Source code(tar.gz)
Source code(zip)
v0.8.0(Oct 19, 2017)

Initial public release
Source code(tar.gz)
Source code(zip)

Owner

Intel Labs

GitHub https://intellabs.github.io/coach/

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

13.5k Jan 7, 2023

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

3.7k Jan 1, 2023

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

10k Jan 7, 2023

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

1.1k Dec 24, 2022

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

2.4k Dec 29, 2022

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.

Reinforcement Learning (PyTorch) ?? + ?? = ❤️ This repo will contain PyTorch implementation of various fundamental RL algorithms. It's aimed at making

123 Dec 23, 2022

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Related tags

Overview

Coach

Table of Contents

Benchmarks

Installation

Getting Started

Tutorials and Documentation

Basic Usage

Running Coach

Running Coach Dashboard (Visualization)

Distributed Multi-Node Coach

Batch Reinforcement Learning

Supported Environments

Supported Algorithms

Value Optimization Agents

Policy Optimization Agents

General Agents

Imitation Learning Agents

Hierarchical Reinforcement Learning Agents

Memory Types

Exploration Techniques

Citation

Contact

Disclaimer

Comments

Ref #134

Challenges

Pytest

Releases(v1.0.0)

v1.0.0(Jul 24, 2019)

v0.12.1(May 30, 2019)

v0.12.0(May 1, 2019)

v0.11.2(May 1, 2019)

v0.11.1(Jan 24, 2019)

v0.11.0(Nov 27, 2018)

v0.10.0(Aug 26, 2018)

v0.9.0(Dec 19, 2017)

v0.8.0(Oct 19, 2017)

Owner

Intel Labs

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.

Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

A toolkit for reproducible reinforcement learning research.

An open source robotics benchmark for meta- and multi-task reinforcement learning

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Tensorforce: a TensorFlow library for applied reinforcement learning

TensorFlow Reinforcement Learning

Deep Reinforcement Learning for Keras.

ChainerRL is a deep reinforcement learning library built on top of Chainer.

Open world survival environment for reinforcement learning

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Fully Automated YouTube Channel ▶️with Added Extra Features.

Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

piSTAR Lab is a modular platform built to make AI experimentation accessible and fun. (pistar.ai)