The Reinforcement Learning Working Group is proud to announce the 2019.10 release of garage.
As always, we are actively seeking new contributors. If you use garage, please consider submitting a PR with your algorithm or improvements to the framework.
Summary
Please see the CHANGELOG for detailed information on the changes in this release.
This release contains an immense number of improvements and new features for garage.
It includes:
- PyTorch support, including DDPG and VPG (94% test coverage)
- Flexible new TensorFlow Model API and complete re-write of the TensorFlow neural network library using it (93% test coverage)
- Better APIs for defining, running, and resuming experiments
- New logging API with dowel, which allows a single
log()
call to stream logs of virtually any object to the screen, disk, CSV files, TensorBoard, and more.
- New algorithms including (D)DQN and TD3 in TensorFlow, and DDPG and VPG in PyTorch
- Distribution via PyPI -- you can now
pip install garage
!
Read below for more information on what's new in this release. See Looking forward for more information on what to expect in the next release.
Why we skipped 2019.06
After 2019.02 we made some large, fundamental changes in garage APIs. Around June these APIs were defined, but the library was in limbo, with some components using new APIs and other using old APIs. Rather than release a half-baked version, we decided our time was better spent getting the toolkit in shape for the next release.
We intend to return to our regularly-scheduled release cadence for 2020.02.
PyTorch Support
We added the garage.torch
tree and primitives which allow you to define and train on-policy and off-policy algorithms in PyTorch.
Though the tree is small, the algorithms in this this tree achieve state-of-the-art performance, have 94% test coverage, and use idiomatic PyTorch constructs with garage APIs. Expect to see many more algorithms and primitives in PyTorch in future releases.
garage.tf.Model
API and TensorFlow primitives re-write
The garage.tf.layers
library quickly became a maintenance burden, and was hindering progress in TensorFlow.
To escape from under this unmaintainable custom library, we embarked on a complete re-write of the TensorFlow primitives around a new API called garage.tf.Model
. This new API allows you to use idiomatic TensorFlow APIs to define reusable components for RL algorithms such as Policies and Q-functions.
Defining a new primitive in garage is easier than ever, and most components you want (e.g. MLPs, CNNs, RNNs) already exist as re-usable and composable Model
classes.
Runner API and improvements to experiment snapshotting and resuming
We defined a new Runner API, which unifies how all algorithms, samplers, and environments interact to create an experiment. Using LocalRunner
handles many of the important minutiae of running a successful experiment, including logging, snapshotting, and consistent definitions of batch size and other hyperparameters.
LocalRunner
also makes it very easy to resume an experiment from an arbitrary iteration from disk, either using the Python API, or invoked from command line the garage
command (e.g. garage resume path/to/experiment
).
See the examples for how to run an algorithm using LocalRunner
.
Log anything to anywhere with dowel
We replaced the garage.misc.logger
package with a new flexible logger, which is implemented in a new package called dowel.
dowel has all of the features of the old logger, but a simpler well-defined API, and support logging any object to any number of outputs, provided a handler has been provided for that object and output. For instance, this allows us to log the TensorFlow graph to TensorBoard using a line like logger.log(tf.get_default_graph())
, and a few lines below to log a message to the console like logger.log('Starting training...')
.
Dowel knows how to log key-value pairs, TensorFlow graphs, strings, and even histograms. Defining new logger outputs and input handlers is easy. Currently dowel supports output to the console, text files, CSVs, TensorBoard. Add your own today!
pip install garage
We delivered many improvements to make garage installable using only pip
. You no longer need to run a setup script to install system dependencies, unless you'd like support for MuJoCo. We now automatically release new versions to pip
.
This also means using garage with the environment manager of your choice is easy. We test virtualenv, pipenv, and conda in our CI pipeline to garage can always successfully install in your environment.
Extensive maintainability and documentation improvements
This release includes extensive maintainability and documentation improvements. Most of these are behind-the-scenes, but make an immense difference in the reliability and usability of the toolkit.
Highlights:
- Unit test coverage increased from ~30% to ~80%
- Overall test coverage increased from ~50% to ~85%
- Overall coverage for
garage.tf
and garage.torch
(which is where algorithm-performance critical code lives) is ~94%
- TensorFlow and PyTorch algorithms are benchmarked before every commit to master
- Every primitive is pickleable/snapshottable and this is tested in the CI
- Docstrings added to all major APIs, including type information
- API documentation is automatically generated and posted to https://garage.readthedocs.io
- Large amounts of old and/or unused code deleted, especially from
garage.misc
Who should use this release, and how
Users who want to base a project on a semi-stable version of this software, and are not interested in bleeding-edge features should use the release branch and tags.
Platform support
This release has been tested extensively on Ubuntu 16.04 and 18.04. We have also used it successfully on macOS 10.13, 10.14, and 10.15.
Maintenance Plan
We plan on supporting this branch until at least June 2020. Our support will come mostly in the form of attempting to reproduce and fix critical user-reported bugs, conducting quality control on user-contributed PRs to the release branch, and releasing new versions when fixes are committed.
We haven no intention of performing proactive maintenance such as dependency upgrades, nor new features, tests, platform support, or documentation. However, we welcome PRs to the maintenance branch (release-2019.10
) from contributors wishing see these enhancements to this version of the software.
Hotfixes
We will post backwards-compatible hotfixes for this release to the branch release-2019.10
. New hotfixes will also trigger a new release tag which complies with semantic versioning, i.e. the first hotfix release would be tagged v2019.10.1
, the second would be tagged v2019.10.2
, etc.
We will not add new features, nor remove existing features from the branch release-2019.02
unless it is absolutely necessary for the integrity of the software.
Next release
We hope to release 2-3 times per year, approximately aligned with the North American academic calendar. We hope to release next around early February 2020, e.g. v2020.02
.
Looking forward
The next release of garage will focus primarily on two goals: meta- and multi-task RL algorithms (and associated toolkit support) and stable, well-defined component APIs for fundamental RL abstractions such as Policy
, QFunction
, ValueFunction
, Sampler
, ReplayBuffer
, Optimizer
, etc.
Meta- and Mulit-Task RL
We are adding a full suite of meta-RL and multi-task RL algorithms to the toolkit, and associated toolkit support where necessary. We would like garage to be the gold standard library for meta- and multi-task RL implementations.
As always, all new meta- and multi-task RL algorithms will be thoroughly tested and verified to meet-or-exceed the best state-of-the-art implementation we can find.
Stable and well-defined component APIs
The toolkit has gotten mature-enough that most components have a fully-described formal API or an informal API which all components of that type implement, and large-enough that we have faith that our existing components cover most current RL use cases.
Now we will turn to formalizing the major component APIs and ensuring that the components in garage all conform to these APIs. This will allow us to simplify lots of logic throughout the toolkit, and will make it easier to mix components defined outside garage with those defined inside garage.
Idiomatic TensorFlow model and tensorflow_probability
While the implementation of the primitives using garage.tf.Model
is complete, their external API still uses the old style from rllab which defines a new feedforward graph for every call to a symbolic API. For instance, a call to GaussianMLPPolicy.log_likelihood_sym()
will create a copy of the GaussianMLPPolicy
graph which implements GaussianMLPPolicy.get_action()
(the two graphs share parameters so optimization results are unaffected). This is not idiomatic TensorFlow, and can be a source of confusion for algorithm implementers.
Now that we have stable and well-tested back-end for the primitives, we will embark on simplifying their APIs to only have a single feedforward path. We will also transition to using tensorflow_probability
for modeling stochastic primitives.
Now that TensorFlow has started to define first-party APIs for composable models (specifically tf.Module
and tf.keras.Model
), we will look into integrating these with garage.tf.Model
.
What about TensorFlow 2.0 support?
We intend to support TensorFlow 2.x and eager execution in the near future, but it may take a release or two to get there. We believe that the garage.tf.Model
API already makes writing neural network code for RL nearly as painless as TensorFlow 2.0, so most users won't notice much of a difference.
We suggest that who really need eager execution APIs today should instead focus on garage.torch
.
For the coming release, we will focus on moving all of our algorithms and primitives to using idiomatic TensorFlow and TensorFlow Probability. Our in-progress transition to garage.tf.Model
and idiomatic usage of TensorFlow will drastically reduce the amount of code which changes between TensorFlow 2.x and 1.x, so we will focus on that before embarking on TF2 support. This will also give TensorFlow 2.x APIs time to stabilize, and time for its performance to catch up to TensorFlow 1.x (there is currently a 10-20% performance hit for using eager execution).
If all goes well, we may be able to begin TF2 support around the 2020.06 release. If you are interested in seeing this happen faster, please contact us on the issue tracker and we will get you started helping with the port!
Contributors to this release
- Ryan Julian (@ryanjulian)
- Anson Wong (@ahtsan)
- Nisanth Hegde (@nish21)
- Keren Zhu (@naeioi)
- Zequn Yu (@zequnyu)
- Gitanshu Sardana (@gitanshu)
- Utkarsh Patel (@utkarshjp7)
- Avnish Narayan (@avnishn)
- Linda Wong (@lywong92)
- Yong Cho (@yonghyuc)
- K.R. Zentner (@krzentner)
- Peter Lillian (@pelillian)
- Angel Ivan Gonzalez (@gonzaiva)
- Kevin Cheng (@cheng-kevin)
- Chang Su (@CatherineSue)
- Jonathon Shen (@jonashen)
- Zhanpeng He (@zhanpenghe)
- Shadi Akiki (@shadiakiki1986)
- Nate Pham (@nhanph)
- Dhiaeddine Gharsallah (@dgharsallah)
- @wyjw
Source code(tar.gz)
Source code(zip)