mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

Facebook Research

Last update: Jan 4, 2023

Related tags

Deep Learning mbrl-lib

Overview

MBRL-Lib

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms. It provides easily interchangeable modeling and planning components, and a set of utility functions that allow writing model-based RL algorithms with only a few lines of code.

Getting Started

Installation

mbrl-lib is a Python 3.7+ library. To install it, clone the repository,

git clone https://github.com/facebookresearch/mbrl-lib.git

then run

cd mbrl-lib
pip install -e .

If you are interested in contributing, please install the developer tools as well

pip install -e ".[dev]"

Finally, make sure your Python environment has PyTorch (>= 1.7) installed with the appropriate CUDA configuration for your system.

For testing your installation, run

python -m pytest tests/core
python -m pytest tests/algorithms

Mujoco

Mujoco is a popular library for testing RL methods. Installing Mujoco is not required to use most of the components and utilities in MBRL-Lib, but if you have a working Mujoco installation (and license) and want to test MBRL-Lib on it, please run

pip install -r requirements/mujoco.txt

and to test our mujoco-related utilities, run

python -m pytest tests/mujoco

Basic example

As a starting point, check out our tutorial notebook on how to write the PETS algorithm (Chua et al., NeurIPS 2018) using our toolbox, and running it on a continuous version of the cartpole environment.

Provided algorithm implementations

MBRL-Lib provides implementations of popular MBRL algorithms as examples of how to use this library. You can find them in the mbrl/algorithms folder. Currently, we have implemented PETS and MBPO, and we plan to keep increasing this list in the near future.

The implementations rely on Hydra to handle configuration. You can see the configuration files in this folder. The overrides subfolder contains environment specific configurations for each environment, overriding the default configurations with the best hyperparameter values we have found so far for each combination of algorithm and environment. You can run training by passing the desired override option via command line. For example, to run MBPO on the gym version of HalfCheetah, you should call

python main.py algorithm=mbpo overrides=mbpo_halfcheetah

By default, all algorithms will save results in a csv file called results.csv, inside a folder whose path looks like ./exp/mbpo/default/gym___HalfCheetah-v2/yyyy.mm.dd/hhmmss; you can change the root directory (./exp) by passing root_dir=path-to-your-dir, and the experiment sub-folder (default) by passing experiment=your-name. The logger will also save a file called model_train.csv with training information for the dynamics model.

Beyond the override defaults, You can also change other configuration options, such as the type of dynamics model (e.g., dynamics_model=basic_ensemble), or the number of models in the ensemble (e.g., dynamics_model.model.ensemble_size=some-number). To learn more about all the available options, take a look at the provided configuration files.

Note that running the provided examples and main.py requires Mujoco, but you can try out the library components (and algorithms) on other environments by creating your own entry script and Hydra configuration.

Visualization tools

Our library also contains a set of visualization tools, meant to facilitate diagnostics and development of models and controllers. These currently require Mujoco installation, but we are planning to add more support and extensions in the future. Currently, the following tools are provided:

Visualizer: Creates a video to qualitatively assess model predictions over a rolling horizon. Specifically, it runs a user specified policy in a given environment, and at each time step, computes the model's predicted observation/rewards over a lookahead horizon for the same policy. The predictions are plotted as line plots, one for each observation dimension (blue lines) and reward (red line), along with the result of applying the same policy to the real environment (black lines). The model's uncertainty is visualized by plotting lines the maximum and minimum predictions at each time step. The model and policy are specified by passing directories containing configuration files for each; they can be trained independently. The following gif shows an example of 200 steps of pre-trained MBPO policy on Inverted Pendulum environment.
DatasetEvaluator: Loads a pre-trained model and a dataset (can be loaded from separate directories), and computes predictions of the model for each output dimension. The evaluator then creates a scatter plot for each dimension comparing the ground truth output vs. the model's prediction. If the model is an ensemble, the plot shows the mean prediction as well as the individual predictions of each ensemble member.
FineTuner: Can be used to train a model on a dataset produced by a given agent/controller. The model and agent can be loaded from separate directories, and the fine tuner will roll the environment for some number of steps using actions obtained from the controller. The final model and dataset will then be saved under directory "model_dir/diagnostics/subdir", where subdir is provided by the user.
True Dynamics Multi-CPU Controller: This script can run a trajectory optimizer agent on the true environment using Python's multiprocessing. Each environment runs in its own CPU, which can significantly speed up costly sampling algorithm such as CEM. The controller will also save a video if the render argument is passed. Below is an example on HalfCheetah-v2 using CEM for trajectory optimization.

Note that the tools above require Mujoco installation, and are specific to models of type OneDimTransitionRewardModel. We are planning to extend this in the future; if you have useful suggestions don't hesitate to raise an issue or submit a pull request!

Documentation

Please check out our documentation and don't hesitate to raise issues or contribute if anything is unclear!

License

mbrl-lib is released under the MIT license. See LICENSE for additional details about it. See also our Terms of Use and Privacy Policy.

Citing

If you use this project in your research, please cite:

@Article{Pineda2021MBRL,
  author  = {Luis Pineda and Brandon Amos and Amy Zhang and Nathan O. Lambert and Roberto Calandra},
  journal = {Arxiv},
  title   = {MBRL-Lib: A Modular Library for Model-based Reinforcement Learning},
  year    = {2021},
  url     = {https://arxiv.org/abs/2104.10159},
}

Comments

Feature pybullet
Continuation of incomplete PR from https://github.com/facebookresearch/mbrl-lib/pull/87 This is my first time contributing to an open-source project so any advice is welcome, technical or otherwise

Types of changes

[x] Docs change / refactoring / dependency upgrade

[ ] Bug fix (non-breaking change which fixes an issue)

[x] New feature (non-breaking change which adds functionality)

[x] Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

This adds support for PyBullet, an open-source alternative to MuJoCo. MuJoCo-compatible and RobotSchool environments are supported via pybullet-gym.

How Has This Been Tested (if it applies)

python -m pytest tests/pybullet

Checklist

[x] The documentation is up-to-date with the changes I made.

[x] I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).

[ ] All tests passed, and additional code has been covered with new tests.

CLA Signed
opened by dtch1997 44
Add trajectory-based dynamics model
TODO for this WIP PR:

[x] New PID based / linear feedback agent(s)

[ ] Make PID accept vector inputs

[x] Training example

[ ] Migrate example to colab

[ ] Add tests

Types of changes

[ ] Docs change / refactoring / dependency upgrade

[ ] Bug fix (non-breaking change which fixes an issue)

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

I'm collaborating with some folks on Berkeley looking to apply the trajectory-based model to real world robotics, so I wanted to integrate it into this library to give it more longevity.

The paper is here. The core of the paper is proposing a long-term prediction focused dynamics model. The parametrization is:

$$ s_{t+1} = f_\theta(s_0, t, \phi),$$

where $\phi$ are closed form control parameters (e.g. PID)

Potentially this #66 , I think we will need to modify the replay buffer to

store control parameter vector

store time indices (which may be close with the trajectory formulation)

How Has This Been Tested (if it applies)

I am going to build a notebook to validate and demonstrate it, currently it is a fork of the PETS example. I will iterate

Checklist

[ ] The documentation is up-to-date with the changes I made.

[x] I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).

[ ] All tests passed, and additional code has been covered with new tests.

CLA Signed
opened by natolambert 19
MBPO cannot work on HumanoidTruncatedObsEnv and original Humanoid Env[Bug]
Steps to reproduce

I tried to run MBPO on HumanoidTruncatedObsEnv with the default parameters in this repo but the final reward is around 180(seems like random policy and not work)

I tried to run MBPO on original Humanoid env(without truncated obs) and still cannot work

and I have tried different seeds and they all cannot work

Observed Results

The results of episode reward :

Expected Results

The expected results (episode reward) may around 6k

bug
opened by jity16 18
[Bug] PETS not working
Steps to reproduce

install mbrl with python3.8 & mujoco_py 2.0.2.0

python -m mbrl.examples.main algorithm=pets overrides=pets_halfcheetah

Observed Results

env_step,episode_reward,step 1000.0,-224.74164192363065,1 2000.0,-216.55716608141833,2 3000.0,-23.61229154142554,3 4000.0,-226.04264782442579,4 5000.0,299.97272326884257,5 6000.0,-424.2352836475372,6 7000.0,-605.4988140825888,7 8000.0,-276.8960448750668,8 9000.0,-570.0111469500497,9 10000.0,-510.15227529837796,10 11000.0,-521.2191905188236,11 12000.0,-380.6738015630948,12 13000.0,-401.0656166902861,13 14000.0,-342.89326195274214,14 15000.0,-387.0973047072805,15 16000.0,271.654545187927,16 17000.0,-357.9662191309233,17 18000.0,-144.4911364581224,18 19000.0,-227.65608581868534,19 20000.0,-270.1466421280269,20 21000.0,-218.2495164661332,21 22000.0,-291.59770272027646,22 23000.0,5.605493817390425,23 24000.0,-260.5804876267262,24 25000.0,-311.1006996761441,25 26000.0,-87.68273024315891,26 27000.0,-224.6058292677028,27 28000.0,-243.66672977662145,28 29000.0,-417.3611859069211,29 30000.0,-205.45597669987774,30 31000.0,-220.6631462332176,31 32000.0,-306.92107250798256,32 33000.0,-321.6192194136308,33 34000.0,156.56899647240394,34 35000.0,-373.6946869809165,35 36000.0,-297.54081355112413,36 37000.0,-403.86887923659464,37 38000.0,-394.61809157238,38 39000.0,-397.597218596027,39 40000.0,-270.5546716816992,40 41000.0,-275.0500238719418,41 42000.0,-339.1503604637613,42 43000.0,-394.371951392158,43 44000.0,-284.8456374765922,44 45000.0,-230.30455468451476,45 46000.0,-452.69669066476587,46 47000.0,-369.8052064885858,47 48000.0,-277.8216601977107,48 49000.0,83.44271984210994,49 50000.0,-165.98679718221237,50 51000.0,-286.4235189537889,51 52000.0,-420.1238034618763,52 53000.0,-348.4956325925755,53 54000.0,-262.9499726805828,54 55000.0,-82.70856034802993,55 56000.0,-283.44756999937294,56 57000.0,-296.14589401299133,57 58000.0,-310.71395667647914,58 59000.0,-92.32547170477757,59 60000.0,-343.62926472041903,60 61000.0,194.0718436837866,61 62000.0,-449.34500076620725,62 63000.0,-317.03787784175205,63 64000.0,-203.2571831873085,64 65000.0,-90.52911874178189,65 66000.0,-188.53310534801767,66 67000.0,-131.71672373665217,67 68000.0,-241.95741966590174,68 69000.0,-329.25808904770525,69 70000.0,-146.0802349071957,70 71000.0,-474.47665284478336,71 72000.0,-191.43021635327702,72

Expected Results

like results in #97
bug
opened by sofan110 18
Pddm
(WIP) PDDM implementation

[x] Docs change / refactoring / dependency upgrade

[x] New feature (non-breaking change which adds functionality)

Motivation and Context / Related issue

PR for PDDM's MPPI planner, support for sequenced batches, and in the near future proper settings and benchmarks for MuJoCo environments.

Checklist

[x] The documentation is up-to-date with the changes I made.

[x] I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).

[x] MPPI planner

[x] MPPI refinement iterations

[x] PDDM

[x] Support for sequenced batches

[x] Multistage Gaussian MLP loss

[x] Testing for MPPI planer and PDDM

[ ] Benchmarks/Tuning and comparisons with the original implementation

CLA Signed
opened by freiberg-roman 13
Training browser
Types of changes

Adds a simple browser to chart training results from multiple runs

[ ] Docs change / refactoring / dependency upgrade

[ ] Bug fix (non-breaking change which fixes an issue)

[X] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

Adds a quick and easy way to browse/compare results

How Has This Been Tested (if it applies)

I ran a few different training runs, with different algorithms and use this to compare them

Checklist

[ ] The documentation is up-to-date with the changes I made.

[X] I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).

[ ] All tests passed, and additional code has been covered with new tests.

CLA Signed
opened by a3ahmad 12
Support pybullet-based Gym Environments
Don't accept this yet -- this is still a work-in-progress. Remaining work:

General-purpose environment loader:

[ ] Agree on interface

[ ] Refactor mujoco.py

Add support for freezing environments:

[X] Locomotors

[ ] Manipulators

[ ] Pendula

Add documentation for:

[X] Installing/using PyBullet

[ ] Various functions in mujoco.py

[ ] Comparing RobotSchool and MuJoCo-compatible PyBullet environments.

Tests:

[X] Freezing environments.

[ ] Comparison between MuJoCo-compatible PyBullet and actual MuJoCo environments.

Other:

[ ] Gracefully handle case that PyBullet is not installed.

[ ] Properly package pybullet-gym

[ ] setup.py needs to copy 3d assets as well.

[ ] (Optional) Put it on Pip

Types of changes

[X] Docs change / refactoring / dependency upgrade

[ ] Bug fix (non-breaking change which fixes an issue)

[X] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

This adds support for PyBullet, an open-source alternative to MuJoCo. MuJoCo-compatible and RobotSchool environments are supported via pybullet-gym.

How Has This Been Tested (if it applies)

Using this for research.

Checklist

[ ] The documentation is up-to-date with the changes I made.

[ ] I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).

[ ] All tests passed, and additional code has been covered with new tests.

CLA Signed
opened by gauravmm 9
Difference in PETS implementation from the original TF version.

This follows from the conversation in #98. I have noticed some discrepancy between the TF and mbrl-lib implementation of PETS.

Difference in normalization.

https://github.com/kchua/handful-of-trials/blob/master/dmbrl/modeling/utils/TensorStandardScaler.py#L45

In the original version, the normalization is guarded against observation dimensions with small stddev by setting the dimensions with small stddev to 1. This prevents the normalized inputs from exploding when the stddev is small. This happens in environments such as Reacher or Pusher where some observation dimensions consist of goals. In that situation, it seems that the goal is never changing during an episode and the stddev will be 0. Hence setting the small stddev to be 1.0 would be helpful in that case.

Another very subtle thing happening in the above code is that the normalization is performed with NumPy instead of in TF, and I think the inputs here are in float64. In that case, the stddev computation is more accurate than those in float32, so the threshold 1e-12 is sensible. Using PyTorch to perform normalization, for example, would require changes to the threshold. I think some values like 1e-5 would be more appropriate in that case (not backed up by any numerical analysis).

Difference in activation function

The original implementation uses the swish activation function whereas in mbrl-lib we use silu. I am confused about the choice of silu in mbrl-lib and would love to know more about the difference in empirical performance.

Difference in CEM stopping criteria

In the TF implementation, the CEM optimizer uses an additional termination criterion on the variance: https://github.com/kchua/handful-of-trials/blob/77fd8802cc30b7683f0227c90527b5414c0df34c/dmbrl/misc/optimizers/cem.py#L71 I doubt that criterion is ever satisfied during training but I am mentioning this here for completeness.

Difference in optimizer weight decay

The original TF implementation uses a carefully selected set of weight decays for different layers of the dynamics model whereas the decay in mbrl-lib is the same for all layers. However, the original implementation does not add weight decays on the biases. See

https://github.com/kchua/handful-of-trials/blob/master/dmbrl/modeling/layers/FC.py#L219

In PyTorch, the default Adam will add weight decay on all parameters. That also means that they are added to the max_logvar and min_logvar whereas in the TF version the only regularization on the max/min-logvars is through the var_loss.

Maybe a side note, have the authors tried using AdamW instead of Adam for the weight decays? I recently learned that naive weight decay in Adam does not behave as you may expect. See https://arxiv.org/abs/1711.05101

Difference in optimizer parameters

~~The default epsilon in TensorFlow's Adam is 1e-7, https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam~~ Scratch this, they are 1e-8 in TF 1 https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/keras/optimizers/Adam.

Anyway, I am mentioning these here after a thorough look at both mbrl-lib and TF PETS to debug my own JAX implementation. Turns out my mistake was in the MPC code. I hope these notes are useful since the author mentions that currently, the current implementation does not get good performance on Half-Cheetah. Maybe it's because of one of these details, if not, fingers crossed the difference can be spotted by someone else :D

opened by ethanluoyc 8
Using Wrapper Class for Custom GYM Env

I have a custom open AI gym env and I am trying to use mbrl wrapper but getting error name 'model_env_args' is not defined. I am trying to follow example here, https://arxiv.org/pdf/2104.10159.pdf. Here's my code.

import gym import mbrl.models as models import numpy as np net = models.GaussianMLP(in_size=14, out_size=12, device="cpu") wrapper = models.OneDTransitionRewardModel(net, target_is_delta=True, learned_rewards=True) model_env = models.ModelEnv(wrapper, *model_env_args, term_fn=hopper)

opened by MishraIN 7
[Feature Request] Logging of custom training metrics

🚀 Feature Request

When training a model with ModelTrainer, it would be nice to be able to log some custom metrics (ideally in tensorboard), defined by the model (e.g., the values of the individual loss terms if the loss of the model is a sum of multiple terms). Right now one can only access the overall loss of the model.

Motivation

Is your feature request related to a problem? Please describe.

At the moment I am working on a model that optimizes a sum of reconstruction loss, reward prediction loss, and a kl divergence term. For debugging purposes it would be nice to monitor how the individual losses evolve over time. This logging can not be done by the model class on its own since it needs some information from the RL algorithm (e.g. the current iteration of the algorithm / number of samples drawn from the environment) for the logged values to be meaningful.

Pitch

Describe the solution you'd like

The simplest solution certainly is to just allow passing kwargs to ModelTrainer.train(), which are passed through to Model.update(). This would allow to pass some custom logging function / object that then logs values passed by the model implementation. This is of course not the most elegant solution, but the kwargs could also be used for other purposes (e.g. passing some additional information to Model.update() if a model implementation requires this).

Describe alternatives you've considered

An alternative to this would be to let Model.update() return a dictionary of metrics in addition to the loss. This dictionary could then be returned by ModelTrainer.train() or it could be processed by the callback passed to the function. This would of course cause breaking changes since the method signature of Model would need to be changed.

Are you willing to open a pull request? (See CONTRIBUTING) Yes
enhancement

opened by jan1854 7
pets_example.ipynb problem
i run the pets_example.ipynb and what i get the following error:

i am not sure if it's my package's compatible problem. so i am not sure following error is bug or not. python:3.7.10 nmupy: 1.20.1 matplotlib: 3.4.2 torch:1.7.1 py3.7_cuda10.1.243_cudnn7.6.3_0

TypeError: normal() received an invalid combination of arguments when run the main loop i found the model_env arg 'rng' is np.random.default_rng(seed=0), not torch.normal

# Create a gym-like environment to encapsulate the model #model_env = models.ModelEnv(env, dynamics_model, term_fn, reward_fn, rng)

TypeError: can't convert cuda:0 device type tensor to numpy. when run the plot part when the gpu is on, val_score tensor is (0.0023, device='cuda:0') and cause error in plot part

def train_callback(_model, _total_calls, _epoch, tr_loss, val_score, _best_val): train_losses.append(tr_loss) #val_scores.append(val_score.mean()) # this returns val score per ensemble model
opened by app1ep1e 7
[Bug] Centering, scaling and clamping the population in iCEM
Steps to reproduce

Run any example configuration using iCEM as action optimizer, e.g. python -m mbrl.examples.main algorithm=mbpo overrides=pets_icem_cartpole

Observed Results

After sampling according to a powerlaw PSD in iCEM, the population is centered on the mean, scaled to the variance and clamped to be within the action space. This process uses the dummy variable population2. However, it appears that the result is not assigned back to the population variable, and it is hence ignored during the rest of the optimization procedure. As a result, I believe that the population is not correctly sampled, and the objective function can be evaluated on actions that potentially do not belong to the action space.

Expected Results

Centering, scaling and clamping should be applied directly to population instead of population2.

Relevant Code

The relevant lines are L438-L441 in mbrl/planning/trajectory_opt.py

https://github.com/facebookresearch/mbrl-lib/blob/f90a29743894fd6db05e73445af0ed83baa845bc/mbrl/planning/trajectory_opt.py#L438-L441

which I believe could be changed to

population = torch.minimum( population * torch.sqrt(var) + mu, self.upper_bound ) population = torch.maximum(population, self.lower_bound)
bug
opened by marbaga 0
[WIP] HF Hub Integration
Working towards closing #169

Things to do (roughly):

Verify base functionality,

Colab example for loading / saving / visualizing models,

Upload pretrained models to hub from @luisenp.

CLA Signed
opened by natolambert 1
[Feature Request] Upload Dynamics Models to the HuggingFace Hub
🚀 Feature Request

Add functionality to upload dynamics models /policies to the HF hug at end of training or during training for sharing / fine-tuning.

This would like like

model.from_pretrained("mbrl/cheetah.bin") model.save_pretrained("mbrl/hopper.bin")

Motivation

We want to be able to re-use computation and make easier demo's showcasing this library.

Happy to help with this.

Additional context

Add any other context or screenshots about the feature request here.
enhancement
opened by natolambert 6
hyperparameters optimization

🚀 Feature Request

I would like to optimize the hyperparameters on a custom environment for PE-TS and other algorithms.

Motivation

How did you find the optimal hyperparameters for the algorithms? for example PE-TS cartpole

Pitch

PE-TS example I did the grid search for 4 parameters: horizon_size, alpha, number of hidden layers, hidden layer dimension.

problems: what parameters are more crutial to optimize.

Do you have bayesian optimisation script for hyperparamters

Describe alternatives you've considered I can make a pull request for the PE-TS grid search or/and bayesian optmization with optuna library.
enhancement

opened by ss555 1
[Feature Request] Output Normalization / Scaling

🚀 Feature Request

When training non delta-state models, the outputs of dynamics models can take large values (way outside a unit Gaussian). In the past I have tried using output scalars to let the outputs try to learn something close to a unit Gaussian rather than variables with diverse scales.

Motivation

Is your feature request related to a problem? Please describe. I think it would help the PR for the trajectory-based model, #158 .

Pitch

Describe the solution you'd like I think there could be an optional output scalar that acts normally to the input one?

Are you willing to open a pull request? (See CONTRIBUTING) Sure.

Additional context

Add any other context or screenshots about the feature request here.
enhancement

opened by natolambert 4
[Feature Request] Add option to use `functorch` for `BasicEnsemble`

🚀 Feature Request

Change BasicEnsemble to optionally use functorch.vmap.

Motivation and Pitch

Is your feature request related to a problem? Please describe.

BasicEnsemble lets the user provide arbitrary models, which are stacked together using a very naive loop-based implementation. We should be able to do this more efficiently now using functorch.
enhancement good first issue

opened by luisenp 2

Releases(v0.1.5)

v0.1.5(Jan 14, 2022)
Fixes important bug in v0.1.4 that was causing PETS to break.

Model.reset() and Model.sample() signature has changed. They no longer receive TransitionBatch objects, and they both return a dictionary of strings to tensors representing a model state that should be passed to sample() to simulate transitions. This dictionary can contain things like previous actions, predicted observation, latent states, beliefs, and any other such quantity that the model need to maintain to simulate trajectories when using ModelEnv.

Ensemble class and sub-classes are assumed to operate on 1-D models.

Source code(tar.gz)
Source code(zip)
v0.1.4(Sep 27, 2021)
This version adds two new optimizers for CEM:

Improved CEM as described here.

MPPI as used in PDDM.

Changed config structure so that action optimizer is passed as another config file.

Added a new iterator for sequences that returns a fixed number of random batches in every loop.

Source code(tar.gz)
Source code(zip)
v0.1.3(Jul 24, 2021)

This version changes the Model API so that loss, eval_score and update methods return a metadata dictionary that can be used for logging. It also adds the option to use double precision for normalization.
Source code(tar.gz)
Source code(zip)
v0.1.2(Jul 19, 2021)

This version adds a training browser to the diagnostics folder, and a new buffer iterator for batches of sequences of transitions.
Source code(tar.gz)
Source code(zip)
v0.1.1(May 7, 2021)

Source code(tar.gz)
Source code(zip)
v0.1.0(Apr 20, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Facebook Research

GitHub

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

PyVarInf PyVarInf provides facilities to easily train your PyTorch neural network models using variational inference. Bayesian Deep Learning with Vari

342 Dec 2, 2022

Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation

A Comprehensive Experimental Evaluation for Database Configuration Tuning This is the source code to the paper "Facilitating Database Tuning with Hype

9 Oct 29, 2022

A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

210 Dec 21, 2022

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

8 Nov 7, 2022

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

4 Apr 15, 2022

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

81 Nov 26, 2022

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

97 Jan 3, 2023

A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

29.6k Jan 8, 2023

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

4.7k Jan 1, 2023

DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

DI-HPC: Decision Intelligence - High Performance Computation DI-HPC is an acceleration operator component for general algorithm modules in reinforceme

185 Dec 29, 2022

PyTorch implementations of deep reinforcement learning algorithms and environments

Deep Reinforcement Learning Algorithms with PyTorch This repository contains PyTorch implementations of deep reinforcement learning algorithms and env

4.7k Jan 4, 2023

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

183 Dec 28, 2022

Reinforcement learning framework and algorithms implemented in PyTorch.

2.1k Jan 4, 2023

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch RL Minimal Implementations There are implementations of some reinforcement learning algorithms, whose characteristics are as follow: Less pack

4 Dec 31, 2022

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

Related tags

Overview

MBRL-Lib

Getting Started

Installation

Mujoco

Basic example

Provided algorithm implementations

Visualization tools

Documentation

License

Citing

Comments

Types of changes

Motivation and Context / Related issue

How Has This Been Tested (if it applies)

Checklist

Types of changes

Motivation and Context / Related issue

How Has This Been Tested (if it applies)

Checklist

Steps to reproduce

Observed Results

Expected Results

Steps to reproduce

Observed Results

Expected Results

(WIP) PDDM implementation

Motivation and Context / Related issue

Checklist

Types of changes

Motivation and Context / Related issue

How Has This Been Tested (if it applies)

Checklist

Types of changes

Motivation and Context / Related issue

How Has This Been Tested (if it applies)

Checklist

Difference in normalization.

Difference in activation function

Difference in CEM stopping criteria

Difference in optimizer weight decay

Difference in optimizer parameters

🚀 Feature Request

Motivation

Pitch

Steps to reproduce

Observed Results

Expected Results

Relevant Code

🚀 Feature Request

Motivation

Additional context

🚀 Feature Request

Motivation

Pitch

🚀 Feature Request

Motivation

Pitch

Additional context

🚀 Feature Request

Motivation and Pitch

Releases(v0.1.5)

v0.1.5(Jan 14, 2022)

v0.1.4(Sep 27, 2021)

v0.1.3(Jul 24, 2021)

v0.1.2(Jul 19, 2021)

v0.1.1(May 7, 2021)

v0.1.0(Apr 20, 2021)

Owner

Facebook Research

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation

A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

A toolkit for developing and comparing reinforcement learning algorithms.