On the model-based stochastic value gradient for continuous reinforcement learning

Facebook Research

Last update: Dec 15, 2022

Related tags

Deep Learning svg

Overview

On the model-based stochastic value gradient for continuous reinforcement learning

This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, and Andrew Gordon Wilson and contains the PyTorch source code to reproduce the experiments in our L4DC 2021 paper On model-based stochastic value gradient for continuous reinforcement learning. Videos of our agents are available here.

Setup and dependencies

After cloning this repository and installing PyTorch on your system, you can set up the code with:

python3 setup.py develop

A basic run and analysis

You can start a single local run on the humanoid with:

./train.py env=mbpo_humanoid

This will create an experiment directory in exp/local/<date>/ with models and logging info. Once that has saved out the first model, you can plot a video of the agent with some diagnostic information with the command:

./eval-vis-model.py exp/local/2021.05.07

Reproducing our main experimental results

We have the default hyper-parameters in this repo set to the best ones we found with a hyper-parameter search. The following command reproduces our final results using 10 seeds with the optimal hyper-parameter:

./train.py -m experiment=mbpo_final env=mbpo_cheetah,mbpo_hopper,mbpo_walker2d,mbpo_humanoid,mbpo_ant seed=$(seq -s, 10)

The results from this experiment can be plotted with our notebook nbs/mbpo.ipynb, which can also serve as a starting point for analyzing and developing further methods.

Reproducing our sweeps and ablations

Our main hyper-parameter sweeps are run with hydra's multi-tasking mode and can be launched with the following command after uncommenting the hydra/sweeper line in config/train.yaml:

./train.py -m experiment=full_poplin_sweep

The results from this experiment can be plotted with our notebook nbs/poplin.ipynb.

Citations

If you find this repository helpful for your publications, please consider citing our paper:

@inproceedings{amos2021svg,
  title={On the model-based stochastic value gradient for continuous reinforcement learning},
  author={Amos, Brandon and Stanton, Samuel and Yarats, Denis and Wilson, Andrew Gordon},
  booktitle={L4DC},
  year={2021}
}

Licensing

This repository is licensed under the CC BY-NC 4.0 License.

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

997 Dec 30, 2022

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

1.5k Dec 29, 2022

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

8 Nov 7, 2022

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

4 Apr 15, 2022

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

Official code for Score-Based Generative Modeling through Stochastic Differential Equations

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

130 Dec 2, 2022

Comments

Error running the code
Hi,

I'm trying to run the basic example in your code (on a local machine without slurm), and I'm encountering the following problems:

When I run ./train.py --run env=mbpo_cheetah, the first thing that happens is that temp_cfg, dx_cfg, actor_cfg, and critic_cfg are already initialized by Hydra before being passed to SACSVGAgent.__init__(...). Attempts to instantiate them using hydra.utils.instantiate cause an error. To get around this, I check the type and assign (using copy.deepcopy where necessary) the already-initialized to properties of SACSVGAgent.

The code keeps running once those changes are made, but it eventually throws this error when it reaches actor_loss.backward():

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm(...)`

I'm guessing the cause is a version mismatch with Hydra, and my jury-rigged initialization is causing later problems. Could you please tell me what version of Hydra this code is compatible with, and what additional Hydra plugins I need to install?

Thanks!
opened by gauravmm 1
Adding Code of Conduct file

This is pull request was created automatically because we noticed your project was missing a Code of Conduct file.

Code of Conduct files facilitate respectful and constructive communities by establishing expected behaviors for project contributors.

This PR was crafted with love by Facebook's Open Source Team.
CLA Signed

opened by facebook-github-bot 0
Adding Contributing file

This is pull request was created automatically because we noticed your project was missing a Contributing file.

CONTRIBUTING files explain how a developer can contribute to the project - which you should actively encourage.

This PR was crafted with love by Facebook's Open Source Team.
CLA Signed

opened by facebook-github-bot 0

On the model-based stochastic value gradient for continuous reinforcement learning

Related tags

Overview

On the model-based stochastic value gradient for continuous reinforcement learning

Setup and dependencies

A basic run and analysis

Reproducing our main experimental results

Reproducing our sweeps and ablations

Citations

Licensing

You might also like...

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

Official code for Score-Based Generative Modeling through Stochastic Differential Equations

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

This code is 3d-CNN model that can predict environmental value

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

Comments

Error running the code

Adding Code of Conduct file

Adding Contributing file

Owner

Facebook Research

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Task-based end-to-end model learning in stochastic optimization

[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control