Implementation of H-UCRL Algorithm

Related tags

Deep Learning hucrl
Overview

Implementation of H-UCRL Algorithm

CircleCI CircleCI Code style: black License

This repository is an implementation of the H-UCRL algorithm introduced in Curi, S., Berkenkamp, F., & Krause, A. (2020). Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning.

To install create a conda environment:

$ conda create -n hucrl python=3.7
$ conda activate hucrl
$ pip install -e .[test,logging,experiments]

For Mujoco (license required) Run:

$ pip install -e .[mujoco]

Running an experiment.

For the inverted pendulum experiment run

$ python exps/inverted_pendulum/run.py

For the mujoco (license required) experiment run

$ python exps/mujoco/run.py --environment ENV_NAME --agent AGENT_NAME --action

We support MBHalfCheetah-v0, MBPusher-v0, MBReacher-v0, MBAnt-v0, MBCartPole-v0, MBHopper-v0, MBInvertedDoublePendulum-v0, MBInvertedPendulum-v0, MBReacher-v0, MBReacher3D-v0, MBSwimmer-v0, MBWalker2d-v0

Citing H-UCRL

If you this repo for your research please use the following BibTeX entry:

@article{curi2020efficient,
  title={Efficient model-based reinforcement learning through optimistic policy search and planning},
  author={Curi, Sebastian and Berkenkamp, Felix and Krause, Andreas},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}
You might also like...
Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S

An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

REDQ source code Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm. Paper link: https://arxiv.org/abs/2101.05

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

AlphaZero-Gomoku This is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) f

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

A PyTorch implementation of
A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

Pytorch implementation of the DeepDream computer vision algorithm
Pytorch implementation of the DeepDream computer vision algorithm

deep-dream-in-pytorch Pytorch (https://github.com/pytorch/pytorch) implementation of the deep dream (https://en.wikipedia.org/wiki/DeepDream) computer

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

Comments
  • experiments on mujoco Pusher

    experiments on mujoco Pusher

    Hi Sebastian, first thanks for your excellent code and paper! However, the BPTT and Data_Augmentation agents fail to accomplish the Pusher task in the simulation and output a very low return, e.g., -416.11. I have only tried these two agents in the Pusher environment, so I am not sure if I run it correctly. E.g., for BPTT agents, I run python exps/mujoco/run.py --environment MBPusher-v0 --agent BPTT --config-file exps/mujoco/config/bptt.yaml

    opened by yesiam-png 18
  • inverted_pendulum/run.py fails to reach the performance as in the paper

    inverted_pendulum/run.py fails to reach the performance as in the paper

    I am interested in your work over the H-UCRL algorithm introduced in Curi, S., Berkenkamp, F., & Krause, A. (2020). Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning. I believe the work is very important.

    Currently I have one question and one bug report.

    My question is how can I run the H-UCRL algorithm for reproduction of the experiments described in the paper, e.g. at Appendix B.2? I understand that the execution "% python exps/inverted_pendulum/run.py" with default value agent_name="mbmpo" corresponds to the experiment of App. B.2.1 though the some parameter configurations may differ. But how about the other experiments? The task is fixed only to InvertedPendulum as in the function https://github.com/sebascuri/hucrl/blob/ec3a1967ca75991b198c41c4f2b8d1678e77307a/exps/inverted_pendulum/util.py#L428 Do we need to implement by ourselves for the other tasks? Or is it possible if we use exps/mujoco/run.py with some modification? Because it seems that MBMPO alg. described in the papers appendix B.1 is not supported; https://github.com/sebascuri/hucrl/blob/ec3a1967ca75991b198c41c4f2b8d1678e77307a/exps/mujoco/run.py#L30-L38 where an Agent class is called from rllib module. I hope I can execute H-UCRL alg. with MBMPO using exps/mujoco/run.py because this way make it easy to compare the other configurations.

    The bug I found is about the execution of exps/inverted_pendulum/run.py. First of all I found that just execution "% python exps/inverted_pendulum/run.py" fails to swing up the pendulum even after 20 episodes. With two modification it succeeded.

    1. Set PLAN_HORIZON to non-zero value e.g. 50 in https://github.com/sebascuri/hucrl/blob/ec3a1967ca75991b198c41c4f2b8d1678e77307a/exps/inverted_pendulum/run.py#L17
    2. Change shape of variable returns from [..., -1, :] to [..., -1] in the source code of rllib https://github.com/sebascuri/rllib/blob/8abca6110fdbc9adeaaff1f92a08e7d97f7fe408/rllib/util/value_estimation.py#L206 This is based on an observation of a shape mismatch when calculating td_error in https://github.com/sebascuri/hucrl/blob/ec3a1967ca75991b198c41c4f2b8d1678e77307a/hucrl/algorithms/mbmpo.py#L104 The second bug is critical but this workaround is unacceptable.

    A minor question is even with the modification above and no action penalty, the Train Return does not reach to 300 as reported in the paper e.g. by Fig.1. but about 220 at best. Is this enough?

    opened by 4kubo 5
  • Unable to Reproduce Results

    Unable to Reproduce Results

    Hi Sebastian, first of all, great paper and thanks for sharing your code! I encountered some issues with simple import errors (different folder names) and/or non-existing attributes in some classes (one example could be "forward_transformations" @166 of hucrl/agent/model_based/model_based_agent.py) I am using your configuration file for the virtual environment and Python3.7. May I ask you to check whether the up-to-date versions of hucrl and rllib are currently working for you?

    opened by RiccarDigno 0
  • Unable to reproduce results in multiple environments

    Unable to reproduce results in multiple environments

    Hi, Sebastian thanks for the great paper. I am trying to reproduce the results in the paper, in order to check some research directions, but so far without success. I am using your configuration file for the virtual environment and Python3.7.
    Attaching a document with the results for CartPole environment, tested on different exploration/agents. Can you figure it out? thanks

    results of hucrl simulations.docx

    opened by tzahishimkin 2
Owner
Sebastian Curi
Even though I am mechanical engineer and finishing my masters on robotics, systems and control, I love to program.
Sebastian Curi
RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow. They have a parallel sampling feature in order to increase computation speed (especially in high-performance computing (HPC)).

Fangjian Li 3 Dec 28, 2021
PyTorch implementation of neural style transfer algorithm

neural-style-pt This is a PyTorch implementation of the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias

null 770 Jan 2, 2023
PyTorch implementation of DeepDream algorithm

neural-dream This is a PyTorch implementation of DeepDream. The code is based on neural-style-pt. Here we DeepDream a photograph of the Golden Gate Br

null 121 Nov 5, 2022
An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

SERank An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow

Zhihu 44 Oct 20, 2022
PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

Code for On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models This repository will reproduce the main results from our pape

Mitch Hill 32 Nov 25, 2022
A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Multi-Agent-Deep-Deterministic-Policy-Gradients A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm This

Phil Tabor 159 Dec 28, 2022
A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

MADGRAD Optimization Algorithm For Tensorflow This package implements the MadGrad Algorithm proposed in Adaptivity without Compromise: A Momentumized,

null 20 Aug 18, 2022
An implementation of the paper "A Neural Algorithm of Artistic Style"

A Neural Algorithm of Artistic Style implementation - Neural Style Transfer This is an implementation of the research paper "A Neural Algorithm of Art

Srijarko Roy 27 Sep 20, 2022
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 6, 2023
The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

Guolz 36 Oct 19, 2022