ProMP: Proximal Meta-Policy Search

Overview

Build Status Docs

ProMP: Proximal Meta-Policy Search

Implementations corresponding to ProMP (Rothfuss et al., 2018). Overall this repository consists of two branches:

  1. master: lightweight branch that provides the necessary code to run Meta-RL algorithms such as ProMP, E-MAML, MAML. This branch is meant to provide an easy start with Meta-RL and can be integrated into other projects and setups.
  2. full-code: branch that provides the comprehensive code that was used to produce the experimental results in Rothfuss et al. (2018). This includes experiment scripts and plotting scripts that can be used to reproduce the experimental results in the paper.

The code is written in Python 3 and builds on Tensorflow. Many of the provided reinforcement learning environments require the Mujoco physics engine. Overall the code was developed under consideration of modularity and computational efficiency. Many components of the Meta-RL algorithm are parallelized either using either MPI or Tensorflow in order to ensure efficient use of all CPU cores.

Documentation

An API specification and explanation of the code components can be found here. Also the documentation can be build locally by running the following commands

# ensure that you are in the root folder of the project
cd docs
# install the sphinx documentaiton tool dependencies
pip install requirements.txt
# build the documentaiton
make clean && make html
# now the html documentation can be found under docs/build/html/index.html

Installation / Dependencies

The provided code can be either run in A) docker container provided by us or B) using python on your local machine. The latter requires multiple installation steps in order to setup dependencies.

A. Docker

If not installed yet, set up docker on your machine. Pull our docker container jonasrothfuss/promp from docker-hub:

docker pull jonasrothfuss/promp

All the necessary dependencies are already installed inside the docker container.

B. Anaconda or Virtualenv

B.1. Installing MPI

Ensure that you have a working MPI implementation (see here for more instructions).

For Ubuntu you can install MPI through the package manager:

sudo apt-get install libopenmpi-dev
B.2. Create either venv or conda environment and activate it
Virtualenv
pip install --upgrade virtualenv
virtualenv 
   
    
source 
    
     /bin/activate

    
   
Anaconda

If not done yet, install anaconda by following the instructions here. Then reate a anaconda environment, activate it and install the requirements in requirements.txt.

conda create -n 
   
     python=3.6
source activate 
    

    
   
B.3. Install the required python dependencies
pip install -r requirements.txt
B.4. Set up the Mujoco physics engine and mujoco-py

For running the majority of the provided Meta-RL environments, the Mujoco physics engine as well as a corresponding python wrapper are required. For setting up Mujoco and mujoco-py, please follow the instructions here.

Running ProMP

In order to run the ProMP algorithm point environment (no Mujoco needed) with default configurations execute:

python run_scripts/pro-mp_run_point_mass.py 

To run the ProMP algorithm in a Mujoco environment with default configurations:

python run_scripts/pro-mp_run_mujoco.py 

The run configuration can be change either in the run script directly or by providing a JSON configuration file with all the necessary hyperparameters. A JSON configuration file can be provided through the flag. Additionally the dump path can be specified through the dump_path flag:

python run_scripts/pro-mp_run.py --config_file 
   
     --dump_path 
    

    
   

Additionally, in order to run the the gradient-based meta-learning methods MAML and E-MAML (Finn et. al., 2017 and Stadie et. al., 2018) in a Mujoco environment with the default configuration execute, respectively:

python run_scripts/maml_run_mujoco.py 
python run_scripts/e-maml_run_mujoco.py 

Cite

To cite ProMP please use

@article{rothfuss2018promp,
  title={ProMP: Proximal Meta-Policy Search},
  author={Rothfuss, Jonas and Lee, Dennis and Clavera, Ignasi and Asfour, Tamim and Abbeel, Pieter},
  journal={arXiv preprint arXiv:1810.06784},
  year={2018}
}

Acknowledgements

This repository includes environments introduced in (Duan et al., 2016, Finn et al., 2017).

Comments
  • is there a script to run experiment with dice for comparison

    is there a script to run experiment with dice for comparison

    Hi, is there a script to run experiments with dice for comparing with other algorithms? I didn't see a script to do this under run_script directory. Thank you

    opened by ghost 6
  • Which name correspond to ProMP in the full_code branch?

    Which name correspond to ProMP in the full_code branch?

    Hi. I am trying to reproduce the results about gradient variance comparison. I know from previous issues that the codes are in https://github.com/jonasrothfuss/ProMP/tree/full_code/experiments/gradient_variance. There seem to be three algorithms tested in run_sweep.py, with the names VPG, VPG_DICE and DICE, respectively.

    But in your paper, you show two curves: LVC and DICE, and I am confused about that. Does the DICE curve correspond to the mane 'DICE' in the script? Then which name does the curve LVC correspond to? Also, what does the third name correspond to?

    opened by NagisaZj 3
  • confusion about experiments in full_code branch

    confusion about experiments in full_code branch

    Hey, could you please check the experiments code in full_code branch, e.g., in the variance comparison experiments, I can't not see where is your proposed algorithm, except dice_maml, vpg_maml, vpd_dice_maml, which are confusing since they seems not appears in paper and names are kind of confusing. Besides, are you assuming that each dimension is independent when computing the variance of gradient?

    opened by ghost 3
  • How to use your model?

    How to use your model?

    Thanks for your code. After runinng the programs in run_scripts folder, we get "Training finished". Could you tell me how to use the model created?

    opened by ghost 2
  • Difference between maml and e-maml

    Difference between maml and e-maml

    Hi, thanks for opensourcing the code!

    I am wondering what's the difference between e-maml and maml code? A line-by-line comparison seems to suggest that they are identical.

    Would appreciate your insights. Thanks!

    opened by hzyjerry 2
  • Question about Figures in the paper and log files

    Question about Figures in the paper and log files

    Hi,

    Thanks for sharing this repo.

    I was wondering which column(s) of progress.csv was used to plot Figure 2,3, and 7? Based on my understanding from the paper, I figure it is "Step_1-AverageReturn", is it right?

    In addation, were log data for Figure 2,3 and 7 collected by running the followings? python run_scripts/maml_run_mujoco.py python run_scripts/pro-mp_run_mujoco.py

    And finally, where were the 3 ranodm seeds?

    That would be great if you please let me know about these.

    Thanks.

    opened by rasoolfa 2
  • It seems that MPI is not really needed cause it is used only in logger

    It seems that MPI is not really needed cause it is used only in logger

    Hey, you say Many components of the Meta-RL algorithm are parallelized either using either MPI or Tensorflow in order to ensure efficient use of all CPU cores. in the document but MPI is only found in logger which means that it is not necessarily used since codes won't be exected in MPI. Am I missing something?

    opened by lhao499 2
  • KeyError: 'default'

    KeyError: 'default'

    python experiments/all_envs_eval/trpo_run_all.py

    WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

    • https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
    • https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue.

    Traceback (most recent call last): File "experiments/all_envs_eval/trpo_run_all.py", line 138, in run_sweep(run_experiment, sweep_params, EXP_NAME, INSTANCE_TYPE) File "/home/ljy/文档/ProMP-full_code/experiment_utils/run_sweep.py", line 27, in run_sweep local_output_dir=os.path.join(config.DATA_DIR, 'local', exp_name)) File "/home/ljy/文档/ProMP-full_code/doodad/doodad/easy_sweep/launcher.py", line 38, in init self.mount_out_s3 = mount.MountS3(s3_path='exp_logs', mount_point=docker_output_dir, output=True) File "/home/ljy/文档/ProMP-full_code/doodad/doodad/mount.py", line 99, in init s3_bucket = AUTOCONFIG.s3_bucket() File "/home/ljy/文档/ProMP-full_code/doodad/doodad/ec2/autoconfig.py", line 16, in s3_bucket return self.config['default']['s3_bucket_name'] File "/home/ljy/anaconda3/envs/mujoco-py/lib/python3.5/configparser.py", line 956, in getitem raise KeyError(key) KeyError: 'default'

    What maybe the cause?

    opened by lujiayou123 1
  • Error of

    Error of "Expired activation key"

    Hi, I have setup ProMP on my server with docker. But when I tried to run the pro-mp_run_point_mass.py script, it returned an error of "Expired activation key". Do you know why this happens? I'm also not sure about where to put my mujoco license. But this problem seems not to be caused by mujoco, as point mass does not require it. Thanks.

    opened by MasterXiong 0
  • Bump joblib from 0.12.2 to 1.2.0 in /docs

    Bump joblib from 0.12.2 to 1.2.0 in /docs

    Bumps joblib from 0.12.2 to 1.2.0.

    Changelog

    Sourced from joblib's changelog.

    Release 1.2.0

    • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

    • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

    • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

    • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

    • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

    • Vendor loky 3.3.0 which fixes several bugs including:

      • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

      • avoiding leaking worker processes in case of nested loky parallel calls;

      • reliability spawn the correct number of reusable workers.

    Release 1.1.0

    • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

    • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

    ... (truncated)

    Commits
    • 5991350 Release 1.2.0
    • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
    • cea26ff CI test the future loky-3.3.0 branch (#1338)
    • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
    • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
    • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
    • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
    • ac09691 [MAINT] various test updates (#1334)
    • 4a314b1 Vendor loky 3.2.0 (#1333)
    • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump joblib from 0.12.2 to 1.2.0

    Bump joblib from 0.12.2 to 1.2.0

    Bumps joblib from 0.12.2 to 1.2.0.

    Changelog

    Sourced from joblib's changelog.

    Release 1.2.0

    • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

    • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

    • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

    • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

    • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

    • Vendor loky 3.3.0 which fixes several bugs including:

      • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

      • avoiding leaking worker processes in case of nested loky parallel calls;

      • reliability spawn the correct number of reusable workers.

    Release 1.1.0

    • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

    • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

    ... (truncated)

    Commits
    • 5991350 Release 1.2.0
    • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
    • cea26ff CI test the future loky-3.3.0 branch (#1338)
    • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
    • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
    • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
    • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
    • ac09691 [MAINT] various test updates (#1334)
    • 4a314b1 Vendor loky 3.2.0 (#1333)
    • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Regarding the Walker2d-Randparams Environment

    Regarding the Walker2d-Randparams Environment

    Thank you for your inspirational great work and open sourcing the code. I have a question regarding the walker random environment. The walker can not walk when the reward is 800, this is the case for all the algorithms that use this environment. When doing a comparative analysis are we supposed to only compare the reward or is the reward scaled for this environment?

    opened by jonathanbrady88 0
  • Why Pickle Environment

    Why Pickle Environment

    Hi, Jonas. Thanks for sharing this great project. I am little confused on why we pickle the environment when creating the process in the MetaParallelEnvExecutor:

            self.ps = [
                 Process(target=worker, args=(work_remote, remote, pickle.dumps(env), envs_per_task, max_path_length, seed))
                 for (work_remote, remote, seed) in zip(self.work_remotes, self.remotes, seeds)]  # Why pass work remotes?
    

    and then unpickle it in the process:

    envs = [pickle.loads(env_pickle) for _ in range(n_envs)]
    

    Why we do not just initialize the environment in the process worker, which eliminates the code for pickle and unpicke.

    opened by 0xJchen 0
  • Run meta_test.py with ProMP-trained policy and get very bad result.

    Run meta_test.py with ProMP-trained policy and get very bad result.

    I ran ppo_run.py and got a .pkl file for HopperRandParamsEnv, of which the average reward was about 200 But when I ran meta_test.py with ProMP-trained policy, the average reward dropped to around 10... I don't understand where the problem is. Can somebody help me?

    opened by han-x 0
Owner
Jonas Rothfuss
Doctoral researcher - Institute of Machine Learning (ETH Zurich) Research emphasis on meta-learning and reinforcement learning
Jonas Rothfuss
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

Alexis David Jacq 163 Dec 26, 2022
ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 9, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
An implementation of the proximal policy optimization algorithm

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 9, 2022
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

Nugroho Dewantoro 9 Jun 6, 2022
Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner This repository is the official implementation of Meta-rPPG: Remote Heart Ra

Eugene Lee 137 Dec 13, 2022
Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

Proximal Backpropagation Proximal Backpropagation (ProxProp) is a neural network training algorithm that takes implicit instead of explicit gradient s

Thomas Frerix 40 Dec 17, 2022
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

null 139 Jan 1, 2023
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Wonyong Jeong 15 Nov 21, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

Google 3.2k Dec 31, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

null 19 Oct 27, 2022
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

null 19 Sep 29, 2022
Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

Jamin Fong 291 Nov 18, 2022
DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos A concise deep reinforcement learning libr

null 329 Jan 3, 2023
A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

NNAISENSE 56 Jan 1, 2023
This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

Stock Trading Market OpenAI Gym Environment with Deep Reinforcement Learning using Keras Overview This project provides a general environment for stoc

Kim, Ki Hyun 769 Dec 25, 2022
PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

Zhengyao Jiang 1.5k Dec 29, 2022
A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Multi-Agent-Deep-Deterministic-Policy-Gradients A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm This

Phil Tabor 159 Dec 28, 2022