[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

MIT Graphics Group

Last update: Jan 7, 2023

Related tags

Overview

PG-MORL

This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control (ICML 2020).

In this paper, we propose an evolutionary learning algorithm to compute a high-quality and dense Pareto solutions for multi-objective continuous robot control problems. We also design seven multi-objective continuous control benchmark problems based on Mujoco, which are also included in this repository. This repository also contains the code for the baseline algorithms in the paper.

Installation

Prerequisites

Operating System: tested on Ubuntu 16.04 and Ubuntu 18.04.
Python Version: >= 3.7.4.
PyTorch Version: >= 1.3.0.
MuJoCo : install mujoco and mujoco-py of version 2.0 by following the instructions in mujoco-py.

Install Dependencies

You can either install the dependencies in a conda virtual env (recomended) or manually.

For conda virtual env installation, simply create a virtual env named pgmorl by:

conda env create -f environment.yml

If you prefer to install all the dependencies by yourself, you could open environment.yml in editor to see which packages need to be installed by pip.

Run the Code

The training related code are in the folder morl. We provide the scripts in scrips folder to run our algorithm/baseline algorithms on each problem described in the paper, and also provide several visualization scripts in scripts/plot folder for you to visualize the computed Pareto policies and the training process.

Precomputed Pareto Results

While you can run the training code the compute the Pareto policies from scratch by following the training steps below, we also provide the precomputed Pareto results for each problem. You can download them for each problem separately in this google drive link and directly visualize them with the visualization instructions to play with the results. After downloading the precomputed results, you can unzip it, create a results folder under the project root directory, and put the downloaded file inside.

Benchmark Problems

We design seven multi-objective continuous control benchmark problems based on Mujoco simulation, including Walker2d-v2, HalfCheetah-v2, Hopper-v2, Ant-v2, Swimmer-v2, Humanoid-v2, and Hopper-v3. A suffix of -v3 indicates a three-objective problem. The reward (i.e. objective) functions in each problem are designed to have similar scales. All environments code can be found in environments/mujoco folder. To avoid conflicting to the original mujoco environment names, we add a MO- prefix to the name of each environment. For example, the environment name for Walker2d-v2 is MO-Walker2d-v2.

Train

The main entrance of the training code is at morl/run.py. We provide a training script in scripts folder for each problem for you to easily start with. You can just follow the following steps to see how to run the training for each problem by each algorithm (our algorithm and baseline algorithms).

Enter the project folder
```
cd PGMORL
```
Activate the conda env:
```
conda activate pgmorl
```
To run our algorithm on Walker2d-v2 for a single run:
```
python scripts/walker2d-v2.py --pgmorl --num-seeds 1 --num-processes 1
```
You can also set other flags as arguments to run the baseline algorithms (e.g. --ra, --moead, --pfa, --random). Please refer to the python scripts for more details about the arguments.
By default, the results are stored in results/[problem name]/[algorithm name]/[seed idx].

Visualization

We provide a script to visualize the computed/downloaded Pareto results.
```
python scripts/plot/ep_obj_visualize_2d.py --env MO-Walker2d-v2 --log-dir ./results/Walker2d-v2/pgmorl/0/
```
You can replace MO-Walker2d-v2 to your problem name, and replace the ./results/Walker2d-v2/pgmorl/0 by the path to your stored results.

It will show a plot of the computed Pareto policies in the performance space. By double-click the point in the plot, it will automatically open a new window and render the simulation for the selected policy.
We also provide a script to help you visualize the evolution process of the policy population.
```
python scripts/plot/training_visualize_2d.py --env MO-Walker2d-v2 --log-dir ./results/Walker2d-v2/pgmorl/0/
```
It will plot the policy population (gray points) in each generation with some other useful information. The black points are the policies on the Pareto front, the green circles are the selected policies to be optimized in next generation, the red points are the predicted offsprings and the green points are the real offsprings. You can interact with the plot with the keyboard. For example, be pressing left/right, you can evolve the policy population by generation. You can refer to the plot scripts for the full description of the allowable operations.

Reproducibility

We run all our experiments on VM instances with 96 Intel Skylake vCPUs and 86.4G memory on Google Cloud Platform without GPU.

Acknowledgement

We use the implementation of pytorch-a2c-ppo-acktr-gail as the underlying PPO implementation and modify it into our Multi-Objective Policy Gradient algorithm.

Citation

If you find our paper or code is useful, please consider citing:

@inproceedings{xu2020prediction,
  title={Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control},
  author={Xu, Jie and Tian, Yunsheng and Ma, Pingchuan and Rus, Daniela and Sueda, Shinjiro and Matusik, Wojciech},
  booktitle={Proceedings of the 37th International Conference on Machine Learning},
  year={2020}
}

Comments

stable-baselines in pytorch-a2c-ppo-acktr-gail

I found that pytorch-a2c-ppo-acktr-gail uses stable-baselines. But the code here is replaced by the baselines of Open AI.

What is the advantage of replacing stable-baselines with baselines?

opened by HONG-ZI 0

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning DouZero is a reinforcement learning framework for DouDizhu (斗地主), t

3.1k Jan 4, 2023

[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

SurRoL IROS 2021 SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning Features dVRK compati

55 Jan 3, 2023

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

34 Oct 8, 2022

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

83 Jan 6, 2023

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

57 Dec 15, 2022

On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

46 Dec 15, 2022

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Preference-Planning-Deep-IRL Introduction Check my portfolio post Dependencies Gym stable-baselines3 PyTorch Usage Take Demonstration python3 record.

9 Oct 26, 2022

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

997 Dec 30, 2022

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

10 Oct 10, 2022

[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

Related tags

Overview

PG-MORL

Installation

Prerequisites

Install Dependencies

Run the Code

Precomputed Pareto Results

Benchmark Problems

Train

Visualization

Reproducibility

Acknowledgement

Citation

You might also like...

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

On the model-based stochastic value gradient for continuous reinforcement learning

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Comments

stable-baselines in pytorch-a2c-ppo-acktr-gail

Owner

MIT Graphics Group

Control-Raspberry-Pi-Robot-using-Hand-Gestures - A 4WD Robot car based on Raspberry Pi that controlled by hand gestures(using openCV and mediapipe)

Multi-objective gym environments for reinforcement learning.

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).