Simple (but Strong) Baselines for POMDPs

Tianwei V. Ni

Last update: Dec 29, 2022

Related tags

Deep Learning reinforcement-learning deep-reinforcement-learning pytorch recurrent-neural-networks sac pomdp generalization ppo baselines meta-rl robust-rl td3

Overview

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

Welcome to the POMDP world! This repo provides some simple baselines for POMDPs, specifically the recurrent model-free RL, for the following paper

Paper: arXiv Numeric Results: google drive

by Tianwei Ni, Benjamin Eysenbach and Ruslan Salakhutdinov.

Installation

First download this repo into your local directory (preferably on a cluster or a server) <local_path>. Then we recommend to use a virtual env to install all the dependencies. For example, we install using miniconda:

conda env create -f install.yml
conda activate pomdp

The yaml file includes all the dependencies (e.g. PyTorch, PyBullet) used in our experiments (including compared methods), but there are two exceptions:

To run Cheetah-Vel in meta RL, you have to install MuJoCo with a license
To run robust RL and generalization in RL experiments, you have to install roboschool.
- We found it hard to install roboschool from scratch, therefore we provide a docker file roboschool.sif in google drive that contains roboschool and the other necessary libraries, adapted from SunBlaze repo.
- To download and activate the docker file by singularity on a cluster (on a single server should be similar):
```
# download roboschool.sif from the google drive to envs/rl-generalization/roboschool.sif
# then run singularity shell
singularity shell --nv -H <local_path>:/home envs/rl-generalization/roboschool.sif
```
- Then you can test it by import roboschool in a python3 shell.

General Form to Run Our Implementation of Recurrent Model-Free RL and Compared Methods

Basically, we use .yml file in configs/ folder for each subarea of POMDPs. To run our implementation, in <local_path> simply use

export PYTHONPATH=${PWD}:$PYTHONPATH
python3 policies/main.py configs/<subarea>/<env_name>/<algo_name>.yml

where algo_name specifies the algorithm name:

sac_rnn and td3_rnn correspond to our implementation of recurrent model-free RL
ppo_rnn and a2c_rnn correspond to (Kostrikov, 2018) implementation of recurrent model-free RL
vrm corresponds to VRM compared in "standard" POMDPs
varibad corresponds the off-policy version of original VariBAD compared in meta RL
MRPO correspond to MRPO compared in robust RL

We have merged the prior methods above into our repository (there is no need to install other repositories), so that future work can use this single repository to run a number of baselines besides ours: A2C-GRU, PPO-GRU, VRM, VariBAD, MRPO. Since our code is heavily drawn from those prior works, we encourage authors to cite those prior papers or implementations. For the compared methods, we use their open-sourced implementation with their default hyperparameters.

Specific Running Commands for Each Subarea

Please see run_commands.md for details on running our implementation of recurrent model-free RL and also all the compared methods.

A Minimal Example to Run Our Implementation

Here we provide a stand-alone minimal example with the least dependencies to run our implementation of recurrent model-free RL!

Only requires PyTorch and PyBullet, no need to install MuJoCo or roboschool, no external configuration file.

Simply open the Jupyter Notebook example.ipynb and it contains the training and evaluation procedure on a toy POMDP environment (Pendulum-V). It only costs < 20 min to run the whole process.

Details of Our Implementation of Recurrent Model-Free RL: Decision Factors, Best Variants, Code Features

Please see our_details.md for more information on:

How to tune the decision factors discussed in the paper in the configuration files
How to tune the other hyperparameters that are also important to training
Where is the core class of our recurrent model-free RL and the RAM-efficient replay buffer
Our best variants in subarea and numeric results on all the bar charts and learning curves

Acknowledgement

Please see acknowledge.md for details.

Citation

If you find our code useful to your work, please consider citing our paper:

@article{ni2021recurrentrl,
  title={Recurrent Model-Free RL is a Strong Baseline for Many POMDPs},
  author={Ni, Tianwei and Eysenbach, Benjamin and Salakhutdinov, Ruslan},
  year={2021}
}

Contact

If you have any questions, please create an issue in this repo or contact Tianwei Ni ([email protected])

Comments

introduce recurrent sac-discrete
This PR introduces recurrent SAC-discrete algorithm for POMDPs with discrete action space. The code is heavily based on the SAC-discrete open-sourced code https://github.com/ku2482/sac-discrete.pytorch/blob/master/sacd/agent/sacd.py and the SAC-discrete paper https://arxiv.org/abs/1910.07207

We provide two sanity checks on classic gym discrete control environments: CartPole-v0 and LunarLander-v2. The commands for running Markovian and recurrent SAC-discrete algorithms are:

# CartPole python3 policies/main.py --cfg configs/pomdp/cartpole/f/mlp.yml --target_entropy 0.7 --cuda -1 # CartPole-V python3 policies/main.py --cfg configs/pomdp/cartpole/v/rnn.yml --target_entropy 0.7 --cuda 0 # Lunalander python3 policies/main.py --cfg configs/pomdp/lunalander/f/mlp.yml --target_entropy 0.7 --cuda -1 # Lunalander-V python3 policies/main.py --cfg configs/pomdp/lunalander/v/rnn.yml --target_entropy 0.5 --cuda 0

where target_entropy sets the ratio of target entropy: ratio * log(|A|).

CartPole: Markovian SAC-discrete is quite sensitive to target_entropy but can solve the task with max return 200:

CartPole-V: recurrent SAC-discrete is robust to target_entropy and can solve the task with max return 200 within 10 episodes

Lunalander: Markovian SAC-discrete is sensitive to target_entropy but can solve the task with return over 200:

Lunalander-V: recurrent SAC-discrete is very sensitive to target_entropy but can nearly solve the task in one target_entropy value:
opened by twni2016 5
Handle cases when the episode length < 2

Hello,

I wonder do you have any thoughts on how to handle that case? Currently, you assume it never happens and the code will raise an error for such cases. However, in some domains like the MiniGrid Lava Crossing, for instance, that might happen quite often. Thanks!

opened by hai-h-nguyen 4
Potential bug?

For SACD, can you explain why you do this https://github.com/twni2016/pomdp-baselines/blob/main/policies/models/policy_rnn.py#L235 instead of this (which I think is correct)? https://github.com/twni2016/pomdp-baselines/blob/main/policies/models/policy_rnn.py#L230

opened by hai-h-nguyen 2
Pixel observation with recurrent SAC-Discrete
This PR is not intended to be merged, but as a showcase for support pixel observation with discrete action space, e.g. Atari games.

We take delayed-catch environment as a sanity check, introduced by IMPALA+SR https://arxiv.org/abs/2102.12425 The environment has only terminal reward and requires long-term memory. It has image size of 1x7x7, discrete action of 3, horizon of ~runs*7. We use a simple image encoder for image observation to replace the MLP encoder for vector observation.

We try delayed-catch with 5, 10, 20, 40 runs. The more the runs, the harder the problems. Below are the learning curves of 10, 20, 40 runs for IMPALA and IMPALA+SR (their Fig. 7b).

Our running command:

# We sweep over the following range python3 policies/main.py --cfg configs/pomdp/catch/rnn.yml --noautomatic_entropy_tuning --entropy_alpha [0.1,0.01,0.001]

where we found fixed temperature works much better than auto-tuning it with target entropy in this task. (Still a bit strange why this can work but that cannot; auto-tuning will finally has zero actor gradient).

Delayed-cach with 5 runs: solve it with 100k samples

Delayed-cach with 10 runs: solve it with 400k samples (vs 50M for IMPALA+SR)

Delayed-cach with 20 runs: solve it with 700k samples (vs 100M for IMPALA+SR)

Delayed-cach with 40 runs: after hparam tuning, can solve it with 2M samples (vs 200M for IMPALA+SR)

Different fixed alpha value:

With alpha=0.1:
opened by twni2016 2
Refactor the code on RL

This PR is a major code refactor on policy and RL part. The idea is to remove RL algorithm specific details from policy*.py, e.g. TD3, SAC, SAC-D details. These details are moved to rl directory.

With such decoupling, one can create a new RL algorithm inheriting the RLAlgorithmBase class, without changing the policy* class.

opened by twni2016 0
Introduce RAM-efficient seq buffer
This PR introduces RAM-efficient seq buffer in buffers/seq_replay_buffer_efficient.py, which stores observations only once, unlike the vanilla seq buffer in buffers/seq_replay_buffer_vanilla.py storing twice. Thus it can reduce RAM roughly by 2x, especially useful for large observation space such as pixel inputs.

I test the correctness of this buffer in __main__, if you are interested, please run

python buffers/seq_replay_buffer_efficient.py

to double check.

To use it, set train: buffer_type: "seq_efficient" in config file. The default buffer type is still the vanilla one, to keep consistent with the paper's results. However, we recommend to use the efficient one, especially for atari games.
opened by twni2016 0

Owner

Tianwei V. Ni

Efficient coding excites me. Good research surprises me.

GitHub

Baselines for TrajNet++

TrajNet++ : The Trajectory Forecasting Framework PyTorch implementation of Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TrajNet

183 Jan 5, 2023

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

4.7k Jan 1, 2023

Provide baselines and evaluation metrics of the task: traffic flow prediction

Note: This repo is adpoted from https://github.com/UNIMIBInside/Smart-Mobility-Prediction. Due to technical reasons, I did not fork their code. Introd

11 Nov 2, 2022

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

45 Dec 12, 2022

Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

61.4k Jan 4, 2023

Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

46.1k Feb 13, 2021

TransGAN: Two Transformers Can Make One Strong GAN

[Preprint] "TransGAN: Two Transformers Can Make One Strong GAN", Yifan Jiang, Shiyu Chang, Zhangyang Wang

1.5k Jan 7, 2023

This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage

1.4k Jan 4, 2023

FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

FrankMocap pursues an easy-to-use single view 3D motion capture system developed by Facebook AI Research (FAIR). FrankMocap provides state-of-the-art 3D pose estimation outputs for body, hand, and body+hands in a single system. The core objective of FrankMocap is to democratize the 3D human pose estimation technology, enabling anyone (researchers, engineers, developers, artists, and others) can easily obtain 3D motion capture outputs from videos and images.

1.9k Jan 7, 2023

This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A STRONG BASELINE FOR VEHICLE RE-IDENTIFICATION This paper is accepted to the IEEE Conference on Computer Vision and Pattern Recognition Workshop(CVPR

78 Dec 29, 2022

The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

SSL models are Strong UDA learners Introduction This is the official code of paper "Semi-supervised Models are Strong Unsupervised Domain Adaptation L

26 Dec 26, 2022

This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

Adversarial poison generation and evaluation. This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong

31 Nov 1, 2022

A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

49 Sep 20, 2022

A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

Pytorch ReID Strong, Small, Friendly A tiny, friendly, strong baseline code for Person-reID (based on pytorch). Strong. It is consistent with the new

3.5k Jan 8, 2023

A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

ALLINONE-Det ALLINONE-Det is a general and strong 3D object detection codebase built on OpenPCDet, which supports more methods, datasets and tools (de

5 Nov 3, 2022

The tool under this branch fork can be used to crack devices above A12 and up to A15. After cracking, you can also use SSH channel strong opening tool to open SSH channel and activate it with Demo or Shell script. The file can be extracted from my Github homepage, and the SSH channel opening tool can be extracted from Dr238 account.

Welcome to C0xy-A12-A15-Attack-Tool The tool under this branch fork can be used to crack devices above A12 and up to A15. After cracking, you can also

13 Dec 23, 2022

Simple, but essential Bayesian optimization package

BayesO: A Bayesian optimization framework in Python Simple, but essential Bayesian optimization package. http://bayeso.org Online documentation Instal

74 Dec 5, 2022

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022

Simple (but Strong) Baselines for POMDPs

Related tags

Overview

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

Installation

General Form to Run Our Implementation of Recurrent Model-Free RL and Compared Methods

Specific Running Commands for Each Subarea

A Minimal Example to Run Our Implementation

Details of Our Implementation of Recurrent Model-Free RL: Decision Factors, Best Variants, Code Features

Acknowledgement

Citation

Contact

Comments

introduce recurrent sac-discrete

Handle cases when the episode length < 2

Potential bug?

Pixel observation with recurrent SAC-Discrete

Refactor the code on RL

Introduce RAM-efficient seq buffer

Owner

Tianwei V. Ni

Baselines for TrajNet++

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Provide baselines and evaluation metrics of the task: traffic flow prediction

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Tensors and Dynamic neural networks in Python with strong GPU acceleration

TransGAN: Two Transformers Can Make One Strong GAN

This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

A Strong Baseline for Image Semantic Segmentation

A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

Simple, but essential Bayesian optimization package

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"