Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Dominik Schmidt

Last update: Dec 21, 2022

Related tags

Overview

Rainbow 🌈

An implementation of Rainbow DQN which outperforms the paper's (Hessel et al. 2017) results on 40% of tested games while using 20x less data. This was developed as part of an undergraduate university course on scientific research and writing. The results are also available as a spreadsheet here. A selection of videos is available here.

Key Changes and Results

We implemented the large IMPALA CNN with 2x channels from Espeholt et al. (2018).
The implementation uses large, vectorized environments, asynchronous environment interaction, mixed-precision training, and larger batch sizes to reduce training time.
Integrations and recommended preprocessing for >1000 environments from gym, gym-retro and procgen are provided.
Due to compute and time constraints, we only trained for 10M frames (compared to 200M in the paper).
We implemented all components apart from distributional RL (we saw mixed results with C51 and QR-DQN).

When trained for only 10M frames, this implementation outperforms:


google/dopamine	trained for 10M frames	on 96% of games
google/dopamine	trained for 200M frames	on 64% of games
Hessel, et al. (2017)	trained for 200M frames	on 40% of games
Human results		on 72% of games

Most of the observed performance improvements compared to the paper come from switching to the IMPALA CNN as well as some hyperparameter changes (e.g. the 4x larger learning rate).

Setup

Install necessary prerequisites with

sudo apt install zlib1g-dev cmake unrar
pip install wandb gym[atari]==0.18.0 imageio moviepy torchsummary tqdm rich procgen gym-retro torch stable_baselines3 atari_py==0.2.9

If you intend to use gym Atari games, you will need to install these separately, e.g., by running:

wget http://www.atarimania.com/roms/Roms.rar 
unrar x Roms.rar
python -m atari_py.import_roms .

To set up gym-retro games you should follow the instructions here.

How to use

To get started right away, run

python train_rainbow.py --env_name gym:Qbert

This will train Rainbow on Atari Qbert and log all results to "Weights and Biases" and the checkpoints directory.

Please take a look at common/argp.py or run python train_rainbow.py --help for more configuration options.

Some Notes

With a single RTX 2080 and 12 CPU cores, training for 10M frames takes around 8-12 hours, depending on the used settings
About 15GB of RAM are required. When using a larger replay buffer or subprocess envs, memory use may be much higher
Hyperparameters can be configured through command line arguments; defaults can be found in common/argp.py
For fastest training throughput use batch_size=512, parallel_envs=64, train_count=1, subproc_vecenv=True

Acknowledgements

We are very grateful to the TU Wien DataLab for providing the majority of the compute resources that were necessary to perform the experiments.

Here are some other implementations and resources that were helpful in the completion of this project:

OpenAI Baselines (especially for preprocessing and Atari wrappers)
https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn_atari.py
https://github.com/Kaixhin/Rainbow/
https://github.com/Kaixhin/Rainbow/wiki/Matteo's-Notes

You might also like...

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Automatic, Readable, Reusable, Extendable Machin is a reinforcement library designed for pytorch. Build status Platform Status Linux Windows Supported

348 Dec 24, 2022

Efficient Lottery Ticket Finding: Less Data is More

The lottery ticket hypothesis (LTH) reveals the existence of winning tickets (sparse but critical subnetworks) for dense networks, that can be trained in isolation from random initialization to match the latter’s accuracies.

20 Sep 4, 2022

This is project is the implementation of the DeepShift: Towards Multiplication-Less Neural Networks paper

DeepShift This is project is the implementation of the DeepShift: Towards Multiplication-Less Neural Networks paper, that aims to replace multiplicati

88 Dec 23, 2022

Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"

The Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more" Arxiv preprint Louay Hazami · Rayhane Mama · Ragavan Thurairatn

144 Dec 23, 2022

Deploy a ML inference service on a budget in less than 10 lines of code.

BudgetML is perfect for practitioners who would like to quickly deploy their models to an endpoint, but not waste a lot of time, money, and effort trying to figure out how to do this end-to-end.

1.3k Dec 25, 2022

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. Now with tensorflow 1.0 support. Evaluation usa

349 Aug 6, 2022

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022

Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Train neural network for semantic segmentation (deep lab V3) with pytorch in 50 lines of code Train net semantic segmentation net using Trans10K datas

17 Dec 19, 2022

Collect super-resolution related papers, data, repositories

1.7k Jan 3, 2023

Comments

procgen

Is it possible to create the testing env for procgen so that you can see the generalization performance? I am having difficult implementing it, since i'm using from 'procgen import ProcgenEnv".

Thank you

opened by hlsafin 2

Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Related tags

Overview

Rainbow 🌈

Key Changes and Results

Setup

How to use

Some Notes

Acknowledgements

You might also like...

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Efficient Lottery Ticket Finding: Less Data is More

This is project is the implementation of the DeepShift: Towards Multiplication-Less Neural Networks paper

Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"

Deploy a ML inference service on a budget in less than 10 lines of code.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Collect super-resolution related papers, data, repositories

Comments

procgen

Owner

Dominik Schmidt

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Official pytorch implementation of Rainbow Memory (CVPR 2021)

Rainbow: Combining Improvements in Deep Reinforcement Learning

BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.

This is the code of using DQN to play Sekiro .

A very short and easy implementation of Quantile Regression DQN

A working implementation of the Categorical DQN (Distributional RL).

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression