A working implementation of the Categorical DQN (Distributional RL).

Florin Gogianu

Last update: Sep 20, 2022

Related tags

Overview

Categorical DQN.

Implementation of the Categorical DQN as described in A distributional Perspective on Reinforcement Learning.

Thanks to @tudor-berariu for optimisation and training tricks and for catching two nasty bugs.

Dependencies

You can take a look in the env export file for the full list of dependencies.

Install the game of Catch:

git clone https://github.com/floringogianu/gym_fast_envs
cd gym_fast_envs

pip install -r requirements.txt
pip install -e .

Install visdom for reporting: pip install visdom.

Training

First start the visdom server: python -m visdom.server. If you don't want to install or use visdom make sure you deactivate the display_plots option in the configs.

Train the Categorical DQN with python main.py -cf configs/catch_categorical.yaml.

Train a DQN baseline with python main.py -cf configs/catch_dqn.yaml.

To Do

Migrate to Pytorch 0.2.0. Breaks compatibility with 0.1.12.
Add some training curves.
Run on Atari.
Add proper evaluation.

Results

First row is with batch size of 64, the second with 32. Will run on more seeds and average for a better comparison. Working on adding Atari results.

MacroTools provides a library of tools for working with Julia code and expressions.

MacroTools.jl MacroTools provides a library of tools for working with Julia code and expressions. This includes a powerful template-matching system an

278 Dec 11, 2022

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

👁️ Hindsight AI: Crime Classification With Clip About For Educational Purposes Only This is a recursive neural net trained to classify specific crime

2 Jun 5, 2022

Addon and nodes for working with structural biology and molecular data in Blender.

Molecular Nodes 🧬 🔬 💻 Buy Me a Coffee to Keep Development Going! Join a Community of Blender SciVis People! What is Molecular Nodes? Molecular Node

456 Jan 8, 2023

A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

MPItrampoline MPI wrapper library: MPI trampoline library: MPI integration tests: MPI is the de-facto standard for inter-node communication on HPC sys

31 Dec 22, 2022

ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

3 Oct 6, 2022

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

6.3k Dec 30, 2022

PyTorch implementation of neural style transfer algorithm

neural-style-pt This is a PyTorch implementation of the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias

770 Jan 2, 2023

PyTorch implementation of DeepDream algorithm

neural-dream This is a PyTorch implementation of DeepDream. The code is based on neural-style-pt. Here we DeepDream a photograph of the Golden Gate Br

121 Nov 5, 2022

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

3.9k Jan 5, 2023

Comments

Excessive clamping

It seems like the clamping of the network's output in the update is a bit excessive? Given values of x in [0, 1] (valid probabilities), as x -> 0, log(x) -> -infinity, so clamping the minimum value makes sense, but log(1) = 0, so there's no issues with the max value. Pinging @tudor-berariu as well.

Empirically, this might be an issue. I'm running my Rainbow agent with a minimum clamp of 0.001 (arbitrarily chosen), and get the following rewards and Q-values on Space Invaders (the Q-values are in line with what is reported in the Double DQN paper; unfortunately I do not have reported Q-values for Rainbow):

Whereas when I use a minimum clamp of 0.01 and maximum clamp of 0.99 as in this repo, I get the following, which indicates that this prevents the network from accurately estimating Q (note that this is the first time I've ever seen Q-values so far from what I got above, so the issue clearly lies with the clamping):

opened by Kaixhin 4
categorical update problem when l=u

In def _get_categorical(self, next_states, rewards, mask), when "b" happens to be an integer (e.g., bellman_op clamped to be self.v_max, so b=51), the floor and ceil indexe values, "l" and "u" will be equal. This seems to cause trouble to the distribution projection, as the category "b" will be projected to nowhere.

opened by haiyanyin 2
Fix disappearing probability mass

Closes #1. Not sure if this is the best way to deal with the problem, but it seems to cover a few cases - let me know if you have a better solution.

So to provide an example on the issue, the update relies on qa_probs * (u.float() - b) and qa_probs * (b - l.float()), but if b happens to contain any ints (e.g. in a terminal state where all probability mass is concentrated in one location), then both of these parts of the update turn into qa_probs * 0 and hence the network tries matching a vector of 0s with its softmax output, which will obviously cause problems. Due to the nature of this edge case I believe it has a worse effect on environments with more terminal transitions.

opened by Kaixhin 1
Reduce clamping

Closes #3. As discussed previously, a max clamp shouldn't be needed. I haven't decoupled the effects of a min clamp of 0.01 versus 0.001, but the latter seems to work fine for my experiments.

opened by Kaixhin 0

Owner

Florin Gogianu

Research engineer at Bitdefender, mostly working on reinforcement learning algorithms.

GitHub

This is a clean and robust Pytorch implementation of DQN and Double DQN.

DQN/DDQN-Pytorch This is a clean and robust Pytorch implementation of DQN and Double DQN. Here is the training curve: All the experiments are trained

15 Dec 27, 2022

Distributional Sliced-Wasserstein distance code

Distributional Sliced Wasserstein distance This is a pytorch implementation of the paper "Distributional Sliced-Wasserstein and Applications to Genera

39 Jan 1, 2023

A Distributional Approach To Controlled Text Generation

A Distributional Approach To Controlled Text Generation This is the repository code for the ICLR 2021 paper "A Distributional Approach to Controlled T

102 Jan 7, 2023

A very short and easy implementation of Quantile Regression DQN

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

80 Sep 17, 2022

Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

289 Jan 5, 2023

This is the code of using DQN to play Sekiro .

Update for using DQN to play sekiro 2021.2.2（English Version） This is the code of using DQN to play Sekiro . I am very glad to tell that I have writen

144 Dec 25, 2022

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Automatic, Readable, Reusable, Extendable Machin is a reinforcement library designed for pytorch. Build status Platform Status Linux Windows Supported

348 Dec 24, 2022

A working implementation of the Categorical DQN (Distributional RL).

Related tags

Overview

Categorical DQN.

Dependencies

Training

To Do

Results

You might also like...

MacroTools provides a library of tools for working with Julia code and expressions.

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

Addon and nodes for working with structural biology and molecular data in Blender.

A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

ALBERT-pytorch-implementation - ALBERT pytorch implementation

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

PyTorch implementation of neural style transfer algorithm

PyTorch implementation of DeepDream algorithm

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Comments

Excessive clamping

categorical update problem when l=u

Fix disappearing probability mass

Reduce clamping

Owner

Florin Gogianu

This is a clean and robust Pytorch implementation of DQN and Double DQN.

Distributional Sliced-Wasserstein distance code

A Distributional Approach To Controlled Text Generation

A very short and easy implementation of Quantile Regression DQN

Categorical Depth Distribution Network for Monocular 3D Object Detection

This is the code of using DQN to play Sekiro .

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX

A set of tools for converting a darknet dataset to COCO format working with YOLOX

Tool for working with Y-chromosome data from YFull and FTDNA