Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Karush Suri

Last update: Nov 7, 2022

Related tags

Deep Learning cql-jax

Overview

CQL-JAX

This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on top of the SAC base of JAX-RL.

Usage

Install Dependencies-

pip install -r requirements.txt
pip install "jax[cuda111]<=0.21.1" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Run CQL-

python train_offline.py --env_name=hopper-expert-v0 --min_q_weight=5

Please use the following values of min_q_weight on MuJoCo tasks to reproduce CQL results from IQL paper-

Domain	medium	medium-replay	medium-expert
walker	10	1	10
hopper	5	5	1
cheetah	90	80	100

For antmaze tasks min_q_weight=10 is found to work best.

In case of Out-Of Memory errors in JAX, try running with the following env variables-

XLA_PYTHON_CLIENT_MEM_FRACTION=0.80 python ...
XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 python ...

Performance & Runtime

Returns are more or less same as the torch implementation and comparable to IQL-

Task	CQL(PyTorch)	CQL(JAX)	IQL
hopper-medium-v2	58.5	74.6	66.3
hopper-medium-replay-v2	95.0	92.1	94.7
hopper-medium-expert-v2	105.4	83.2	91.5
antmaze-umaze-v0	74.0	69.5	87.5
antmaze-umaze-diverse-v0	84.0	78.7	62.2
antmaze-medium-play-v0	61.2	14.2	71.2
antmaze-medium-diverse-v0	53.7	10.7	70.2
antmaze-large-play-v0	15.8	0.0	39.6
antmaze-large-diverse-v0	14.9	0.0	47.5

Wall-clock time averages to ~50 mins, improving over IQL paper's 80 min CQL and closing the gap with IQL's 20 min.

Task	CQL(JAX)	IQL
hopper-medium-v2	52	27
hopper-medium-replay-v2	54	30
hopper-medium-expert-v2	57	29

Time efficiency over the original torch implementation is more than 4 times.

For more offline RL algorithm implementations, check out the JAX-RL, IQL and rlkit repositories.

Citation

In case you use CQL-JAX for your research, please cite the following-

@misc{cqljax,
  author = {Suri, Karush},
  title = {{Conservative Q Learning in JAX.}},
  url = {https://github.com/karush17/cql-jax},
  year = {2021}
}

References

You might also like...

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching This is the official PyTorch implementation of SMODICE: Versatile Offline I

14 Aug 30, 2022

Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

A Minimalist Approach to Offline Reinforcement Learning TD3+BC is a simple approach to offline RL where only two changes are made to TD3: (1) a weight

193 Dec 23, 2022

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

1 Oct 24, 2021

RoMA: Robust Model Adaptation for Offline Model-based Optimization

RoMA: Robust Model Adaptation for Offline Model-based Optimization Implementation of RoMA: Robust Model Adaptation for Offline Model-based Optimizatio

9 Oct 31, 2022

Generalized Decision Transformer for Offline Hindsight Information Matching

Generalized Decision Transformer for Offline Hindsight Information Matching [arxiv] If you use this codebase for your research, please cite the paper:

35 Dec 12, 2022

Active Offline Policy Selection With Python

Active Offline Policy Selection This is supporting example code for NeurIPS 2021 paper Active Offline Policy Selection by Ksenia Konyushkova*, Yutian

27 Oct 15, 2022

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Pi Zero Bikecomputer An open-source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+ https://github.com/hishizuka/pizero_bikecompute

264 Jan 2, 2023

Ppq - A powerful offline neural network quantization tool with custimized IR

PPL Quantization Tool(PPL 量化工具) PPL Quantization Tool (PPQ) is a powerful offlin

605 Jan 3, 2023

Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 👍 原文地址: Deci

14 Nov 22, 2022

Comments

About the performance gap in `Antmaze` env.

Hi Karush,

I also tried to apply CQL on the antmaze environment. Do you have any idea why there is a performance gap between the pytorch version and the Jax version?

opened by fuyw 0

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Related tags

Overview

CQL-JAX

Usage

Performance & Runtime

Citation

References

You might also like...

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

RoMA: Robust Model Adaptation for Offline Model-based Optimization

Generalized Decision Transformer for Offline Hindsight Information Matching

Active Offline Policy Selection With Python

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Ppq - A powerful offline neural network quantization tool with custimized IR

Decision Transformer: A brand new Offline RL Pattern

Comments

About the performance gap in `Antmaze` env.

Owner

Karush Suri

Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

CLOOB training (JAX) and inference (JAX and PyTorch)

Offline Reinforcement Learning with Implicit Q-Learning

MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

Plug-n-Play Reinforcement Learning in Python with OpenAI Gym and JAX

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.