Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov

Last update: Dec 31, 2022

Related tags

Deep Learning implicit_q_learning

Overview

Offline Reinforcement Learning with Implicit Q-Learning

This repository contains the official implementation of Offline Reinforcement Learning with Implicit Q-Learning by Ilya Kostrikov, Ashvin Nair, and Sergey Levine.

If you use this code for your research, please consider citing the paper:

@article{kostrikov2021iql,
    title={Offline Reinforcement Learning with Implicit Q-Learning},
    author={Ilya Kostrikov and Ashvin Nair and Sergey Levine},
    year={2021},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

How to run the code

Install dependencies

pip install -r requirements.txt

See instructions for CUDA.

Run training

Locomotion

python train_offline.py --env_name=halfcheetah-medium-expert-v2 --config=configs/mujoco_config.py

AntMaze

python train_offline.py --env_name=antmaze-large-play-v0 --config=configs/antmaze_config.py --eval_episodes=100 --eval_interval=100000

Kitchen and Adroit

python train_offline.py --env_name=pen-human-v0 --config=configs/kitchen_config.py

Misc

The implementation is based on JAXRL.

Comments

conflicting dependencies between optax and jaxlib

Hi, I got the following error when running pip install -r ./requirements.txt

The conflict is caused by: optax 0.0.9 depends on jaxlib>=0.1.37 optax 0.0.8 depends on jaxlib>=0.1.37 optax 0.0.6 depends on jaxlib>=0.1.37

Could you please take a look? Thank you.

opened by enosair 6
A question about the `sample_actions()`
https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/policy.py#L66

Hi Ilya,

Many thanks for the nice work. I have a question of the sample_actions() function, why do we need the _sample_actions()? Isn't it redundant?

Maybe we can simply:

@functools.partial(jax.jit, static_argnames=('actor_def')) def sample_actions(rng, actor_def, actor_params, observations, temperature): dist = actor_def.apply({'params': actor_params}, observations, temperature) rng, key = jax.random.split(rng) return rng, dist.sample(seed=key)

Further, I tried to reimplement IQL with TrainState. I found that use TrainState is slower than this implementation (~100-200 fps).
opened by fuyw 3
A question about the toy umaze environment in Figure 2?

Hi Ilya,

May I ask about the toy umaze environment in Figure 2.

Is this a self-defined environment, or is it the antmaze-umaze environment in D4RL?

And how do we generate the offline dataset?

Many thanks.

opened by fuyw 2
Potential issue in scaling rewards in train_finetune.py.

Hi @ikostrikov,

thanks again for sharing this.

I have some questions on a potential issue in train_fientune when working with mujoco environments.

I noticed that rewards are not scaled for hopper, halfcheetah or walker2d during online fine-tuning. However, for these tasks, you normalized the rewards in the offline datasets. See e.g., https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/train_finetune.py#L82

Is this intended or I have misunderstood something. Many thanks!

opened by ethanluoyc 2
A small problem

Hi Ilya,

I have a small question about the orthogonal initialization of the policy function.

In pytorch's documentation, it uses a default gain of 5/3 for the tanh activation function.

If we set tanh_squash_distribution = False, then do we need to set the gain to 5/3 for the output layer in the policy network.

means = nn.Dense(self.action_dim, kernel_init=default_init())(outputs).

https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/policy.py#L54

Anyway, this does not matter in practice.

opened by fuyw 0

Owner

Ilya Kostrikov

Post doc

GitHub

MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

MINERVA is an out-of-the-box GUI tool for offline deep reinforcement learning, designed for everyone including non-programmers to do reinforcement learning as a tool.

80 Nov 6, 2022

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Offline Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are

266 Dec 27, 2022

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

14 Sep 16, 2022

PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

ExORL: Exploratory Data for Offline Reinforcement Learning This is an original PyTorch implementation of the ExORL framework from Don't Change the Alg

52 Jan 1, 2023

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Optimal Model Design for Reinforcement Learning This repository contains JAX code for the paper Control-Oriented Model-Based Reinforcement Learning wi

43 Sep 28, 2022

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

4 Apr 15, 2022

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

SOFA This repository is the implementation of SOFA, the Simulator for OFfline leArning and evaluation. Keeping Dataset Biases out of the Simulation: A

22 Nov 23, 2022

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching This is the official PyTorch implementation of SMODICE: Versatile Offline I

14 Aug 30, 2022

Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

A Minimalist Approach to Offline Reinforcement Learning TD3+BC is a simple approach to offline RL where only two changes are made to TD3: (1) a weight

193 Dec 23, 2022

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

1 Oct 24, 2021

RoMA: Robust Model Adaptation for Offline Model-based Optimization

RoMA: Robust Model Adaptation for Offline Model-based Optimization Implementation of RoMA: Robust Model Adaptation for Offline Model-based Optimizatio

9 Oct 31, 2022

Generalized Decision Transformer for Offline Hindsight Information Matching

Generalized Decision Transformer for Offline Hindsight Information Matching [arxiv] If you use this codebase for your research, please cite the paper:

35 Dec 12, 2022

Active Offline Policy Selection With Python

Active Offline Policy Selection This is supporting example code for NeurIPS 2021 paper Active Offline Policy Selection by Ksenia Konyushkova*, Yutian

27 Oct 15, 2022

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Pi Zero Bikecomputer An open-source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+ https://github.com/hishizuka/pizero_bikecompute

264 Jan 2, 2023

Ppq - A powerful offline neural network quantization tool with custimized IR

PPL Quantization Tool(PPL 量化工具) PPL Quantization Tool (PPQ) is a powerful offlin

605 Jan 3, 2023

Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 ?? 原文地址: Deci

14 Nov 22, 2022

Learning Continuous Image Representation with Local Implicit Image Function

LIIF This repository contains the official implementation for LIIF introduced in the following paper: Learning Continuous Image Representation with Lo

1k Dec 25, 2022

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

697 Jan 6, 2023

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

226 Jan 8, 2023

Offline Reinforcement Learning with Implicit Q-Learning

Related tags

Overview

Offline Reinforcement Learning with Implicit Q-Learning

How to run the code

Install dependencies

Run training

Misc

Comments

conflicting dependencies between optax and jaxlib

A question about the `sample_actions()`

A question about the toy umaze environment in Figure 2?

Potential issue in scaling rewards in train_finetune.py.

A small problem

Owner

Ilya Kostrikov

MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

RoMA: Robust Model Adaptation for Offline Model-based Optimization

Generalized Decision Transformer for Offline Hindsight Information Matching

Active Offline Policy Selection With Python

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Ppq - A powerful offline neural network quantization tool with custimized IR

Decision Transformer: A brand new Offline RL Pattern

Learning Continuous Image Representation with Local Implicit Image Function

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.