Offline Reinforcement Learning with Implicit Q-Learning

Overview

Offline Reinforcement Learning with Implicit Q-Learning

This repository contains the official implementation of Offline Reinforcement Learning with Implicit Q-Learning by Ilya Kostrikov, Ashvin Nair, and Sergey Levine.

If you use this code for your research, please consider citing the paper:

@article{kostrikov2021iql,
    title={Offline Reinforcement Learning with Implicit Q-Learning},
    author={Ilya Kostrikov and Ashvin Nair and Sergey Levine},
    year={2021},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

How to run the code

Install dependencies

pip install -r requirements.txt

See instructions for CUDA.

Run training

Locomotion

python train_offline.py --env_name=halfcheetah-medium-expert-v2 --config=configs/mujoco_config.py

AntMaze

python train_offline.py --env_name=antmaze-large-play-v0 --config=configs/antmaze_config.py --eval_episodes=100 --eval_interval=100000

Kitchen and Adroit

python train_offline.py --env_name=pen-human-v0 --config=configs/kitchen_config.py

Misc

The implementation is based on JAXRL.

Comments
  • conflicting dependencies between optax and jaxlib

    conflicting dependencies between optax and jaxlib

    Hi, I got the following error when running pip install -r ./requirements.txt

    The conflict is caused by: optax 0.0.9 depends on jaxlib>=0.1.37 optax 0.0.8 depends on jaxlib>=0.1.37 optax 0.0.6 depends on jaxlib>=0.1.37

    Could you please take a look? Thank you.

    opened by enosair 6
  • A question about the `sample_actions()`

    A question about the `sample_actions()`

    https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/policy.py#L66

    Hi Ilya,

    Many thanks for the nice work. I have a question of the sample_actions() function, why do we need the _sample_actions()? Isn't it redundant?

    Maybe we can simply:

    @functools.partial(jax.jit, static_argnames=('actor_def'))
    def sample_actions(rng, actor_def, actor_params, observations, temperature):
        dist = actor_def.apply({'params': actor_params}, observations, temperature)
        rng, key = jax.random.split(rng)
        return rng, dist.sample(seed=key)
    

    Further, I tried to reimplement IQL with TrainState. I found that use TrainState is slower than this implementation (~100-200 fps).

    opened by fuyw 3
  • A question about the toy umaze environment in Figure 2?

    A question about the toy umaze environment in Figure 2?

    Hi Ilya,

    May I ask about the toy umaze environment in Figure 2.

    Is this a self-defined environment, or is it the antmaze-umaze environment in D4RL?

    And how do we generate the offline dataset?

    Many thanks.

    opened by fuyw 2
  • Potential issue in scaling rewards in train_finetune.py.

    Potential issue in scaling rewards in train_finetune.py.

    Hi @ikostrikov,

    thanks again for sharing this.

    I have some questions on a potential issue in train_fientune when working with mujoco environments.

    I noticed that rewards are not scaled for hopper, halfcheetah or walker2d during online fine-tuning. However, for these tasks, you normalized the rewards in the offline datasets. See e.g., https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/train_finetune.py#L82

    Is this intended or I have misunderstood something. Many thanks!

    opened by ethanluoyc 2
  • A small problem

    A small problem

    Hi Ilya,

    I have a small question about the orthogonal initialization of the policy function.

    In pytorch's documentation, it uses a default gain of 5/3 for the tanh activation function.

    If we set tanh_squash_distribution = False, then do we need to set the gain to 5/3 for the output layer in the policy network.

    means = nn.Dense(self.action_dim, kernel_init=default_init())(outputs).

    https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/policy.py#L54

    Anyway, this does not matter in practice.

    opened by fuyw 0
Owner
Ilya Kostrikov
Post doc
Ilya Kostrikov
MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

MINERVA is an out-of-the-box GUI tool for offline deep reinforcement learning, designed for everyone including non-programmers to do reinforcement learning as a tool.

Takuma Seno 80 Nov 6, 2022
Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Offline Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are

Michael Janner 266 Dec 27, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

ExORL: Exploratory Data for Offline Reinforcement Learning This is an original PyTorch implementation of the ExORL framework from Don't Change the Alg

Denis Yarats 52 Jan 1, 2023
JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Optimal Model Design for Reinforcement Learning This repository contains JAX code for the paper Control-Oriented Model-Based Reinforcement Learning wi

Evgenii Nikishin 43 Sep 28, 2022
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

SOFA This repository is the implementation of SOFA, the Simulator for OFfline leArning and evaluation. Keeping Dataset Biases out of the Simulation: A

null 22 Nov 23, 2022
PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching This is the official PyTorch implementation of SMODICE: Versatile Offline I

Jason Ma 14 Aug 30, 2022
Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

A Minimalist Approach to Offline Reinforcement Learning TD3+BC is a simple approach to offline RL where only two changes are made to TD3: (1) a weight

Scott Fujimoto 193 Dec 23, 2022
MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

LxzGordon 1 Oct 24, 2021
RoMA: Robust Model Adaptation for Offline Model-based Optimization

RoMA: Robust Model Adaptation for Offline Model-based Optimization Implementation of RoMA: Robust Model Adaptation for Offline Model-based Optimizatio

null 9 Oct 31, 2022
Generalized Decision Transformer for Offline Hindsight Information Matching

Generalized Decision Transformer for Offline Hindsight Information Matching [arxiv] If you use this codebase for your research, please cite the paper:

Hiroki Furuta 35 Dec 12, 2022
Active Offline Policy Selection With Python

Active Offline Policy Selection This is supporting example code for NeurIPS 2021 paper Active Offline Policy Selection by Ksenia Konyushkova*, Yutian

DeepMind 27 Oct 15, 2022
An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Pi Zero Bikecomputer An open-source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+ https://github.com/hishizuka/pizero_bikecompute

hishizuka 264 Jan 2, 2023
Ppq - A powerful offline neural network quantization tool with custimized IR

PPL Quantization Tool(PPL 量化工具) PPL Quantization Tool (PPQ) is a powerful offlin

null 605 Jan 3, 2023
Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 ?? 原文地址: Deci

Irving 14 Nov 22, 2022
Learning Continuous Image Representation with Local Implicit Image Function

LIIF This repository contains the official implementation for LIIF introduced in the following paper: Learning Continuous Image Representation with Lo

Yinbo Chen 1k Dec 25, 2022
This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Differentiable Volumetric Rendering Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page This repos

null 697 Jan 6, 2023
FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

null 226 Jan 8, 2023