Generalized Decision Transformer for Offline Hindsight Information Matching

Overview

Generalized Decision Transformer for Offline Hindsight Information Matching

[arxiv]

If you use this codebase for your research, please cite the paper:

@article{furuta2021generalized,
  title={Generalized Decision Transformer for Offline Hindsight Information Matching},
  author={Hiroki Furuta and Yutaka Matsuo and Shixiang Shane Gu},
  journal={arXiv preprint arXiv:2111.10364},
  year={2021}
}

Installation

Experiments require MuJoCo. Follow the instructions in the mujoco-py repo to install. Then, dependencies can be installed with the following command:

conda env create -f conda_env.yml

Downloading datasets

Datasets are stored in the data directory. Install the D4RL repo, following the instructions there. Then, run the following script in order to download the datasets and save them in our format:

python download_d4rl_datasets.py

Run experiments

Run train_cdt.py to train Categorical DT:

python train_cdt.py --env halfcheetah --dataset medium-expert --gpu 0 --seed 0 --dist_dim 30 --n_bins 31 --condition 'reward' --save_model True

python train_cdt.py --env halfcheetah --dataset medium-expert --gpu 0 --seed 0 --dist_dim 30 --n_bins 31 --condition 'xvel' --save_model True

Run eval_cdt.py to eval CDT using saved weights:

python eval_cdt.py --env halfcheetah --dataset medium-expert --gpu 0 --seed 0 --dist_dim 30 --n_bins 31 --condition 'reward' --save_rollout True
python eval_cdt.py --env halfcheetah --dataset medium-expert --gpu 0 --seed 0 --dist_dim 30 --n_bins 31 --condition 'xvel' --save_rollout True

For Bi-directional DT, run train_bdt.py & eval_bdtf.py

python train_bdt.py --env halfcheetah --dataset medium-expert --gpu 0 --seed 0 --dist_dim 30 --n_bins 31 --z_dim 16 --save_model True
python eval_bdt.py --env halfcheetah --dataset medium-expert --gpu 0 --seed 0 --dist_dim 30 --n_bins 31 --z_dim 16 --save_rollout True

Reference

This repository is developed on top of original Decision Transformer.

You might also like...
A generalized framework for prototyping full-stack cooperative driving automation applications under CARLA+SUMO.
A generalized framework for prototyping full-stack cooperative driving automation applications under CARLA+SUMO.

OpenCDA OpenCDA is a SIMULATION tool integrated with a prototype cooperative driving automation (CDA; see SAE J3216) pipeline as well as regular autom

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

G-SFDA Code (based on pytorch 1.3) for our ICCV 2021 paper 'Generalized Source-free Domain Adaptation'. [project] [paper]. Dataset preparing Download

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

GeDML is an easy-to-use generalized deep metric learning library
GeDML is an easy-to-use generalized deep metric learning library

GeDML is an easy-to-use generalized deep metric learning library

Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection
Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection

LMFD-PAD Note This is the official repository of the paper: LMFD-PAD: Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechani

 Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation
Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation The code repository for "Audio-Visual Generalized Few-Shot Learning with

Generalized hybrid model for mode-locked laser diodes with an extended passive cavity

GenHybridMLLmodel Generalized hybrid model for mode-locked laser diodes with an extended passive cavity This hybrid simulation strategy combines a tra

Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

The official code for the NeurIPS 2021 paper Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

OnsagerNet Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks This is the original pyTorch implemenati

Comments
  • Question on inputs to get action_preds

    Question on inputs to get action_preds

    Hi,

    Many thanks for the implementation!

    My understanding of DT is that the DT policy chooses an action given the previous K states and returns-to-go. However, as far as I can see, in your code (and also in the original DT code), the action_preds calculated in the forward() method of DT only uses x[:,1] as input, which corresponds to the states. I also noticed that you use x[0, :] to get the return predictions rather than x[:, 2] as used by the original DT paper.

    1. Should the input not be x[:,0] (returns R_1, R_2, ...) and x[:, 1] (states S_1, S_2, ...)?

    2. Why do you change x[:, 2] to x[:, 0] to get the return predictions?

    Also, am I correct that the way you would sample a discrete single action 'choice' at evaluation time from action_preds is by indexing action_preds[0, -1] to get the 'action predictions' of each possible action for the current time step, and then calling torch.max(action_preds[0, -1])? I.e. similar to 'greedy sampling' from the action predictions in standard RL.

    opened by cwfparsonson 1
  • Padding tokens represented differently in different parts of the code

    Padding tokens represented differently in different parts of the code

    Hi, Thank you so much for the super interesting work and for releasing your code.

    A question regarding padding tokens: They seem to be handled slightly differently in different parts of the code. When loading the data to run the experiments, it appears that padding token values are informed by the environment characteristics (e.g. -10 for actions in mujoco, 2 for dones, and 0 for other types of tokens) both in BDT and CDT. However, on the model side for action prediction, all padding tokens are zeros. We were unsure about the reason behind this difference in representing padding tokens. However, we inferred that since the attention mask reflects the position of padding tokens, that would override these slight differences ultimately.

    Could you please let us know more about your implementation of padding tokens and why they are represented differently? And do their actual values matter when their position is reflected in the attention mask?

    opened by asmadotgh 0
  • backflipping expert hyperparameters

    backflipping expert hyperparameters

    Hi, I am interested in training a model on backflip halfcheetah for my work. I have tried training PPO, however I have not been able to achieve results as on gifs, usually the agent just jumps backwards, no flips.

    The paper says that SAC was trained. Do you still have the hyperparameters for the expert with which the trajectories were collected? I would be happy if they were made public (or even dataset).

    Thanks!

    opened by Howuhh 2
Owner
Hiroki Furuta
The University of Tokyo/Reinforcement Learning
Hiroki Furuta
Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 ?? 原文地址: Deci

Irving 14 Nov 22, 2022
PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching This is the official PyTorch implementation of SMODICE: Versatile Offline I

Jason Ma 14 Aug 30, 2022
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

Yuming Jiang 151 Dec 26, 2022
A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

LPM_Python A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching. The code is established ac

AoxiangFan 11 Nov 7, 2022
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

Kevin Lu 1.4k Jan 7, 2023
Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

Minimal implementation of Decision Transformer: Reinforcement Learning via Sequence Modeling in PyTorch for mujoco control tasks in OpenAI gym

Nikhil Barhate 104 Jan 6, 2023
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 360 Jan 6, 2023
This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network.

GPRGNN This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network. Hidden state feature extraction i

Jianhao 92 Jan 3, 2023