Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Overview

Decision Transformer

Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor Mordatch†

*equal contribution, †equal advising

A link to our paper can be found on arXiv.

Overview

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling. Contains scripts to reproduce experiments.

image info

Instructions

We provide code in two sub-directories: atari containing code for Atari experiments and gym containing code for OpenAI Gym experiments. See corresponding READMEs in each folder for instructions; scripts should be run from the respective directories. It may be necessary to add the respective directories to your PYTHONPATH.

Citation

Please cite our paper as:

@article{chen2021decisiontransformer,
  title={Decision Transformer: Reinforcement Learning via Sequence Modeling},
  author={Lili Chen and Kevin Lu and Aravind Rajeswaran and Kimin Lee and Aditya Grover and Michael Laskin and Pieter Abbeel and Aravind Srinivas and Igor Mordatch},
  journal={arXiv preprint arXiv:2106.01345},
  year={2021}
}

Note: this is not an official Google or Facebook product.

Comments
  • After loading 50 trajectories, the terminal shows `killed`

    After loading 50 trajectories, the terminal shows `killed`

    Hello, Thanks for you code. And after reading readme-atari.md, I have set the env and downloaded the dataset, then I tried to run the follows: (--block_size 90 There is no block_size args, so I removed it )

    python -m atari.run_dt_atari.py --seed 123 --epochs 5 --model_type 'reward_conditioned' --num_steps 500000 --num_buffers 50 --game 'Breakout' --batch_size 64 --data_dir_prefix /home/shy/decision-transformer/atari/dqn_replay/
    

    Then it shows:

    loading from buffer 45 which has 0 already loaded
    this buffer has 2196 loaded transitions and there are now 2196 transitions total divided into 1 trajectories
    loading from buffer 2 which has 0 already loaded
    this buffer has 1234 loaded transitions and there are now 3430 transitions total divided into 2 trajectories
    loading from buffer 28 which has 0 already loaded
    this buffer has 2413 loaded transitions and there are now 5843 transitions total divided into 3 trajectories
    loading from buffer 34 which has 0 already loaded
    this buffer has 2718 loaded transitions and there are now 8561 transitions total divided into 4 trajectories
    loading from buffer 38 which has 0 already loaded
    this buffer has 2326 loaded transitions and there are now 10887 transitions total divided into 5 trajectories
    loading from buffer 17 which has 0 already loaded
    this buffer has 2425 loaded transitions and there are now 13312 transitions total divided into 6 trajectories
    loading from buffer 19 which has 0 already loaded
    this buffer has 3063 loaded transitions and there are now 16375 transitions total divided into 7 trajectories
    loading from buffer 42 which has 0 already loaded
    this buffer has 1190 loaded transitions and there are now 17565 transitions total divided into 8 trajectories
    loading from buffer 22 which has 0 already loaded
    this buffer has 1002 loaded transitions and there are now 18567 transitions total divided into 9 trajectories
    loading from buffer 33 which has 0 already loaded
    this buffer has 1473 loaded transitions and there are now 20040 transitions total divided into 10 trajectories
    loading from buffer 32 which has 0 already loaded
    this buffer has 4009 loaded transitions and there are now 24049 transitions total divided into 11 trajectories
    loading from buffer 49 which has 0 already loaded
    this buffer has 2006 loaded transitions and there are now 26055 transitions total divided into 12 trajectories
    loading from buffer 47 which has 0 already loaded
    this buffer has 1935 loaded transitions and there are now 27990 transitions total divided into 13 trajectories
    loading from buffer 9 which has 0 already loaded
    this buffer has 1750 loaded transitions and there are now 29740 transitions total divided into 14 trajectories
    loading from buffer 32 which has 4009 already loaded
    this buffer has 7451 loaded transitions and there are now 33182 transitions total divided into 15 trajectories
    loading from buffer 46 which has 0 already loaded
    this buffer has 2137 loaded transitions and there are now 35319 transitions total divided into 16 trajectories
    loading from buffer 32 which has 7451 already loaded
    this buffer has 10311 loaded transitions and there are now 38179 transitions total divided into 17 trajectories
    loading from buffer 47 which has 1935 already loaded
    this buffer has 5165 loaded transitions and there are now 41409 transitions total divided into 18 trajectories
    loading from buffer 25 which has 0 already loaded
    this buffer has 2124 loaded transitions and there are now 43533 transitions total divided into 19 trajectories
    loading from buffer 19 which has 3063 already loaded
    this buffer has 5660 loaded transitions and there are now 46130 transitions total divided into 20 trajectories
    loading from buffer 14 which has 0 already loaded
    this buffer has 1462 loaded transitions and there are now 47592 transitions total divided into 21 trajectories
    loading from buffer 36 which has 0 already loaded
    this buffer has 1173 loaded transitions and there are now 48765 transitions total divided into 22 trajectories
    loading from buffer 32 which has 10311 already loaded
    this buffer has 13460 loaded transitions and there are now 51914 transitions total divided into 23 trajectories
    loading from buffer 16 which has 0 already loaded
    this buffer has 2148 loaded transitions and there are now 54062 transitions total divided into 24 trajectories
    loading from buffer 4 which has 0 already loaded
    this buffer has 1754 loaded transitions and there are now 55816 transitions total divided into 25 trajectories
    loading from buffer 49 which has 2006 already loaded
    this buffer has 4612 loaded transitions and there are now 58422 transitions total divided into 26 trajectories
    loading from buffer 3 which has 0 already loaded
    this buffer has 1200 loaded transitions and there are now 59622 transitions total divided into 27 trajectories
    loading from buffer 2 which has 1234 already loaded
    this buffer has 2192 loaded transitions and there are now 60580 transitions total divided into 28 trajectories
    loading from buffer 20 which has 0 already loaded
    this buffer has 1644 loaded transitions and there are now 62224 transitions total divided into 29 trajectories
    loading from buffer 39 which has 0 already loaded
    this buffer has 1473 loaded transitions and there are now 63697 transitions total divided into 30 trajectories
    loading from buffer 2 which has 2192 already loaded
    this buffer has 3244 loaded transitions and there are now 64749 transitions total divided into 31 trajectories
    loading from buffer 20 which has 1644 already loaded
    this buffer has 4785 loaded transitions and there are now 67890 transitions total divided into 32 trajectories
    loading from buffer 47 which has 5165 already loaded
    this buffer has 7681 loaded transitions and there are now 70406 transitions total divided into 33 trajectories
    loading from buffer 48 which has 0 already loaded
    this buffer has 2836 loaded transitions and there are now 73242 transitions total divided into 34 trajectories
    loading from buffer 7 which has 0 already loaded
    this buffer has 2135 loaded transitions and there are now 75377 transitions total divided into 35 trajectories
    loading from buffer 41 which has 0 already loaded
    this buffer has 933 loaded transitions and there are now 76310 transitions total divided into 36 trajectories
    loading from buffer 35 which has 0 already loaded
    this buffer has 1973 loaded transitions and there are now 78283 transitions total divided into 37 trajectories
    loading from buffer 28 which has 2413 already loaded
    this buffer has 4864 loaded transitions and there are now 80734 transitions total divided into 38 trajectories
    loading from buffer 38 which has 2326 already loaded
    this buffer has 5358 loaded transitions and there are now 83766 transitions total divided into 39 trajectories
    loading from buffer 33 which has 1473 already loaded
    this buffer has 3457 loaded transitions and there are now 85750 transitions total divided into 40 trajectories
    loading from buffer 21 which has 0 already loaded
    this buffer has 2198 loaded transitions and there are now 87948 transitions total divided into 41 trajectories
    loading from buffer 30 which has 0 already loaded
    this buffer has 2916 loaded transitions and there are now 90864 transitions total divided into 42 trajectories
    loading from buffer 27 which has 0 already loaded
    this buffer has 2128 loaded transitions and there are now 92992 transitions total divided into 43 trajectories
    loading from buffer 34 which has 2718 already loaded
    this buffer has 4650 loaded transitions and there are now 94924 transitions total divided into 44 trajectories
    loading from buffer 33 which has 3457 already loaded
    this buffer has 6102 loaded transitions and there are now 97569 transitions total divided into 45 trajectories
    loading from buffer 12 which has 0 already loaded
    this buffer has 3207 loaded transitions and there are now 100776 transitions total divided into 46 trajectories
    loading from buffer 40 which has 0 already loaded
    this buffer has 1369 loaded transitions and there are now 102145 transitions total divided into 47 trajectories
    loading from buffer 3 which has 1200 already loaded
    this buffer has 3316 loaded transitions and there are now 104261 transitions total divided into 48 trajectories
    loading from buffer 42 which has 1190 already loaded
    this buffer has 2969 loaded transitions and there are now 106040 transitions total divided into 49 trajectories
    loading from buffer 5 which has 0 already loaded
    this buffer has 2499 loaded transitions and there are now 108539 transitions total divided into 50 trajectories
    loading from buffer 0 which has 0 already loaded
    killed
    (decision-transformer-atari) shy@user:~/decision-transformer$ 
    

    Then it shows killed and I guess when loading the dataset,this problem is due to excessive memory usage and how to fix it?Thanks a lot.

    opened by SunHaoOne 8
  • Atari results

    Atari results

    Hi,

    Thanks for your wonderful work. I cannot reproduce the performance reported in the paper for Atari. For example, compared to Table 1, my normalized score for Breakout is 147.738, for Seaquest is 1.875 (averaged over 3 seeds, I use the same seed as this script: https://github.com/kzl/decision-transformer/blob/master/atari/run.sh ) I wonder did you use the same seeds (123, 231, 312) as that script ? Or did I miss something?

    opened by TongZhangTHU 4
  • state and action prediction

    state and action prediction

    Hi, interesting paper and thanks for sharing the code, QQ, here https://github.com/kzl/decision-transformer/blob/d28039e97a30edaa6839333a8e12661a89ce0861/gym/decision_transformer/models/decision_transformer.py#L96 and here https://github.com/kzl/decision-transformer/blob/d28039e97a30edaa6839333a8e12661a89ce0861/gym/decision_transformer/models/decision_transformer.py#L97, shouldn't they be state_preds = self.predict_state(x[:,1]) and action_preds = self.predict_action(x[:,2]) instead ?

    opened by mehdimashayekhi 4
  • aligning action embeddings to other embeddings at line 237

    aligning action embeddings to other embeddings at line 237

    at https://github.com/kzl/decision-transformer/blob/master/atari/mingpt/model_atari.py, line 237

    token_embeddings[:,2::3,:] = action_embeddings[:,-states.shape[1] + int(targets is None):,:]

    I am not quite sure about the usage of checking targets is None. It seems to me it is for 2 cases of inputs:

    1.(r_0,s_0,a_0,r_1,s_1,a_1,...,r_k,s_k,a_k) , in that we have all actions for each states for k timesteps, in this case the targets = actions

    2.(r_0,s_0,a_0,r_1,s_1,a_1,...r_k,s_k), with the last action a_k to be predicted from s_k, the targets is None in this case (or we could still have the targets (a_0,a_1,...a_(k-1))?

    However it looks to me the quoted line of code would make the token embeddings be presented in the following way when the targets is absent:

    (r_0,s_0,a_1,r_1,s_1,a_2,...,r_k,s_k) , in that there is mis-alignment between the states and actions, since it starts from 1 position moved to the right. To me it should be written as

    token_embeddings[:,2::3,:] = action_embeddings[:,-states.shape[1] : None if targets else -1,:]

    Please see if I have misunderstood the code.

    opened by loct824 3
  • Global position embedding and timesteps look wrong in atari

    Global position embedding and timesteps look wrong in atari

    I'm not familiar with position encoding, but if my understanding is correct, for each sample batch, global_pos_emb is used only for a single timestep in the atari code. Is it the intended one?

    Essentially, the current code computes global position encoding in the following way:

    import torch
    
    max_time_step = 10
    emb_dim = 7
    batch_size = 2
    timesteps = torch.tensor([1, 3]).view(batch_size, 1, 1). # the implementation of the dataset returns a relative index in an episode of the first state.
    
    global_pos_emb = torch.rand(1, max_time_step, emb_dim)
    all_global_pos_emb = torch.repeat_interleave(
        global_pos_emb, batch_size, dim=0
    )  # batch_size, traj_length, n_embd 
    
    
    torch.gather(
        all_global_pos_emb,
        1,
        torch.repeat_interleave(timesteps, emb_dim, dim=-1),
    )
    # shape is (batch_size, 1, emb_dim), 
    
    opened by nzw0301 2
  • Padding / attention_mask questions

    Padding / attention_mask questions

    Hi, thanks for making your code available. I'm trying to wrap my head around the padding in your gym implementation.

    In this code: https://github.com/kzl/decision-transformer/blob/c9e6ac0b75895cef3e7c06cd309fd398ec9ceef5/gym/experiment.py#L154 you are padding your inputs on the left, and creating an attention_mask so that the model will ignore the padding.

    According to this (possibly out-of-date?) comment on the Hugging Face repo, GPT should ideally be padded on the right, and then the causal masking will take care of making sure nothing is conditioned on the padding values, making the attention_mask unnecessary:

    GPT-2 is a model with absolute position embeddings (like Bert) so you should always pad on the right to get best performances for this model (will add this information to the doc_string).

    As it's a causal model (only attend to the left context), also means that the model will not attend to the padding tokens (which are on the right) for any real token anyway.

    So in conclusion, no need to take special care of avoiding attention on padding.

    Just don't use the output of the padded tokens for anything as they don't contain any reliable information (which is obvious I hope).

    (see https://github.com/huggingface/transformers/issues/808#issuecomment-522932583 )

    Can you explain the rationale behind the padding scheme? Or am I just getting the wrong end of the stick? Cheers!

    opened by DaveyBiggers 2
  • Potential bug: Attention mask allows access to future tokens?

    Potential bug: Attention mask allows access to future tokens?

    Thanks for sharing your code, great work. I do not get one detail here, maybe you can help:

    During the batch data generation, you fill the masks with one's where ever valid history trajectory data is available: https://github.com/kzl/decision-transformer/blob/f04280e3668a992c41b38bdfb6b6181d61b4dc52/gym/experiment.py#L154

    Isn't it common practice with auto regression to use a triangular matrix? In your training code, you consider all actions where the mask is >0. Doesn't this result in allowing actions early in the sequence to access subsequent tokens? https://github.com/kzl/decision-transformer/blob/5605e40ce763bb27ff0cf5beb06210d08771e047/gym/decision_transformer/training/seq_trainer.py#L18

    Thanks!

    opened by donthomasitos 2
  • Question: is it possible to use the same Decision Transformer for new training trajectories generation?

    Question: is it possible to use the same Decision Transformer for new training trajectories generation?

    Maybe I'm missing something, but why do we stop the training after going over the initial trajectories dataset? Can the same model be run again to generate new (better) trajectories and trained on them in an iterative manner? Thanks for your time!

    opened by danielgafni 2
  • undestanding use of rewards

    undestanding use of rewards

    as I am reading through the code I do not understand how you are using rewards to learn an optimal policy. Can you point out where rewards are used in decision transformer during training?

    opened by jeweinb 2
  • Return-to-go conditioning on Atari

    Return-to-go conditioning on Atari

    Dear authors, Great work of DT! I found that the Return-to-go conditioning hyperparameters in Table 8 are different from https://github.com/kzl/decision-transformer/blob/f04280e3668a992c41b38bdfb6b6181d61b4dc52/atari/mingpt/trainer_atari.py#L164 in the code. Which should be right?

    Thanks

    opened by geekyutao 2
  • Citation request

    Citation request

    Hi,

    I sent an email and received no response, so I am trying the issues section as a way to contact the authors.

    I would like to request a citation from the "Decision Transformers" paper. Our work is very relevant I believe - the novelty presented in the "Decision Transformers" paper is identical to ours that we introduced nearly 2 years ago.

    It's a blog post and not a paper, but I don't think that matters. The source code has also been public for a long time. Here is the blog post in question: https://ogma.ai/2019/08/acting-without-rewards/

    The idea of "RL as a sequence prediction/generation problem" is identical to ours. The use of the Transformer is not, but that is not the novelty being presented so I don't think it matters.

    We used slightly different language, as we do not use Transformers but rather a bio-inspired system (that avoids backpropagation). Still, it does the whole process of predicting a sequence and performing "goal relabeling". We took it a step further and did so hierarchically as well. As in decision transformers, we do not use any classic RL algorithm (no dynamic programming), but rather we learn to predict the sequences in such a way that they can be "prompted" and generate desired trajectories. We invented it specifically as a way to avoid rewards, but rewards can be used as well. Decision Transformers also do not require rewards necessarily, as shown in one of the experiments.

    The ideas in "Upside-Down Reinforcement Learning" by Juergen Schmidhuber are also similar. However, our work pre-dates that as well, but we cannot contact Juergen Schmidhuber for a citation, so it would be kind if we could at least get one from you.

    Thanks

    opened by 222464 2
  • Why the padding is different for state, action, reward?

    Why the padding is different for state, action, reward?

    https://github.com/kzl/decision-transformer/blob/c9e6ac0b75895cef3e7c06cd309fd398ec9ceef5/gym/experiment.py#L147-L154

    It's easy to understand padding the state with np.zero(,), but why use np.ones(,)* -10 to pad the action and np.ones(,) * 2 to pad the done flag?

    opened by CeyaoZhang 0
  • The setting of final token

    The setting of final token

    Great work and thanks for the open source. In Atari experiments, is there any reason for setting the final token as "2 * dataset_length * block_size" in the code? In the Appendix, this hyperparameter is set to 2 * 50000 * K. I didn't get the point of 2 times. I think the final token is "dataset_length * block_size". Please correct me if I have missed something. Thanks.

    opened by IpadLi 0
  • MemoryError: Unable to allocate 6.57 GiB for an array with shape (7056000000,) and data type uint8

    MemoryError: Unable to allocate 6.57 GiB for an array with shape (7056000000,) and data type uint8

    I wonder how to solve the memory? When you do your experiment, do you load all data into the memory as each unzipped observation file is more than 2GB? bdb1c32eef4e5f5d3ecc97a8a5ba67b 799ddb26c11a6b7e7b501cbb2d5baea

    opened by leeruibin 1
  • No registered env with id: halfcheetah-medium-v2

    No registered env with id: halfcheetah-medium-v2

    python download_d4rl_datasets.py -> Traceback (most recent call last): File "download_d4rl_datasets.py", line 12, in env = gym.make(name) File "/public/home/chenxn1/anaconda3/envs/decision-transformer-gym/lib/python3.8/site-packages/gym/envs/registration.py", line 145, in make return registry.make(id, **kwargs) File "/public/home/chenxn1/anaconda3/envs/decision-transformer-gym/lib/python3.8/site-packages/gym/envs/registration.py", line 89, in make spec = self.spec(path) File "/public/home/chenxn1/anaconda3/envs/decision-transformer-gym/lib/python3.8/site-packages/gym/envs/registration.py", line 131, in spec raise error.UnregisteredEnv('No registered env with id: {}'.format(id)) gym.error.UnregisteredEnv: No registered env with id: halfcheetah-medium-v2

    opened by boykac 1
  • The results on Mujoco reported in paper might be heavily influenced by env version

    The results on Mujoco reported in paper might be heavily influenced by env version

    Hello there,

    Recently, we reproduce some experiments in offline reinforcement learning and find that the decision transformer cites the result of CQL from the original paper. However, the problem is that DT uses mujoco version 2 like(hopper-v2, walker2d-v2), while original CQL uses mujoco version 0 like(hopper-v0, walker2d-v0), and the reward scale is different in these environments. So we run DT and CQL in the same environment(hopper-v2, walker2d-v2), but CQL is better than DT in almost all the tasks (except for hopper-replay). So I wonder:

    1. Have you considered the environment version into consideration in the results?
    2. Refer to https://github.com/kzl/decision-transformer/issues/16 . The score is normalized by an export policy from https://github.com/rail-berkeley/d4rl/blob/master/d4rl/infos.py . However, the results based on the official code are far away from the results reported in the paper. Or did I miss some key components in DT code?

    Looking forward to your reply!

    Best Wishes

    opened by linprophet 0
Owner
Kevin Lu
Researcher interested in artificial intelligence. Undergrad at UC Berkeley.
Kevin Lu
Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are in envir

Michael Janner 269 Jan 5, 2023
Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Offline Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are

Michael Janner 266 Dec 27, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 8, 2023
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan                                             32 Dec 23, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 3, 2023
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 8, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 5, 2023
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 708 Dec 19, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022
Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

Decentralized Reinforcement Learning This is the code complementing the paper Decentralized Reinforcment Learning: Global Decision-Making via Local Ec

null 40 Oct 30, 2022
Generalized Decision Transformer for Offline Hindsight Information Matching

Generalized Decision Transformer for Offline Hindsight Information Matching [arxiv] If you use this codebase for your research, please cite the paper:

Hiroki Furuta 35 Dec 12, 2022
Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 ?? 原文地址: Deci

Irving 14 Nov 22, 2022
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 1, 2023
[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

CMU Locus Lab 460 Oct 13, 2022