Decision Transformer
Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling.
Notable difference from official implementation are:
- Simple GPT implementation (causal transformer)
- Uses PyTorch's Dataset and Dataloader class and removes redundant computations for calculating rewards to go and state normalization for efficient training
Instructions
Results
Dataset | Environment | DT (this repo) | DT (offcial) |
---|---|---|---|
Medium | HalfCheetah | 42.18 ± 0.77 | 42.6 ± 0.1 |
Note that these results are mean and variance for 3 random seeds obtained by after only 20k updates while the official models are trained to saturation for 100k updates.
References
- official code and paper
- minimal GPT (causal transformer) tweet and colab notebook