Decision Transformer: A brand new Offline RL Pattern

Irving

Last update: Nov 22, 2022

Related tags

Overview

DecisionTransformer_StepbyStep

Intro

Decision Transformer: A brand new Offline RL Pattern.

这是关于NeurIPS 2021 热门论文Decision Transformer的复现。

👍 原文地址: Decision Transformer: Reinforcement Learning via Sequence Modeling

👍 官方的Git仓库: decision-transformer(official)

Decision Transformer

Decision Transformer属于Offline RL，所谓Offline RL，即从次优数据中学习策略来分配Agent，即从固定、有限的经验中产生最大有效的行为。

👀️ Motivation

DT将RL看成一个序列建模问题（Sequence Modeling Problem ），不用传统RL方法，而使用网络直接输出动作进行决策。传统RL方法存在一些问题，比如估计未来Return过程中Bootstrapping过程会导致Overestimate；马尔可夫假设;

DT借助了Transformer的强大表征能力和时序建模能力。

Decision Transformer的表现达到甚至超过了目前最好的基于dynamic programming的主流方法；
在一些需要long-term credit assignment的task【例如sparse reward或者delayed reward等】，Decision Transformer的表现远超过了最好的主流方法.

🚀️ DT的核心思想

Decision Transformer的核心思想； States、Actions、Returns被Fed into Modality-Specific的线性Embedding；并添加了带有时间步信息的positional episodic timestep; 这些Tokens被输入一个GPT架构，使用a causal self-attention mask来预测actions。

🎉️ DT的优势

无需Markov假设;
没有使用一个可学习的Value Function作为Training Target;
利用Transformer的特性，绕过长期信用分配进行“自举bootstrapping”的需要，避免了时序差分学习的“短视”行为;
可以通过self-attention直接执行信度分配。这与缓慢传播奖励并容易产生干扰信号的 Bellman Backup 相反，可以使 Transformer 在奖励稀少或分散注意力的情况下仍然有效地工作.

Dependencies

1. D4RL ( Dataset for Deep Data-Driven Reinforcement Learning )

2. MUJOCO 210

# 安装之前先安装absl-py和matplotlib 
pip install absl-py 
pip install matplotlib 

"""
git clone https://github.com/rail-berkeley/d4rl.git
cd d4rl
pip install -e . # 这种方法不好使 !! 
"""

#首先在https://github.com/deepmind/dm_control这个库git clone
# cd
pip install -r requirement.txt 
# 然后 
pip install matplotlib 
# 然后 https://github.com/takuseno/d3rlpy 
pip install d3rlpy 
# 然后安装mujoco 210  
# 直接安装，然后添加环境变量 
# 装完之后进d4rl文件夹下
python setup.py install 
# 成功安装 d4rl 1.1

3. GPT-2

pip install transformers

Experiments

Group1: Decision Transformer — Hopper-v3-Medium-Dataset

参数Config

class Config:
    env = "hopper"
    dataset = "medium"
    mode = "normal" # "delayed" : all rewards moved to end of trajectory
    device = 'cuda'
    log_dir = 'TB_log/'
    record_algo = 'DT_Hopper_v1'
    test_cycles = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')

    # 模型
    model_type = "DT"
    activation_function = 'relu'

    # Scalar
    max_length = 20 # max_len # K
    pct_traj = 1.
    batch_size = 64
    embed_dim = 128
    n_layer = 3
    n_head = 1
    dropout = 0.1
    lr = 1e-4
    wd = 1e-4
    warmup_steps = 1000
    num_eval_episodes = 100
    max_iters = 50
    num_steps_per_iter = 1000

    # Bool
    log_to_tb = True

效果

Jittor Medical Segmentation Lib -- The assignment of Pattern Recognition course (2021 Spring) in Tsinghua University

THU模式识别2021春 -- Jittor 医学图像分割模型列表本仓库收录了课程作业中同学们采用jittor框架实现的如下模型： UNet SegNet DeepLab V2 DANet EANet HarDNet及其改动HarDNet_alter PSPNet OCNet OCRNet DL

48 Dec 26, 2022

Deep Halftoning with Reversible Binary Pattern

Deep Halftoning with Reversible Binary Pattern ICCV Paper | Project Website | BibTex Overview Existing halftoning algorithms usually drop colors and f

17 Nov 22, 2022

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks Contributions A novel pairwise feature LSP to extract structural

31 Dec 6, 2022

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition

17 Dec 12, 2022

FindFunc is an IDA PRO plugin to find code functions that contain a certain assembly or byte pattern, reference a certain name or string, or conform to various other constraints.

FindFunc: Advanced Filtering/Finding of Functions in IDA Pro FindFunc is an IDA Pro plugin to find code functions that contain a certain assembly or b

213 Dec 17, 2022

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 4, 2023

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 8, 2023

5.7k Feb 12, 2021

Neural Oblivious Decision Ensembles

Neural Oblivious Decision Ensembles A supplementary code for anonymous ICLR 2020 submission. What does it do? It learns deep ensembles of oblivious di

25 Sep 21, 2022

Decision Transformer: A brand new Offline RL Pattern

Related tags

Overview

DecisionTransformer_StepbyStep

Intro

Decision Transformer

👀️ Motivation

🚀️ DT的核心思想

🎉️ DT的优势

Dependencies

1. D4RL ( Dataset for Deep Data-Driven Reinforcement Learning )

2. MUJOCO 210

3. GPT-2

Experiments

Group1: Decision Transformer — Hopper-v3-Medium-Dataset

参数Config

效果

You might also like...

Jittor Medical Segmentation Lib -- The assignment of Pattern Recognition course (2021 Spring) in Tsinghua University

Deep Halftoning with Reversible Binary Pattern

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

FindFunc is an IDA PRO plugin to find code functions that contain a certain assembly or byte pattern, reference a certain name or string, or conform to various other constraints.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Neural Oblivious Decision Ensembles

Owner

Irving

Generalized Decision Transformer for Offline Hindsight Information Matching

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

The all new way to turn your boring vector meshes into the new fad in town; Voxels!

a spacial-temporal pattern detection system for home automation

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing, Pattern Recognition