Implementation of Feedback Transformer in Pytorch

Phil Wang

Last update: Oct 4, 2022

Related tags

Overview

Feedback Transformer - Pytorch

Simple implementation of Feedback Transformer in Pytorch. They improve on Transformer-XL by having each token have access to the representations of all previous layers through time. This is achieved by aggregating the outputs of all layers into a shared memory, which each token across layers can attend to at each time step.

The main drawback is longer training time, due to its non-parallel nature. But I thought I'd build it to further exploration and research into this line of work.

Yannic Kilcher video

I also took the liberty to add some various enhancements, including pre-normalization, GLU gated feedforwards, as well as simplified T5 relative positional embeddings.

Install

$ pip install feedback-transformer-pytorch

Usage

import torch
from feedback_transformer_pytorch import FeedbackTransformer

model = FeedbackTransformer(
    num_tokens = 20000,           # number of tokens
    dim = 512,                    # dimension
    depth = 6,                    # depth
    seq_len = 2,                  # the sequence length of each segment or window
    mem_len = 256,                # length of the memory buffer
    dim_head = 64,                # dimension of each head
    heads = 8,                    # number of heads
    attn_dropout = 0.1,           # attention dropout
    ff_dropout = 0.1              # feedforward dropout
).cuda()

x = torch.randint(0, 20000, (2, 64)).cuda()
model(x)  # (2, 64, 20000)

If you would like to have fine control over the memory (when to detach, etc), you can do it with some extra keyword arguments on .forward

import torch
from feedback_transformer_pytorch import FeedbackTransformer

model = FeedbackTransformer(
    num_tokens = 20000,
    dim = 512,
    depth = 6,
    seq_len = 32,
    mem_len = 256
).cuda()

x1 = torch.randint(0, 20000, (2, 32)).cuda()
x2 = torch.randint(0, 20000, (2, 32)).cuda()
x3 = torch.randint(0, 20000, (2, 32)).cuda()

out1, mem1 = model(x1, return_memory = True)
out2, mem2 = model(x2, memory = mem1, return_memory = True)
out3, mem3 = model(x3, memory = mem2, return_memory = True)  # (2, 32, 20000)

Citations

@misc{fan2021addressing,
    title   = {Addressing Some Limitations of Transformers with Feedback Memory}, 
    author  = {Angela Fan and Thibaut Lavril and Edouard Grave and Armand Joulin and Sainbayar Sukhbaatar},
    year    = {2021},
    eprint  = {2002.09402},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

Comments

Should it really be using lower layers output for keys and values?

Could you explain the logic of how the key-value pairs are formed at these lines and whether it is necessary?

https://github.com/lucidrains/feedback-transformer-pytorch/blob/d7d8939910d1491f01a3d93ce81d4663925fb389/feedback_transformer_pytorch/feedback_transformer_pytorch.py#L146-L151

It looks to me that line 146 transforms the output of the layer below (x) to keys and values, and the following lines combine these keys and values with the memory. I thought that x should only be used for forming the query here, and only the existing memory is used for keys and values.

opened by tarvaina 6
In place operation with gradient

https://github.com/lucidrains/feedback-transformer-pytorch/blob/main/feedback_transformer_pytorch/feedback_transformer_pytorch.py#L173 I think this is an error.

opened by hadaev8 4
Bug in weighted sum

Bug in https://github.com/lucidrains/feedback-transformer-pytorch/blob/main/feedback_transformer_pytorch/feedback_transformer_pytorch.py#L264

Should be layer_weight = rearrange(layer_weight, 'd -> d () () ()')

opened by Victor0118 1

Input/Output dimensions

Hey @lucidrains

Can I check the dimensions of the input and output, is it (seq_len, dim) -> (? ,dim, tokens)?

model = FeedbackTransformer(
    num_tokens = 20000,           # number of tokens
    dim = 512,                    # dimension
    depth = 6,                    # depth
    seq_len = 2,                  # the sequence length of each segment or window
    mem_len = 256,                # length of the memory buffer
    dim_head = 64,                # dimension of each head
    heads = 8,                    # number of heads
    attn_dropout = 0.1,           # attention dropout
    ff_dropout = 0.1              # feedforward dropout
).cuda()

x = torch.randint(0, 256, (2, 512)).cuda()
model(x)  # (1, 512, 20000)

opened by iiSeymour 1

Non intuitive memory usage with cross attention

Give simple 256 dim and 512 len tensor and memory len 16 feedback transformer uses 3.6gm memory after forward pass. With cross attention on 100 len tensor usage grows to 14gb.

While parallel version uses 3.1gb and 3.5gb.

Notebooks for testing https://colab.research.google.com/drive/1dRImydFn3WthOXdLYIvdf5bsqjXcmhC5?usp=sharing https://colab.research.google.com/drive/1n653j4Pz9_U7OukhTlUbomAHMvpPXwx0?usp=sharing

opened by hadaev8 0
I think mask padding value should be False

Here https://github.com/lucidrains/feedback-transformer-pytorch/blob/with-cross-attention/feedback_transformer_pytorch/feedback_transformer_pytorch.py#L181

opened by hadaev8 0
ETA for the enwiki8 example

Hey @lucidrains,

Any eta on the example for auto-regressive enwiki8 example? I and others would really appreciate it as always :)

Also, if you can provide an example for training on custom line-by-line TXT datasets, it would be absolutely fantastic.

Thank you.

opened by asigalov61 0

Releases(0.0.11)

0.0.11(Mar 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.10(Feb 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.9(Feb 3, 2021)

Source code(tar.gz)
Source code(zip)
0.0.8(Feb 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.7(Feb 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.6(Feb 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5(Feb 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.4(Feb 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.3(Feb 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.2(Feb 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.1(Feb 2, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need.

GitHub

The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

125 Dec 25, 2022

A resource for learning about ML, DL, PyTorch and TensorFlow. Feedback always appreciated :)

4.7k Jan 8, 2023

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

CoSMo.pytorch Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback, Seungmin Lee*, Dongwan Kim*, Bohyung

54 Dec 8, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba

Microsoft Research - Language and Information Technologies (MSR LIT)

35 Oct 31, 2022

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

PyMAF This repository contains the code for the following paper: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop Hongwe

450 Dec 28, 2022

(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback About This repository accompanies the real-world experiments conducted i

19 Dec 1, 2022

FLVIS: Feedback Loop Based Visual Initial SLAM

FLVIS Feedback Loop Based Visual Inertial SLAM 1-Video EuRoC DataSet MH_05 Handheld Test in Lab FlVIS on UAV Platform 2-Relevent Publication: Under Re

182 Dec 4, 2022

Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

Posture: Pose Tracking and Machine Learning for prescribing corrective suggestions to improve posture and form while exercising. This repository conta

10 Nov 11, 2022

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

This is code repo for our EMNLP 2017 paper "Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback", which implements the A2C algorithm on top of a neural encoder-decoder model and benchmarks the combination under simulated noisy rewards.

131 Oct 21, 2022

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

1 Jan 16, 2022

Kaggle Feedback Prize - Evaluating Student Writing 15th solution

Kaggle Feedback Prize - Evaluating Student Writing 15th solution First of all, I would like to thank the excellent notebooks and discussions from http

6 Mar 24, 2022

Implementation of Feedback Transformer in Pytorch

Related tags

Overview

Feedback Transformer - Pytorch

Install

Usage

Citations

Comments

Should it really be using lower layers output for keys and values?

In place operation with gradient

Bug in weighted sum

Input/Output dimensions

Non intuitive memory usage with cross attention

I think mask padding value should be False

ETA for the enwiki8 example

Releases(0.0.11)

0.0.11(Mar 2, 2021)

0.0.10(Feb 22, 2021)

0.0.9(Feb 3, 2021)

0.0.8(Feb 2, 2021)

0.0.7(Feb 2, 2021)

0.0.6(Feb 2, 2021)

0.0.5(Feb 2, 2021)

0.0.4(Feb 2, 2021)

0.0.3(Feb 2, 2021)

0.0.2(Feb 2, 2021)

0.0.1(Feb 2, 2021)

Owner

Phil Wang

The code for two papers: Feedback Transformer and Expire-Span.

A resource for learning about ML, DL, PyTorch and TensorFlow. Feedback always appreciated :)

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

FLVIS: Feedback Loop Based Visual Initial SLAM

Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

a grammar based feedback fuzzer

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

Kaggle Feedback Prize - Evaluating Student Writing 15th solution