Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

Rishabh Anand

Last update: Dec 12, 2022

Related tags

Deep Learning aft-pytorch

Overview

aft-pytorch

Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc.

Installation

You can install aft-pytorch via pip:

pip install aft-pytorch

Usage

You can import the AFT-Full or AFT-Simple layer (as described in the paper) from the package like so:

`AFTFull`

from aft_pytorch import AFTFull

layer = AFTFull(
    max_seqlen=20,
    dim=512,
    hidden_dim=64
)

# a batch of sequences with 10 timesteps of length 512 each
x = torch.rand(32, 10, 512)
y = layer(x) # [32, 10, 512]

`AFTSimple`

from aft_pytorch import AFTSimple

layer = AFTSimple(
    max_seqlen=20,
    dim=512,
    hidden_dim=64
)

# a batch of sequences with 10 timesteps of length 512 each
x = torch.rand(32, 10, 512)
y = layer(x) # [32, 10, 512]

This layer wrapper is a 'plug-and-play' with your existing networks / Transformers. You can swap out the Self-Attention layer with the available layers in this package with minimal changes.

TODO

Add full AFT architecture
Add variants like, AFTConv, AFTLocal

Contributing

If you like this repo, please leave a star! If there are any amends or suggestions, feel free to raise a PR/issue.

Credits

@misc{attention-free-transformer,
title = {An Attention Free Transformer},
author = {Shuangfei Zhai and Walter Talbott and Nitish Srivastava and Chen Huang and Hanlin Goh and Ruixiang Zhang and Josh Susskind},
year = {2021},
URL = {https://arxiv.org/pdf/2105.14103.pdf}
}

License

MIT

You might also like...

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

face-vid2vid Usage Dataset Preparation cd datasets wget https://yt-dl.org/downloads/latest/youtube-dl -O youtube-dl chmod a+rx youtube-dl python load_

68 Dec 30, 2022

Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

Fastformer-Keras Unofficial Tensorflow-Keras implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Tensorflo

10 Jan 30, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

43 Dec 7, 2022

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution [arXiv 2021].

122 Dec 12, 2022

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

210 Jan 4, 2023

Comments

why sum(0)?

Thank you for this code! I got a problem with this at line 33 and line 36. Why sum(0) to add all batch dimension? I noticed that it add the timestep dimension from t=1 to T, so maybe it should be sum(2) here.

opened by zzwang058 2
can run on cpu but failed in gpu,why?

RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for baddbmm)

i set .cuda() but it can't work, please help!

opened by TangDL 1
I test the model in an NLP task.
I use aft_full model，6 layers. and I use it in init with this code:

self.encoder_transformer = nn.ModuleList() for _ in range(6): self.encoder_transformer.append(AFTFull(max_seqlen=500, dim=512,hidden_dim=256))

and in forward function, I use this code:

for _, layer in enumerate(self.encoder_transformer):` x = layer(x) + x

Originally I used the traditional transformer, now I replaced it with this, the training loss appeared Nan，Is something wrong? and how U use the model for many layers，please help me, Thank U.
opened by zshy1205 1

Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

Related tags

Overview

aft-pytorch

Installation

Usage

`AFTFull`

`AFTSimple`

TODO

Contributing

Credits

License

You might also like...

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

TensorFlow, PyTorch and Numpy layers for generating Orthogonal Polynomials

A library to inspect itermediate layers of PyTorch models.

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Comments

why sum(0)?

can run on cpu but failed in gpu,why?

I test the model in an NLP task.

Owner

Rishabh Anand

Compare outputs between layers written in Tensorflow and layers written in Pytorch

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Unofficial Alias-Free GAN implementation. Based on rosinality's version with expanded training and inference options.