An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Rishit Dagli

Last update: Dec 28, 2022

Related tags

Deep Learning machine-learning deep-learning tensorflow keras transformers artificial-intelligence attention-mechanism

Overview

Fast Transformer

This repo implements Fastformer: Additive Attention Can Be All You Need by Wu et al. in TensorFlow. Fast Transformer is a Transformer variant based on additive attention that can handle long sequences efficiently with linear complexity. Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance.

Installation

Run the following to install:

pip install fast-transformer

Developing fast-transformer

To install fast-transformer, along with tools you need to develop and test, run the following in your virtualenv:

git clone https://github.com/Rishit-dagli/Fast-Transformer.git
# or clone your own fork

cd fast-transformer
pip install -e .[dev]

Usage

import tensorflow as tf
from fast_transformer import FastTransformer

mask = tf.ones([1, 4096], dtype=tf.bool)
model = FastTransformer(
    num_tokens = 20000,
    dim = 512,
    depth = 2,
    max_seq_len = 4096,
    absolute_pos_emb = True, # Absolute positional embeddings
    mask = mask
)
x = tf.experimental.numpy.random.randint(0, 20000, (1, 4096))

logits = model(x) # (1, 4096, 20000)

Want to Contribute 🙋‍♂️ ?

Awesome! If you want to contribute to this project, you're always welcome! See Contributing Guidelines. You can also take a look at open issues for getting more information about current or upcoming tasks.

Want to discuss? 💬

Have any questions, doubts or want to present your opinions, views? You're always welcome. You can start discussions.

Citation

@misc{wu2021fastformer,
    title   = {Fastformer: Additive Attention is All You Need}, 
    author  = {Chuhan Wu and Fangzhao Wu and Tao Qi and Yongfeng Huang},
    year    = {2021},
    eprint  = {2108.09084},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

Yannic Kilcher's video was super helpful while building this.

License

Copyright 2020 Rishit Dagli

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

You might also like...

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

nam-pytorch Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al. [abs, pdf] Installation You can access nam-pytorch vi

11 Mar 14, 2022

The tool under this branch fork can be used to crack devices above A12 and up to A15. After cracking, you can also use SSH channel strong opening tool to open SSH channel and activate it with Demo or Shell script. The file can be extracted from my Github homepage, and the SSH channel opening tool can be extracted from Dr238 account.

Welcome to C0xy-A12-A15-Attack-Tool The tool under this branch fork can be used to crack devices above A12 and up to A15. After cracking, you can also

13 Dec 23, 2022

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

59 Sep 25, 2022

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

HiFi-GAN+ This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All

134 Dec 30, 2022

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Trainable multi-codebook quantization This repository implements a utility for use with PyTorch, and ideally GPUs, for training an efficient quantizer

41 Jan 7, 2023

Neural network-based build time estimation for additive manufacturing

Neural network-based build time estimation for additive manufacturing Oh, Y., Sharp, M., Sprock, T., & Kwon, S. (2021). Neural network-based build tim

1 Nov 15, 2021

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Smaller Multilingual Transformers This repository shares smaller versions of multilingual transformers that keep the same representations offered by t

79 Dec 28, 2022

Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

247 Dec 28, 2022

Comments

Implement Additive Attention
Implement Additive Attention as a TensorFlow layer:

[x] Figure out using rotary embeddings

[x] Add masking functionality

[x] Relative Position embeddings

[x] Calculate query attention logits

[x] Calculate Global Query tokens

[x] Calculate key attention logits

[x] Calculate Global Key tokens

[x] Add queries as residuals
opened by Rishit-dagli 0

Releases(v0.2.0)

v0.2.0(Jan 16, 2022)
✅ Bug Fixes / Improvements

Unit Tests for output rank and shape

Looser dependency requirements (now supports all TensorFlow versions >= 2.5.0)

Source code(tar.gz)
Source code(zip)
v0.1.0(Sep 3, 2021)
This is the initial release of Fast Transformer and implements Fast Transformer as a subclassed TensorFlow model.

Classes

FastAttention: Implements additive attention as a TensorFlow Keras layer, and supports using relative positional encodings.

PreNorm: Normalize the activations of the previous layer for each given example in a batch independently and apply some function to it, implemented as a TensorFlow Keras Layer.

FeedForward: Create a FeedForward neural net with two Dense layers and GELU activation, implemented as a TensorFlow Keras Layer.

FastTransformer: Implements the FastTransformer model using all the other classes, allows using rotary embeddings, weight tie projections, and converts to logits. Implemented as a TensorFlow Keras Model.

Source code(tar.gz)
Source code(zip)

Owner

Rishit Dagli

High School,TEDx,2xTED-Ed speaker | International Speaker | Microsoft Student Ambassador | Mentor, @TFUGMumbai | Organize @KotlinMumbai

GitHub

An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Fast Transformer This repo implements Fastformer: Additive Attention Can Be All You Need by Wu et al. in TensorFlow. Fast Transformer is a Transformer

139 Dec 28, 2022

TensorFlow implementation of "Attention is all you need (Transformer)"

[TensorFlow 2] Attention is all you need (Transformer) TensorFlow implementation of "Attention is all you need (Transformer)" Dataset The MNIST datase

4 Jan 5, 2022

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provid

1k Dec 31, 2022

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

7.1k Jan 4, 2023

pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

230 Dec 7, 2022

Implementation of Vaswani, Ashish, et al. "Attention is all you need."

Attention Is All You Need Paper Implementation This is my from-scratch implementation of the original transformer architecture from the following pape

195 Dec 30, 2022

Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Patches Are All You Need? - ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in t

8 Oct 3, 2022

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

16 Jul 16, 2022

[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Transformer for Image Colorization This is an implemention for Yes, "Attention Is All You Need", for Exemplar based Colorization, and the current soft

30 Dec 7, 2022

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

180 Jan 5, 2023

An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Related tags

Overview

Fast Transformer

Installation

Developing fast-transformer

Usage

Want to Contribute 🙋‍♂️ ?

Want to discuss? 💬

Citation

License

You might also like...

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Neural network-based build time estimation for additive manufacturing

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Code for "Diffusion is All You Need for Learning on Surfaces"

Comments

Implement Additive Attention

Releases(v0.2.0)

v0.2.0(Jan 16, 2022)

✅ Bug Fixes / Improvements

v0.1.0(Sep 3, 2021)

Classes

Owner

Rishit Dagli

An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

TensorFlow implementation of "Attention is all you need (Transformer)"

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

A PyTorch implementation of the Transformer model in "Attention is All You Need".

pytorch implementation of Attention is all you need

Implementation of Vaswani, Ashish, et al. "Attention is all you need."

Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"