PyTorch implementation of Pay Attention to MLPs

Jake Tae

Last update: Dec 13, 2022

Related tags

Overview

gMLP

PyTorch implementation of Pay Attention to MLPs.

Quickstart

Clone this repository.

git clone https://github.com/jaketae/g-mlp.git

Navigate to the cloned directory. You can use the barebone gMLP model via

>>> from g_mlp import gMLP
>>> model = gMLP()

By default, the model comes with the following parameters:

gMLP(
    d_model=256,
    d_ffn=512,
    seq_len=256,
    num_layers=6,
)

Usage

The repository also contains gMLP models specifically for language modeling and image classification.

NLP

gMLPForLanguageModeling shares the same default parameters as gMLP, with num_tokens=10000 as an added parameter that represents the size of the token embedding table.

>>> from g_mlp import gMLPForLanguageModeling
>>> model = gMLPForLanguageModeling()
>>> tokens = torch.randint(0, 10000, (8, 256))
>>> model(tokens).shape
torch.Size([8, 256, 256])

Computer Vision

gMLPForImageClassification is a ViT-esque version of gMLP that includes a patch creating layer and a final classification head.

>>> from g_mlp import gMLPForImageClassification
>>> model = gMLPForImageClassification()
>>> images = torch.randn(8, 3, 256, 256)
>>> model(images).shape
torch.Size([8, 1000])

Summary

The authors of the paper present gMLP, an an attention-free all-MLP architecture based on spatial gating units. gMLP achieves parity with transformer models such as ViT and BERT on language and vision downstream tasks. The authors also show that gMLP scales with increased data and number of parameters, suggesting that self-attention is not a necessary component for designing performant models.

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

489 Jan 7, 2023

Comments

Cuda out of memory

hi Jaketa

thanks for making this code. anyway i run on colab and got error when parsing torch tensor to the model

can you help me with this error. thank you very much

Regards, Winsap

opened by winsapdev 2
NLP Usage

Hi, can you add more documentation for the Usage? I'm a little bit confused about how to use the model for training until getting the final result. Thank you

opened by muhammadfhadli1453 2

PyTorch implementation of Pay Attention to MLPs

Related tags

Overview

gMLP

Quickstart

Usage

NLP

Computer Vision

Summary

Resources

You might also like...

Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

Attention-driven Robot Manipulation (ARM) which includes Q-attention

Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch

Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch.

Implementation of TabTransformer, attention network for tabular data, in Pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Comments

Cuda out of memory

NLP Usage

Owner

Jake Tae

A GPT, made only of MLPs, in Jax

Code for KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"