Sinkformers: Transformers with Doubly Stochastic Attention

Overview

Code for the paper : "Sinkformers: Transformers with Doubly Stochastic Attention"

Paper

You will find our paper here.

Compat

This package has been developed and tested with python3.8. It is therefore not guaranteed to work with earlier versions of python.

Install the repository on your machine

This package can easily be installed using pip, with the following command:

pip install numpy
pip install -e .

This will install the package and all its dependencies, listed in requirements.txt.

Each command has to be executed from the root folder sinkformers. Our code is distributed in the different repositories. For each repository, we modify the architectures proposed by replacing the SoftMax attention with a Sinkhorn attention.

Defining a toy Sinkformer for which attention matrices are doubly stochastic

For this example we use a Transformer from the nlp-tutorial library and define its Sinkformer counterpart with the argument "n_it", the number of iterations in Sinkhorn's algorithm.

cd nlp-tutorial/text-classification-transformer
import torch
from model import TransformerEncoder
n_it = 1
print('1 iteration in Sinkhorn corresponds to the original Transformer: ')
transformer = TransformerEncoder(vocab_size=1000, seq_len=512, n_layers=1,  n_heads=1, n_it=n_it, print_attention=True, pad_id=-1)
inp = torch.arange(512).repeat(5, 1)
out = transformer(inp)
n_it = 5
print('5 iteration in Sinkhorn gives a Sinkformer with perfectly doubly stochastic attention matrices: ')
sinkformer = TransformerEncoder(vocab_size=1000, seq_len=512, n_layers=1,  n_heads=1, n_it=n_it, print_attention=True, pad_id=-1)
inp = torch.arange(512).repeat(5, 1)
out = sinkformer(inp)

Then go back to the root:

cd ..
cd ..

Reproducing the experiments of the paper

Comparison of the different normalizations.

python plot_normalizations.py

ModelNet 40 classification. Code adapted from this repository. First, you need to preprocess the ModelNet40 dataset available here. Unzip it and save it under model_net_40/data. Then, preferably on multiple cpus, run

cd model_net_40
python to_h5.py
python formatting.py
cd ..
mv model_net_40/data/ModelNet40_cloud.h5 set_transformer/ModelNet40_cloud.h5
cd set_transformer
mkdir ../dataset
mv ModelNet40_cloud.h5 ../dataset/ModelNet40_cloud.h5
cd ..

Then you can train a Set Sinkformer (or Set Transformer) on ModelNet 40 with

cd set_transformer
python one_expe.py
cd ..

Arguments for one_expe.py can be accessed through

cd set_transformer
python one_expe.py --help
cd ..

Results are saved in the folder set_transformer/results. You can plot the learning curves using the script set_transformer/plot_results.py. The array iterations in the script must contains the different values for n_it used when training.

Sentiment Analysis. Code adapted from this repository. You can also train a Sinkformer for Sentiment Analysis on the IMDb Dataset with the following command (the IMDb Dataset is downloaded automatically).

cd nlp-tutorial/text-classification-transformer
python one_expe.py
cd ..
cd ..

Arguments for one_expe.py can be accessed through

cd nlp-tutorial/text-classification-transformer
python one_expe.py --help
cd ..

Results are saved in the folder nlp-tutorial/text-classification-transformer/results. You can plot the learning curves using the script nlp-tutorial/text-classification-transformer/plot_results.py. The array iterations in the script must contain the different values for "n_it" used when training.

ViT Cats and Dogs classification. Code adapted from this repository. First, you can download the data set here, unzip it and save the train and test repositories at sinkformers/vit-pytorch/examples/data. Then you can run

cd vit-pytorch
python one_expe.py
cd ..

Arguments for one_expe.py can be accessed through

cd vit-pytorch
python one_expe.py --help
cd ..

Results are saved in the folder vit-pytorch/results. You can plot the learning curves using the script vit-pytorch/plot_results.py. The array iterations in the script must contain the different values for "n_it" used when training.

ViT MNIST. The MNIST dataset will be downloaded automatically.

cd vit-pytorch
python one_expe_mnist.py
cd ..

Arguments for one_expe_mnist.py can be accessed through

cd vit-pytorch
python one_expe_mnist.py --help
cd ..

Especially, the argument "ps" is the patch size. Results are saved in the folder vit-pytorch/results_mnist. You can plot the learning curves using the script vit-pytorch/plot_results_mnist.py. The array iterations in the script must contain the different values for "n_it" used when training. The array patches_size in the script must contain the different values for "ps" used when training.

Cite

If you use this code in your project, please cite::

Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré
Sinkformers: Transformers with Doubly Stochastic Attention
arXiv preprint arXiv:2110.11773, 2021
https://arxiv.org/abs/2110.11773
You might also like...
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations Project | Paper | Colab PyTorch implementation of SDEdit: Image Synthesis a

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Task-based end-to-end model learning in stochastic optimization

Task-based End-to-end Model Learning in Stochastic Optimization This repository is by Priya L. Donti, Brandon Amos, and J. Zico Kolter and contains th

Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting
Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting

StochFuzz: A New Solution for Binary-only Fuzzing StochFuzz is a (probabilistically) sound and cost-effective fuzzing technique for stripped binaries.

This package implements THOR: Transformer with Stochastic Experts.

THOR: Transformer with Stochastic Experts This PyTorch package implements Taming Sparsely Activated Transformer with Stochastic Experts. Installation

This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

About Repository This repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.' About Code

Stochastic Scene-Aware Motion Prediction
Stochastic Scene-Aware Motion Prediction

Stochastic Scene-Aware Motion Prediction [Project Page] [Paper] Description This repository contains the training code for MotionNet and GoalNet of SA

Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI
Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Introduction This script trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI. In order to run this

Collision risk estimation using stochastic motion models
Collision risk estimation using stochastic motion models

collision_risk_estimation Collision risk estimation using stochastic motion models. This is a new approach, based on stochastic models, to predict the

Comments
  • nice work, did sinkhorn-attention could be applied to casual-language model?

    nice work, did sinkhorn-attention could be applied to casual-language model?

    Hi, sinkhorn-transform is a great work, how to use sinkhorn-attention into casual language model? does it just replace bi-attention mask with casual mask for cost matrix?

    attn_score = torch.matmul(q, k.transpose(-1, -2)) / np.sqrt(self.d_k) attn_score.masked_fill_(attn_mask, -1e9)---------in general, attn_mask for bi-directional modeling attends to all tokens, if we apply to casual attention for GPT-2, we have to replace attn_mask with casual mask, when we do it, is it valid to use sinkhorn-attention? attn_score_shape = attn_score.shape attn_weights_soft = nn.Softmax(dim=-1)(attn_score)

    |attn_score| : (batch_size, n_heads, q_len, k_len)

    attn_score = attn_score.view(-1, attn_score_shape[2], attn_score_shape[3]) sink = SinkhornDistance(1, max_iter=n_it)

    opened by yyht 0
Owner
Michael E. Sander
Michael E. Sander
Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"

Infinitely Deep Bayesian Neural Networks with SDEs This library contains JAX and Pytorch implementations of neural ODEs and Bayesian layers for stocha

Winnie Xu 95 Nov 26, 2021
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
Stochastic Normalizing Flows

Stochastic Normalizing Flows We introduce stochasticity in Boltzmann-generating flows. Normalizing flows are exact-probability generative models that

AI4Science group, FU Berlin (Frank Noé and co-workers) 50 Dec 16, 2022
Implementation of Stochastic Image-to-Video Synthesis using cINNs.

Stochastic Image-to-Video Synthesis using cINNs Official PyTorch implementation of Stochastic Image-to-Video Synthesis using cINNs accepted to CVPR202

CompVis Heidelberg 135 Dec 28, 2022
DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

Jingwei Huang 130 Dec 2, 2022
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

null 46 Dec 7, 2022
On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

Facebook Research 46 Dec 15, 2022
iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis Andreas Bl

CompVis Heidelberg 36 Dec 25, 2022
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks (SDPoint) This repository contains the cod

Jason Kuen 17 Jul 4, 2022
Binary Stochastic Neurons in PyTorch

Binary Stochastic Neurons in PyTorch http://r2rt.com/binary-stochastic-neurons-in-tensorflow.html https://github.com/pytorch/examples/tree/master/mnis

Onur Kaplan 54 Nov 21, 2022