Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)

Overview

Skyformer

This repository is the official implementation of Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr"om Method (NeurIPS 2021).

image

Requirements

To install requirements in a conda environment:

conda create -n skyformer python=3.6
conda activate skyformer
pip install -r requirements.txt

Note: Specific requirements for data preprocessing are not included here.

Data Preparation

Processed files can be downloaded here, or processed with the following steps:

  1. Requirements
tensorboard>=2.3.0
tensorflow>=2.3.1
tensorflow-datasets>=4.0.1
  1. Download the TFDS files for pathfinder and then set _PATHFINER_TFDS_PATH to the unzipped directory (following https://github.com/google-research/long-range-arena/issues/11)
  2. Download lra_release.gz (7.7 GB).
  3. Unzip lra-release and put under ./data/.
cd data
wget https://storage.googleapis.com/long-range-arena/lra_release.gz
tar zxvf lra-release.gz 
  1. Create a directory lra_processed under ./data/.
mkdir lra_processed
cd ..

6.The directory structure would be (assuming the root dir is code)

./data/lra-processed
./data/long-range-arena-main
./data/lra_release
  1. Create train, dev, and test dataset pickle files for each task.
cd preprocess
python create_pathfinder.py
python create_listops.py
python create_retrieval.py
python create_text.py
python create_cifar10.py

Note: most source code comes from LRA repo.

Run

Modify the configuration in config.py and run

python main.py --mode train --attn skyformer --task lra-text
  • mode: train, eval
  • attn: softmax, nystrom, linformer, reformer, perfromer, informer, bigbird, kernelized, skyformer
  • task: lra-listops, lra-pathfinder, lra-retrieval, lra-text, lra-image

Reference

@inproceedings{Skyformer,
    title={Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method}, 
    author={Yifan Chen and Qi Zeng and Heng Ji and Yun Yang},
    booktitle={NeurIPS},
    year={2021}
}
You might also like...
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax
[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

[NeurIPS 2021] Galerkin Transformer: linear attention without softmax Summary A non-numerical analyst oriented explanation on Toward Data Science abou

(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds by Mutian Xu*, Runyu Ding*, Hengshuang Zhao, and Xiaojuan Qi. Int

Supplementary code for the AISTATS 2021 paper
Supplementary code for the AISTATS 2021 paper "Matern Gaussian Processes on Graphs".

Matern Gaussian Processes on Graphs This repo provides an extension for gpflow with Matérn kernels, inducing variables and trainable models implemente

[ICCV 2021] Official Tensorflow Implementation for
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

Code for MarioNette: Self-Supervised Sprite Learning, in NeurIPS 2021
Code for MarioNette: Self-Supervised Sprite Learning, in NeurIPS 2021

MarioNette | Webpage | Paper | Video MarioNette: Self-Supervised Sprite Learning Dmitriy Smirnov, Michaël Gharbi, Matthew Fisher, Vitor Guizilini, Ale

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

code for paper
code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Comments
  • save trained model to checkpoing?

    save trained model to checkpoing?

    I noticed that Line 158 of main.py controls saving the model to the checkpoint. When I run lra-text, the trained model is not saved, and the performance seems to be overfitting. Are there some problems or mistakes here?

    opened by xyyphant0m 1
Owner
Qi Zeng
CS Ph.D. candidate at UIUC
Qi Zeng
Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

LESA Introduction This repository contains the official implementation of Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Cont

Chenglin Yang 20 Dec 31, 2021
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save <SAVE_NAME> --data <PATH_TO_DATA_DIR> --dataset <DATASET> --model <model_name> [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"

CSA: Contextual Similarity Aggregation with Self-attention for Visual Re-ranking PyTorch training code for CSA (Contextual Similarity Aggregation). We

Hui Wu 19 Oct 21, 2022
This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs)

Description This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs) in [Gardy et

Ludovic Gardy 0 Feb 9, 2022
Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision https://arxiv.org/abs/2003.00393 Abstract Active learning (AL) aims to min

Denis 29 Nov 21, 2022
Self-Supervised Learning with Kernel Dependence Maximization

Self-Supervised Learning with Kernel Dependence Maximization This is the code for SSL-HSIC, a self-supervised learning loss proposed in the paper Self

DeepMind 29 Dec 29, 2022
Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

Phil Wang 189 Nov 22, 2022
Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

mandos 43 Dec 7, 2022
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

Phil Wang 180 Jan 5, 2023