Code for the paper PermuteFormer

Peng Chen

Last update: Mar 16, 2022

Related tags

Text Data & NLP PermuteFormer

Overview

PermuteFormer

This repo includes codes for the paper PermuteFormer: Efficient Relative Position Encoding for Long Sequences. Directory long_range_arena is for experiments on the dataset Long-Range Arena. Directory language_model is for experiments on the dataset WikiText-103.

Comments

Transfomers and torch version as there is an error

Hello, thanks for this awesome code. I am trying to run the listops script as follows python listops.py --model-config configs/bert_prenorm_listops.json and I am getting the following error config_class=_CONFIG_FOR_DOC, TypeError: add_code_sample_docstrings() got an unexpected keyword argument 'tokenizer_class'. Do you know what might have caused this error. From reading online, it looks like an old version of transformers. Could you please provide Transfomers and torch versions please. Thank you

opened by BalloutAI 0

Apply to linear attention and got "Nan" issue

Hi,

I tried this method to linear attention http://proceedings.mlr.press/v119/katharopoulos20a/katharopoulos20a.pdf as following code:

    #PermuterFormer - P
    q = q.gather(-1, self.permutation[:, :, :q.shape[2]].expand_as(q))
    k = k.gather(-1, self.permutation[:, :, :k.shape[2]].expand_as(k))

    # Apply the feature map to the queries and keys
    Q = torch.nn.functional.elu(q) + 1
    K = torch.nn.functional.elu(k) + 1

    #PermuterFormer - r
    Q *= (self.ratio.unsqueeze(-1) ** torch.arange(Q.shape[2], device=Q.device).unsqueeze(0)).unsqueeze(-1)
    K *= ((1 / self.ratio).unsqueeze(-1) ** torch.arange(K.shape[2], device=K.device).unsqueeze(0)).unsqueeze(-1)

    if mask is not None:
        K.masked_fill_(mask.unsqueeze(1).unsqueeze(-1), 0.0)

    # Compute the KV matrix
    KV = torch.einsum("nhsd,nhsm->nhmd", K, v)

    # Compute the normalizer
    Z = 1/(torch.einsum("nhld,nhd->nlh", Q, K.sum(dim=2))+self.eps)

    # Finally compute and return the new values
    V = torch.einsum("nhld,nhmd,nlh->nlhm", Q, KV, Z)

But always got "nan" issue after 1~5 steps. From my perspective, this may caused by this step:

    Q *= (self.ratio.unsqueeze(-1) ** torch.arange(Q.shape[2], device=Q.device).unsqueeze(0)).unsqueeze(-1)
    K *= ((1 / self.ratio).unsqueeze(-1) ** torch.arange(K.shape[2], device=K.device).unsqueeze(0)).unsqueeze(-1)

which multiply a very small number to Q and a very big number to K when the index is large.

Do I use the correct integration way? Or any suggestion for this?

Thanks.

opened by Yogurt928 2

Roformer

Hi Peng, and thank you for this interesting paper. I was wondering if you checked in your code for the rotary transformer experiments? To be clear, you tested linear transformers with rotary embeddings on LRA? Also, I was wondering if you tested your permutation scheme with Choromanski's FAVOR+

opened by lucidrains 1

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

478 Dec 25, 2022

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

🌳 Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

5 Sep 13, 2022

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

9 Jun 27, 2022

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

118 Dec 30, 2022

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

189 Jan 2, 2023

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

884 Nov 11, 2022

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

End-to-end neural table-text understanding models.

914 Jan 7, 2023

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

81 Dec 9, 2022

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

79 Dec 27, 2022

Code for the paper PermuteFormer

Related tags

Overview

PermuteFormer

You might also like...

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Comments

Transfomers and torch version as there is an error

Apply to linear attention and got "Nan" issue

Roformer

Owner

Peng Chen

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021