Code for the paper PermuteFormer

Overview
You might also like...
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

🌳 Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

Code for paper
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification
This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

Comments
  • Transfomers and torch version as there is an error

    Transfomers and torch version as there is an error

    Hello, thanks for this awesome code. I am trying to run the listops script as follows python listops.py --model-config configs/bert_prenorm_listops.json and I am getting the following error config_class=_CONFIG_FOR_DOC, TypeError: add_code_sample_docstrings() got an unexpected keyword argument 'tokenizer_class'. Do you know what might have caused this error. From reading online, it looks like an old version of transformers. Could you please provide Transfomers and torch versions please. Thank you

    opened by BalloutAI 0
  • Apply to linear attention and got

    Apply to linear attention and got "Nan" issue

    Hi,

    I tried this method to linear attention http://proceedings.mlr.press/v119/katharopoulos20a/katharopoulos20a.pdf as following code:

        #PermuterFormer - P
        q = q.gather(-1, self.permutation[:, :, :q.shape[2]].expand_as(q))
        k = k.gather(-1, self.permutation[:, :, :k.shape[2]].expand_as(k))
    
        # Apply the feature map to the queries and keys
        Q = torch.nn.functional.elu(q) + 1
        K = torch.nn.functional.elu(k) + 1
    
        #PermuterFormer - r
        Q *= (self.ratio.unsqueeze(-1) ** torch.arange(Q.shape[2], device=Q.device).unsqueeze(0)).unsqueeze(-1)
        K *= ((1 / self.ratio).unsqueeze(-1) ** torch.arange(K.shape[2], device=K.device).unsqueeze(0)).unsqueeze(-1)
    
        if mask is not None:
            K.masked_fill_(mask.unsqueeze(1).unsqueeze(-1), 0.0)
    
        # Compute the KV matrix
        KV = torch.einsum("nhsd,nhsm->nhmd", K, v)
    
        # Compute the normalizer
        Z = 1/(torch.einsum("nhld,nhd->nlh", Q, K.sum(dim=2))+self.eps)
    
        # Finally compute and return the new values
        V = torch.einsum("nhld,nhmd,nlh->nlhm", Q, KV, Z)
    

    But always got "nan" issue after 1~5 steps. From my perspective, this may caused by this step:

        Q *= (self.ratio.unsqueeze(-1) ** torch.arange(Q.shape[2], device=Q.device).unsqueeze(0)).unsqueeze(-1)
        K *= ((1 / self.ratio).unsqueeze(-1) ** torch.arange(K.shape[2], device=K.device).unsqueeze(0)).unsqueeze(-1)
    

    which multiply a very small number to Q and a very big number to K when the index is large.

    Do I use the correct integration way? Or any suggestion for this?

    Thanks.

    opened by Yogurt928 2
  • Roformer

    Roformer

    Hi Peng, and thank you for this interesting paper. I was wondering if you checked in your code for the rotary transformer experiments? To be clear, you tested linear transformers with rotary embeddings on LRA? Also, I was wondering if you tested your permutation scheme with Choromanski's FAVOR+

    opened by lucidrains 1
Owner
Peng Chen
Peng Chen
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 1, 2023
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 3.2k Feb 17, 2021
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

null 44 Dec 31, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 3, 2023
This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

null 1.1k Dec 27, 2022
Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

ICTNLP 90 Dec 27, 2022
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Francis R. Willett 305 Dec 22, 2022
source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

null 49 Dec 17, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 44 Jan 6, 2023
Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Mask-Align: Self-Supervised Neural Word Alignment This is the implementation of our work Mask-Align: Self-Supervised Neural Word Alignment. @inproceed

THUNLP-MT 46 Dec 15, 2022