Norm-based Analysis of Transformer

Overview

Norm-based Analysis of Transformer

Implementations for 2 papers introducing to analyze Transformers using vector norms:

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

This paper proposed to analyze attention, a core component of Transformer, using vector norms rather than attention weights.
Transformer analyses have been focused on mixing in attention and have typically observed attention weights.
However, in addition to attention weights, there are more factors to determine attention's outputs: the input vector itself and vector transformations.
Then, this paper proposed to analyze attention using vector norms considering them.
→ Check this paper's code: Code for emnlp2020.

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

This paper proposed to analyze attention block (i.e., attention, residual connection, and layer normalization) using vector norms.
Transformer analyses have been focused on mixing in attention.
However, there are components other than attention in Transformer, and they can play a role other than mixing.
Then, this paper proposed to expand the scope of Transformer analysis from attention into attention block.
→ Check this paper's code: Code for emnlp2021.

Citation

If you use our code for academic work, please cite:

@inproceedings{kobayashi-etal-2020-attention,  
   title = {Attention is Not Only a Weight: Analyzing Transformers with Vector Norms},  
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},  
   booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},  
   year = "2020",  
   url = "https://www.aclweb.org/anthology/2020.emnlp-main.574",  
   pages = "7057--7075",  
}
@inproceedings{kobayashi-etal-2021-incorporating,
   title = {Incorporating Residual and Normalization Layers into Analysis of Masked Language Models},
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},
   booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Proceeding (EMNLP)},
   year = "2021",
   url = "https://arxiv.org/abs/2109.07152",
   pages = "to appear",
}
You might also like...
3D-Transformer: Molecular Representation with Transformer in 3D Space

3D-Transformer: Molecular Representation with Transformer in 3D Space

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

Transformer - Transformer in PyTorch

Transformer 完成进度 Embeddings and PositionalEncoding with example. MultiHeadAttent

Transformer Huffman coding - Complete Huffman coding through transformer

Transformer_Huffman_coding Complete Huffman coding through transformer 2022/2/19

Implementation of the Transformer variant proposed in
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

FLASH - Pytorch Implementation of the Transformer variant proposed in the paper Transformer Quality in Linear Time Install $ pip install FLASH-pytorch

Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Delta_Conformity_Sociopatterns_Analysis ∆-Conformity is a local homophily measur

Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).
Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

SentiBERT Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020). https://arxiv.org/abs/20

a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Comments
  • Typo in train.sh

    Typo in train.sh

    Hi, amazing work! and really appreciate you sharing the code.

    I was checking it and there is a typo in norm-analysis-of-transformer/exp2_nmt/train.sh, in --trainpref $DATS_DIR/train.bpe it should be $DATA_DIR

    #!/bin/sh
    
    export LANG=en_US.UTF-8
    export LC_ALL=en_US.UTF-8
    
    DATA_DIR=./work/processed_data
    BPE_DIR=./work/bpe_model_and_vocab
    
    # convert BPE vocab to use for fairseq
    cut -f1 $BPE_DIR/de.vocab | tail -n +4 | sed "s/$/ 100/g" > $DATA_DIR/de.vocab
    cut -f1 $BPE_DIR/en.vocab | tail -n +4 | sed "s/$/ 100/g" > $DATA_DIR/en.vocab
    
    # fairseq preprocess
    fairseq-preprocess --source-lang de --target-lang en \
        --trainpref $DATS_DIR/train.bpe \
        --validpref $DATA_DIR/valid.bpe \
        --testpref $DATA_DIR/valid.bpe \
        --srcdict $DATA_DIR/de.vocab \
        --tgtdict $DATA_DIR/en.vocab \
        --destdir $DATA_DIR/fairseq_preprocessed_data \
        --workers 16 # adjust here to suit your environment
    
    opened by javiferran 1
  • cannot load pre-trained model in fairseq reproduction

    cannot load pre-trained model in fairseq reproduction

    Hi,

    I'm trying to reproduce the results in the demo notebook emnlp2020/changed_fairseq_usage.ipynb with the hope of extending them to analyse a BART model. However, I'm currently running into issues when trying to load the model.

    Here's the traceback:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-2-0ba6c5e0035c> in <module>
          6     model_name_or_path=path,
          7     tokenizer='moses',
    ----> 8     bpe='fastbpe',
          9 ).to(device)
         10 
    
    ~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/models/fairseq_model.py in from_pretrained(cls, model_name_or_path, checkpoint_file, data_name_or_path, **kwargs)
        275             data_name_or_path,
        276             archive_map=cls.hub_models(),
    --> 277             **kwargs,
        278         )
        279         logger.info(x["args"])
    
    ~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/hub_utils.py in from_pretrained(model_name_or_path, checkpoint_file, data_name_or_path, archive_map, **kwargs)
         71     models, args, task = checkpoint_utils.load_model_ensemble_and_task(
         72         [os.path.join(model_path, cpt) for cpt in checkpoint_file.split(os.pathsep)],
    ---> 73         arg_overrides=kwargs,
         74     )
         75 
    
    ~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/checkpoint_utils.py in load_model_ensemble_and_task(filenames, arg_overrides, task, strict, suffix, num_shards)
        285             if not PathManager.exists(filename):
        286                 raise IOError("Model file not found: {}".format(filename))
    --> 287             state = load_checkpoint_to_cpu(filename, arg_overrides)
        288             if "args" in state and state["args"] is not None:
        289                 cfg = convert_namespace_to_omegaconf(state["args"])
    
    ~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/checkpoint_utils.py in load_checkpoint_to_cpu(path, arg_overrides)
        237         overwrite_args_by_name(state["cfg"], arg_overrides)
        238 
    --> 239     state = _upgrade_state_dict(state)
        240     return state
        241 
    
    ~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/checkpoint_utils.py in _upgrade_state_dict(state)
        458             state["args"].post_process = state["args"].remove_bpe
        459 
    --> 460         state["cfg"] = convert_namespace_to_omegaconf(state["args"])
        461 
        462     if "cfg" in state and state["cfg"] is not None:
    
    ~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/dataclass/utils.py in convert_namespace_to_omegaconf(args)
        296 
        297 
    --> 298     with initialize(config_path=config_path, strict=True): # BUG: TypeError: __init__() got an unexpected keyword argument 'strict'
        299     # with initialize(config_path=config_path):
        300         # import pdb;pdb.set_trace()
    
    TypeError: __init__() got an unexpected keyword argument 'strict'
    

    This seems to me to be an issue with the dependencies, so I've tried a few different versions of Hydra (e.g. 1.0.0, 1.1.0, 1.2.0) and dropping back to pytorch=1.7.1, but still no luck.

    I've attached the list of installed dependencies after following the setup instructions in the README. Can you provide some details on the versions used to run this notebook?

    Thanks in advance! Tannon

    default_versions.txt

    opened by tannonk 1
Owner
Goro Kobayashi
Goro Kobayashi
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Alex Pashevich 62 Dec 24, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

null 61 Jan 1, 2023
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 9, 2023
The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Shuffle Transformer The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer" Introduction Very recently, window-

null 87 Nov 29, 2022
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

null 52 Dec 29, 2022
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

Microsoft 409 Jan 6, 2023
nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

jsguo 610 Dec 28, 2022