Norm-based Analysis of Transformer

Goro Kobayashi

Last update: Dec 5, 2022

Related tags

Deep Learning norm-analysis-of-transformer

Overview

Norm-based Analysis of Transformer

Implementations for 2 papers introducing to analyze Transformers using vector norms:

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)
Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

This paper proposed to analyze attention, a core component of Transformer, using vector norms rather than attention weights.
Transformer analyses have been focused on mixing in attention and have typically observed attention weights.
However, in addition to attention weights, there are more factors to determine attention's outputs: the input vector itself and vector transformations.
Then, this paper proposed to analyze attention using vector norms considering them.
→ Check this paper's code: Code for emnlp2020.

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

This paper proposed to analyze attention block (i.e., attention, residual connection, and layer normalization) using vector norms.
Transformer analyses have been focused on mixing in attention.
However, there are components other than attention in Transformer, and they can play a role other than mixing.
Then, this paper proposed to expand the scope of Transformer analysis from attention into attention block.
→ Check this paper's code: Code for emnlp2021.

Citation

If you use our code for academic work, please cite:

@inproceedings{kobayashi-etal-2020-attention,  
   title = {Attention is Not Only a Weight: Analyzing Transformers with Vector Norms},  
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},  
   booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},  
   year = "2020",  
   url = "https://www.aclweb.org/anthology/2020.emnlp-main.574",  
   pages = "7057--7075",  
}
@inproceedings{kobayashi-etal-2021-incorporating,
   title = {Incorporating Residual and Normalization Layers into Analysis of Masked Language Models},
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},
   booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Proceeding (EMNLP)},
   year = "2021",
   url = "https://arxiv.org/abs/2109.07152",
   pages = "to appear",
}

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

1 Dec 24, 2021

Transformer - Transformer in PyTorch

Transformer 完成进度 Embeddings and PositionalEncoding with example. MultiHeadAttent

1 Jan 6, 2022

Transformer Huffman coding - Complete Huffman coding through transformer

Transformer_Huffman_coding Complete Huffman coding through transformer 2022/2/19

3 May 19, 2022

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

FLASH - Pytorch Implementation of the Transformer variant proposed in the paper Transformer Quality in Linear Time Install $ pip install FLASH-pytorch

209 Dec 28, 2022

Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Delta_Conformity_Sociopatterns_Analysis ∆-Conformity is a local homophily measur

2 Jan 9, 2022

Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

1 Jan 10, 2022

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

SentiBERT Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020). https://arxiv.org/abs/20

66 Aug 13, 2022

a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

9.9k Jan 8, 2023

Comments

Typo in train.sh

Hi, amazing work! and really appreciate you sharing the code.

I was checking it and there is a typo in norm-analysis-of-transformer/exp2_nmt/train.sh, in --trainpref $DATS_DIR/train.bpe it should be $DATA_DIR

#!/bin/sh

export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8

DATA_DIR=./work/processed_data
BPE_DIR=./work/bpe_model_and_vocab

# convert BPE vocab to use for fairseq
cut -f1 $BPE_DIR/de.vocab | tail -n +4 | sed "s/$/ 100/g" > $DATA_DIR/de.vocab
cut -f1 $BPE_DIR/en.vocab | tail -n +4 | sed "s/$/ 100/g" > $DATA_DIR/en.vocab

# fairseq preprocess
fairseq-preprocess --source-lang de --target-lang en \
    --trainpref $DATS_DIR/train.bpe \
    --validpref $DATA_DIR/valid.bpe \
    --testpref $DATA_DIR/valid.bpe \
    --srcdict $DATA_DIR/de.vocab \
    --tgtdict $DATA_DIR/en.vocab \
    --destdir $DATA_DIR/fairseq_preprocessed_data \
    --workers 16 # adjust here to suit your environment

opened by javiferran 1

cannot load pre-trained model in fairseq reproduction

Hi,

I'm trying to reproduce the results in the demo notebook emnlp2020/changed_fairseq_usage.ipynb with the hope of extending them to analyse a BART model. However, I'm currently running into issues when trying to load the model.

Here's the traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-0ba6c5e0035c> in <module>
      6     model_name_or_path=path,
      7     tokenizer='moses',
----> 8     bpe='fastbpe',
      9 ).to(device)
     10 

~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/models/fairseq_model.py in from_pretrained(cls, model_name_or_path, checkpoint_file, data_name_or_path, **kwargs)
    275             data_name_or_path,
    276             archive_map=cls.hub_models(),
--> 277             **kwargs,
    278         )
    279         logger.info(x["args"])

~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/hub_utils.py in from_pretrained(model_name_or_path, checkpoint_file, data_name_or_path, archive_map, **kwargs)
     71     models, args, task = checkpoint_utils.load_model_ensemble_and_task(
     72         [os.path.join(model_path, cpt) for cpt in checkpoint_file.split(os.pathsep)],
---> 73         arg_overrides=kwargs,
     74     )
     75 

~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/checkpoint_utils.py in load_model_ensemble_and_task(filenames, arg_overrides, task, strict, suffix, num_shards)
    285             if not PathManager.exists(filename):
    286                 raise IOError("Model file not found: {}".format(filename))
--> 287             state = load_checkpoint_to_cpu(filename, arg_overrides)
    288             if "args" in state and state["args"] is not None:
    289                 cfg = convert_namespace_to_omegaconf(state["args"])

~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/checkpoint_utils.py in load_checkpoint_to_cpu(path, arg_overrides)
    237         overwrite_args_by_name(state["cfg"], arg_overrides)
    238 
--> 239     state = _upgrade_state_dict(state)
    240     return state
    241 

~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/checkpoint_utils.py in _upgrade_state_dict(state)
    458             state["args"].post_process = state["args"].remove_bpe
    459 
--> 460         state["cfg"] = convert_namespace_to_omegaconf(state["args"])
    461 
    462     if "cfg" in state and state["cfg"] is not None:

~/INSTALLS/norm-analysis-of-transformer/emnlp2020/fairseq/fairseq/dataclass/utils.py in convert_namespace_to_omegaconf(args)
    296 
    297 
--> 298     with initialize(config_path=config_path, strict=True): # BUG: TypeError: __init__() got an unexpected keyword argument 'strict'
    299     # with initialize(config_path=config_path):
    300         # import pdb;pdb.set_trace()

TypeError: __init__() got an unexpected keyword argument 'strict'

This seems to me to be an issue with the dependencies, so I've tried a few different versions of Hydra (e.g. 1.0.0, 1.1.0, 1.2.0) and dropping back to pytorch=1.7.1, but still no luck.

I've attached the list of installed dependencies after following the setup instructions in the README. Can you provide some details on the versions used to run this notebook?

Thanks in advance! Tannon

default_versions.txt

opened by tannonk 1

Norm-based Analysis of Transformer

Related tags

Overview

Norm-based Analysis of Transformer

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Citation

You might also like...

3D-Transformer: Molecular Representation with Transformer in 3D Space

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

Transformer - Transformer in PyTorch

Transformer Huffman coding - Complete Huffman coding through transformer

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Streamlit App For Product Analysis - Streamlit App For Product Analysis

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

a general-purpose Transformer based vision backbone

Comments

Typo in train.sh

cannot load pre-trained model in fairseq reproduction

Owner

Goro Kobayashi

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "