MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Related tags

Deep Learning MSG-Transformer

Overview

MSG-Transformer

Official implementation of the paper MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens,
by Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian.

We propose a novel Transformer architecture, named MSG-Transformer, which enables efficient and flexible information exchange by introducing MSG tokens to sever as the information hub.

Transformers have offered a new methodology of designing neural networks for visual recognition. Compared to convolutional networks, Transformers enjoy the ability of referring to global features at each stage, yet the attention module brings higher computational overhead that obstructs the application of Transformers to process high-resolution visual data. This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a messenger (MSG). Hence, by manipulating these MSG tokens, one can flexibly exchange visual information across regions and the computational complexity is reduced. We then integrate the MSG token into a multi-scale architecture named MSG-Transformer. In standard image classification and object detection, MSG-Transformer achieves competitive performance and the inference on both GPU and CPU is accelerated.

Updates

2021.6.2 Code for ImageNet classification is released. Pre-trained models will be available soon.

Requirements

PyTorch==1.7
timm==0.3.2
Apex
opencv-python>=3.4.1.15
yacs==0.1.8

Data Preparation

Please organize your ImageNet dataset as followins.

path/to/ImageNet
|-train
| |-cls1
| | |-img1
| | |-...
| |-cls2
| | |-img2
| | |-...
| |-...
|-val
  |-cls1
  | |-img1
  | |-...
  |-cls2
  | |-img2
  | |-...
  |-...

Training

Train MSG-Transformers on ImageNet-1k with the following script.
For MSG-Transformer-T, run

python -m torch.distributed.launch --nproc_per_node 8 main.py \
    --cfg configs/msg_tiny_p4_win7_224.yaml --data-path <dataset-path> --batch-size 128

For MSG-Transformer-S, run

python -m torch.distributed.launch --nproc_per_node 8 main.py \
    --cfg configs/msg_small_p4_win7_224.yaml --data-path <dataset-path> --batch-size 128

For MSG-Transformer-B, we recommend running the following script on two nodes, where each node is with 8 GPUs.

python -m torch.distributed.launch --nproc_per_node 8 \
    --nnodes=2 --node_rank=<node-rank> --master_addr=<ip-address> --master_port=<port> \
    main.py --cfg configs/msg_base_p4_win7_224.yaml --data-path <dataset-path> --batch-size 64

Evaluation

Run the following script to evaluate the pre-trained model.

python -m torch.distributed.launch --nproc_per_node <GPU-number> main.py \
    --cfg <model-config> --data-path <dataset-path> --batch-size <batch-size> \
    --resume <checkpoint> --eval

Main Results

ImageNet-1K

Model	Input size	Params	FLOPs	GPU throughput (images/s)	CPU Latency	Top-1 ACC (%)
MSG-Trans-T	224	28M	4.6G	696.7	150ms	80.9
MSG-Trans-S	224	50M	8.9G	401.0	262ms	83.0
MSG-Trans-B	224	88M	15.8G	262.6	437ms	83.5

MS-COCO

Method	box mAP	mask mAP	Params	FLOPs	FPS
MSG-Trans-T	50.3	43.6	86M	748G	9.4
MSG-Trans-S	51.8	44.8	107M	842G	7.5
MSG-Trans-B	51.9	45.0	145M	990G	6.2

Acknowledgements

This repository is based on Swin-Transformer and timm. Thanks for their contributions to the community.

Citation

If you find this repository/work helpful in your research, welcome to cite the paper.

@article{fang2021msgtransformer,
  title={MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens},
  author={Jiemin Fang and Lingxi Xie and Xinggang Wang and Xiaopeng Zhang and Wenyu Liu and Qi Tian},
  journal={arXiv:2105.15168},
  year={2021}
}

Comments

could pyTorch be a higher version?

the module apex installation seems require a higher cuda verson to satisify my CPU, but pytorch official site did not give the installation command with 1.7 verson and cudatoolkit after 11.1 version. Any suggestions please?

opened by smhhyyz 1

Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Demysitifing Local Vision Transformer, arxiv This is the official PyTorch implementation of our paper. We simply replace local self attention by (dyna

138 Dec 28, 2022

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

ELSA: Enhanced Local Self-Attention for Vision Transformer By Jingkai Zhou, Pich

87 Dec 19, 2022

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

NeurIPS 2021 Title: Distilling Robust and Non-Robust Features in Adversarial Exa

35 Dec 26, 2022

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Related tags

Overview

MSG-Transformer

Updates

Requirements

Data Preparation

Training

Evaluation

Main Results

ImageNet-1K

MS-COCO

Acknowledgements

Citation

You might also like...

Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Algo-burn - Script to configure an Algorand address as a "burn" address for one or more ASA tokens

A simple python program that can be used to implement user authentication tokens into your program...

Pull sensitive data from users on windows including discord tokens and chrome data.

git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Comments

could pyTorch be a higher version?

Owner

Hust Visual Learning Team

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

So-ViT: Mind Visual Tokens for Vision Transformer

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Group Activity Recognition with Clustered Spatial Temporal Transformer

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Tools for manipulating UVs in the Blender viewport.

Synthesizing and manipulating 2048x1024 images with conditional GANs

Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

Graph neural network message passing reframed as a Transformer with local attention