Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Related tags

Text Data & NLP evit
Overview

Expediting Vision Transformers via Token Reorganizations

This repository contains PyTorch evaluation code, training code and pretrained EViT models for the ICLR 2022 Spotlight paper:

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

The proposed EViT models obtain competitive tradeoffs in terms of speed / precision:

EViT

If you use this code for a paper please cite:

@inproceedings{liang2022evit,
title={Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations},
author={Youwei Liang and Chongjian Ge and Zhan Tong and Yibing Song and Jue Wang and Pengtao Xie},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=BjyvwnXXVn_}
}

Model Zoo

We provide EViT-DeiT-S models pretrained on ImageNet 2012.

Token fusion Keep rate Acc@1 Acc@5 #Params URL
0.9 79.8 95.0 22.1M model
0.8 79.8 94.9 22.1M model
0.7 79.5 94.8 22.1M model
0.6 78.9 94.5 22.1M model
0.5 78.5 94.2 22.1M model
0.9 79.9 94.9 22.1M model
0.8 79.7 94.8 22.1M model
0.7 79.4 94.7 22.1M model
0.6 79.1 94.5 22.1M model
0.5 78.4 94.1 22.1M model

Preparation

The reported results in the paper were obtained with models trained with 16 NVIDIA A100 GPUs using Python3.6 and the following packages

torch==1.9.0
torchvision==0.10.0
timm==0.4.12
tensorboardX==2.4
torchprofile==0.0.4
lmdb==1.2.1
pyarrow==5.0.0

These packages can be installed by running pip install -r requirements.txt.

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

We use the same datasets as in DeiT. You can optionally use an LMDB dataset for ImageNet by building it using folder2lmdb.py and passing --use-lmdb to main.py, which may speed up data loading.

Usage

First, clone the repository locally:

git clone https://github.com/youweiliang/evit.git

Change directory to the cloned repository by running cd evit, install necessary packages, and prepare the datasets.

Training

To train EViT/0.7-DeiT-S on ImageNet, set the datapath (path to dataset) and logdir (logging directory) in run_code.sh properly and run bash ./run_code.sh (--nproc_per_node should be modified if necessary). Note that the batch size in the paper is 16x128=2048.

Set --base_keep_rate in run_code.sh to use a different keep rate, and set --fuse_token to configure whether to use inattentive token fusion.

Training/Finetuning on higher resolution images

To training on images with a (higher) resolution h, set --input-size h in run_code.sh.

Multinode training

Please refer to DeiT for multinode training.

Finetuning

First set the datapath, logdir, and ckpt (the model checkpoint for finetuning) in run_code.sh, and then run bash ./finetune.sh.

Evaluation

To evaluate a pre-trained EViT/0.7-DeiT-S model on ImageNet val with a single GPU run (replacing checkpoint with the actual file):

python3 main.py --model deit_small_patch16_shrink_base --fuse_token --base_keep_rate 0.7 --eval --resume checkpoint --data-path /path/to/imagenet

You can also pass --dist-eval to use multiple GPUs for evaluation.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

We would like to think the authors of DeiT, based on which this project is built.

Comments
  • EViT-DeiT-S Reproduce

    EViT-DeiT-S Reproduce

    Many thanks to the authors for their open source work.

    I would like to ask about the specific configuration of EViT-DeiT-S for fine-tuning experiments, including --nproc_per_node, --batch-size, --warmup-epochs, --shrink_epochs, --shrink_start_epoch, etc. 1

    Directly following the finetune.sh configuration, I can't seem to reproduce the 78.5% accuracy.

    opened by Cydia2018 4
  • Where should I read the evit token filtering and fusion algorithm code

    Where should I read the evit token filtering and fusion algorithm code

    I'm sorry I'm using the Timm library for the first time, and I can't find the source code for the EVit section. Please help me find it, I want to read it carefully!

    opened by kritohyh 1
  • Comparison with Evo-ViT

    Comparison with Evo-ViT

    Thx for your brilliant works on accelerating ViTs!

    Months ago I've read this paper: Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer, and now found that you have similar model structures: preserving informative tokens while aggregating uninformative ones. Have you ever comparing with this work? I found few counter works about faster inference in Figure 4, referring to the surveys of ViTs recently, there are many more fast ViTs you could compare with.

    Also, as far as I know, there are plenty works in the field of NLP, based on BERT, of prototyping and aggregating tokens. Those could also be good comparisions.

    Thanks for your considerations, and looking forward to your reply.

    opened by Longday0923 1
  • forward_features(self, x, keep_rate=None, tokens=None, get_idx=False)

    forward_features(self, x, keep_rate=None, tokens=None, get_idx=False)

    with torch.cuda.amp.autocast(): outputs = model(samples, keep_rate) loss = criterion(samples, outputs, targets) 这段代码是在train_one_epoch函数中调用的,你的model没有传token参数,按照你这个应该会报错的,请问你这个token是在哪里传过去的?

    opened by 60wanjinbing 1
  • How to handle

    How to handle "compute MAC error"?

    Thanks for your code.

    I test my own model by the code you provide in speed_test.py, but I encounter this problem.

    https://github.com/youweiliang/evit/blob/97e58f610c51d4b74a070341739e41647dced32c/speed_test.py#L106

    What's the reason for it? How to deal with it?

    opened by pzSuen 1
  • Warmup strategy

    Warmup strategy

    Hello, I would like to ask if the warmup strategy is not used, but the keep rate is directly set to the target value, will the experimental results differ greatly?

    opened by 1171000410 3
  • UnboundLocalError: local variable 'idxs' referenced before assignment

    UnboundLocalError: local variable 'idxs' referenced before assignment

    Hello, when I run bash ./run_code.sh, the following error has occurred, and how can I solve this problem?

    Traceback (most recent call last): File "main.py", line 497, in main(args) File "main.py", line 448, in main args=args File "/content/evit/engine.py", line 56, in train_one_epoch outputs = model(samples, keep_rate) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/distributed.py", line 963, in forward output = self.module(*inputs[0], **kwargs[0]) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/content/evit/evit.py", line 442, in forward x, _, idxs = self.forward_features(x, keep_rate, tokens, get_idx) File "/content/evit/evit.py", line 437, in forward_features return self.pre_logits(x[:, 0]), left_tokens, idxs UnboundLocalError: local variable 'idxs' referenced before assignment

    opened by DDxk369 1
Owner
Youwei Liang
Youwei Liang
Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

beyond masking Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers The code is coming Figure 1: Pipeline of token-based pre-

Yunjie Tian 23 Sep 27, 2022
Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks arXiv link: upcoming To be published in Findings of NA

Allen 16 Nov 12, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
Global Tracking Transformers, CVPR 2022

Global Tracking Transformers Global Tracking Transformers, Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl, CVPR 2022 (arXiv 2203.13250)

Xingyi Zhou 304 Dec 16, 2022
Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Negative Sampling for NER Unlabeled entity problem is prevalent in many NER scenarios (e.g., weakly supervised NER). Our paper in ICLR-2021 proposes u

Yangming Li 128 Dec 29, 2022
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

deepset 6.4k Jan 9, 2023
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Jinglin Liu 829 Jan 7, 2023
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 3, 2023
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

null 10 Jul 1, 2022
Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

null 14 Jan 3, 2023
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Recurrent VLN-BERT Code of the Recurrent-VLN-BERT paper: A Recurrent Vision-and-Language BERT for Navigation Yicong Hong, Qi Wu, Yuankai Qi, Cristian

YicongHong 109 Dec 21, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 44 Jan 6, 2023
Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Zhenhailong Wang 2 Jul 15, 2022
[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

CMU Locus Lab 460 Oct 13, 2022
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

Harald Scheidl 736 Jan 3, 2023
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Francis R. Willett 305 Dec 22, 2022
Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chenhe Dong 28 Nov 10, 2022