This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Overview

Dynamic-Vision-Transformer (Pytorch)

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Update on 2021/06/01: Release Pre-trained Models and the Inference Code on ImageNet.

Introduction

We develop a Dynamic Vision Transformer (DVT) to automatically configure a proper number of tokens for each individual image, leading to a significant improvement in computational efficiency, both theoretically and empirically.

Results

  • Top-1 accuracy on ImageNet v.s. GFLOPs

  • Top-1 accuracy on CIFAR v.s. GFLOPs

  • Top-1 accuracy on ImageNet v.s. Throughput

  • Visualization

Pre-trained Models

Backbone # of Exits # of Tokens Links
T2T-ViT-12 3 7x7-10x10-14x14 Tsinghua Cloud / Google Drive
  • What are contained in the checkpoints:
**.pth.tar
├── model_state_dict: state dictionaries of the model
├── flops: a list containing the GFLOPs corresponding to exiting at each exit
├── anytime_classification: Top-1 accuracy of each exit
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

  • python 3.7.7
  • pytorch 1.3.1
  • torchvision 0.4.2

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: [email protected].

Acknowledgment

Our code of T2T-ViT from here.

To Do

  • Update the code for training.
Comments
  • About the implementation of upsampling in relation_reuse

    About the implementation of upsampling in relation_reuse

    The main concern for me is that what is the necessity to split relation_temp as:

      split_index = int(relation_temp.size(0) / 2)
      relation_temp = torch.cat(
          (
              self.relation_reuse_upsample(relation_temp[:split_index * 1]),
              self.relation_reuse_upsample(relation_temp[split_index * 1:]),
          ), 0
      )
    

    It is more straight to implement the upsample like this:

      relation_temp =  self.relation_reuse_upsample(relation_temp)
    

    Could you please explain the difference between the above two implementations?

    opened by larenzhang 1
  • Error 'Unknown model (DVT_T2t_vit_12)'

    Error 'Unknown model (DVT_T2t_vit_12)'

    Hi!

    I try to evaluate the DVT_T2t_vit_12, then I run 'python inference.py --data_url ./data/ --batch_size 64 --model DVT_T2t_vit_12 --checkpoint_path .\checkpoint\DVT_T2t_vit_12.pth.tar --eval_mode 1', I get the error.

    " Traceback (most recent call last): File "inference.py", line 226, in main() File "inference.py", line 57, in main model = create_model( File "A:\transformer\DViT\Dynamic-Vision-Transformer-main\Dynamic-Vision-Transformer-main\timm\models\factory.py", line 59, in create_model raise RuntimeError('Unknown model (%s)' % model_name) RuntimeError: Unknown model (DVT_T2t_vit_12) "

    And I try to print the _model_entrypoints, which in ..Dynamic-Vision-Transformer-main/timm/models/registry.py to find the model name'DVT_T2t_vit_12'. I don't see that.

    env: python:3.8 pytorch:1.8.1 torchvision 0.9.1

    opened by xiyiyia 1
  • Could you give us some example checkpoint of ImageNet 2014 or anything else?

    Could you give us some example checkpoint of ImageNet 2014 or anything else?

    I try to evaluate your model on my machine. Unfortunately, I find you don't give us any checkpoint in any dataset. I don't have enough machines to train a new model at ImageNet. Your work is perfect, and It would be a pity if I couldn't run your model. Just one checkpoint file I need! Any dataset is fine! Please. @blackfeather-wang @guanfuchen

    Thank you

    opened by xiyiyia 1
  • Some questions wioth FLOPs calculation in ViT

    Some questions wioth FLOPs calculation in ViT

    Thanks for your great work. I am interested in the FLOPs reported in your paper like table 1 table 4. I am wondering if you can release the code of FLOPs calcuation for ViT. Thank you!

    opened by Liuyang829 1
  • 关于feature 和 relation reuse 的疑惑

    关于feature 和 relation reuse 的疑惑

    1. 我们知道一个transformer应该是由多个encoder blocks组成的,那么我好奇的是upstream transformer 最后一层的输出是否要与downstream transformer每一个encoder block中的mlp输出进行级联?
    2. 论文中提到要重用upstream transformer的attention logits, 也就是重用upstream transformer中由Q与K生成的attention map, 那么我所好奇的是,是不是要将upstream transformer 每一个encoder block中的 attention map都与 downstream transformer与之深度对应的encoder block的attention map 进行级联来达到relation resue的目的?
    3. 这种重用机制所带来的额外计算开销理论上来说是非常巨大的,就像densenet的dense connection, 而论文中提到额外的计算开销是很小的,那么我觉得只有一个理由能解释这种相对额外开销很小的原因就是每一个patch 进行linear projection后得到的D的数值是很小的。我这样理解对吗?
    opened by xingshulicc 1
Owner
null
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 7, 2023
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

null 61 Jan 1, 2023
Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Demysitifing Local Vision Transformer, arxiv This is the official PyTorch implementation of our paper. We simply replace local self attention by (dyna

null 138 Dec 28, 2022
Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer This repository contains the PyTorch code for Evo-ViT. This work proposes a slow-fas

YifanXu 53 Dec 5, 2022
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 9, 2023
This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

null 1 Dec 24, 2021
This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

DCL-PyTorch Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page. Framework Grounding Physical

Zhenfang Chen 31 Jan 6, 2023
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

THUNLP 75 Nov 2, 2022
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

null 9 Jan 12, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 341 Dec 29, 2022
Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

Motionformer This is an official pytorch implementation of paper Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers. In this rep

Facebook Research 192 Dec 23, 2022
Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Knowledge-Inheritance Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq fo

THUNLP 31 Nov 19, 2022
This repo contains the code required to train the multivariate time-series Transformer.

Multi-Variate Time-Series Transformer This repo contains the code required to train the multivariate time-series Transformer. Download the data The No

Gregory Duthé 4 Nov 24, 2022