As-ViT: Auto-scaling Vision Transformers without Training

VITA

Last update: Sep 5, 2022

Related tags

Overview

As-ViT: Auto-scaling Vision Transformers without Training [PDF]

Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou

In ICLR 2022.

Note: We implemented topology search (sec. 3.3) and scaling (sec. 3.4) in this code base in PyTorch. Our training code is based on Tensorflow and Keras on TPU, which will be released soon.

Overview

We present As-ViT, a framework that unifies the automatic architecture design and scaling for ViT (vision transformer), in a training-free strategy.

Highlights:

Trainig-free ViT Architecture Design: we design a "seed" ViT topology by leveraging a training-free search process. This extremely fast search is fulfilled by our comprehensive study of ViT's network complexity (length distorsion), yielding a strong Kendall-tau correlation with ground-truth accuracies.
Trainig-free ViT Architecture Scaling: starting from the "seed" topology, we automate the scaling rule for ViTs by growing widths/depths to different ViT layers. This will generate a series of architectures with different numbers of parameters in a single run.
Efficient ViT Training via Progressive Tokenization: we observe that ViTs can tolerate coarse tokenization in early training stages, and further propose to train ViTs faster and cheaper with a progressive tokenization strategy.

Left: Length Distortion shows a strong correlation with ViT's accuracy. Middle: Auto scaling rule of As-ViT. Right: Progressive re-tokenization for efficient ViT training.

Prerequisites

Ubuntu 18.04
Python 3.6.9
CUDA 11.0 (lower versions may work but were not tested)
NVIDIA GPU + CuDNN v7.6

This repository has been tested on V100 GPU. Configurations may need to be changed on different platforms.

Installation

Clone this repo:

git clone https://github.com/VITA-Grou/AsViT.git
cd AsViT

Install dependencies:

pip install -r requirements.txt

1. Seed As-ViT Topology Search

CUDA_VISIBLE_DEVICES=0 python ./search/reinforce.py --save_dir ./output/REINFORCE-imagenet --data_path /path/to/imagenet

This job will return you a seed topology. For example, our search seed topology is 8,2,3|4,1,2|4,1,4|4,1,6|32, which can be explained as below:

Stage1			Stage2			Stage3			Stage4			Head
Kernel K1	Split S1	Expansion E1	Kernel K2	Split S2	Expansion E2	Kernel K3	Split S3	Expansion E3	Kernel K4	Split S4	Expansion E4	Head
8	2	3	4	1	2	4	1	4	4	1	6	32

2. Scaling

CUDA_VISIBLE_DEVICES=0 python ./search/grow.py --save_dir ./output/GROW-imagenet \
--arch "[arch]" --data_path /path/to/imagenet

Here [arch] is the seed topology (output from step 1 above). This job will return you a series of topologies. For example, our largest topology (As-ViT Large) is 8,2,3,5|4,1,2,2|4,1,4,5|4,1,6,2|32,180, which can be explained as below:

Stage1				Stage2				Stage3				Stage4				Head	Initial Hidden Size
Kernel K1	Split S1	Expansion E1	Layers L1	Kernel K2	Split S2	Expansion E2	Layers L2	Kernel K3	Split S3	Expansion E3	Layers L3	Kernel K4	Split S4	Expansion E4	Layers L4	Head	Initial Hidden Size
8	2	3	5	4	1	2	2	4	1	4	5	4	1	6	2	32	180

3. Evaluation

Tensorflow and Keras code for training on TPU. To be released soon.

Citation

@inproceedings{chen2021asvit,
  title={Auto-scaling Vision Transformers without Training},
  author={Chen, Wuyang and Huang, Wei and Du, Xianzhi and Song, Xiaodan and Wang, Zhangyang and Zhou, Denny},
  booktitle={International Conference on Learning Representations},
  year={2022}
}

You might also like...

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Codes for this paper: [CVPR 2022] The Pr

16 Nov 26, 2022

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022

PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

MoCo v3 for Self-supervised ResNet and ViT Introduction This is a PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT. The original M

887 Jan 8, 2023

A simple approach to emable dense segmentation with ViT.

Vision Transformer Segmentation Network This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of

5 Jan 3, 2023

This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

3 Jun 3, 2022

vit for few-shot classification

Few-Shot ViT Requirements PyTorch (= 1.9) TorchVision timm (latest) einops tqdm numpy scikit-learn scipy argparse tensorboardx Pretrained Checkpoints

26 Nov 30, 2022

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

45 Dec 8, 2022

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

189 Nov 22, 2022

Image-Scaling Attacks and Defenses

Image-Scaling Attacks & Defenses This repository belongs to our publication: Erwin Quiring, David Klein, Daniel Arp, Martin Johns and Konrad Rieck. Ad

163 Nov 21, 2022

As-ViT: Auto-scaling Vision Transformers without Training

Related tags

Overview

As-ViT: Auto-scaling Vision Transformers without Training [PDF]

Overview

Prerequisites

Installation

1. Seed As-ViT Topology Search

2. Scaling

3. Evaluation

Citation

You might also like...

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

A simple approach to emable dense segmentation with ViT.

This project uses ViT to perform image classification tasks on DATA set CIFAR10.

vit for few-shot classification

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Image-Scaling Attacks and Defenses

Owner

VITA

For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

So-ViT: Mind Visual Tokens for Vision Transformer

A PyTorch Implementation of ViT (Vision Transformer)

Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

Implementing Vision Transformer (ViT) in PyTorch

A simple program for training and testing vit

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.