ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
Introduction | Updates | Usage | Results&Pretrained Models | Statement |
Introduction
This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers.
Updates
07/12/2021 The code is released!
19/10/2021 The paper is accepted by Neurips'2021! The code will be released soon!
06/08/2021 The paper is post on arxiv! The code will be made public available once cleaned up.
Usage
Install
- Clone this repo:
git clone https://github.com/Annbless/ViTAE.git
cd ViTAE
- Create a conda virtual environment and activate it:
conda create -n vitae python=3.7 -y
conda activate vitae
conda install pytorch==1.8.1 torchvision==0.9.1 cudatoolkit=10.2 -c pytorch -c conda-forge
- Install
timm==0.3.4
:
pip install timm==0.3.4
- Install
Apex
:
git clone https://github.com/NVIDIA/apex
cd apex
git reset --hard a651e2c24ecf97cbf367fd3f330df36760e1c597
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
- Install other requirements:
pip install pyyaml ipdb
Data Prepare
We use standard ImageNet dataset, you can download it from http://image-net.org/. The file structure should look like:
$ tree data
imagenet
├── train
│ ├── class1
│ │ ├── img1.jpeg
│ │ ├── img2.jpeg
│ │ └── ...
│ ├── class2
│ │ ├── img3.jpeg
│ │ └── ...
│ └── ...
└── val
├── class1
│ ├── img4.jpeg
│ ├── img5.jpeg
│ └── ...
├── class2
│ ├── img6.jpeg
│ └── ...
└── ...
Evaluation
Take ViTAE_basic_7 as an example, to evaluate the pretrained ViTAE model on ImageNet val, run
python validate.py [ImageNetPath] --model ViTAE_basic_7 --eval_checkpoint [Checkpoint Path]
Training
Take ViTAE_basic_7 as an example, to train the ViTAE model on ImageNet with 4 GPU and 512 batch size, run
python -m torch.distributed.launch --nproc_per_node=4 main.py [ImageNetPath] --model ViTAE_basic_7 -b 128 --lr 1e-3 --weight-decay .03 --img-size 224 --amp
The trained model file will be saved under the output
folder
Results
Main Results on ImageNet-1K with pretrained models
name | resolution | acc@1 | acc@5 | acc@RealTop-1 | Pretrained |
---|---|---|---|---|---|
ViTAE-T | 224x224 | 75.3 | 92.7 | 82.9 | Coming Soon |
ViTAE-6M | 224x224 | 77.9 | 94.1 | 84.9 | Coming Soon |
ViTAE-13M | 224x224 | 81.0 | 95.4 | 86.9 | Coming Soon |
ViTAE-S | 224x224 | 82.0 | 95.9 | 87.0 | Coming Soon |
Statement
This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .