The official pytorch implementation of ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Last update: Nov 27, 2022

Related tags

Data Analysis ViTAE

Overview

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers.

Updates

07/12/2021 The code is released!

19/10/2021 The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021 The paper is post on arxiv! The code will be made public available once cleaned up.

Usage

Install

Clone this repo:

git clone https://github.com/Annbless/ViTAE.git
cd ViTAE

Create a conda virtual environment and activate it:

conda create -n vitae python=3.7 -y
conda activate vitae

conda install pytorch==1.8.1 torchvision==0.9.1 cudatoolkit=10.2 -c pytorch -c conda-forge

Install timm==0.3.4:

pip install timm==0.3.4

Install Apex:

git clone https://github.com/NVIDIA/apex
cd apex
git reset --hard a651e2c24ecf97cbf367fd3f330df36760e1c597
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Install other requirements:

pip install pyyaml ipdb

Data Prepare

We use standard ImageNet dataset, you can download it from http://image-net.org/. The file structure should look like:

$ tree data
imagenet
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img4.jpeg
    │   ├── img5.jpeg
    │   └── ...
    ├── class2
    │   ├── img6.jpeg
    │   └── ...
    └── ...

Evaluation

Take ViTAE_basic_7 as an example, to evaluate the pretrained ViTAE model on ImageNet val, run

python validate.py [ImageNetPath] --model ViTAE_basic_7 --eval_checkpoint [Checkpoint Path]

Training

Take ViTAE_basic_7 as an example, to train the ViTAE model on ImageNet with 4 GPU and 512 batch size, run

python -m torch.distributed.launch --nproc_per_node=4 main.py [ImageNetPath] --model ViTAE_basic_7 -b 128 --lr 1e-3 --weight-decay .03 --img-size 224 --amp

The trained model file will be saved under the output folder

Results

Main Results on ImageNet-1K with pretrained models

name	resolution	acc@1	acc@5	acc@RealTop-1	Pretrained
ViTAE-T	224x224	75.3	92.7	82.9	Coming Soon
ViTAE-6M	224x224	77.9	94.1	84.9	Coming Soon
ViTAE-13M	224x224	81.0	95.4	86.9	Coming Soon
ViTAE-S	224x224	82.0	95.9	87.0	Coming Soon

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

1.4k Dec 30, 2022

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

17 Nov 3, 2022

《Lerning n Intrinsic Grment Spce for Interctive Authoring of Grment Animtion》

Learning an Intrinsic Garment Space for Interactive Authoring of Garment Animation Overview This is the demo code for training a motion invariant enco

213 Dec 14, 2022

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

90 Dec 27, 2022

Code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Comments

TensorRT for model acceleration

Hi, ViTAE is really a great work. But there are some mistake when i want to use tensorrt acceleration for raw ViTAE. Can you share the experimental details of acceleration ? Thanks.

opened by NightQing 1

The official pytorch implementation of ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Related tags

Overview

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Introduction

Updates

Usage

Install

Data Prepare

Evaluation

Training

Results

Main Results on ImageNet-1K with pretrained models

Statement

You might also like...

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

《Lerning n Intrinsic Grment Spce for Interctive Authoring of Grment Animtion》

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Neural Logic Inductive Learning

"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

Intrinsic Image Harmonization

Study of human inductive biases in CNNs and Transformers.

Code for "The Intrinsic Dimension of Images and Its Impact on Learning" - ICLR 2021 Spotlight

Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Camera Intrinsic Calibration and Hand-Eye Calibration in Pybullet

Python module for performing linear regression for data with measurement errors and intrinsic scatter

Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

⚡KiCad library containing footprints and symbols for inductive analog keyboard switches

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Advanced python code - For students in my advanced python class

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Comments

TensorRT for model acceleration

Owner

A highly efficient and modular implementation of Gaussian Processes in PyTorch

PyTorch implementation for NCL (Neighborhood-enrighed Contrastive Learning)

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Deep universal probabilistic programming with Python and PyTorch

Stochastic Gradient Trees implementation in Python

Python implementation of Principal Component Analysis

Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.