This repository contains PyTorch code for Robust Vision Transformers.

Last update: Dec 7, 2022

Related tags

Deep Learning Robust-Vision-Transformer

Overview

RVT: Robust Vision Transformers

This repository contains PyTorch code for Robust Vision Transformers.

For details see Rethinking the Design Principles of Robust Vision Transformer by Xiaofeng Mao, Gege Qi, Yuefeng Chen, Yuan He and Hui Xue.

Usage

First, clone the repository locally:

git clone https://github.com/vtddggg/Robust-Vision-Transformer.git

Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

We use 4 nodes with 8 gpus to train RVT-Ti, RVT-S and RVT-B:

Training RVT-Ti

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=4 main.py --model rvt_tiny --data-path /path/to/imagenet --output_dir output --dist-eval

Training RVT-S

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=4 main.py --model rvt_small --data-path /path/to/imagenet --output_dir output --dist-eval

Training RVT-B

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=4 main.py --model rvt_base --data-path /path/to/imagenet --output_dir output --batch-size 32 --dist-eval

If you want to train RVT-Ti*, RVT-S* or RVT-B*, simply add --use_mask and --use_patch_aug to enable positon-aware attention scaling and patch-wise augmentation.

You might also like...

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

440 Jan 2, 2023

Explainability for Vision Transformers (in PyTorch)

Explainability for Vision Transformers (in PyTorch) This repository implements methods for explainability in Vision Transformers

442 Jan 4, 2023

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers Usage: img = torch

193 Jan 3, 2023

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

73 Jan 1, 2023

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

127 Dec 23, 2022

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

6.6k Jan 6, 2023

Comments

finue-tuning problem

Hi~ @vtddggg

I'm trying, but if I set it to https://github.com/vtddggg/Robust-Vision-Transformer/blob/main/robust_models.py#L317, then In https://github.com/vtddggg/Robust-Vision-Transformer/blob/main/robust_models.py#L283 > the length of self.pools is zero.

Why pit model name? > weights/pit_s_809.pth? rvt_small.pth ?? Is this correct?? Looks like a problem...

What are the correct settings to run this exact model?

thanks

opened by peternara 3
Batch size for training RVT-B

I noticed that the training script provided for training RVT-B specifically sets the batch-size to be 32. Is there any reason why 32 is a good choice? Can I set a larger batch size, e.g., 200? Thanks!

opened by bfshi 1
where the code of patch augmentation?

i'm new to Vision Transformer, I can't find your code about patch augmentation, could you give me tips? And I want to know the Wp how to learn position relationship,Like B in ViT?

opened by 1104662797 1

This repository contains PyTorch code for Robust Vision Transformers.

Related tags

Overview

RVT: Robust Vision Transformers

Usage

Training RVT-Ti

Training RVT-S

Training RVT-B

You might also like...

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Explainability for Vision Transformers (in PyTorch)

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

A PyTorch library for Vision Transformers

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Comments

finue-tuning problem

Batch size for training RVT-B

where the code of patch augmentation?

Owner

This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

This repository contains the source code of our work on designing efficient CNNs for computer vision

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Official repository for "Intriguing Properties of Vision Transformers" (2021)

Official repository for "On Improving Adversarial Transferability of Vision Transformers" (2021)

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.