PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

Rishikesh (ऋषिकेश)

Last update: Jan 3, 2023

Related tags

Deep Learning computer-vision transformers cnn pytorch classification image-classification convolution

Overview

CvT: Introducing Convolutions to Vision Transformers

Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers

Usage:

img = torch.ones([1, 3, 224, 224])

model = CvT(224, 3, 1000)

parameters = filter(lambda p: p.requires_grad, model.parameters())
parameters = sum([np.prod(p.size()) for p in parameters]) / 1_000_000
print('Trainable Parameters: %.3fM' % parameters)

out = model(img)

print("Shape of out :", out.shape)  # [B, num_classes]

Citation:

@misc{wu2021cvt,
      title={CvT: Introducing Convolutions to Vision Transformers}, 
      author={Haiping Wu and Bin Xiao and Noel Codella and Mengchen Liu and Xiyang Dai and Lu Yuan and Lei Zhang},
      year={2021},
      eprint={2103.15808},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement:

Base ViT code is borrowed from @lucidrains repo : https://github.com/lucidrains/vit-pytorch

You might also like...

Simple Tensorflow implementation of "Adaptive Convolutions for Structure-Aware Style Transfer" (CVPR 2021)

AdaConv — Simple TensorFlow Implementation [Paper] : Adaptive Convolutions for Structure-Aware Style Transfer (CVPR 2021) Note This repository does no

26 Nov 18, 2022

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

73 Jan 1, 2023

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

127 Dec 23, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Implementations of orthogonal and semi-orthogonal convolutions in the Fourier domain with applications to adversarial robustness

Orthogonalizing Convolutional Layers with the Cayley Transform This repository contains implementations and source code to reproduce experiments for t

36 Dec 30, 2022

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

The goal is to classify different birds species based on their songs/calls. Spectrograms have been extracted from the audio samples and used as features for classification.

9 Dec 27, 2022

Implements an infinite sum of poisson-weighted convolutions

An infinite sum of Poisson-weighted convolutions Kyle Cranmer, Aug 2018 If viewing on GitHub, this looks better with nbviewer: click here Consider a v

26 Dec 7, 2022

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition (PyTorch) Paper: https://arxiv.org/abs/2105.01883 Citation: @

260 Jan 3, 2023

Kernel Point Convolutions

Created by Hugues THOMAS Introduction Update 27/04/2020: New PyTorch implementation available. With SemanticKitti, and Windows supported. This reposit

584 Jan 7, 2023

Comments

Implementation of convolutional projection

hi, @rishikksh20

Thank you for your quick implementation!

I notice that your depth-wise separable convolution is implemented as depth-wise conv --> point-wise conv. In the paper, CvT's depth-wise seprable convolution is implemented as depth-wise conv --> bn --> point-wise conv.

opened by leoxiaobin 7
Implementation of attention module

Hi nice implementation for CvT

Here I have a question is in paper, they use squeezed convolution for computing attention

here the stride of q k v is different。

But in your code, it seems like each attention module use stride as 1

opened by MARD1NO 1
Convolution projection

Hi,I have a question as followed:In the paper,the stride of depthwise convolution in convolution projection is 2,but in your repo stride is 1,I want to know stride is 2 or 1? thanks!

opened by qdd1234 0

Owner

Rishikesh (ऋषिकेश)

GitHub

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

175 Jan 8, 2023

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

BERT Got a Date: Introducing Transformers to Temporal Tagging Satya Almasian*, Dennis Aumiller*, and Michael Gertz Heidelberg University Contact us vi

54 Dec 4, 2022

Introducing neural networks to predict stock prices

IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o

637 Jan 4, 2023

DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.

DeepProbLog DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predic

KU Leuven Machine Learning Research Group

94 Dec 18, 2022

Introducing neural networks to predict stock prices

IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o

637 Jan 4, 2023

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

342 Dec 16, 2022

[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

50 Dec 29, 2022

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

Related tags

Overview

CvT: Introducing Convolutions to Vision Transformers

Usage:

Citation:

Acknowledgement:

You might also like...

Simple Tensorflow implementation of "Adaptive Convolutions for Structure-Aware Style Transfer" (CVPR 2021)

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementations of orthogonal and semi-orthogonal convolutions in the Fourier domain with applications to adversarial robustness

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

Implements an infinite sum of poisson-weighted convolutions

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Kernel Point Convolutions

Comments

Implementation of convolutional projection

Implementation of attention module

Convolution projection

Owner

Rishikesh (ऋषिकेश)

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

Introducing neural networks to predict stock prices

DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.

Introducing neural networks to predict stock prices

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

Unofficial pytorch implementation of 'Image Inpainting for Irregular Holes Using Partial Convolutions'

TART - A PyTorch implementation for Transition Matrix Representation of Trees with Transposed Convolutions

[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"