Vision transformers (ViTs) have found only limited practical use in processing images

Cloudwalker

Last update: Sep 10, 2022

Related tags

Deep Learning CXV

Overview

CXV

Convolutional Xformers for Vision

Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-of-the-art accuracy on certain benchmarks. The reason for their limited use include their need for larger training datasets and more computational resources compared to convolutional neural networks (CNNs), owing to the quadratic complexity of their self-attention mechanism. We propose a linear attention-convolution hybrid architecture -- Convolutional X-formers for Vision (CXV) -- to overcome these limitations. We replace the quadratic attention with linear attention mechanisms, such as Performer, Nyströmformer, and Linear Transformer, to reduce its GPU usage. Inductive prior for image data is provided by convolutional sub-layers, thereby eliminating the need for class token and positional embeddings used by the ViTs. CXV outperforms other architectures, token mixers (eg ConvMixer, FNet and MLP Mixer), transformer models (eg ViT, CCT, CvT and hybrid Xformers), and ResNets for image classification in scenarios with limited data and GPU resources.

Models:

CNV - Convolutional Nyströmformer for Vision
CPV - Convolutional Performer for Vision
CLTV - Convolutional Linear Transformer for Vision

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

43 Nov 7, 2022

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data (NeurIPS 2021) This repository provides the official PyTorch implementation

155 Nov 30, 2021

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

CGPN This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels]. Req

10 Sep 12, 2022

Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Collapse by Conditioning: Training Class-conditional GANs with Limited Data Moha

33 Dec 6, 2022

Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

PyOpenVINO - An Experimental Python Implementation of OpenVINO Inference Engine (minimum-set) Description The PyOpenVINO is a spin-off product from my

7 Oct 31, 2022

This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

Comments

Syntax and logical errors

First of all thanks for shring your great works. However, I found some syntax and logical errors for all three transformers. One of them related to CLTV, for cnn, transformer in self.layers: x = cnn(x) x = norm(x) x = transformer(x)

### It should be:

for cnn, norm, transformer in self.layers: x = cnn(x) x = norm(x) x = transformer(x)

opened by sazani 0

Vision transformers (ViTs) have found only limited practical use in processing images

Related tags

Overview

CXV

Convolutional Xformers for Vision

You might also like...

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

This repository holds code and data for our PETS'22 article 'From "Onion Not Found" to Guard Discovery'.

Generate vibrant and detailed images using only text.

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

Comments

Syntax and logical errors

Owner

Cloudwalker

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Have you ever wondered how cool it would be to have your own A.I

Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

Run Effective Large Batch Contrastive Learning on Limited Memory GPU