Transformer-in-Vision

A paper list of some recent Transformer-based CV works. If you find some ignored papers, please open issues or pull requests.

**Last updated: 2022/01/20

Update log

2021/April - update all of recent papers of Transformer-in-Vision.
2021/May - update all of recent papers of Transformer-in-Vision.
2021/June - update all of recent papers of Transformer-in-Vision.
2021/July - update all of recent papers of Transformer-in-Vision.
2021/August - update all of recent papers of Transformer-in-Vision.
2021/September - update all of recent papers of Transformer-in-Vision.
2021/October - update all of recent papers of Transformer-in-Vision.
2021/November - update all of recent papers of Transformer-in-Vision.
2021/December - update all of recent papers of Transformer-in-Vision.

Survey:

(arXiv 2022.01) Video Transformers: A Survey. [Paper]
(arXiv 2021.11) A Survey of Visual Transformers. [Paper]
(arXiv 2021.09) Survey: Transformer based Video-Language Pre-training. [Paper]
(arXiv 2021.03) Multi-modal Motion Prediction with Stacked Transformers. [Paper], [Code]
(arXiv 2021.03) Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision. [Paper]
(arXiv 2020.09) Efficient Transformers: A Survey. [Paper]
(arXiv 2020.01) Transformers in Vision: A Survey. [Paper]

Contact & Feedback

If you have any suggestions about this project, feel free to contact me.

[e-mail: yzhangcst[at]gmail.com]

Build fully-functioning computer vision models with PyTorch

Detecto is a Python package that allows you to build fully-functioning computer vision and object detection models with just 5 lines of code. Inferenc

576 Dec 29, 2022

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

962 Dec 23, 2022

Datasets, Transforms and Models specific to Computer Vision

torchvision The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Installat

13.1k Jan 2, 2023

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

53 Nov 9, 2022

A PyTorch-Based Framework for Deep Learning in Computer Vision

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{you2019torchcv, author = {Ansheng You and Xiangtai Li and Zhen Zhu a

2.2k Jan 9, 2023

Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

7.6k Jan 4, 2023

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

107 Dec 2, 2022

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

26 Oct 14, 2022

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

Please add RelViT

Hi,

Thanks for making this learning list and indeed I learned a lot. Just want to share one of our recent works on transformers and I hope it could help the community through your platform:

RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning (ICLR 2022) arxiv | code In this work, we propose a better training scheme for vision transformers and testify it on VQA, HOI, and visual reasoning tasks. We further introduce concept-guided contrastive learning that helps these models master visual reasoning without massive pertaining or extra training data.

opened by jeasinema 2
Please add our paper to your list

Our paper titled "Bilateral-ViT for Robust Fovea Localization" has been accepted to ISBI 2022 conference and a preprint is available at this link: https://arxiv.org/abs/2110.09860

I would appreciate it a lot if you can add our paper to your list. Thanks!

opened by jacobdang 1
Code is available

Hi @Yangzhangcst Thank you for this repo. The code for paper "STAR: Sparse Transformer-based Action Recognition" is available at the link: https://github.com/imj2185/STAR

opened by shi27feng 1

Transformer in Computer Vision

Related tags

Overview

Transformer-in-Vision

Update log

Survey:

Recent Papers

Action

Active Learning

Anomaly Detection

Assessment

Captioning

Classification (Backbone)

Completion

Compression

Crowd Counting

Depth

Deepfake Detection

Dehazing

Detection

Face

Few-shot Learning

Fusion

GAN

Gaze

HOI

Hyperspectral

Incremental Learning

In-painting

Instance Segmentation

Layout

Matching

Medical

Motion

Multi-task/modal

Multi-view Stereo

NAS

Navigation

OCR

Octree

Panoptic Segmentation

Point Cloud

Pose

Planning

Pruning & Quantization

Recognition

Reconstruction

Re-identification

Restoration

Retrieval

Salient Object Detection

Scene

Self-supervised Learning

Semantic Segmentation

Shape

Super-Resolution

Synthesis

Tracking

Traffic

Texture

Transfer learning

Video

Visual Grounding

Visual Reasoning

Visual Relationship Detection

Voxel

Weakly Supervised Learning

Zero-Shot Learning

Others