1526 Python Vision-transformer Libraries

A PaddlePaddle version image model zoo.

Paddle-Image-Models English | 简体中文 A PaddlePaddle version image model zoo. Install Package Install by pip： $ pip install ppim Install by wheel package

131 Dec 7, 2022

An All-MLP solution for Vision, from Google AI

MLP Mixer - Pytorch An All-MLP solution for Vision, from Google AI, in Pytorch. No convolutions nor attention needed! Yannic Kilcher video Install $ p

784 Jan 6, 2023

Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

MLP-Mixer: An all-MLP Architecture for Vision This repo contains PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision. Usage : impo

175 Dec 23, 2022

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

2.8k Jan 8, 2023

The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

125 Dec 25, 2022

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

218 Jan 5, 2023

PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)

mlp-mixer-pytorch PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021) Usage import torch from mlp_mixer

27 Jul 9, 2022

A DeepStack custom model for detecting common objects in dark/night images and videos.

DeepStack_ExDark This repository provides a custom DeepStack model that has been trained and can be used for creating a new object detection API for d

98 Dec 24, 2022

Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py

178 Dec 2, 2022

Segmenter - Transformer for Semantic Segmentation

592 Dec 27, 2022

This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe libraries.

CVZone This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe librar

648 Dec 30, 2022

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

922 Jan 1, 2023

Vision Transformer for 3D medical image registration (Pytorch).

ViT-V-Net: Vision Transformer for Volumetric Medical Image Registration keywords: vision transformer, convolutional neural networks, image registratio

192 Dec 20, 2022

LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference This repository contains PyTorch evaluation code, training code and pretrained

504 Jan 2, 2023

QueryInst: Parallelly Supervised Mask Query for Instance Segmentation

QueryInst is a simple and effective query based instance segmentation method driven by parallel supervision on dynamic mask heads, which outperforms previous arts in terms of both accuracy and speed.

386 Jan 8, 2023

Exploring whether attention is necessary for vision transformers

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet Paper/Report TL;DR We replace the attention layer in a v

461 Jan 7, 2023

Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

Deep learning based state estimation: incorporating Transformer and LSTM to Kalman Filter with EM algorithm Overview Kalman Filter requires the true p

57 Dec 27, 2022

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

922 Jan 1, 2023

📚 A collection of Jupyter notebooks for learning and experimenting with OpenVINO 👓

A collection of ready-to-run Python* notebooks for learning and experimenting with OpenVINO developer tools. The notebooks are meant to provide an introduction to OpenVINO basics and teach developers how to leverage our APIs for optimized deep learning inference in their applications.

840 Jan 3, 2023

Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

1.8k Dec 30, 2022

Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

5.3k Jan 7, 2023

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

5.1k Dec 26, 2022

An easier way to build neural search on the cloud

Jina is geared towards building search systems for any kind of data, including text, images, audio, video and many more. With the modular design & multi-layer abstraction, you can leverage the efficient patterns to build the system by parts, or chaining them into a Flow for an end-to-end experience.

17k Jan 1, 2023

Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

360 Jan 6, 2023

Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

DTI-Sprites Pytorch implementation of "Unsupervised Layered Image Decomposition into Object Prototypes" paper Check out our paper and webpage for deta

40 Dec 22, 2022

Transparent Transformer Segmentation

Transparent Transformer Segmentation Introduction This repository contains the data and code for IJCAI 2021 paper Segmenting transparent object in the

140 Jan 2, 2023

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

DeepNER An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models. This repository contains complex Deep

9 May 30, 2022

Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

358 Dec 24, 2022

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transformer, etc. It is therefore best useful for Machine Translation, Text Generation, Dialog， Language Modelling, and other related tasks using these models.

2.5k Jan 3, 2023

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Denoised-Smoothing-TF Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow. Denoised Smoothing is

19 Dec 11, 2022

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

209 Dec 30, 2022

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Self-Supervised Vision Transformers with DINO PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supe

4.2k Jan 3, 2023

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

106 Jan 3, 2023

[CVPR 2021] MiVOS - Scribble to Mask module

MiVOS (CVPR 2021) - Scribble To Mask Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] A simplistic network that turns scri

65 Dec 22, 2022

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

107 Dec 2, 2022

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Twins: Revisiting the Design of Spatial Attention in Vision Transformers Very recently, a variety of vision transformer architectures for dense predic

482 Dec 18, 2022

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

15k Jan 2, 2023

This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision | Project Page | Paper | This repository contains a pytorch implementation of "St

87 Dec 9, 2022

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

34 Nov 15, 2022

So-ViT: Mind Visual Tokens for Vision Transformer

So-ViT: Mind Visual Tokens for Vision Transformer Introduction This repository contains the source code under PyTorch framework and models trai

44 Nov 24, 2022

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

119 Dec 4, 2022

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

370 Dec 27, 2022

A list of multi-task learning papers and projects.

This page contains a list of papers on multi-task learning for computer vision. Please create a pull request if you wish to add anything. If you are interested, consider reading our recent survey paper.

297 Dec 17, 2022

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

44 Dec 14, 2022

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

106 Jan 6, 2023

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

ImageNet-21K Pretraining for the Masses Paper | Pretrained models Official PyTorch Implementation Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelni

574 Jan 2, 2023

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Token Labeling: Training an 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet (arxiv) This is a Pytorch implementation of our te

383 Dec 27, 2022

FID calculation with proper image resizing and quantization steps

clean-fid: Fixing Inconsistencies in FID Project | Paper The FID calculation involves many steps that can produce inconsistencies in the final metric.

606 Jan 6, 2023

SiT: Self-supervised vIsion Transformer

This repository contains the official PyTorch self-supervised pretraining, finetuning, and evaluation codes for SiT (Self-supervised image Transformer).

275 Dec 28, 2022

Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

Skeleton Merger Skeleton Merger, an Unsupervised Aligned Keypoint Detector. The paper is available at https://arxiv.org/abs/2103.10814. A map of the r

48 Nov 14, 2022

This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

BMW-IntelOpenVINO-Segmentation-Inference-API This is a repository for a semantic segmentation inference API using the OpenVINO toolkit. It's supported

34 Nov 24, 2022

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

695 Jan 5, 2023

CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

83 Jan 4, 2023

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

1000 Apr 19, 2021

Implementation of Convolutional enhanced image Transformer

CeiT : Convolutional enhanced image Transformer This is an unofficial PyTorch implementation of Incorporating Convolution Designs into Visual Transfor

82 Dec 13, 2022

OpenMMLab Text Detection, Recognition and Understanding Toolbox

Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi

3k Jan 7, 2023

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

BMW-Anonymization-Api Data privacy and individuals’ anonymity are and always have been a major concern for data-driven companies. Therefore, we design

148 Dec 21, 2022

This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

BMW Semantic Segmentation GPU/CPU Inference API This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit. The train

56 Nov 24, 2022

Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Perceiver This Python package implements Perceiver: General Perception with Iterative Attention by Andrew Jaegle in TensorFlow. This model builds on t

84 Oct 15, 2022

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

PLOP: Learning without Forgetting for Continual Semantic Segmentation This repository contains all of our code. It is a modified version of Cermelli e

116 Dec 14, 2022

Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

7.6k Jan 4, 2023

A medical imaging framework for Pytorch

Welcome to MedicalTorch MedicalTorch is an open-source framework for PyTorch, implementing an extensive set of loaders, pre-processors and datasets fo

799 Jan 3, 2023

A PyTorch-Based Framework for Deep Learning in Computer Vision

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{you2019torchcv, author = {Ansheng You and Xiangtai Li and Zhen Zhu a

2.2k Jan 9, 2023

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

349 Dec 8, 2022

I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

139 Dec 27, 2022

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

255 Dec 29, 2022

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Cross Transformers - Pytorch (wip) Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch Install $ pip install cross-t

40 Dec 22, 2022

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

1.4k Dec 30, 2022

A fast and easy implementation of Transformer with PyTorch.

FasySeq FasySeq is a shorthand as a Fast and easy sequential modeling toolkit. It aims to provide a seq2seq model to researchers and developers, which

7 Jul 18, 2022

PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Involution: Inverting the Inherence of Convolution for Visual Recognition Unofficial PyTorch reimplementation of the paper Involution: Inverting the I

100 Dec 1, 2022

Read images to numpy arrays

mahotas-imread: Read Image Files IO with images and numpy arrays. Mahotas-imread is a simple module with a small number of functions: imread Reads an

67 Jan 7, 2023

Computer Vision in Python

Mahotas Python Computer Vision Library Mahotas is a library of fast computer vision algorithms (all implemented in C++ for speed) operating over numpy

792 Dec 20, 2022

《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation》(AAAI 2021) GitHub:

LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation

76 Dec 5, 2022

Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

4.7k Jan 1, 2023

Implementation of ViViT: A Video Vision Transformer

ViViT: A Video Vision Transformer Unofficial implementation of ViViT: A Video Vision Transformer. Notes: This is in WIP. Model 2 is implemented, Model

297 Jan 6, 2023

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

79 Dec 19, 2022

Layout Parser is a deep learning based tool for document image layout analysis tasks.

A Python Library for Document Layout Understanding

3.4k Dec 30, 2022

《DeepViT: Towards Deeper Vision Transformer》(2021)

DeepViT This repo is the official implementation of "DeepViT: Towards Deeper Vision Transformer". The repo is based on the timm library (https://githu

109 Dec 2, 2022

Code for the Convolutional Vision Transformer (ConViT)

ConViT : Vision Transformers with Convolutional Inductive Biases This repository contains PyTorch code for ConViT. It builds on code from the Data-Eff

418 Jan 6, 2023

Implementation of various Vision Transformers I found interesting

78 Dec 6, 2022

Transformer related optimization, including BERT, GPT

This repository provides a script and recipe to run the highly optimized transformer-based encoder and decoder component, and it is tested and maintained by NVIDIA.

1.7k Jan 4, 2023

Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video Project Page | Paper NeuralRecon: Real-Time Coherent 3D Reconstruction from Mon

1.4k Dec 30, 2022

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

1.4k Jan 4, 2023

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

36 Dec 5, 2022

Code for the paper "A Study of Face Obfuscation in ImageNet"

A Study of Face Obfuscation in ImageNet Code for the paper: A Study of Face Obfuscation in ImageNet Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng,

35 Oct 4, 2022

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval This repository contains source code and pre-trained/fine-tun

65 Dec 26, 2022

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

489 Jan 7, 2023

Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Hiring research interns for visual transformer

484 Dec 29, 2022

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

307 Jan 3, 2023

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers Usage: img = torch

193 Jan 3, 2023

《Rethinking Sptil Dimensions of Vision Trnsformers》(2021)

Rethinking Spatial Dimensions of Vision Transformers Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, Seong Joon Oh | Paper NAVER

224 Dec 27, 2022

Implementation of the Swin Transformer in PyTorch.

Swin Transformer - PyTorch Implementation of the Swin Transformer architecture. This paper presents a new vision Transformer, called Swin Transformer,

597 Jan 3, 2023

TransReID: Transformer-based Object Re-Identification

TransReID: Transformer-based Object Re-Identification [arxiv] The official repository for TransReID: Transformer-based Object Re-Identification achiev

569 Dec 30, 2022

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

307 Jan 3, 2023

Python Vision-transformer Resources

Python vision-transformer Libraries

A PaddlePaddle version image model zoo.

An All-MLP solution for Vision, from Google AI

Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

The code for two papers: Feedback Transformer and Expire-Span.

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)

A DeepStack custom model for detecting common objects in dark/night images and videos.

Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

Segmenter - Transformer for Semantic Segmentation

This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe libraries.

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Vision Transformer for 3D medical image registration (Pytorch).

LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

QueryInst: Parallelly Supervised Mask Query for Instance Segmentation

Exploring whether attention is necessary for vision transformers

Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

📚 A collection of Jupyter notebooks for learning and experimenting with OpenVINO 👓

Reformer, the efficient Transformer, in Pytorch

Google AI 2018 BERT pytorch implementation

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

An easier way to build neural search on the cloud

Code release for "COTR: Correspondence Transformer for Matching Across Images"

Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

Transparent Transformer Segmentation

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

Code release for "COTR: Correspondence Transformer for Matching Across Images"

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

[CVPR 2021] MiVOS - Scribble to Mask module

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

So-ViT: Mind Visual Tokens for Vision Transformer

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

A list of multi-task learning papers and projects.

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

FID calculation with proper image resizing and quantization steps

SiT: Self-supervised vIsion Transformer

Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Implementation of Convolutional enhanced image Transformer

OpenMMLab Text Detection, Recognition and Understanding Toolbox

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

Open Source Differentiable Computer Vision Library for PyTorch

A medical imaging framework for Pytorch

A PyTorch-Based Framework for Deep Learning in Computer Vision

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

I-BERT: Integer-only BERT Quantization

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

A fast and easy implementation of Transformer with PyTorch.

PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Read images to numpy arrays

Computer Vision in Python

《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation》(AAAI 2021) GitHub:

Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

Implementation of ViViT: A Video Vision Transformer

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

Layout Parser is a deep learning based tool for document image layout analysis tasks.

《DeepViT: Towards Deeper Vision Transformer》(2021)