Reading list in Transformer
We are a team from KAUST Vision-CAIR group and focus on the Multi-modal representation learning.
This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.
Recent News
CVPR multi-modal papers are collected in here
The code of VisualGPT is open sourced. They can be found here
The code and paper of LeViT is open sourced. They can be found here
The paper MLP-Mixer: An all-MLP Architecture for Vision is availble here
The code and paper of MDTER is open sourced. They can be found here
The code and papper of RelTransformer is open sourced. They can be found here
The code and paper of Twins-SVT is open sourced. They can be found here
Vision Transformer for deepfake detection. They can be found here
The code of VideoGPT is open sourced. They can be found here
The code of CoaT is open sourced. They can be found here
The code of Kaleido-BERT is open sourced. They can be found here
The code of TimeSformer is open sourced. They can be found here
The code of SwinTransformer is open sourced. They can be found here
Topics (paper and code)
Review Paper in multi-modal
Tutorials and workshop
-
Cross-View and Cross-Modal Visual Geo-Localization: IEEE CVPR 2021 Tutorial
-
From VQA to VLN: Recent Advances in Vision-and-Language Research: IEEE CVPR 2021 Tutorial
Datasets
Blogs
Tools
-
PyTorchVideo a deep learning library for video understanding research
-
horovod a tool for multi-gpu parallel processing
-
accelerate an easy API for mixed precision and any kind of distributed computing