1612 Python Vision-Transformer-papers Libraries

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

3d-pose-baseline This is the code for the paper Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little. A simple yet effective baseline for 3

1.3k Jan 3, 2023

Computer vision - fun segmentation experience using classic and deep tools :)

Computer_Vision_Segmentation_Fun Segmentation of Images and Video. Tools: pytorch Models: Classic model - GrabCut Deep model - Deeplabv3_resnet101 Flo

1 Dec 18, 2021

Implementing Vision Transformer (ViT) in PyTorch

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥 Click on Use this template to initialize new re

2 Dec 24, 2021

Automatically remove the mosaics in images and videos, or add mosaics to them.

1.4k Dec 30, 2022

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

PaperRobot PaperRobot 是一个论文抓取工具，可以快速批量下载大量论文，方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文，目前抓取成功率维持在90%以上。通过配置Config文件，可以抓取任意计算机领域相关会议的论文。 Installation Down

47 Nov 23, 2022

Mixed Transformer UNet for Medical Image Segmentation

MT-UNet Update 2021/11/19 Thank you for your interest in our work. We have uploaded the code of our MTUNet to help peers conduct further research on i

92 Dec 25, 2022

Ensembling Off-the-shelf Models for GAN Training

Vision-aided GAN video (3m) | website | paper Can the collective knowledge from a large bank of pretrained vision models be leveraged to improve GAN t

345 Dec 28, 2022

Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources.

Illumination_Decomposition Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources. This code implements the

7 Nov 15, 2020

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

2 Jul 15, 2022

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

[ICCV2021] TransReID: Transformer-based Object Re-Identification [pdf] The official repository for TransReID: Transformer-based Object Re-Identificati

569 Dec 30, 2022

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

1 Dec 24, 2021

Collection of common code that's shared among different research projects in FAIR computer vision team.

fvcore fvcore is a light-weight core library that provides the most common and essential functionality shared in various computer vision frameworks de

1.5k Jan 7, 2023

A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

University1652-Baseline [Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍] This

335 Jan 6, 2023

Official PyTorch implementation of SegFormer

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Figure 1: Performance of SegFormer-B0 to SegFormer-B5. Project page

1.4k Dec 29, 2022

Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE)

[中文|English] Rotary Transformer Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE). The RoPE is a relative

211 Dec 8, 2021

Model parallel transformers in JAX and Haiku

Table of contents Mesh Transformer JAX Updates Pretrained Models GPT-J-6B Links Acknowledgments License Model Details Zero-Shot Evaluations Architectu

4.9k Jan 4, 2023

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

217 Dec 5, 2022

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

342 Jan 5, 2023

Edge-Augmented Graph Transformer

Edge-augmented Graph Transformer Introduction This is the official implementation of the Edge-augmented Graph Transformer (EGT) as described in https:

21 Dec 14, 2022

FastFormers - highly efficient transformer models for NLU

FastFormers FastFormers provides a set of recipes and methods to achieve highly efficient inference of Transformer models for Natural Language Underst

678 Jan 5, 2023

Transformer training code for sequential tasks

Sequential Transformer This is a code for training Transformers on sequential tasks such as language modeling. Unlike the original Transformer archite

578 Dec 13, 2022

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

77.1k Dec 31, 2022

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

17 Oct 18, 2022

Large-scale pretraining for dialogue

A State-of-the-Art Large-scale Pretrained Response Generation Model (DialoGPT) This repository contains the source code and trained model for a large-

1.8k Jan 7, 2023

The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

125 Dec 25, 2022

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Introduction Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduc

197 Dec 11, 2022

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

Sinkhorn Transformer This is a reproduction of the work outlined in Sparse Sinkhorn Attention, with additional enhancements. It includes a parameteriz

217 Nov 25, 2022

An implementation of the Pay Attention when Required transformer

Pay Attention when Required (PAR) Transformer-XL An implementation of the Pay Attention when Required transformer from the paper: https://arxiv.org/pd

7 Aug 11, 2022

Fully featured implementation of Routing Transformer

Routing Transformer A fully featured implementation of Routing Transformer. The paper proposes using k-means to route similar queries / keys into the

246 Jan 2, 2023

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Linear Multihead Attention (Linformer) PyTorch Implementation of reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer:

58 Dec 23, 2022

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

1.2k Jan 3, 2023

Longformer: The Long-Document Transformer

Longformer Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. ***** New December 1st, 2020: Longforme

1.6k Dec 29, 2022

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

464 Jan 4, 2023

This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer Models by Reordering their Sublayers.

Improving Transformer Models by Reordering their Sublayers This repository contains the code for running the character-level Sandwich Transformers fro

53 Sep 26, 2022

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

BP-Transformer This repo contains the code for our paper BP-Transformer: Modeling Long-Range Context via Binary Partition Zihao Ye, Qipeng Guo, Quan G

119 Nov 14, 2022

Conditional Transformer Language Model for Controllable Generation

CTRL - A Conditional Transformer Language Model for Controllable Generation Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong,

1.7k Dec 28, 2022

Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

711 Jan 8, 2023

Awesome Treasure of Transformers Models Collection

💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️

577 Jan 7, 2023

Pytorch implementation of the paper "Class-Balanced Loss Based on Effective Number of Samples"

Class-balanced-loss-pytorch Pytorch implementation of the paper Class-Balanced Loss Based on Effective Number of Samples presented at CVPR'19. Yin Cui

697 Dec 29, 2022

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning Tensorflow code and models for the paper: Large Scale Fine-Grained Categ

187 Oct 1, 2022

Learn the Deep Learning for Computer Vision in three steps: theory from base to SotA, code in PyTorch, and space-repetition with Anki

DeepCourse: Deep Learning for Computer Vision arthurdouillard.com/deepcourse/ This is a course I'm giving to the French engineering school EPITA each

113 Nov 29, 2022

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena.

17 Nov 7, 2022

[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Transformer for Image Colorization This is an implemention for Yes, "Attention Is All You Need", for Exemplar based Colorization, and the current soft

30 Dec 7, 2022

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2 Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan

1 Dec 16, 2021

Paper and Codes for “Embracing Single Stride 3D Object Detector with Sparse Transformer”

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

59 Dec 14, 2021

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 8, 2022

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

BiCAT This is our TensorFlow implementation for the paper: "BiCAT: Sequential Recommendation with Bidirectional Chronological Augmentation of Transfor

15 Dec 6, 2022

The code of "Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer".

Code data_preprocess.py: preprocess data for Dependent-T5. parameters.py: define parameters of Dependent-T5. train_tools.py: traning and evaluation co

1 Apr 21, 2022

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

47 Dec 26, 2022

Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

Camera Condition Monitoring and Readjustment by means of Noise and Blur This repository contains the source code of the paper: Wischow, M., Gallego, G

7 Dec 22, 2022

Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

points2d_projection_mesh Input 2D points (e.g. facial landmarks) on an image Camera parameters (extrinsic and intrinsic) of the image Aligned 3D mesh

5 Dec 8, 2022

(Preprint) Official PyTorch implementation of "How Do Vision Transformers Work?"

656 Dec 30, 2022

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

1 Dec 13, 2021

SRA's seminar on Introduction to Computer Vision Fundamentals

Introduction to Computer Vision This repository includes basics to : Python Numpy: A python library Git Computer Vision. The aim of this repository is

147 Dec 4, 2022

A Vision Transformer approach that uses concatenated query and reference images to learn the relationship between query and reference images directly.

24 Dec 13, 2022

Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX.

Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX. The repository combines a class agnostic object localizer to first detect the objects in the image, and next a ResNet50 model trained on ImageNet is used to label each box.

24 Nov 14, 2022

Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Overview of The Code BaseColab/MLDL_FPAR.pdf: it contains the full explanation of our work Base Colab: it contains the base colab used to perform all

4 Jul 16, 2022

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation [arxiv] This is the official repository for CDTrans: Cross-domain Transformer for

238 Dec 22, 2022

PyTorch implementation of a collections of scalable Video Transformer Benchmarks.

PyTorch implementation of Video Transformer Benchmarks This repository is mainly built upon Pytorch and Pytorch-Lightning. We wish to maintain a colle

156 Jan 8, 2023

Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

PyTorch implementation of Continuous Augmented Positional Embeddings (CAPE), by Likhomanenko et al. Enhance your Transformer positional embeddings with easy-to-use augmentations!

26 Dec 13, 2022

Reference management solution using Python and Notion.

notion-scholar Reference management solution using Python and Notion. The main idea of this app is to allow to furnish a Notion database using a BibTe

69 Dec 21, 2022

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

47 Dec 26, 2022

Official code of IterMVS

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

30 Dec 12, 2021

[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Are Transformers More Robust Than CNNs? Pytorch implementation for NeurIPS 2021 Paper: Are Transformers More Robust Than CNNs? Our implementation is b

145 Dec 1, 2022

Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

385 Dec 28, 2022

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

1 Dec 14, 2021

Official Implementation for the paper DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification

DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification Official Implementation for the pape

36 Dec 28, 2022

This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

RUAS This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision" A prelimin

2 May 5, 2022

Implementation of the paper Recurrent Glimpse-based Decoder for Detection with Transformer.

REGO-Deformable DETR By Zhe Chen, Jing Zhang, and Dacheng Tao. This repository is the implementation of the paper Recurrent Glimpse-based Decoder for

33 Nov 30, 2022

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

127 Jan 4, 2023

A new video text spotting framework with Transformer

TransVTSpotter: End-to-end Video Text Spotter with Transformer Introduction A Multilingual, Open World Video Text Dataset and End-to-end Video Text Sp

67 Jan 3, 2023

Supplemental learning materials for "Fourier Feature Networks and Neural Volume Rendering"

Fourier Feature Networks and Neural Volume Rendering This repository is a companion to a lecture given at the University of Cambridge Engineering Depa

133 Dec 26, 2022

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

Introduction K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce. Installation PyTor

21 Nov 16, 2022

Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Class-Balanced Loss Based on Effective Number of Samples Tensorflow code for the paper: Class-Balanced Loss Based on Effective Number of Samples Yin C

546 Jan 8, 2023

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

Conceptual 12M We introduce the Conceptual 12M (CC12M), a dataset with ~12 million image-text pairs meant to be used for vision-and-language pre-train

226 Dec 7, 2022

Python library for computer vision labeling tasks. The core functionality is to translate bounding box annotations between different formats-for example, from coco to yolo.

PyLabel pip install pylabel PyLabel is a Python package to help you prepare image datasets for computer vision models including PyTorch and YOLOv5. I

176 Jan 1, 2023

Neural search engine for AI papers

Papers search Neural search engine for ML papers. Demo Usage is simple: input an abstract, get the matching papers. The following demo also showcases

44 Dec 24, 2022

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Trapper (Transformers wRAPPER) Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps h

42 Sep 21, 2022

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Scene Text Recognition Recommendations Everythin about Scene Text Recognition SOTA • Papers • Datasets • Code Contents 1. Papers 2. Datasets 2.1 Synth

Deep Learning and Vision Computing Lab, SCUT

197 Jan 5, 2023

SimMIM: A Simple Framework for Masked Image Modeling

SimMIM By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*. This repo is the official implementation of

181 Dec 10, 2021

A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

115 Dec 9, 2021

A simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

This is a simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

3 Jan 8, 2022

An end to end ASR Transformer model training repo

END TO END ASR TRANSFORMER 本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统 Model Instructions 1.数据准备: 自行下载数据，遵循文件结构如下： ├── data │ ├── train │

10 Jul 19, 2022

A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

515 Dec 26, 2022

Localized representation learning from Vision and Text (LoVT)

Localized Vision-Text Pre-Training Contrastive learning has proven effective for pre- training image models on unlabeled data and achieved great resul

10 Dec 7, 2022

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

CALVIN CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks Oier Mees, Lukas Hermann, Erick Rosete,

107 Dec 26, 2022

Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal"

Patch-wise Adversarial Removal Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

4 Oct 12, 2022

RaceBERT -- A transformer based model to predict race and ethnicty from names

RaceBERT -- A transformer based model to predict race and ethnicty from names Installation pip install racebert Using a virtual environment is highly

3 Nov 2, 2022

Repository aimed at compiling code, papers, demos etc.. related to my PhD on 3D vision and machine learning for fruit detection and shape estimation at the university of Lincoln

PhD_3DPerception Repository aimed at compiling code, papers, demos etc.. related to my PhD on 3D vision and machine learning for fruit detection and s

2 Oct 6, 2022

S-attack library. Official implementation of two papers "Are socially-aware trajectory prediction models really socially-aware?" and "Vehicle trajectory prediction works, but not everywhere".

S-attack library: A library for evaluating trajectory prediction models This library contains two research projects to assess the trajectory predictio

71 Jan 4, 2023

MLPs for Vision and Langauge Modeling (Coming Soon)

MLP Architectures for Vision-and-Language Modeling: An Empirical Study MLP Architectures for Vision-and-Language Modeling: An Empirical Study (Code wi

27 May 9, 2022

Post-Training Quantization for Vision transformers.

PTQ4ViT Post-Training Quantization Framework for Vision Transformers. We use the twin uniform quantization method to reduce the quantization error on

61 Dec 28, 2022

MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

25 Dec 26, 2022

An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches

Transformer-in-Transformer An Implementation of the Transformer in Transformer paper by Han et al. for image classification, attention inside local pa

40 Jul 25, 2022

Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

198 Dec 20, 2022

Papers about explainability of GNNs

236 Jan 4, 2023

The implementation code for "DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction"

DAGAN This is the official implementation code for DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruct

159 Nov 22, 2022

Python Vision-Transformer-papers Resources

Python Vision-Transformer-papers Libraries

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

Computer vision - fun segmentation experience using classic and deep tools :)

Implementing Vision Transformer (ViT) in PyTorch

Automatically remove the mosaics in images and videos, or add mosaics to them.

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

Mixed Transformer UNet for Medical Image Segmentation

Ensembling Off-the-shelf Models for GAN Training

Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources.

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

Collection of common code that's shared among different research projects in FAIR computer vision team.

A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

Official PyTorch implementation of SegFormer

Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE)

Model parallel transformers in JAX and Haiku

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Edge-Augmented Graph Transformer

FastFormers - highly efficient transformer models for NLU

Transformer training code for sequential tasks

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Large-scale pretraining for dialogue

The code for two papers: Feedback Transformer and Expire-Span.

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

An implementation of the Pay Attention when Required transformer

Fully featured implementation of Routing Transformer

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Longformer: The Long-Document Transformer

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer Models by Reordering their Sublayers.

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

Conditional Transformer Language Model for Controllable Generation

Multi Task Vision and Language

Awesome Treasure of Transformers Models Collection

Pytorch implementation of the paper "Class-Balanced Loss Based on Effective Number of Samples"

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018

Learn the Deep Learning for Computer Vision in three steps: theory from base to SotA, code in PyTorch, and space-repetition with Anki

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2

Paper and Codes for “Embracing Single Stride 3D Object Detector with Sparse Transformer”

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

The code of "Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer".

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

(Preprint) Official PyTorch implementation of "How Do Vision Transformers Work?"

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

SRA's seminar on Introduction to Computer Vision Fundamentals

A Vision Transformer approach that uses concatenated query and reference images to learn the relationship between query and reference images directly.

Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX.

Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

PyTorch implementation of a collections of scalable Video Transformer Benchmarks.

Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

Reference management solution using Python and Notion.

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

Official code of IterMVS

[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Embracing Single Stride 3D Object Detector with Sparse Transformer

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Official Implementation for the paper DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification

This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

Implementation of the paper Recurrent Glimpse-based Decoder for Detection with Transformer.

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

A new video text spotting framework with Transformer

Supplemental learning materials for "Fourier Feature Networks and Neural Volume Rendering"

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

Python library for computer vision labeling tasks. The core functionality is to translate bounding box annotations between different formats-for example, from coco to yolo.

Neural search engine for AI papers

State-of-the-art NLP through transformer models in a modular design and consistent APIs.