PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Facebook Research

Last update: Jan 2, 2023

Related tags

Deep Learning xcit

Overview

Cross-Covariance Image Transformer (XCiT)

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Linear complexity in time and memory

Our XCiT models has a linear complexity w.r.t number of patches/tokens: $\mathcal{O}(N d ^2)$


Peak Memory (inference)	Millisecond/Image (Inference)

Scaling to high resolution inputs

XCiT can scale to high resolution inputs both due to cheaper compute requirement as well as better adaptability to higher resolution at test time (see Figure 3 in the paper)

Detection and Instance Segmentation for Ultra high resolution images (6000x4000)

)

XCiT+DINO: High Res. Self-Attention Visualization 🦖

Our XCiT models with self-supervised training using DINO can obtain high resolution attention maps.

xcit_dino.mp4

Self-Attention visualization per head

Below we show the attention maps for each of the 8 heads separately and we can observe that every head specializes in different semantic aspects of the scene for the foreground as well as the background.

Multi_head.mp4

Getting Started

First, clone the repo

git clone https://github.com/facebookresearch/XCiT.git

Then, you can install the required packages including: Pytorch version 1.7.1, torchvision version 0.8.2 and Timm version 0.4.8

pip install -r requirements.txt

Download and extract the ImageNet dataset. Afterwards, set the --data-path argument to the corresponding extracted ImageNet path.

For full details about all the available arguments, you can use

python main.py --help

For detection and segmentation downstream tasks, please check:

COCO Object detection and Instance segmentation: XCiT Detection
ADE20k Semantic segmentation: XCiT Semantic Segmentation

Model Zoo

We provide XCiT models pre-trained weights on ImageNet-1k.

§: distillation

Models with 16x16 patch size

Arch	params	Model
		224		224 §		384 §
		top-1	weights	top-1	weights	top-1	weights
xcit_nano_12_p16	3M	69.9%	download	72.2%	download	75.4%	download
xcit_tiny_12_p16	7M	77.1%	download	78.6%	download	80.9%	download
xcit_tiny_24_p16	12M	79.4%	download	80.4%	download	82.6%	download
xcit_small_12_p16	26M	82.0%	download	83.3%	download	84.7%	download
xcit_small_24_p16	48M	82.6%	download	83.9%	download	85.1%	download
xcit_medium_24_p16	84M	82.7%	download	84.3%	download	85.4%	download
xcit_large_24_p16	189M	82.9%	download	84.9%	download	85.8%	download

Models with 8x8 patch size

Arch	params	Model
		224		224 §		384 §
		top-1	weights	top-1	weights	top-1	weights
xcit_nano_12_p8	3M	73.8%	download	76.3%	download	77.8%	download
xcit_tiny_12_p8	7M	79.7%	download	81.2%	download	82.4%	download
xcit_tiny_24_p8	12M	81.9%	download	82.6%	download	83.7%	download
xcit_small_12_p8	26M	83.4%	download	84.2%	download	85.1%	download
xcit_small_24_p8	48M	83.9%	download	84.9%	download	85.6%	download
xcit_medium_24_p8	84M	83.7%	download	85.1%	download	85.8%	download
xcit_large_24_p8	189M	84.4%	download	85.4%	download	86.0%	download

XCiT + DINO Self-supervised models

Arch	params	k-nn	linear	download
xcit_small_12_p16	26M	76.0%	77.8%	backbone
xcit_small_12_p8	26M	77.1%	79.2%	backbone
xcit_medium_24_p16	84M	76.4%	78.8%	backbone
xcit_medium_24_p8	84M	77.9%	80.3%	backbone

Training

For training using a single node, use the following command

python -m torch.distributed.launch --nproc_per_node=[NUM_GPUS] --use_env main.py --model [MODEL_KEY] --batch-size [BATCH_SIZE] --drop-path [STOCHASTIC_DEPTH_RATIO] --output_dir [OUTPUT_PATH]

For example, the XCiT-S12/16 model can be trained using the following command

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model xcit_small_12_p16 --batch-size 128 --drop-path 0.05 --output_dir /experiments/xcit_small_12_p16/ --epochs [NUM_EPOCHS]

For multinode training via SLURM you can alternatively use

python run_with_submitit.py --partition [PARTITION_NAME] --nodes 2 --ngpus 8 --model xcit_small_12_p16 --batch-size 64 --drop-path 0.05 --job_dir /experiments/xcit_small_12_p16/ --epochs 400

More details for the hyper-parameters used to train the different models can be found in Table B.1 in the paper.

Evaluation

To evaluate an XCiT model using the checkpoints above or models you trained use the following command:

python main.py --eval --model  --input-size  [--full_crop] --pretrained

By default we use the --full_crop flag which evaluates the model with a crop ratio of 1.0 instead of 0.875 following CaiT.

For example, the command to evaluate the XCiT-S12/16 using 224x224 images:

python main.py --eval --model xcit_small_12_p16 --input-size 384 --full_crop --pretrained https://dl.fbaipublicfiles.com/xcit/xcit_small_12_p16_224.pth

Acknowledgement

This repository is built using the Timm library and the DeiT repository. The self-supervised training is based on the DINO repository.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

Citation

If you find this repository useful, please consider citing our work:

@misc{elnouby2021xcit,
      title={XCiT: Cross-Covariance Image Transformers}, 
      author={Alaaeldin El-Nouby and Hugo Touvron and Mathilde Caron and Piotr Bojanowski and Matthijs Douze and Armand Joulin and Ivan Laptev and Natalia Neverova and Gabriel Synnaeve and Jakob Verbeek and Hervé Jegou},
      year={2021},
      journal={arXiv preprint arXiv:2106.09681},
}

Comments

Training on Single GPU

Thanks for the exciting work.

I am trying to finetune on my classification (imagenet) like dataset on 1 GPU using following command.

python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --model xcit_nano_12_p16 --batch-size 16 --drop-path 0.05 --output_dir experiments/xcit_nano_12_p16/ --epochs 30 --pretrained /mnt/hdd1/Projects/XCiT/xcit_nano_12_p16_224.pth

But it fails with following error: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 1, 128]], which is output 0 of SliceBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

what could be done to resolve this? I am new to distributed training .

opened by trathpai 5
Can't read files from the datasets folder

Thanks for your work!

I met a problem when I try to follow the steps in the readme.md

I try to set the datapath by: python main.py --data-path 'F:\Projects\AI\ImageNet' However, it comes: RuntimeError: Found 0 files in subfolders of: F:\Projects\AI\ImageNet\train

I'm sure the path is right, and there are all .JPEG files in this train folder, but it seems can't read any of them.

opened by RichardYann 2
Block-diagonal XCA?

Hello, It seems like the XCA doesn't use separated parameters for each head and I can't find the implementation for the block-diagonal one. I'm curious about why the implementation doesn't include it? Sorry if my understanding is incorrect.

opened by blakechi 2
Finetuning details

Hello; this is a great work. I would like to take advantage of the models lower vram requirement to deploy these models on edge.

However i would like to ask for resources on how to finetune the models with our data. Does finetuning follow the standart model of replacing the classification head (the final connected layer maybe?); and then applying training with a lower learning rate (what would you advise as a general baseline?).

Thanks in advance for any pointers, again great work!

opened by asahinyener 2

ERROR in requirements.txt

Getting the following error while executing requirements.txt

ERROR: Could not find a version that satisfies the requirement timm==0.4.8 (from -r requirements.txt (line 3)) (from versions: 0.1.1, 0.1.2, 0.1.4, 0.1.6, 0.1.8, 0.1.10, 0.1.12, 0.1.14, 0.1.16, 0.1.18, 0.1.20, 0.1.22, 0.1.24, 0.1.26, 0.1.28, 0.1.30, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.4.5, 0.4.9)
ERROR: No matching distribution found for timm==0.4.8 (from -r requirements.txt (line 3))

I think we have to change the version from 0.4.8 to 0.4.9

opened by sahilkhose 2

Question about training epochs and training logs
Hello, thank u for another simple but effective work!

In your paper, the training epochs is setting as :

We train our model for 400 epochs with the AdamW optimizer [45] using a cosine learning rate decay.

but the default epochs in your code is setting as 300 epochs and doesn't be changed in command line.

So, I'm confused about it.

By the way, could you publish your training logs?
opened by pengzhiliang 2

loss is not decrease

I use 4x1080ti

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model xcit_nano_12_p8 --batch-size 64 --drop-path 0.05 --output_dir ./experiments/xcit_nano_12_p8/ --epochs 100

I got this

Epoch: [1]  [  0/390]  eta: 0:27:35  lr: 0.000001  loss: 6.8866 (6.8866)  time: 4.2456  data: 2.6132  max mem: 6128
Epoch: [1]  [ 10/390]  eta: 0:07:40  lr: 0.000001  loss: 6.9260 (6.9281)  time: 1.2121  data: 0.2377  max mem: 6128
Epoch: [1]  [ 20/390]  eta: 0:06:37  lr: 0.000001  loss: 6.9244 (6.9252)  time: 0.9165  data: 0.0001  max mem: 6128
Epoch: [1]  [ 30/390]  eta: 0:06:09  lr: 0.000001  loss: 6.9226 (6.9256)  time: 0.9227  data: 0.0001  max mem: 6128
Epoch: [1]  [ 40/390]  eta: 0:05:49  lr: 0.000001  loss: 6.9321 (6.9277)  time: 0.9192  data: 0.0001  max mem: 6128
Epoch: [1]  [ 50/390]  eta: 0:05:35  lr: 0.000001  loss: 6.9378 (6.9297)  time: 0.9253  data: 0.0001  max mem: 6128
Epoch: [1]  [ 60/390]  eta: 0:05:21  lr: 0.000001  loss: 6.9406 (6.9320)  time: 0.9267  data: 0.0001  max mem: 6128
Epoch: [1]  [ 70/390]  eta: 0:05:10  lr: 0.000001  loss: 6.9406 (6.9321)  time: 0.9265  data: 0.0001  max mem: 6128
Epoch: [1]  [ 80/390]  eta: 0:04:58  lr: 0.000001  loss: 6.9337 (6.9325)  time: 0.9315  data: 0.0001  max mem: 6128
Epoch: [1]  [ 90/390]  eta: 0:04:48  lr: 0.000001  loss: 6.9340 (6.9328)  time: 0.9339  data: 0.0001  max mem: 6128
Epoch: [1]  [100/390]  eta: 0:04:38  lr: 0.000001  loss: 6.9246 (6.9323)  time: 0.9438  data: 0.0001  max mem: 6128
Epoch: [1]  [110/390]  eta: 0:04:27  lr: 0.000001  loss: 6.9255 (6.9317)  time: 0.9371  data: 0.0001  max mem: 6128
Epoch: [1]  [120/390]  eta: 0:04:17  lr: 0.000001  loss: 6.9293 (6.9317)  time: 0.9224  data: 0.0001  max mem: 6128
Epoch: [1]  [130/390]  eta: 0:04:07  lr: 0.000001  loss: 6.9322 (6.9319)  time: 0.9176  data: 0.0001  max mem: 6128
Epoch: [1]  [140/390]  eta: 0:03:57  lr: 0.000001  loss: 6.9306 (6.9320)  time: 0.9286  data: 0.0001  max mem: 6128
Epoch: [1]  [150/390]  eta: 0:03:47  lr: 0.000001  loss: 6.9294 (6.9319)  time: 0.9332  data: 0.0001  max mem: 6128
Epoch: [1]  [160/390]  eta: 0:03:38  lr: 0.000001  loss: 6.9265 (6.9313)  time: 0.9317  data: 0.0001  max mem: 6128
Epoch: [1]  [170/390]  eta: 0:03:28  lr: 0.000001  loss: 6.9265 (6.9318)  time: 0.9253  data: 0.0001  max mem: 6128
Epoch: [1]  [180/390]  eta: 0:03:18  lr: 0.000001  loss: 6.9412 (6.9319)  time: 0.9212  data: 0.0001  max mem: 6128
Epoch: [1]  [190/390]  eta: 0:03:08  lr: 0.000001  loss: 6.9273 (6.9319)  time: 0.9292  data: 0.0001  max mem: 6128
Epoch: [1]  [200/390]  eta: 0:02:59  lr: 0.000001  loss: 6.9255 (6.9318)  time: 0.9226  data: 0.0001  max mem: 6128
Epoch: [1]  [210/390]  eta: 0:02:49  lr: 0.000001  loss: 6.9305 (6.9317)  time: 0.9275  data: 0.0001  max mem: 6128
Epoch: [1]  [220/390]  eta: 0:02:40  lr: 0.000001  loss: 6.9295 (6.9314)  time: 0.9360  data: 0.0001  max mem: 6128
Epoch: [1]  [230/390]  eta: 0:02:30  lr: 0.000001  loss: 6.9290 (6.9312)  time: 0.9446  data: 0.0001  max mem: 6128
Epoch: [1]  [240/390]  eta: 0:02:21  lr: 0.000001  loss: 6.9229 (6.9305)  time: 0.9421  data: 0.0001  max mem: 6128
Epoch: [1]  [250/390]  eta: 0:02:11  lr: 0.000001  loss: 6.9263 (6.9310)  time: 0.9283  data: 0.0001  max mem: 6128
Epoch: [1]  [260/390]  eta: 0:02:02  lr: 0.000001  loss: 6.9225 (6.9305)  time: 0.9232  data: 0.0001  max mem: 6128
Epoch: [1]  [270/390]  eta: 0:01:52  lr: 0.000001  loss: 6.9225 (6.9307)  time: 0.9220  data: 0.0001  max mem: 6128
Epoch: [1]  [280/390]  eta: 0:01:43  lr: 0.000001  loss: 6.9359 (6.9309)  time: 0.9205  data: 0.0001  max mem: 6128
Epoch: [1]  [290/390]  eta: 0:01:33  lr: 0.000001  loss: 6.9323 (6.9307)  time: 0.9232  data: 0.0001  max mem: 6128
Epoch: [1]  [300/390]  eta: 0:01:24  lr: 0.000001  loss: 6.9245 (6.9304)  time: 0.9327  data: 0.0001  max mem: 6128
Epoch: [1]  [310/390]  eta: 0:01:15  lr: 0.000001  loss: 6.9237 (6.9304)  time: 0.9280  data: 0.0001  max mem: 6128
Epoch: [1]  [320/390]  eta: 0:01:05  lr: 0.000001  loss: 6.9333 (6.9307)  time: 0.9234  data: 0.0001  max mem: 6128
Epoch: [1]  [330/390]  eta: 0:00:56  lr: 0.000001  loss: 6.9372 (6.9308)  time: 0.9362  data: 0.0001  max mem: 6128
Epoch: [1]  [340/390]  eta: 0:00:46  lr: 0.000001  loss: 6.9314 (6.9306)  time: 0.9338  data: 0.0001  max mem: 6128
Epoch: [1]  [350/390]  eta: 0:00:37  lr: 0.000001  loss: 6.9309 (6.9308)  time: 0.9319  data: 0.0001  max mem: 6128
Epoch: [1]  [360/390]  eta: 0:00:28  lr: 0.000001  loss: 6.9258 (6.9307)  time: 0.9357  data: 0.0001  max mem: 6128
Epoch: [1]  [370/390]  eta: 0:00:18  lr: 0.000001  loss: 6.9239 (6.9303)  time: 0.9330  data: 0.0002  max mem: 6128
Epoch: [1]  [380/390]  eta: 0:00:09  lr: 0.000001  loss: 6.9205 (6.9301)  time: 0.9424  data: 0.0001  max mem: 6128
Epoch: [1]  [389/390]  eta: 0:00:00  lr: 0.000001  loss: 6.9206 (6.9300)  time: 0.9389  data: 0.0001  max mem: 6128

opened by GuoQuanhao 1

Fine-tuning configurations

Thank you for a great work. I'm trying to reproduce transfer learning results, but I'm not sure about the fine-tuning configuration. I read the issue below, but is it all (just smaller lr=5e-5 for classification)? https://github.com/facebookresearch/xcit/issues/9#issuecomment-868494866

Would you inform me if there should be additional changes?

opened by amoeba04 1
segmentation readme pip instruction error?
In

https://github.com/facebookresearch/xcit/blob/master/semantic_segmentation/README.md

should the line

pip install mmcv-full==1.3.0 mmseg==0.11.0

instead be

pip install mmcv-full==1.3.0 mmsegmentation==0.11.0

?

mmseg appears to be a library for segmenting chinese characters. https://pypi.org/project/mmseg/
opened by mellorjc 1
ImageNet-22K Models?

Greetings,

I would like first thank you for sharing the code of this amazing research.

When looking onto other competitive backbone models, such as Swin-Transformers, I see that their best results are coming from pretraiend ImageNet 22K models. Would it be possible for you guys to also release pre-trained 22K models? It would allow model to reach even higher results and possibly outperform Swin-L model that currently achieves 87.3 T1.

opened by ErenBalatkan 1
ModuleNotFoundError: No module named 'mmcv_custom'
Thank you for publishing the great work! Please check the following error in xcit/detection/backbone/xcit.py file:

from mmcv_custom import load_checkpoint ModuleNotFoundError: No module named 'mmcv_custom'

Probably you forgot to add this folder: https://github.com/SwinTransformer/Swin-Transformer-Object-Detection/tree/master/mmcv_custom

Thanks again!
opened by ifding 1
Log Files from Training

Thank you for your awesome code!

I am hoping you might open-source the log files you have from training. Maybe the training and validation loss as a function of epoch (and/or batch) with an estimate of the runtime?

opened by gauenk 0
does remove CLS_token means features before the classification layer ?

if I need to extract the features before the classification layer, will I need to remove CLS_token from the Cross-covariance attention block as it has Global aggregation with class attention that has CLS_token, does that right please?

opened by mathshangw 1
The code is inconsistent with the pseudocode in the paper.

https://github.com/facebookresearch/xcit/blob/82f5291f412604970c39a912586e008ec009cdca/semantic_segmentation/backbone/xcit.py#L256

In paper https://arxiv.org/pdf/2106.09681.pdf:

@aelnouby @tanujdhiman

opened by lartpang 0
Warning: Grad strides do not match bucket view strides.

Hi, thanks for your wonderful work. When I use Xcit for another task as the backbone, it comes the warning of Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. It is caused by feeding incontiguous tensors to view-style operators. While I can find some positions that cause this warning, there seem exits several different code lines which can cause this warning and I failed to find all of them. I wonder if you have also encountered this warning and do you have any advice to solve this problem?

opened by botaoye 0
Extremely unstable training on multiple gpus
Hi, I'm trying to reproduce the classification training results.

I tried on 2 different machines, machine A with one RTX 3090 and machine B with four A100 gpus.

The training on machine A with a single GPU is fine; see green line (with default parameters). But on machine B with 4 gpus, it's not training properly and very erratic; see gray, yellow, teal lines (with default and custom parameters). Purple line is DeiT training on the same machine B (default parameters).

All experiments done with --batch-size=128 (128 samples per gpu).

This is validation loss, other metrics tell the same story, some even worse.

Example of the commands I used:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py \ --model xcit_small_12_p16 --batch-size 128 --drop-path 0.05 --epochs 400

Anyone's seen this or know how to fix it? Many thanks.
opened by felix-do-wizardry 7

Owner

Facebook Research

GitHub

Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

pyRiemann pyRiemann is a python package for covariance matrices manipulation and classification through Riemannian geometry. The primary target is cla

447 Jan 5, 2023

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

1.4k Jan 1, 2023

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 1, 2023

A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

96 Dec 21, 2022

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

16 Jul 16, 2022

Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

24 Dec 21, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

409 Jan 6, 2023

Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque

86 Oct 28, 2022

Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Cross-Quality Labeled Faces in the Wild (XQLFW) Here, we release the database, evaluation protocol and code for the following paper: Cross Quality LFW

10 Dec 12, 2022

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT : Cross-Attention Multi-Scale Vision Transformer for Image Classification This is an unofficial PyTorch implementation of CrossViT: Cross-Att

103 Nov 25, 2022

Official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT This repository is the official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. ArXiv If

168 Dec 29, 2022

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

28 Dec 30, 2022

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Cross Transformers - Pytorch (wip) Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch Install $ pip install cross-t

40 Dec 22, 2022

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

140 Dec 28, 2022

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

3.3k Jan 4, 2023

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

2.4k Dec 28, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

408 Jan 1, 2023