DeiT: Data-efficient Image Transformers

Related tags

Deep Learning deit
Overview

DeiT: Data-efficient Image Transformers

This repository contains PyTorch evaluation code, training code and pretrained models for DeiT (Data-Efficient Image Transformers).

They obtain competitive tradeoffs in terms of speed / precision:

DeiT

For details see Training data-efficient image transformers & distillation through attention by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles and Hervé Jégou.

If you use this code for a paper please cite:

@article{touvron2020deit,
  title={Training data-efficient image transformers & distillation through attention},
  author={Hugo Touvron and Matthieu Cord and Matthijs Douze and Francisco Massa and Alexandre Sablayrolles and Herv\'e J\'egou},
  journal={arXiv preprint arXiv:2012.12877},
  year={2020}
}

Model Zoo

We provide baseline DeiT models pretrained on ImageNet 2012.

name acc@1 acc@5 #params url
DeiT-tiny 72.2 91.1 5M model
DeiT-small 79.9 95.0 22M model
DeiT-base 81.8 95.6 86M model
DeiT-tiny distilled 74.5 91.9 6M model
DeiT-small distilled 81.2 95.4 22M model
DeiT-base distilled 83.4 96.5 87M model
DeiT-base 384 82.9 96.2 87M model
DeiT-base distilled 384 (1000 epochs) 85.2 97.2 88M model

The models are also available via torch hub. Before using it, make sure you have the pytorch-image-models package timm==0.3.2 by Ross Wightman installed. Note that our work relies of the augmentations proposed in this library. In particular, the RandAugment and RandErasing augmentations that we invoke are the improved versions from the timm library, which already led the timm authors to report up to 79.35% top-1 accuracy with Imagenet training for their best model, i.e., an improvement of about +1.5% compared to prior art.

To load DeiT-base with pretrained weights on ImageNet simply do:

import torch
# check you have the right version of timm
import timm
assert timm.__version__ == "0.3.2"

# now load it with torchhub
model = torch.hub.load('facebookresearch/deit:main', 'deit_base_patch16_224', pretrained=True)

Additionnally, we provide a Colab notebook which goes over the steps needed to perform inference with DeiT.

Usage

First, clone the repository locally:

git clone https://github.com/facebookresearch/deit.git

Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Evaluation

To evaluate a pre-trained DeiT-base on ImageNet val with a single GPU run:

python main.py --eval --resume https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --data-path /path/to/imagenet

This should give

* Acc@1 81.846 Acc@5 95.594 loss 0.820

For Deit-small, run:

python main.py --eval --resume https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth --model deit_small_patch16_224 --data-path /path/to/imagenet

giving

* Acc@1 79.854 Acc@5 94.968 loss 0.881

Note that Deit-small is not the same model as in Timm.

And for Deit-tiny:

python main.py --eval --resume https://dl.fbaipublicfiles.com/deit/deit_tiny_patch16_224-a1311bcf.pth --model deit_tiny_patch16_224 --data-path /path/to/imagenet

which should give

* Acc@1 72.202 Acc@5 91.124 loss 1.219

Here you'll find the command-lines to reproduce the inference results for the distilled and finetuned models

deit_base_distilled_patch16_224
python main.py --eval --model deit_base_distilled_patch16_224 --resume https://dl.fbaipublicfiles.com/deit/deit_base_distilled_patch16_224-df68dfff.pth

giving

* Acc@1 83.372 Acc@5 96.482 loss 0.685
deit_small_distilled_patch16_224
python main.py --eval --model deit_small_distilled_patch16_224 --resume https://dl.fbaipublicfiles.com/deit/deit_small_distilled_patch16_224-649709d9.pth

giving

* Acc@1 81.164 Acc@5 95.376 loss 0.752
deit_tiny_distilled_patch16_224
python main.py --eval --model deit_tiny_distilled_patch16_224 --resume https://dl.fbaipublicfiles.com/deit/deit_tiny_distilled_patch16_224-b40b3cf7.pth

giving

* Acc@1 74.476 Acc@5 91.920 loss 1.021
deit_base_patch16_384
python main.py --eval --model deit_base_patch16_384 --input-size 384 --resume https://dl.fbaipublicfiles.com/deit/deit_base_patch16_384-8de9b5d1.pth

giving

* Acc@1 82.890 Acc@5 96.222 loss 0.764
deit_base_distilled_patch16_384
python main.py --eval --model deit_base_distilled_patch16_384 --input-size 384 --resume https://dl.fbaipublicfiles.com/deit/deit_base_distilled_patch16_384-d0272ac0.pth

giving

* Acc@1 85.224 Acc@5 97.186 loss 0.636

Training

To train DeiT-small and Deit-tiny on ImageNet on a single node with 4 gpus for 300 epochs run:

DeiT-small

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model deit_small_patch16_224 --batch-size 256 --data-path /path/to/imagenet --output_dir /path/to/save

DeiT-tiny

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model deit_tiny_patch16_224 --batch-size 256 --data-path /path/to/imagenet --output_dir /path/to/save

Multinode training

Distributed training is available via Slurm and submitit:

pip install submitit

To train DeiT-base model on ImageNet on 2 nodes with 8 gpus each for 300 epochs:

python run_with_submitit.py --model deit_base_patch16_224 --data-path /path/to/imagenet

To train DeiT-base with hard distillation using a RegNetY-160 as teacher, on 2 nodes with 8 GPUs with 32GB each for 300 epochs (make sure that the model weights for the teacher have been downloaded before to the correct location, to avoid multiple workers writing to the same file):

python run_with_submitit.py --model deit_base_distilled_patch16_224 --distillation-type hard --teacher-model regnety_160 --teacher-path https://dl.fbaipublicfiles.com/deit/regnety_160-a5fe301d.pth --use_volta32

To finetune a DeiT-base on 384 resolution images for 30 epochs, starting from a DeiT-base trained on 224 resolution images, do (make sure that the weights to the original model have been downloaded before, to avoid multiple workers writing to the same file):

python run_with_submitit.py --model deit_base_patch16_384 --batch-size 32 --finetune https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --input-size 384 --use_volta32 --nodes 2 --lr 5e-6 --weight-decay 1e-8 --epochs 30 --min-lr 5e-6

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

Comments
  • Loss NAN for Deit Base

    Loss NAN for Deit Base

    I have reproduced the small and tiny model but met with problems for reproducing the base model with 224 and 384 image size. With a large probability, the loss came to NAN after training with few epochs. My setting is 16 GPUs and the batch size is 64 on each GPU and I do not change any hyper-parameters in run_with_submitit.py. Do you have any idea to solve this problem? Thanks for your help.

    awaiting response 
    opened by ChengyueGongR 24
  • I need some help to reproduce DeiT-III finetuning result

    I need some help to reproduce DeiT-III finetuning result

    Hi

    Thank you for sharing finetune code & training logs On IN-1k pretraining, I got similar results to your log: ViT-S 81.43 and ViT-B 82.88 But, I failed to reproduce finetune performance even with your official finetuning setting So, I would like to ask for advice or help.

    Here is my fine-tune result with ViT-B on IN-1k. image

    I expected performance will increase as your fine-tune log, but. instead, the finetune degrades the performance. I can't use submitit, so I used the following command on 1 node 8 GPUs A100 machine

    OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${num_gpus_per_node} --nnodes=${WORLD_SIZE} --node_rank=${RANK}  --master_addr=${MASTER_ADDR}  --master_port=${MASTER_PORT} --use_env main.py \
        --model deit_base_patch16_LS \
        --data-path ${local_data_path} \
        --finetune ${SAVE_BASE_PATH}/pretraining/checkpoint-${epoch}.pth \
        --output_dir ${SAVE_BASE_PATH}/finetune4 \
        --batch-size 64 \
        --print_freq 400 \
        --epochs 20 \
        --smoothing 0.1 \
        --reprob 0.0 \
        --opt adamw \
        --lr 1e-5 \
        --weight-decay 0.1 \
        --input-size 224 \
        --drop 0.0 \
        --drop-path 0.2 \
        --mixup 0.8 \
        --cutmix 1.0 \
        --unscale-lr \
        --no-repeated-aug \
        --aa rand-m9-mstd0.5-inc1 \
    

    and full args printed on the command line

    Namespace(ThreeAugment=False, aa='rand-m9-mstd0.5-inc1', attn_only=False, auto_resume=True, batch_size=64, bce_loss=False, clip_grad=None, color_jitter=0.3, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/mnt/ddn/datasets/ILSVRC2015/train/Data/CLS-LOC', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_eval=False, dist_url='env://', distillation_alpha=0.5, distillation_tau=1.0, distillation_type='none', distributed=True, drop=0.0, drop_path=0.2, epochs=20, eval=False, finetune='/mnt/backbone-nfs/bhheo/checkpoints/deit_codebase_deit_base_patch16_LS_800epoch_reproduce/pretraining/checkpoint-800.pth', gpu=0, inat_category='name', input_size=224, log_dir='nsmlv2', log_name='finetune', lr=1e-05, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_base_patch16_LS', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='/mnt/backbone-nfs/bhheo/checkpoints/deit_codebase_deit_base_patch16_LS_800epoch_reproduce/finetune4', patience_epochs=10, pin_mem=True, print_freq=400, rank=0, recount=1, remode='pixel', repeated_aug=False, reprob=0.0, resplit=False, resume='', save_periods=['last2'], sched='cosine', seed=0, smoothing=0.1, src=False, start_epoch=0, teacher_model='regnety_160', teacher_path='', train_interpolation='bicubic', unscale_lr=True, warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.1, world_size=8)
    

    I think it is the same as your finetune setting. I double-checked my code but I still don't know why the result is totally different.

    I'm using different library versions torch : 1.11.0a0+b6df043, torchvision: 0.11.0a0, timm: 0.5.4 It might cause some problems, but there was no problem in pretraining and the performance difference is too severe for a simple library version issue.

    I'm sorry to keep bothering you, but could you please let me know if there is something wrong with my setting? Or could you please share the ViT-B weights pretrained on IN-1k 192x192 resolution without finetuning on 224x224? If you share the weights before finetune, I can verify my finetune code without doubting my pretraining.

    opened by bhheo 23
  • Fine-tuning details

    Fine-tuning details

    Hi,

    I am trying to replicate the results of the paper that have been fine-tuned to datasets such as CIFAR-10 and Stanford Cars. Could you give details about hyper-parameters used (like batch size, learning rate etc.)

    Thanks.

    question 
    opened by nakashima-kodai 14
  • No learning when transfer learning with Cait XXS24 224

    No learning when transfer learning with Cait XXS24 224

    Hello,

    Thanks a lot for this this great repo. I'm currently doing transfer learning with Cait XXS24 224, but I have a problem when loading the pretrained weights : when I train cait on the new task, the accuracy will start from 10 (random) and won't increase. I tried to train small deit on this task with transfer learning, and this time it worked well (with the same training functions). Do you have any idea what could be the problem here ?

    Here is the code to load weigths (actually it is the one that you provide) :

    v = cait_XXS_224(pretrained = False) checkpoint = torch.load('logs/ImageNet/XXS24_24.pth') checkpoint_no_module = {} for k in v.state_dict().keys(): checkpoint_no_module[k] = checkpoint["model"]['module.'+k] v.load_state_dict(checkpoint_no_module)

    I'm using torch 1.7.1 and timm 0.4.5.

    awaiting response 
    opened by BasileR 10
  • Resume Broken

    Resume Broken

    It keeps complaining about no state_dict_model_ema : Failed to find state_dict_ema, starting from loaded model weights But in the checkpoint there is clearly a model_ema: dict_keys(['model', 'optimizer', 'lr_scheduler', 'epoch', 'model_ema', 'scaler', 'args'])

    and the loss goes to NaN a few hundred steps after resume...

    opened by kyleliang919 8
  • The training log of DeiT III

    The training log of DeiT III

    Hi, I'm trying to reproduce the base model of DeiT III on ImageNet-1k with the suggested hyper-parameters. By running:

    python run_with_submitit.py --model deit_base_patch16_LS --data-path /path/to/imagenet --batch 256 --lr 3e-3 --epochs 800 --weight-decay 0.05 --sched cosine --input-size 192 --reprob 0.0 --node 1 --gpu 8 --smoothing 0.0 --warmup-epochs 5 --drop 0.0 --nb-classes 1000 --seed 0 --opt fusedlamb --warmup-lr 1e-6 --mixup .8 --drop-path 0.2 --cutmix 1.0 --unscale-lr --repeated-aug --bce-loss  --color-jitter 0.3 --ThreeAugment
    

    Will the training log be available? Or is it ok to share the accuracy on 1/2, 1/4 of the total schedule?

    Thanks!

    opened by tgxs002 7
  • Question about implementing finetuning on iNat-18 dataset

    Question about implementing finetuning on iNat-18 dataset

    Hi, I run following command to implement:

    python -m torch.distributed.launch
    --nproc_per_node=8
    --use_env main.py
    --model deit_base_patch16_224
    --data-set INAT
    --batch-size 96
    --lr 7.5e-5
    --opt AdamW
    --weight-decay 0.05
    --epochs 360
    --repeated-aug
    --reprob 0.1
    --drop-path 0.1
    --data-path /data/Dataset/inat2018_tar
    --finetune ./output/deit_base_patch16_224-b5f2ef4d.pth
    --output_dir ./output/finetune_inat18_deit

    Other arguments are the same as the default values in main.py.

    But I only got 71% acc within 300 epochs. Should I continue to finetune until 360 epochs?

    opened by cokezrr 7
  • Question about Repeated Augmentation

    Question about Repeated Augmentation

    Hi, first of all, thank you for releasing the code base. I have a small question about the sampler for Repeated Augmentation. What does this 256*256 mean?

    https://github.com/facebookresearch/deit/blob/cb29b5efd522a0ac83d64aa8b41fe27cead3a030/samplers.py#L32

    Thank you!

    question 
    opened by moskomule 7
  • Image throughput numbers

    Image throughput numbers

    What do the image / sec throughput numbers represent (train, inferences, batch size, mixed-prc or float32, etc)? They are lower than any inference numbers I'm familiar with for any of the listed models. They also don't seem to match expected training throughputs and have an odd spread (smallest to largest models), being quite low for the smaller models (CPU bound?).

    I don't spend much time with V100, but relative to Titan RTX and RTX 3090 I have a fairly good idea where the numbers should fall...

    Thanks

    question 
    opened by rwightman 7
  • `kxd` matrix or `1xd` vector?

    `kxd` matrix or `1xd` vector?

    In section 3 of paper 'Augmenting Convolutional networks with attention-based aggregation': ··· We can easily specialize the attention maps per class by replacing the CLS vector with a k × d matrix, where each of the k columns is associated with one of the classes. This specialization allows us to visualize an attention map for each class, as shown in Figure 2. ··· But I only found 1 x d vector. Where is k x d matrix?

    https://github.com/facebookresearch/deit/blob/40ae72b79cc5cd48dac2b02e1fceb03ee4192676/patchconvnet_models.py#L201

    opened by densechen 6
  • Question about the convergence of the Deit-base model

    Question about the convergence of the Deit-base model

    Great work! and thanks for sharing the codes.

    I am trying to re-train Deit base model but I encountered some issues. May I ask for your insights?

    I can reproduce the reported results 81.8% with all default setting; however, the performance degrades a lot if I change two very minor hyperparameters

    1. Change batch size to 512 (default is 1024), and learning rate is automatically scaled based on your codes.
    2. Keep batch size to 1024 but increase the warmup epochs to 10 (default is 5).

    Here is the test accuracy over epochs

    The orange line is the default setting. (81.8%) The blue line is batch size 512. (78.8%) The green line is using 10 epochs for warmup. (79.2%)

    Testing accuracy curve deit-base

    Zoom in for the first 50 epochs zoom-in

    For the default setting, it seems that the model is going to diverge around the 6-th epoch but it recovers later, and then it eventually achieve pretty good results. (81.8%) However, when using smaller batch size or warmup for additional 5 epochs, the performance degrades ~3%

    I wonder that do you observe the same trend? and do you have any insights into why two small changes I made will affect so much?

    My env: pytorch 1.7, timm 0.3.2, torchvision 0.8

    Thanks.

    question 
    opened by chunfuchen 6
  • Meaning of the model name ( ResMLP)

    Meaning of the model name ( ResMLP)

    Hello, thanks for sharing great work!

    I had small question of the model name. I wondering about the meaning of 'S24' in ResMLP-S24. I think 'S' can mean a small-scale model and '24' may mean that model was consist of 24 layers. But I can not find any description in the paper.

    Could you tell me the meaning like 'S24' or 'B24' ? Thanks!

    opened by YHYeooooong 0
  • Can I use timm==0.4.12 instead of timm==0.3.2 ?

    Can I use timm==0.4.12 instead of timm==0.3.2 ?

    I have created an conda env and installed the following: conda install -c pytorch pytorch torchvision pip install timm==0.3.2

    I tried to run the main.py for evaluation. Gives the following error: With the given cannot import name 'container_abcs' from 'torch._six' Is there a fix for this package issue?

    Alternatively I tried to evaluate (DeiT-base) with timm==0.4.12 I got the Acc@1 81.802 instead of 81.846. Is this slight difference caused by the difference of timm versions ?

    opened by irhallac 0
  • What batch size number other than 1024 have been tried when training a DeiT or ViT model?

    What batch size number other than 1024 have been tried when training a DeiT or ViT model?

    What batch size number other than batch size of 1024 have been tried when training a DeiT or ViT model? In the paper, DeiT (https://arxiv.org/abs/2012.12877), they used a batch size of 1024 and they mentioned that the learning rate should be scaled according to the batch size.

    However, I was wondering if anyone have any experience or successfully train a DeiT model with a batch size that is even less than 512? If yes, what accuracy did you achieve?

    opened by CharlesLeeeee 0
  • Multinode Slurm Training

    Multinode Slurm Training

    Hello, I'm trying to use the run_with_submitit.py file to run the model on the Slurm cluster, but I do not get any output log file to see the training progress. All I have here are logs of each node initiating. Screenshot 2022-12-24 at 2 15 59 PM Can you please help me with this multinode training? Best regards, Mehdi

    opened by yazdanimehdi 0
  • Does the EMA is used in DeiT-III?

    Does the EMA is used in DeiT-III?

    I'm working on reproducing the accuracy of DeiT-III, and I notice that the EMA is enabled during pre-training, but it's not used during evaluation. So does the EMA model is used in any location?

    opened by mzr1996 2
  • What's the accuracy of deit-S without pre-trained on CIFAR10

    What's the accuracy of deit-S without pre-trained on CIFAR10

    opened by hanwenran1 0
Owner
Facebook Research
Facebook Research
Official Implementation of DE-CondDETR and DELA-CondDETR in "Towards Data-Efficient Detection Transformers"

DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-CondDETR and DELA-Cond

Wen Wang 41 Dec 12, 2022
Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-DETR and DELA-DETR in

Wen Wang 61 Dec 12, 2022
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

Syed Waqas Zamir 906 Dec 30, 2022
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification Created by Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, Ch

Yongming Rao 414 Jan 1, 2023
Official implementation of "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Figure 1: Performance of SegFormer-B0 to SegFormer-B5. Project page

NVIDIA Research Projects 1.4k Dec 31, 2022
EsViT: Efficient self-supervised Vision Transformers

Efficient Self-Supervised Vision Transformers (EsViT) PyTorch implementation for EsViT, built with two techniques: A multi-stage Transformer architect

Microsoft 352 Dec 25, 2022
Code for "Searching for Efficient Multi-Stage Vision Transformers"

Searching for Efficient Multi-Stage Vision Transformers This repository contains the official Pytorch implementation of "Searching for Efficient Multi

Yi-Lun Liao 62 Oct 25, 2022
Efficient Training of Visual Transformers with Small Datasets

Official codes for "Efficient Training of Visual Transformers with Small Datasets", NerIPS 2021.

Yahui Liu 112 Dec 25, 2022
Efficient Training of Audio Transformers with Patchout

PaSST: Efficient Training of Audio Transformers with Patchout This is the implementation for Efficient Training of Audio Transformers with Patchout Pa

null 165 Dec 26, 2022
Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

引言 感谢苏神带来的模型,原文地址:https://spaces.ac.cn/archives/8877 如何运行 对应模型EfficientGlobalPoi

powerycy 40 Dec 14, 2022
Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

SinIR (Official Implementation) Requirements To install requirements: pip install -r requirements.txt We used Python 3.7.4 and f-strings which are in

null 47 Oct 11, 2022
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

Vision CAIR Research Group, KAUST 140 Dec 28, 2022
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou

Sense-GVT 470 Dec 30, 2022
This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage

null 1.4k Jan 4, 2023
CoaT: Co-Scale Conv-Attentional Image Transformers

CoaT: Co-Scale Conv-Attentional Image Transformers Introduction This repository contains the official code and pretrained models for CoaT: Co-Scale Co

mlpc-ucsd 191 Dec 3, 2022
Medical Image Segmentation using Squeeze-and-Expansion Transformers

Medical Image Segmentation using Squeeze-and-Expansion Transformers Introduction This repository contains the code of the IJCAI'2021 paper 'Medical Im

askerlee 172 Dec 20, 2022
General Multi-label Image Classification with Transformers

General Multi-label Image Classification with Transformers Jack Lanchantin, Tianlu Wang, Vicente Ordóñez Román, Yanjun Qi Conference on Computer Visio

QData 154 Dec 21, 2022
Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

null 24 Dec 28, 2022
"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Texformer: 3D Human Texture Estimation from a Single Image with Transformers This is the official implementation of "3D Human Texture Estimation from

XiangyuXu 193 Dec 5, 2022