LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

Related tags

Deep Learning LeViT
Overview

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

This repository contains PyTorch evaluation code, training code and pretrained models for LeViT.

They obtain competitive tradeoffs in terms of speed / precision:

LeViT

For details see LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference by Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou and Matthijs Douze.

If you use this code for a paper please cite:

@article{graham2021levit,
  title={LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference},
  author={Benjamin Graham and Alaaeldin El-Nouby and Hugo Touvron and Pierre Stock and Armand Joulin and Herv\'e J\'egou and Matthijs Douze},
  journal={arXiv preprint arXiv:22104.01136},
  year={2021}
}

Model Zoo

We provide baseline LeViT models trained with distllation on ImageNet 2012.

name acc@1 acc@5 #FLOPs #params url
LeViT-128S 76.6 92.9 305M 7.8M model
LeViT-128 78.6 94.0 406M 9.2M model
LeViT-192 80.0 94.7 658M 11M model
LeViT-256 81.6 95.4 1120M 19M model
LeViT-384 82.6 96.0 2353M 39M model

Usage

First, clone the repository locally:

git clone https://github.com/facebookresearch/levit.git

Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models:

conda install -c pytorch pytorch torchvision
pip install timm

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Evaluation

To evaluate a pre-trained LeViT-256 model on ImageNet val with a single GPU run:

python main.py --eval --model LeViT_256 --data-path /path/to/imagenet

This should give

* Acc@1 81.636 Acc@5 95.424 loss 0.750

Training

To train LeViT-256 on ImageNet with hard distillation on a single node with 8 gpus run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model LeViT_256 --data-path /path/to/imagenet --output_dir /path/to/save

Multinode training

Distributed training is available via Slurm and submitit:

pip install submitit

To train LeViT-256 model on ImageNet on one node with 8 gpus:

python run_with_submitit.py --model LeViT_256 --data-path /path/to/imagenet

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

Comments
  • LeViT-128S without distillation 100 epoch training reproduction on 1 GPU

    LeViT-128S without distillation 100 epoch training reproduction on 1 GPU

    Hello and thanks for the great paper and codebase! I am trying to replicate the numbers reported in Table 5 of the paper, and specifically the A4 model (without distillation), that is reported to achieve 69.7% top-1 accuracy. Would you have any hints as to how to replicate these numbers only having 1 GPU? Modifying the code and using gradient accumulation techniques to replicate the 256 * 32 = 8192 batch size seems to only reach 63.9% top-1 accuracy. Are there any other steps / tricks that I might be missing? Thanks!

    opened by ktertikas 4
  • Why LeViT needs 1000 training epochs?

    Why LeViT needs 1000 training epochs?

    While other VIT models are trained with only 300 epochs, LeViT need 1000 epochs,which bring lots of traing cost. I think is unfair for comparison. What is the accuracy of LeViT at 300 epoch ?

    opened by lilujunai 4
  • Question about running the speed_test.py

    Question about running the speed_test.py

    Hi~ I want to run the speed_test.py, but there is an error as follow: q, k, v = qkv.view(B, N, self.num_heads, shape '[2048, 50176, 4, -1]' is invalid for input of size 308281344 when I check the code, I find that the code remove batchnorm of model, and the patch_embed of model is also removed. Therefore, the transformer blocks can not reshape the input. image

    image

    My question is how I fix this problem? And When I delete this line of removing batchnorm, I find the result of 'levit.LeViT_128S, 2048, 224' is 20761 images/s on RTX 3090 which is a lot higher than what you reported (12880 images/s reported in Tab 3). Is this result reasonable? I am looking forward to your reply, thx~.

    opened by irsLu 3
  • LeViT training and bench on GTSRB dataset

    LeViT training and bench on GTSRB dataset

    Hello

    I'm trying to use your SOTA LeViT for GTSRB but encounter some problems when testing. The accuracy after testing 12K images in GTSRB was only 13.4%, 347 FPS on 3080Ti. I believe that your model could break any record of my survey and training may be the main cause. I have tried levit.py and levit_c.py to load the model with only arg num_classes = 43 for training. I also use the same training, and testing method for GhostNet 1.0 and MobileNetV3 Large. Could you please point out some points in the training code that made my testbench not work with your work? Thank you in advance.

    import torch
    import torchvision
    from torchvision import models
    import torch.nn as nn
    import torch.optim as optim
    import torchvision.transforms as transforms
    from torch.utils.data import DataLoader
    from torchsummary import summary
    from utils import save_plots
    from levit_c import LeViT_c_128S
    mean=(0.485, 0.456, 0.406)
    std=(0.229, 0.224, 0.225)
    transform_train = transforms.Compose([
        transforms.Resize((224,224)),
        transforms.ToTensor(),
        transforms.Normalize(mean,std),
    ])
    trainset = torchvision.datasets.GTSRB(root='data', download=False, transform=transform_train)   # download=True if you did not download yet
    trainloader = DataLoader(trainset, batch_size=32, shuffle=True, num_workers=8) 
    model = LeViT_c_128S(num_classes=43)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)
    # Config Training HyperParameter
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
    # Lists to keep track of losses and accuracies.
    train_loss= []
    train_acc= []
    # Training
    epochs = 50
    model.train()
    epoch_acc = 0
    epoch_loss = 0
    for epoch in range(epochs):
        print("\n Epoch: %d"%(epoch+1))
        sum_loss = 0.0
        correct = 0.0
        total =0.0
        for i, data in enumerate(trainloader,0):
            length = len(trainloader)
            inputs,labels = data
            inputs,labels = inputs.to(device),labels.to(device)
            optimizer.zero_grad()
            # forward+backward
            outputs, x = model(inputs)
            loss = criterion(outputs,labels)
            loss.backward()
            optimizer.step()
            # 每个epoch输出损失和正确率
            sum_loss += loss.item()
            _, predicted = torch.max(outputs.data,1)
            total += labels.size(0)
            correct += predicted.eq(labels.data).cpu().sum()
            print("[epoch:%d, iter:%d] Loss: %.03F | Acc: %.3f%%"
                  %(epoch+1, (i+1+epoch*length), sum_loss/(i+1), 100.*correct/total))
        scheduler.step()      # Adjust Learning Rate for next epoch
        epoch_loss = sum_loss/(i+1)
        epoch_acc = 100.*correct/total
        train_loss.append(epoch_loss)
        train_acc.append(epoch_acc)
    #Display Training Result
    model_name = "LeViT_128s"
    save_plots(model_name, train_acc, train_loss)
    print("Model: LeViT_128s")
    print(f"Training Hyperparameter - Epochs: %s, Batch-size: 32, Learning-rate: 0.1, Optimizer: SGD, Momentum: 0.9 " % epochs)
    print("[epoch:%d, iter:%d] Loss: %.03F | Acc: %.3f%%" %(epoch+1, (i+1+epoch*length), sum_loss/(i+1), 100.*correct/total))
    print(f"Model was saved as %s.pth" % model_name)   
    torch.save(model.state_dict(),'LeViT_128s.pth')
    

    acc loss

    opened by thaihoangminhtam 3
  • [fix] Fix Conv2d_BN fuse bug when groups > 1

    [fix] Fix Conv2d_BN fuse bug when groups > 1

    Hi there, thanks for your great work : )

    I found there is a bug when fusing Conv2d_BN with groups > 1. The reason is that the input channel should be w.size(1) * self.c.groups rather than w.size(1) in the function Conv2d_BN.fuse.

    Reproduce Code:

    from levit import Conv2d_BN
    from levit_c import Conv2d_BN as Conv2d_BN_c
    import torch
    import numpy as np
    from itertools import product
    import utils
    
    @torch.no_grad()
    def test():
        for layer_t, a, b, ks, groups in product(
            [Conv2d_BN, Conv2d_BN_c],
            [8, 16, 32, 64],
            [8, 16, 32, 64],
            [1, 3, 5, 7],
            [1, 2, 4],
                ):
            layer = layer_t(a, b, ks, pad=ks//2, groups=groups)
            layer.eval()
    
            x = torch.randn((1, a, 16, 16))
            y1 = layer(x)
            utils.replace_batchnorm(layer)
            y2 = layer(x)
    
            np.testing.assert_almost_equal(y1.detach().numpy(), y2.detach().numpy(), decimal=4)
    
    if __name__ == '__main__':
        test()
        print("Test Over")
    

    Error:

      File "test_conv.py", line 21, in test
        layer.fuse()
      File "/home/wkcn/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/home/wkcn/proj/LeViT-1/levit_c.py", line 99, in fuse
        m.weight.data.copy_(w)
    RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 1
    
    CLA Signed 
    opened by wkcn 3
  • 'NoneType' object has no attribute 'log_softmax'

    'NoneType' object has no attribute 'log_softmax'

    I am using standard loss function nn.CrossEntropyLoss(). It give the following error, please let me know, can we use nn.CrossEntropyLoss()?

    Traceback (most recent call last):

      File "/raid/khawar/PycharmProjects/thesis/train.py", line 487, in <module>
        loss = LOSS(outputs, labels)
      File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1047, in forward
        return F.cross_entropy(input, target, weight=self.weight,
      File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 2693, in cross_entropy
        return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
      File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 1672, in log_softmax
        ret = input.log_softmax(dim)
    AttributeError: 'NoneType' object has no attribute 'log_softmax'
    
    opened by khawar-islam 3
  • Exporting ONNX failed.

    Exporting ONNX failed.

    I used following code to export onnx model:

    torch.onnx.export(levit_model, dummy_input, 
                      "levit192.onnx",
                      export_params=True,
                      verbose=True, 
                      input_names=input_names, output_names=output_names)
    

    but error occurred:

    raise RuntimeError("step!=1 is currently not supported")
    RuntimeError: step!=1 is currently not supported
    

    I tried to set opset_version=11, but another error occurred:

      File "/multimedia-nfs/liwei/model_selection/model_select_env/lib/python3.6/site-packages/torch/onnx/utils.py", line 500, in _model_to_graph
        _export_onnx_opset_version)
    RuntimeError: Index is supposed to be an empty tensor or a vector
    

    I need your help. Thank you!

    opened by leviome 2
  • About the shape of attention_biases

    About the shape of attention_biases

    Thanks for your work!

    When I run the code, I meet an error: too many indices for tensor of dimension2.

    Error file is "mypath/levit.py", the error code is "self.attention_biases[:, self.attention_bias_idxs] "

    attention_biases shape: [4, 196] attention_bias_idxs shape: [196, 196] Is there something wrong with this code?

    opened by ZhangLei999 2
  • The specific setting (e.g., batch-size) to reproduce the inference speed in Tab.3?

    The specific setting (e.g., batch-size) to reproduce the inference speed in Tab.3?

    In the Tab.3 of the paper, there are some values indicating the inference speed of LeViT models, such as 12880 img/s for LeViT-128S and 9266 img/s for LeViT-128.

    Would you please list the specific setting (e.g., the batch-size, the type of GPU), because the same architecture can run with various inference speed under different settings.

    opened by Openning07 2
  • problem of inference precision

    problem of inference precision

    Thank you very much for your open source. But when I reproduce the inference precision, when I use the model provided by the official, the inference precision is inconsistent with that given in readme. What is the reason.

    i use the LeViT-256 Acc@1 81.584 Acc@5 95.464 loss 0.745

    opened by aso538 1
  • Inference - different output when using different batch size

    Inference - different output when using different batch size

    When performing inference using pretrained model (within eval() mode), the same image may producing different logits output when batch size is changed.

    Example code:

    with torch.no_grad():
        x = torch.stack([
            torch.zeros((3,224,224)),
            torch.ones((3,224,224)),
            torch.ones((3,224,224)),
        ])
        model = LeViT_384(pretrained=True)
        model = model.eval()
        print('batch=1', model(x[:1])[0][:2].numpy())
        print('batch=3', model(x[:3])[0][:2].numpy())
    

    Output: (only for the 1st sample, limited to 2 classes)

    batch=1 [-0.3287484  -0.11664876]
    batch=3 [-0.32874817 -0.11664899]
    

    While the argmax may not significantly affected, this inconsistency make it difficult to perform gradient analysis.

    I'm suspecting that this is caused by some batch-normalization layers not honoring eval() mode.

    opened by thariq-nugrohotomo 1
Owner
Facebook Research
Facebook Research
A faster pytorch implementation of faster r-cnn

A Faster Pytorch Implementation of Faster R-CNN Write at the beginning [05/29/2020] This repo was initaited about two years ago, developed as the firs

Jianwei Yang 7.1k Jan 1, 2023
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 9, 2023
This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

null 1 Dec 24, 2021
Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

Andrew Jong 97 Dec 13, 2022
Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

The Power of Points for Modeling Humans in Clothing (ICCV 2021) This repository contains the official PyTorch implementation of the ICCV 2021 paper: T

Qianli Ma 158 Nov 24, 2022
RepVGG: Making VGG-style ConvNets Great Again

RepVGG: Making VGG-style ConvNets Great Again (PyTorch) This is a super simple ConvNet architecture that achieves over 80% top-1 accuracy on ImageNet

null 2.8k Jan 4, 2023
RepVGG: Making VGG-style ConvNets Great Again

This repository is the code that needs to be submitted for OpenMMLab Algorithm Ecological Challenge,the paper is RepVGG: Making VGG-style ConvNets Great Again

Ty Feng 62 May 21, 2022
PyTorch implementation of spectral graph ConvNets, NIPS’16

Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbresson

Xavier Bresson 287 Jan 4, 2023
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 27 Dec 12, 2022
Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Tacotron 2 (without wavenet) PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementati

NVIDIA Corporation 4.1k Jan 3, 2023
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

Microsoft 209 Dec 30, 2022
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 8 Oct 3, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Marco 3 Feb 8, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Alex Pashevich 62 Dec 24, 2022
The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Shuffle Transformer The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer" Introduction Very recently, window-

null 87 Nov 29, 2022
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

null 52 Dec 29, 2022