The official implementation for "FQ-ViT: Fully Quantized Vision Transformer without Retraining".

Overview

FQ-ViT [arXiv]

This repo contains the official implementation of "FQ-ViT: Fully Quantized Vision Transformer without Retraining".

Table of Contents

Introduction

Transformer-based architectures have achieved competitive performance in various CV tasks. Compared to the CNNs, Transformers usually have more parameters and higher computational costs, presenting a challenge when deployed to resource-constrained hardware devices.

Most existing quantization approaches are designed and tested on CNNs and lack proper handling of Transformer-specific modules. Previous work found there would be significant accuracy degradation when quantizing LayerNorm and Softmax of Transformer-based architectures. As a result, they left LayerNorm and Softmax unquantized with floating-point numbers. We revisit these two exclusive modules of the Vision Transformers and discover the reasons for degradation. In this work, we propose the FQ-ViT, the first fully quantized Vision Transformer, which contains two specific modules: Powers-of-Two Scale (PTS) and Log-Int-Softmax (LIS).

Layernorm quantized with Powers-of-Two Scale (PTS)

These two figures below show that there exists serious inter-channel variation in Vision Transformers than CNNs, which leads to unacceptable quantization errors with layer-wise quantization.

Taking the advantages of both layer-wise and channel-wise quantization, we propose PTS for LayerNorm's quantization. The core idea of PTS is to equip different channels with different Powers-of-Two Scale factors, rather than different quantization scales.

Softmax quantized with Log-Int-Softmax (LIS)

The storage and computation of attention map is known as a bottleneck for transformer structures, so we want to quantize it to extreme lower bit-width (e.g. 4-bit). However, if directly implementing 4-bit uniform quantization, there will be severe accuracy degeneration. We observe a distribution centering at a fairly small value of the output of Softmax, while only few outliers have larger values close to 1. Based on the following visualization, Log2 preserves more quantization bins than uniform for the small value interval with dense distribution.

Combining Log2 quantization with i-exp, which is a polynomial approximation of exponential function presented by I-BERT, we propose LIS, an integer-only, faster, low consuming Softmax.

The whole process is visualized as follow.

Getting Started

Install

  • Clone this repo.
git clone https://github.com/linyang-zhh/FQ-ViT.git
cd FQ-ViT
  • Create a conda virtual environment and activate it.
conda create -n fq-vit python=3.7 -y
conda activate fq-vit
  • Install PyTorch and torchvision. e.g.,
conda install pytorch=1.7.1 torchvision cudatoolkit=10.1 -c pytorch

Data preparation

You should download the standard ImageNet Dataset.

├── imagenet
│   ├── train
|
│   ├── val

Run

Example: Evaluate quantized DeiT-S with MinMax quantizer and our proposed PTS and LIS

python test_quant.py deit_small <YOUR_DATA_DIR> --quant --pts --lis --quant-method minmax
  • deit_small: model architecture, which can be replaced by deit_tiny, deit_base, vit_base, vit_large, swin_tiny, swin_small and swin_base.

  • --quant: whether to quantize the model.

  • --pts: whether to use Power-of-Two Scale Integer Layernorm.

  • --lis: whether to use Log-Integer-Softmax.

  • --quant-method: quantization methods of activations, which can be chosen from minmax, ema, percentile and omse.

Results on ImageNet

This paper employs several current post-training quantization strategies together with our methods, including MinMax, EMA , Percentile and OMSE.

  • MinMax uses the minimum and maximum values of the total data as the clipping values;

  • EMA is based on MinMax and uses an average moving mechanism to smooth the minimum and maximum values of different mini-batch;

  • Percentile assumes that the distribution of values conforms to a normal distribution and uses the percentile to clip. In this paper, we use the 1e-5 percentile because the 1e-4 commonly used in CNNs has poor performance in Vision Transformers.

  • OMSE determines the clipping values by minimizing the quantization error.

The following results are evaluated on ImageNet.

Method W/A/Attn Bits ViT-B ViT-L DeiT-T DeiT-S DeiT-B Swin-T Swin-S Swin-B
Full Precision 32/32/32 84.53 85.81 72.21 79.85 81.85 81.35 83.20 83.60
MinMax 8/8/8 23.64 3.37 70.94 75.05 78.02 64.38 74.37 25.58
MinMax w/ PTS 8/8/8 83.31 85.03 71.61 79.17 81.20 80.51 82.71 82.97
MinMax w/ PTS, LIS 8/8/4 82.68 84.89 71.07 78.40 80.85 80.04 82.47 82.38
EMA 8/8/8 30.30 3.53 71.17 75.71 78.82 70.81 75.05 28.00
EMA w/ PTS 8/8/8 83.49 85.10 71.66 79.09 81.43 80.52 82.81 83.01
EMA w/ PTS, LIS 8/8/4 82.57 85.08 70.91 78.53 80.90 80.02 82.56 82.43
Percentile 8/8/8 46.69 5.85 71.47 76.57 78.37 78.78 78.12 40.93
Percentile w/ PTS 8/8/8 80.86 85.24 71.74 78.99 80.30 80.80 82.85 83.10
Percentile w/ PTS, LIS 8/8/4 80.22 85.17 71.23 78.30 80.02 80.46 82.67 82.79
OMSE 8/8/8 73.39 11.32 71.30 75.03 79.57 79.30 78.96 48.55
OMSE w/ PTS 8/8/8 82.73 85.27 71.64 78.96 81.25 80.64 82.87 83.07
OMSE w/ PTS, LIS 8/8/4 82.37 85.16 70.87 78.42 80.90 80.41 82.57 82.45

Citation

If you find this repo useful in your research, please consider citing the following paper:

@misc{
    lin2021fqvit,
    title={FQ-ViT: Fully Quantized Vision Transformer without Retraining}, 
    author={Yang Lin and Tianyu Zhang and Peiqin Sun and Zheng Li and Shuchang Zhou},
    year={2021},
    eprint={2111.13824},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Comments
  • log_int_softmax int64 问题

    log_int_softmax int64 问题

    你好,我在把 FQ-ViT 移植进 ncnn, WIP 分支在 https://github.com/tpoisonooo/ncnn/blob/75061d9d46654a4abf52969ea6bfe53698177db9/src/layer/multiheadattention.cpp#L181 ,还在调试...

    计算 log_int_softmax 的时候,这一句 可能会超过 int32

    exp_int, exp_scaling_factor = int_polynomial(r, scaling_factor)
    exp_int = torch.clamp(torch.floor(exp_int * 2**(n - q)), min=0)
    

    例如 [x,r,exp_int 0 ,0 ,60129542144 ]

    这时候不得不上 int64_t(后面算 sum 要额外开辟内存),想问下有没有办法能不超过 int32_t ?

    opened by tpoisonooo 8
  • ViT-B  add ptf reshape_tensor 问题

    ViT-B add ptf reshape_tensor 问题

    我执行的是 ViT-B

    $  python3 test_quant.py  vit_base ./quantdata/ --quant --ptf --lis --quant-method minmax
    

    add opr 的输出 shape 是 [1, 197, 768], 按 channel-wise 的语义, min/max 的 shape 不应是 [197] 么。

    为啥 reshape_tensor 做了一次 transpose,导致最后 shape 是 [768]。

    后面 quant 里 get_reshape_range 特意用 (1,1,-1) ,感觉也不是个 bug 而是个精妙的设计。

        def reshape_tensor(self, v):
            if not isinstance(v, torch.Tensor):
                v = torch.tensor(v)
            v = v.detach()
            if self.module_type in ['conv_weight', 'linear_weight']:
                v = v.reshape(v.shape[0], -1)
            elif self.module_type == 'activation':
                if len(v.shape) == 4:
                    v = v.permute(0, 2, 3, 1)
                v = v.reshape(-1, v.shape[-1])
                v = v.transpose(0, 1)    **为啥这里要 transpose ?**
            else:
                raise NotImplementedError
            return v
    
    opened by tpoisonooo 6
  • Reproducing 8/8/8 for ViT-Base

    Reproducing 8/8/8 for ViT-Base

    Can explain how to reproduce 8/8/8 for ViT-Base? I assume that the following command is for 8/8/4:

    python test_quant.py vit_base <YOUR_DATA_DIR> --quant --ptf --lis --quant-method minmax

    Additionally, does Attn Bits refer to just the softmax quantization (ie. the argument BIT_TYPE_S in config.py)?

    Thanks so much!

    opened by nfrumkin 5
  • can not get scale

    can not get scale

    I just want to test the FQVit results in Segmentation with quantized SwinTransform backbone, and I only change the code of Mlp as following:

    class Mlp(nn.Module):
        """ Multilayer perceptron."""
    
        def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.,quant=True,calibrate=False):
            super().__init__()
            out_features = out_features or in_features
            hidden_features = hidden_features or in_features
    
            self.fc1 = QLinear(
                in_features,
                hidden_features,
                quant=quant,
                calibrate=calibrate,
                bit_type=BIT_TYPE_DICT["int4"],
                calibration_mode="channel_wise",
                observer_str="minmax",
                quantizer_str="uniform"
            )
    
            self.act = act_layer()
            self.qact1 = QAct(
                quant=quant,
                calibrate=calibrate,
                bit_type=BIT_TYPE_DICT["uint4"],
                calibration_mode="layer_wise",
                observer_str= "minmax",
                quantizer_str="uniform"
            )
    
            self.fc2 = QLinear(
                hidden_features,
                out_features,
                quant=quant,
                calibrate=calibrate,
                bit_type=BIT_TYPE_DICT["int4"],
                calibration_mode="channel_wise",
                observer_str="minmax",
                quantizer_str="uniform"
            )
    
            self.qact2 = QAct(
                quant=quant,
                calibrate=calibrate,
                bit_type=BIT_TYPE_DICT["uint4"],
                calibration_mode="layer_wise",
                observer_str= "minmax",
                quantizer_str="uniform"
            )
    
            self.drop = nn.Dropout(drop) 
    

    and I didn't modify the training code from github: https://github.com/SwinTransformer/Swin-Transformer-Semantic-Segmentation

    but get result:

      File "/home/code/SwinTransformer/FQVIT/ptq/quantizer/base.py", line 45, in forward
        outputs = self.quant(inputs)
      File "/home/code/SwinTransformer/FQVIT/ptq/quantizer/uniform.py", line 30, in quant
        scale = scale.reshape(range_shape)
    AttributeError: 'NoneType' object has no attribute 'reshape'
    

    Could you please help me where I did wrong? Thanks a lot for your kindness.

    opened by youdutaidi 5
  • How to Get a pretrained model to do PTQ?

    How to Get a pretrained model to do PTQ?

    Thank you for sharing your PTQ code but I test Deit-Small with min-max quant technic, top@1 accuracy in validation dataset is only 0.11, can not match the papers 75.05%,is there anything wrong?

    opened by youdutaidi 5
  • The file structure of COCO dataset

    The file structure of COCO dataset

    Hi, I am wondering how to specify the data dir and the file structure of COCO dataset for quantizing DeiT models? I tried to make it like ImageNet dataset which has been given in README, but there is an error:

    RuntimeError: Found 0 files in subfolders of: /xxx/xxx/coco/val
    Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp
    

    And if I add another folder in the original folder like ./val -> ./val/1/. It was no error but soon ended, and the model wasn't trained at all. It would be helpful if the authors could show the content of the directory of COCO dataset in a tree-like format. Many thanks!

    Sincerely, Yifu

    opened by yifu-ding 5
  • Dequantization

    Dequantization

    Hi, thank you for sharing the coding.

    Could you please explain why you dequantize the values after quantization?

    https://github.com/linyang-zhh/FQ-ViT/blob/16122ee7ea33e80aed3edd29cfebb3ab2ce2cb69/models/ptq/quantizer/base.py#L46

    opened by ebsrn 4
  • What is the purpose of clamping zero point in the range of qmin and qmax?

    What is the purpose of clamping zero point in the range of qmin and qmax?

    Hi, thanks for the wonderful works of your paper and code. I was looking into your code and I couldn't understand why you need to clamp the zero point in the range of qmin and qmax. I lack knowledge of this field and hope that you can explain it for me, please.

    https://github.com/linyang-zhh/FQ-ViT/blob/16122ee7ea33e80aed3edd29cfebb3ab2ce2cb69/models/ptq/observer/minmax.py#L49

    opened by airacid 3
  • when i use the method to do object detection, how to apply the calibration step for that detector in MMDet framework?

    when i use the method to do object detection, how to apply the calibration step for that detector in MMDet framework?

    Good jobs! I try to use the code to realize the object detection,but I can't understand the step "apply the calibration step for that detector in MMDet framework",can you give me more detailed instructions?thanks very much!

    opened by Wuyy-fairy 1
  • A question about Log2Quantization when Activations or Weights equal to zeros

    A question about Log2Quantization when Activations or Weights equal to zeros

    I appreciate your Log2Quantization methods in FQ-Vit but I have a questions that: when activations and weights equal to zero, we can not log2(0), and when they are negtive numbers ,we can not opreate it , either. But it seems in the code it did not notice that problem.

    opened by youdutaidi 1
  • Is GELU operation quantized as well?

    Is GELU operation quantized as well?

    Hi there,

    Thanks a lot for sharing the coding. I have a quick question, is GELU layer fully quantized as well? I see you mention that the vision transformer is fully quantized, but I was not able to find where you quantize GELU in code, nor do I find any description of it in the paper? Did I miss something? Thanks a lot for your clarification.

    opened by Kevinpsk 1
  • How to visualize the Figure3?

    How to visualize the Figure3?

    Hello, Thanks for this amazing work! I want to know how to obtain distribution like Figure3. Can you share the visualization code with me? I am looking forward to your reply!

    opened by YoloEliwa 2
  • 最后没有精度的原因可能有哪些?

    最后没有精度的原因可能有哪些?

    按照论文里,从ImageNet中随机选择1000张训练图片作为校准数据集,但是为什么最后的结果里没有精度呢?

    Test: [104/115] Time 0.142 (0.400) Loss 9.3196 (9.1861) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [105/115] Time 0.448 (0.400) Loss 9.2776 (9.1869) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [106/115] Time 0.355 (0.400) Loss 8.9878 (9.1851) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [107/115] Time 0.148 (0.397) Loss 9.2649 (9.1858) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [108/115] Time 0.449 (0.398) Loss 9.0594 (9.1846) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [109/115] Time 0.145 (0.396) Loss 8.9516 (9.1825) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [110/115] Time 0.448 (0.396) Loss 9.0349 (9.1812) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [111/115] Time 0.353 (0.396) Loss 9.1589 (9.1810) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [112/115] Time 0.352 (0.395) Loss 9.4418 (9.1833) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [113/115] Time 0.283 (0.394) Loss 9.4959 (9.1860) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) Test: [114/115] Time 0.322 (0.394) Loss 9.3794 (9.1872) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000)

    opened by roncedupon 1
Owner
null
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

NVIDIA Research Projects 3.2k Dec 30, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 35 Dec 6, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

CV Lab @ Yonsei University 87 Dec 30, 2022
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 59 Dec 17, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

null 46 Nov 9, 2022
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Đorđe Miladinović   Aleksandar Stanić   Stefan Bauer   Jürgen Schmid

Djordje Miladinovic 34 Jan 19, 2022
Official implementation of YOGO for Point-Cloud Processing

You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module By Chenfeng Xu, Bohan Zhai, Bichen Wu, T

Chenfeng Xu 67 Dec 20, 2022