Post-Training Quantization for Vision transformers.

Zhihang Yuan

Last update: Dec 28, 2022

Related tags

Deep Learning PTQ4ViT

Overview

PTQ4ViT

Post-Training Quantization Framework for Vision Transformers. We use the twin uniform quantization method to reduce the quantization error on these activation values. And we use a Hessian guided metric to evaluate different scaling factors, which improves the accuracy of calibration with a small cost. The quantized vision transformers (ViT, DeiT, and Swin) achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task. Please read the paper for details.

Install

Requirement

python>=3.5
pytorch>=1.5
matplotlib
pandas
timm

Datasets

To run example testing, you should put your ImageNet2012 dataset in path /datasets/imagenet.

We use ViTImageNetLoaderGenerator in utils/datasets.py to initialize our DataLoader. If your Imagenet datasets are stored elsewhere, you'll need to manually pass its root as an argument when instantiating a ViTImageNetLoaderGenerator.

Usage

1. Run example quantization

To test on all models with BasePTQ/PTQ4ViT, run

python example/test_all.py

To run ablation testing, run

python example/test_ablation.py

You can run the testing scripts with multiple GPUs. For example, calling

python example/test_all.py --multigpu --n_gpu 6

will use 6 gpus to run the test.

2. Download quantized model checkpoints

(Coming soon)

Results

Results of BasePTQ

model	original	w8a8	w6a6
ViT-S/224/32	75.99	73.61	60.144
ViT-S/224	81.39	80.468	70.244
ViT-B/224	84.54	83.896	75.668
ViT-B/384	86.00	85.352	46.886
DeiT-S/224	79.80	77.654	72.268
DeiT-B/224	81.80	80.946	78.786
DeiT-B/384	83.11	82.33	68.442
Swin-T/224	81.39	80.962	78.456
Swin-S/224	83.23	82.758	81.742
Swin-B/224	85.27	84.792	83.354
Swin-B/384	86.44	86.168	85.226

Results of PTQ4ViT

model	original	w8a8	w6a6
ViT-S/224/32	75.99	75.582	71.908
ViT-S/224	81.39	81.002	78.63
ViT-B/224	84.54	84.25	81.65
ViT-B/384	86.00	85.828	83.348
DeiT-S/224	79.80	79.474	76.282
DeiT-B/224	81.80	81.482	80.25
DeiT-B/384	83.11	82.974	81.55
Swin-T/224	81.39	81.246	80.47
Swin-S/224	83.23	83.106	82.38
Swin-B/224	85.27	85.146	84.012
Swin-B/384	86.44	86.394	85.388

Results of Ablation

ViT-S/224 (original top-1 accuracy 81.39%)

Hessian Guided	Softmax Twin	GELU Twin	W8A8	W6A6
			80.47	70.24
✓			80.93	77.20
✓	✓		81.11	78.57
✓		✓	80.84	76.93
	✓	✓	79.25	74.07
✓	✓	✓	81.00	78.63

ViT-B/224 (original top-1 accuracy 84.54%)

Hessian Guided	Softmax Twin	GELU Twin	W8A8	W6A6
			83.90	75.67
✓			83.97	79.90
✓	✓		84.07	80.76
✓		✓	84.10	80.82
	✓	✓	83.40	78.86
✓	✓	✓	84.25	81.65

ViT-B/384 (original top-1 accuracy 86.00%)

Hessian Guided	Softmax Twin	GELU Twin	W8A8	W6A6
			85.35	46.89
✓			85.42	79.99
✓	✓		85.67	82.01
✓		✓	85.60	82.21
	✓	✓	84.35	80.86
✓	✓	✓	85.89	83.19

Citation

@article{PTQ4ViT_cvpr2022,
    title={PTQ4ViT: Post-Training Quantization Framework for Vision Transformers},
    author={Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, Guangyu Sun},
    journal={arXiv preprint arXiv:2111.12293},
    year={2022},
}

Comments

Are the quantized models differentiable

Hi, Are the quantized models differentiable(i.e. can we get gradients using backprop using these) or is it not possible due to actual INT8 quantization? Pl reply ASAP

opened by SwapnilDreams100 2

calibration parameters

Hi, may I ask for more details on what this function does?

   def _initialize_calib_parameters(self):
        """ 
        set parameters for feeding calibration data
        """
        self.calib_size = int(self.raw_input.shape[0])
        self.calib_batch_size = int(self.raw_input.shape[0])
        while True:
            numel = (2*(self.raw_input.numel()+self.raw_out.numel()) /
                     self.calib_size*self.calib_batch_size)  # number of parameters on GPU
            self.parallel_eq_n = int((15*1024*1024*1024/4)//numel)
            if self.parallel_eq_n <= 1:
                self.calib_need_batching = True
                self.calib_batch_size //= 2
            else:
                break

I am adapting your code for 1DConvolution and the self.parallel_eq_n is multiplying by the output channels of the weights and causing memory issues. If you could provide further details it would be really helpful. Thankyou. Ed

opened by ed-fish 2

实验结果会集中于几个数字
作者您好，非常感谢您的突出贡献。我在运行您的代码时遇到几个问题需要请教一下：

为什么我的实验运行结果会集中于几个数字，比如在对vit_tiny_patch16_224运行时的消融实验结果是：

我的数据集训练和验证数据集都是将32张图片分成4类（文件夹），每一类中分别有8张图片。这样做是否正确，如果不正确请您指教 2. 在linear.py中class PTQSLQuantlinear 是在哪用到的？basePTQ? 希望有时间的话您能够回复一下，谢谢
opened by shysss 0
The problems of program result running error

Dear author, thank you for your outstanding contribution. But I'm having some problems running the program: I used the dataset Imagenet2012 in your program and did not change any parameters (I found the α of PTQ4ViT =0.01 in the program), but there is a little difference between the result of several attempts and the original result. For example, in the result of PTQ4ViT, your ViT-B/224 accuracy rate of w8a8 is 84.25, but mine is 84.148. Your ViT-B/224 accuracy rate of w6a6 is 81.65, but mine is 81.844, and so on. I use an Nvidia RTX 3090-24G. Did I do something wrong? We look forward to your reply. Thanks again!

opened by AM-Nina8086 0
Constrain the scaling factors of the two ranges

First of all, thank you for the great work and the official code.

I have one question.

Where is the code implementation for constraining the scaling factors for post-softmax and post-gelu, i.e., ∆R2 = 2m∆R1, where m is an unsigned integer, in order for an efficient process.

I really appreciate for providing the code once again.

opened by Shape-Kim 4
How to load the quantized models with PTQ4ViT into the net?

Hi! Thanks for your great work! There was a problem when I was trying to load the quantized model 'vit_base_patch16_384.pth' (you've provided) into the net created by timm.create_model. The error is as follows.

RuntimeError: Error(s) in loading state_dict for VisionTransformer: Missing key(s) in state_dict: "cls_token", "pos_embed", "patch_embed.proj.weight", "patch_embed.proj.bias", "blocks.0.norm1.weight", "blocks.0.norm1.bias", "blocks.0.attn.qkv.weight", "blocks.0.attn.qkv.bias", "blocks.0.attn.proj.weight", "blocks.0.attn.proj.bias", "blocks.0.norm2.weight", "blocks.0.norm2.bias", "blocks.0.mlp.fc1.weight", "blocks.0.mlp.fc1.bias", "blocks.0.mlp.fc2.weight", "blocks.0.mlp.fc2.bias".......

Would you please provide the correct method to load your quantized model? Thank you a lot~

opened by uniqzheng 3

Owner

Zhihang Yuan

GitHub

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Differentiable Model Compression via Pseudo Quantization Noise DiffQ performs differentiable quantization using pseudo quantization noise. It can auto

145 Dec 30, 2022

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

HAWQ: Hessian AWare Quantization HAWQ is an advanced quantization library written for PyTorch. HAWQ enables low-precision and mixed-precision uniform

293 Dec 30, 2022

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Nonuniform-to-Uniform Quantization This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quanti

60 Dec 28, 2022

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Transformer-PyTorch A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (

22 Feb 27, 2022

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit counterparts.

51 Dec 10, 2022

Degree-Quant: Quantization-Aware Training for Graph Neural Networks.

Degree-Quant This repo provides a clean re-implementation of the code associated with the paper Degree-Quant: Quantization-Aware Training for Graph Ne

35 Oct 7, 2022

MQBench Quantization Aware Training with PyTorch

MQBench Quantization Aware Training with PyTorch I am using MQBench(Model Quantization Benchmark)(http://mqbench.tech/) to quantize the model for depl

29 Nov 18, 2022

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

transformer-from-scratch Code for my Medium blog post: Transformers from Scratch in PyTorch Note: This Transformer code does not include masked attent

27 Dec 21, 2022

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Self-Supervised Vision Transformers with DINO PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supe

4.2k Jan 3, 2023

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

127 Dec 23, 2022

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Codes for this paper: [CVPR 2022] The Pr

16 Nov 26, 2022

As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

68 Sep 5, 2022

QKeras: a quantization deep learning library for Tensorflow Keras

QKeras github.com/google/qkeras QKeras 0.8 highlights: Automatic quantization using QKeras; Stochastic behavior (including stochastic rouding) is disa

437 Jan 3, 2023

I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

139 Dec 27, 2022

FID calculation with proper image resizing and quantization steps

clean-fid: Fixing Inconsistencies in FID Project | Paper The FID calculation involves many steps that can produce inconsistencies in the final metric.

606 Jan 6, 2023

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Efficient implementations of Product Quantization and its variants using Pytorch and CUDA

146 Dec 28, 2022

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

18 Sep 2, 2022

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark We propose a benchmark to evaluate different quantization algorithms on vari

494 Dec 29, 2022

This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.

PyTorch implementation of DAQ This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021. For more informatio

36 Nov 4, 2022

Post-Training Quantization for Vision transformers.

Related tags

Overview

PTQ4ViT

Install

Requirement

Datasets

Usage

1. Run example quantization

2. Download quantized model checkpoints

Results

Results of BasePTQ

Results of Ablation

Citation

Comments

Are the quantized models differentiable

calibration parameters

实验结果会集中于几个数字

The problems of program result running error

Constrain the scaling factors of the two ranges

How to load the quantized models with PTQ4ViT into the net?

Owner

Zhihang Yuan

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

Degree-Quant: Quantization-Aware Training for Graph Neural Networks.

MQBench Quantization Aware Training with PyTorch

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

As-ViT: Auto-scaling Vision Transformers without Training

QKeras: a quantization deep learning library for Tensorflow Keras

I-BERT: Integer-only BERT Quantization

FID calculation with proper image resizing and quantization steps

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.