Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Zechun Liu

Last update: Dec 28, 2022

Related tags

Deep Learning Nonuniform-to-Uniform-Quantization

Overview

Nonuniform-to-Uniform Quantization

This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation"

In this study, we propose a quantization method that can learn the non-uniform input thresholds to maintain the strong representation ability of nonuniform methods, while output uniform quantized levels to be hardware-friendly and efficient as the uniform quantization for model inference.

To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters.

The formula for N2UQ is simply as follows,

Forward pass:

Backward pass:

Moreover, we proposed L1 norm based entropy preserving weight regularization for weight quantization.

Citation

If you find our code useful for your research, please consider citing:

@inproceedings{liu2022nonuniform,
  title={Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation},
  author={Liu, Zechun and Cheng, Kwang-Ting and Huang, Dong and Xing, Eric and Shen, Zhiqiang},
  journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Run

1. Requirements:

python 3.6, pytorch 1.7.1, torchvision 0.8.2
gdown

2. Data:

Download ImageNet dataset

3. Pretrained Models:

pip install gdown # gdown will automatically download the models
If gdown doesn't work, you may need to manually download the pretrained models and put them in the correponding ./models/ folder.

4. Steps to run:

(1) For ResNet architectures:

Change directory to ./resnet/
Run bash run.sh architecture n_bits quantize_downsampling
E.g., bash run.sh resnet18 2 0 for quantize resnet18 to 2-bit without quantizing downsampling layers

(2) For MobileNet architectures:

Change directory to ./mobilenetv2/
Run bash run.sh

Models

1. ResNet

Network	Methods	W2/A2	W3/A3	W4/A4
ResNet-18
	PACT	64.4	68.1	69.2
	DoReFa-Net	64.7	67.5	68.1
	LSQ	67.6	70.2	71.1
	N2UQ	69.4 Model-Res18-2bit	71.9 Model-Res18-3bit	72.9 Model-Res18-4bit
	N2UQ *	69.7 Model-Res18-2bit	72.1 Model-Res18-3bit	73.1 Model-Res18-4bit
ResNet-34
	LSQ	71.6	73.4	74.1
	N2UQ	73.3 Model-Res34-2bit	75.2 Model-Res34-3bit	76.0 Model-Res34-4bit
	N2UQ *	73.4 Model-Res34-2bit	75.3 Model-Res34-3bit	76.1 Model-Res34-4bit
ResNet-50
	PACT	64.4	68.1	69.2
	LSQ	67.6	70.2	71.1
	N2UQ	75.8 Model-Res50-2bit	77.5 Model-Res50-3bit	78.0 Model-Res50-4bit
	N2UQ *	76.4 Model-Res50-2bit	77.6 Model-Res50-3bit	78.0 Model-Res50-4bit

Note that N2UQ without * denotes quantizing all the convolutional layers except the first input convolutional layer.

N2UQ with * denotes quantizing all the convolutional layers except the first input convolutional layer and three downsampling layers.

W2/A2, W3/A3, W4/A4 denote the cases where the weights and activations are both quantized to 2 bits, 3 bits, and 4 bits, respectively.

2. MobileNet

Network	Methods	W4/A4
MobileNet-V2	N2UQ	72.1 Model-MBV2-4bit

Contact

Zechun Liu, HKUST (zliubq at connect.ust.hk)

Comments

Concerns on non-linear function

Hello, thanks for your excellent paper and code! I have some concerns on the non-linear functions that are used in the ResNet and Mobilenet. Could you please provide more details? In the Section 4 of the paper, the authors claim that "they use RPReLU [32] as non-linear function":

However, in this code, it seems that you use PReLU for Resnet and ReLU6 for MobileNet. May this difference affect the accuracy of quantized models seriously?

opened by HaoKun-Li 1
Accuracy of the floating-point ResNet18 model?

Hello, thanks for your excellent work and code! There is one question that confused me. In Table 1 of your paper, the Top1 Accuracy of the pre-trained FP Resnet18 model is 71.8%. But in your code, the pre-trained FP Resnet18 model whose Top1 Accuracy is 69.758% came from the torchvision. The link to torchvision's pre-trained weight is [https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py], from lines 312 to 329. Why are they quite different? Did I use the right pre-trained weight (resnet18-f37072fd.pth)?

opened by dailingjun 1
Concerns about the loss function

Hello, thanks for your excellent work and code!

In the paper, the authors claim that they use the same knowledge distillation scheme as LSQ to train the quantized models. I show the screenshot as follows:

However, in the paper of LSQ, LSQ uses both the distillation loss function of Hinton et al. (2015) with temperature of 1 and equal weight given to the standard loss during training. I show the screenshot as follows:

When I read the code in this github. I notice that you have defined both the KD_loss and CrossEntropyLabelSmooth loss, but you use only distillation loss to train the quantized models. Is this a mistake, or a trick to improve the accuracy?

opened by HaoKun-Li 0
about result

Hello, I am very glad to see the latest and best quantitative method. I have some questions while reproducing the results. I hope you can answer your questions. I used the script for training RESNET provided by you to reproduce the 2BIT result, but I repeated it twice, and the result was 68.9, which was much different from the 69.4 in the paper report. However, I could get the result of the paper by using the model provided by you. Therefore, I would like to ask what problems I should pay attention to about the reproduction. What is the reason for my low result? thank you.

opened by xiaolonghao 4
Some questions about the code

eq1:What is the role of the "LearnableBias" class in the code? eq2:How to understand the correspondence between w^r^' in the paper and line 100 in 'resnet.py' in the code. That is “scaling_factor = gamma * torch.mean(torch.mean(torch.mean(abs(real_weights),dim=3,keepdim=True),dim=2,keepdim=True),dim=1,keepdim=True)” thank you.

opened by xiaolonghao 0

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Related tags

Overview

Nonuniform-to-Uniform Quantization

Citation

Run

1. Requirements:

2. Data:

3. Pretrained Models:

4. Steps to run:

Models

1. ResNet

2. MobileNet

Contact

Comments

Concerns on non-linear function

Accuracy of the floating-point ResNet18 model?

Concerns about the loss function

about result

Some questions about the code

Owner

Zechun Liu

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021).

Rlmm blender toolkit - A set of tools to streamline level generation in UDK straight from Blender

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

[CVPR 2022] Deep Equilibrium Optical Flow Estimation

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

A lightweight deep network for fast and accurate optical flow estimation.

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation