QKeras: a quantization deep learning library for Tensorflow Keras

Google

Last update: Jan 3, 2023

Related tags

Deep Learning machine-learning fpga deep-learning tensorflow accelerator keras quantization hardware-acceleration fpga-accelerator quantized-neural-networks asic-design quantized-networks

Overview

QKeras

github.com/google/qkeras

QKeras 0.8 highlights:

Automatic quantization using QKeras;
Stochastic behavior (including stochastic rouding) is disabled during inference;
LeakyReLU for quantized_relu;
Qtools for estimating effort to perform inference;
- Qtools will estimate the sizes and types of operations to perform inference, with its data sizes compatible with high-level synthesis datatypes. For example, quantized_bits and quantized_relu bits and int_bits from Qtools will match exactly ac_fixed datatypes (if you rely on QKeras alone, the correct datatype should be ac_fixed<bits, int_bits+is_negative, is_negative>, where is_negative has to be inferred from the other parameters of the quantizer.

Introduction

QKeras is a quantization extension to Keras that provides drop-in replacement for some of the Keras layers, especially the ones that creates parameters and activation layers, and perform arithmetic operations, so that we can quickly create a deep quantized version of Keras network.

According to Tensorflow documentation, Keras is a high-level API to build and train deep learning models. It's used for fast prototyping, advanced research, and production, with three key advantages:

User friendly

Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.

Modular and composable

Keras models are made by connecting configurable building blocks together, with few restrictions.

Easy to extend

Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.

QKeras is being designed to extend the functionality of Keras using Keras' design principle, i.e. being user friendly, modular and extensible, adding to it being "minimally intrusive" of Keras native functionality.

In order to successfully quantize a model, users need to replace variable creating layers (Dense, Conv2D, etc) by their counterparts (QDense, QConv2D, etc), and any layers that perform math operations need to be quantized afterwards.

Publications

http://arxiv.org/abs/2006.10159

Layers Implemented in QKeras

QDense
QConv1D
QConv2D
QDepthwiseConv2D
QSeparableConv1D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)
QSeparableConv2D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)
QMobileNetSeparableConv2D (extended from MobileNet SeparableConv2D implementation, quantizes the activation values after the depthwise step)
QConv2DTranspose
QActivation
QAdaptiveActivation [EXPERIMENTAL]
QAveragePooling2D (in fact, an AveragePooling2D stacked with a QActivation layer for quantization of the result)
QBatchNormalization (is still in its experimental stage, as we have not seen the need to use this yet due to the normalization and regularization effects of stochastic activation functions.)
QOctaveConv2D
QSimpleRNN, QSimpleRNNCell
QLSTM, QLSTMCell
QGRU, QGRUCell
QBidirectional

It is worth noting that not all functionality is safe at this time to be used with other high-level operations, such as with layer wrappers. For example, Bidirectional layer wrappers are used with RNNs. If this is required, we encourage users to use quantization functions invoked as strings instead of the actual functions as a way through this, but we may change that implementation in the future.

A first attempt to create a safe mechanism in QKeras is the adoption of QActivation is a wrap-up that provides an encapsulation around the activation functions so that we can save and restore the network architecture, and duplicate them using Keras interface, but this interface has not been fully tested yet.

Activation Layers Implemented in QKeras

smooth_sigmoid(x)
hard_sigmoid(x)
binary_sigmoid(x)
binary_tanh(x)
smooth_tanh(x)
hard_tanh(x)
quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=1)(x)
bernoulli(alpha=1.0)(x)
stochastic_ternary(alpha=1.0, threshold=0.33)(x)
ternary(alpha=1.0, threshold=0.33)(x)
stochastic_binary(alpha=1.0)(x)
binary(alpha=1.0)(x)
quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0)(x)
quantized_ulaw(bits=8, integer=0, symmetric=0, u=255.0)(x)
quantized_tanh(bits=8, integer=0, symmetric=0)(x)
quantized_po2(bits=8, max_value=-1)(x)
quantized_relu_po2(bits=8, max_value=-1)(x)

The stochastic_* functions, bernoulli as well as quantized_relu and quantized_tanh rely on stochastic versions of the activation functions. They draw a random number with uniform distribution from _hard_sigmoid of the input x, and result is based on the expected value of the activation function. Please refer to the papers if you want to understand the underlying theory, or the documentation in qkeras/qlayers.py.

The parameters "bits" specify the number of bits for the quantization, and "integer" specifies how many bits of "bits" are to the left of the decimal point. Finally, our experience in training networks with QSeparableConv2D, both quantized_bits and quantized_tanh that generates values between [-1, 1), required symmetric versions of the range in order to properly converge and eliminate the bias.

Every time we use a quantization for weights and bias that can generate numbers outside the range [-1.0, 1.0], we need to adjust the *_range to the number. For example, if we have a quantized_bits(bits=6, integer=2) in a weight of a layer, we need to set the weight range to 2**2, which is equivalent to Catapult HLS ac_fixed<6, 3, true>. Similarly, for quantization functions that accept an alpha parameter, we need to specify a range of alpha, and for po2 type of quantizers, we need to specify the range of max_value.

Example

Suppose you have the following network.

An example of a very simple network is given below in Keras.

from keras.layers import *

x = x_in = Input(shape)
x = Conv2D(18, (3, 3), name="first_conv2d")(x)
x = Activation("relu")(x)
x = SeparableConv2D(32, (3, 3))(x)
x = Activation("relu")(x)
x = Flatten()(x)
x = Dense(NB_CLASSES)(x)
x = Activation("softmax")(x)

You can easily quantize this network as follows:

from keras.layers import *
from qkeras import *

x = x_in = Input(shape)
x = QConv2D(18, (3, 3),
        kernel_quantizer="stochastic_ternary",
        bias_quantizer="ternary", name="first_conv2d")(x)
x = QActivation("quantized_relu(3)")(x)
x = QSeparableConv2D(32, (3, 3),
        depthwise_quantizer=quantized_bits(4, 0, 1),
        pointwise_quantizer=quantized_bits(3, 0, 1),
        bias_quantizer=quantized_bits(3),
        depthwise_activation=quantized_tanh(6, 2, 1))(x)
x = QActivation("quantized_relu(3)")(x)
x = Flatten()(x)
x = QDense(NB_CLASSES,
        kernel_quantizer=quantized_bits(3),
        bias_quantizer=quantized_bits(3))(x)
x = QActivation("quantized_bits(20, 5)")(x)
x = Activation("softmax")(x)

The last QActivation is advisable if you want to compare results later on. Please find more cases under the directory examples.

QTools

The purpose of QTools is to assist hardware implementation of the quantized model and model energy consumption estimation. QTools has two functions: data type map generation and energy consumption estimation.

Data Type Map Generation: QTools automatically generate the data type map for weights, bias, multiplier, adder, etc. of each layer. The data type map includes operation type, variable size, quantizer type and bits, etc. Input of the QTools is:

a given quantized model;
a list of input quantizers for the model. Output of QTools json file that list the data type map of each layer (stored in qtools_instance._output_dict) Output methods include: qtools_stats_to_json, which is to output the data type map to a json file; qtools_stats_print which is to print out the data type map.

Energy Consumption Estimation: Another function of QTools is to estimate the model energy consumption in Pico Joules (pJ). It provides a tool for QKeras users to quickly estimate energy consumption for memory access and MAC operations in a quantized model derived from QKeras, especially when comparing power consumption of two models running on the same device.

As with any high-level model, it should be used with caution when attempting to estimate the absolute energy consumption of a model for a given technology, or when attempting to compare different technologies.

This tool also provides a measure for model tuning which needs to consider both accuracy and model energy consumption. The energy cost provided by this tool can be integrated into a total loss function which combines energy cost and accuracy.

Energy Model: The best work referenced by the literature on energy consumption was first computed by Horowitz M.: “1.1 computing’s energy problem ( and what we can do about it)”; IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014

In this work, the author attempted to estimate the energy consumption for accelerators, and for 45 nm process, the data points he presented has since been used whenever someone wants to compare accelerator performance. QTools energy consumption on a 45nm process is based on the data published in this work.

Examples: Example of how to generate data type map can be found in qkeras/qtools/ examples/example_generate_json.py. Example of how to generate energy consumption estimation can be found in qkeras/qtools/examples/example_get_energy.py

AutoQKeras

AutoQKeras allows the automatic quantization and rebalancing of deep neural networks by treating quantization and rebalancing of an existing deep neural network as a hyperparameter search in Keras-Tuner using random search, hyperband or gaussian processes.

In order to contain the explosion of hyperparameters, users can group tasks by patterns, and perform distribute training using available resources.

Extensive documentation is present in notebook/AutoQKeras.ipynb.

Related Work

QKeras has been implemented based on the work of "B.Moons et al. - Minimum Energy Quantized Neural Networks", Asilomar Conference on Signals, Systems and Computers, 2017 and "Zhou, S. et al. - DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients," but the framework should be easily extensible. The original code from QNN can be found below.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks. Finally, our main goal is easy of use, so we attempt to make QKeras layers a true drop-in replacement for Keras, so that users can easily exchange non-quantized layers by quantized ones.

Acknowledgements

Portions of QKeras were derived from QNN.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

Comments

Kernel weights and activations not quantized after training

Hi there! I was interested in implementing the Qkeras example for MNIST CNN model as given in the examples section - Link. This examples involves quantizing the weights and activations into INT4 or 4 bits using the quantized_bits(4,0,1) method for Conv kernels and activations. I was expecting the weights and activations to be in INT4 but they were in FP32 and there wasn't any integer left of decimal point. I ran some experiments with the quantized_bits() method and the results were quantized! And here are the weights and activations for the MNIST model after model_save_quantized_weights(): I would essentially want to save the quantized model with the INT8 or INT4 weights and convert into a TRT engine and do GPU inferencing. Any pointers? Thanks, Yoga

opened by YogaVicky 11
Converting regular Keras weights to Qkeras

Hello,

First I wanted to say: kudos for creating this library; I'm really excited to try it out on different models!

I saw in the readme:

QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks.

Is there any documentation on using those tools to convert pretrained weights (e.g. ImageNet) to the quantized versions?

Thanks!

opened by xhluca 10
Transposed convolution (deconvolution)

This PR adds QConv2DTranspose which is useful in autoencoders. I can also add QConv1DTranspose but the equivalent Keras layer is only avaiable in nightly TF releases, not in stable channel yet.
cla: yes ready to pull

opened by vloncar 9
Support JSON save/load

Currently, QKeras doesn't support saving model with model.to_json() and loading with load_from_json. This extends the QDense, QConv1D, QConv2D, QDepthwiseConv2D and QBatchNormalization as well as the quantizers and activations to support this functionality.
cla: yes ready to pull

opened by vloncar 8
add non-stochastic inference mode to quantizers

Currently, models with stochastic quantizers peform stochastic operations even in inference mode. This mechanism prevents this and uses the non-stochastic version of the quantizer if in inference mode.
cla: no

opened by jecorona97 7
Fixing Conv1D's weight info extraction

Hi,

This is a PR to fix the print_qstats error I mentioned in #13 . Essentially I just replaced kernel_h and kernel_w extraction by kernel_length for Conv1D. Someone might have copied the code from Conv2D and forgot to change it. I tested and it worked fine.

Regards,

Duc.
cla: yes ready to pull

opened by Duchstf 7
Using qkeras layers concurrently with Tensorflow's pruning tools.
Hello, very cool project!!

I'm just wondering if it would be possible to train the model using qkeras layers with the pruning tools in Tensorflow's model optimization package. For example, can we have something like this?

tf.keras.Sequential([ sparsity.prune_low_magnitude( l.QConv2D(32, 5, padding='same', activation='relu'), input_shape=input_shape, **pruning_params)])

Thanks,

Duc.
enhancement
opened by Duchstf 7
QSeparableConv1D and 2D

This PR adds QSeparableConv1D and QSeparableConv2D. The existing QSeparableConv2D (which expands to QDepthwiseConv2D and 1x1 QConv2D) that is based on MobileNet is retained and renamed QMobileNetSeparableConv2D
cla: yes ready to pull

opened by vloncar 6

print_qstats(): operation type issue with Sequential() model

When I was applying quantization on a Keras Sequential() model, I found that there could be an issue about the operation type in print_stats() function.

For example, with the model in example_mnist.py but coded by the Sequential() API, I got an output as below. The operation type for the first conv2d layer is unull_4_-1, whereas it is smult_4_8 with the functional API.

Based on my experiments with some other models, this only happens to the first layer of the Sequential() model.

Also, for smult_4_8, I would like to know what does the 8 stand for here?

I am on: tensorflow-gpu 2.2.0 tensorflow-model-optimization 0.4.1

Number of operations in model:
    conv2d_0_m                    : 25088 (unull_4_-1)
    conv2d_1_m                    : 663552 (smult_4_4)
    conv2d_2_m                    : 147456 (smult_4_4)
    dense                         : 5760  (smult_4_4)

Number of operation types in model:
    smult_4_4                     : 816768
    unull_4_-1                    : 25088

Weight profiling:
    conv2d_0_m_weights             : 128   (4-bit unit)
    conv2d_0_m_bias                : 32    (4-bit unit)
    conv2d_1_m_weights             : 18432 (4-bit unit)
    conv2d_1_m_bias                : 64    (4-bit unit)
    conv2d_2_m_weights             : 16384 (4-bit unit)
    conv2d_2_m_bias                : 64    (4-bit unit)
    dense_weights                  : 5760  (4-bit unit)
    dense_bias                     : 10    (4-bit unit)

Weight sparsity:
... quantizing model
    conv2d_0_m                     : 0.1812
    conv2d_1_m                     : 0.1345
    conv2d_2_m                     : 0.1156
    dense                          : 0.1393
    ----------------------------------------
    Total Sparsity                 : 0.1278

opened by HaoranREN 6

Return a function instead of calling it (in safe_eval)

If quantizer is a function (like binary_tanh or hard_sigmoid, used as activations), safe_eval would try to make an instance of it and fail. This was due to the change introduced in #12. We should check if quantizer is class to be instantiated before being called or a function ready to be called.
cla: yes ready to pull

opened by vloncar 6
QBatchNormalization with scale=False and model_save_quantized_weights

When model_save_quantized_weights is called on a model including a QBatchNormalization with scale=False it seems that the wrong quantizers are used. QBatchNormalization.get_quantizers() returns a list with gamma_quantizer as first element even when there is no gamma, resulting in a disalignment between quantizers and weights in this point https://github.com/google/qkeras/blob/1f2134b48548a548f22ee7b75079cb9e34eaff5b/qkeras/utils.py#L159

opened by lattuada-st 5
`pyparser` vs `pyparsing`
I see you have both pyparser and pyparsing in your requirements.txt. However, only pyparser is in the setup.py as a dependency. Moreover, I only see a use of the pyparsing library in the code.

It seems to me that only pyparsing should be in the requirements.txt and in setup.py as a dependency. What do you all think?

For reference:

pyparser: Code: https://keep.imfreedom.org/grim/pyparser, PyPI: https://pypi.org/project/pyparser/

pyparsing: Code: https://github.com/pyparsing/pyparsing, PyPI: https://pypi.org/project/pyparsing/
opened by jmduarte 0
How do I save an AutoQKeras model that a different script can load?
I can't figure out how to get back a model from an AutoQKeras search in one script, when in another script. I tried to use qmodel.save('qmodel') and qmodel = load_qmodel('qmodel'), but I get these errors.

Traceback (most recent call last): File "code/auto_qkeras.py", line 578, in <module> aqk_model = load_qmodel('qmodel') File "/home/berian/.local/lib/python3.8/site-packages/qkeras/utils.py", line 928, in load_qmodel qmodel = tf.keras.models.load_model(filepath, custom_objects=custom_objects, File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/load.py", line 1008, in revive_custom_object raise ValueError( ValueError: Unable to restore custom object of type _tf_keras_metric. Please make sure that any custom layers are included in the `custom_objects` arg when calling `load_model()` and make sure that all layers implement `get_config` and `from_config`.

Following the AutoQKeras guide https://github.com/google/qkeras/blob/master/notebook/AutoQKeras.ipynb, there is an example for saving/loading weights into a QKeras model object. However, I won't have the model object from the AutoQKeras searchin a different script, so using qmodel.load_weights("qmodel.h5") is not feasible. I have also noticed that when I make my own QKeras model object, qmodel.save(...) and qmodel = load_qmodel(...) work just fine.

Maybe there are some extra options I need to add to theload_qmodel(...) function? Or is there a better way altogether to transfer qmodel the object from one script to another?
opened by alexberian 0

Cannot convert 6.0 to EagerTensor of dtype int64

Hi all,

My setup is:

Arch Linux 5.15.78-1-lts Python 3.10.8 Tensorflow 2.11.0 Numpy 1.23.0 qkeras 0.9.0

I am running the following example code:

import tensorflow as tf
import numpy as np
from qkeras import QActivation


# build the model
l_0 = tf.keras.layers.Input(shape=2)
l_1 = QActivation("bernoulli")(l_0)
l_2 = tf.keras.layers.Dense(units=10, activation="sigmoid")(l_1)
l_3 = QActivation("bernoulli")(l_2)
out = tf.keras.layers.Dense(units=1, activation="sigmoid")(l_3)

# create the model
model = tf.keras.models.Model(inputs=l_0, outputs=out)
model.compile(loss='binary_crossentropy')

# create some data
x = np.array([[1,2],[3,4],[5,6]])
y = np.array([[0],[1],[1]])

# fit the model
model.fit(x, y)

# eval the model layers
layer_out = None
for layer in model.layers:
    if "input" in layer.name:
        layer_out = layer(x)
    if "input" not in layer.name:
        layer_out = layer(layer_out)

Until fitting everything works well but in the evaluation step of my model layers I encounter the following errro:

Traceback (most recent call last):
  File "test.py", line 30, in <module>
    layer_out = layer(layer_out)
  File "keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "qkeras/qlayers.py", line 177, in call
    return self.quantizer(inputs)
  File "qkeras/quantizers.py", line 796, in __call__
    p = tf.keras.backend.sigmoid(self.temperature * x / std)
TypeError: Exception encountered when calling layer 'q_activation' (type QActivation).

Cannot convert 6.0 to EagerTensor of dtype int64

Call arguments received by layer 'q_activation' (type QActivation):
  • inputs=tf.Tensor(shape=(3, 2), dtype=int64)

I think the problem is caused because in quantizers.py the variables std and temperature are not match up with the input data type of x. One way to fix it is to change the code from line 790 to:

    std = tf.constant(1.0, dtype=tf.float32)

    if self.use_real_sigmoid:
      self.temperature = tf.constant(self.temperature, dtype=std.dtype)
      x = tf.cast(x, std.dtype)
      p = tf.keras.backend.sigmoid(self.temperature * x / std)

with this one forces the type to be tf.float32.

Cheers, Marius

opened by makoeppel 0

Only Qconv layer's output tensors are quantized

Hello,

I am using a quantized QKeras model, where all the Conv, BatchNormalization, and Dense parameters have been quantized to 4 bits.

However, when I run the predict function of one image and then print the output tensors of the quantized layers, I can see that only the Qconv layer's output tensors are expressed in 4 bits. In contrast, the outputs tensors of the QBatchNormalization and the QDense are expressed in regular floating point.

My question is: If I use a QKeras quantized model, does QKeras perform the quantization of the input tensors or output tensor of the quantized layers in the prediction function internally? Why is only the QConv layer's output expressed in 4 bits?

## Loading model
model = qkeras_utils.load_qmodel(model_dir)
model.summary()

(train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# Converting the pixels data to float type
train_images = train_images.astype('float32')
test_images = test_images.astype('float32')
 
# Standardizing (255 is the total number of pixels an image can have)
train_images = train_images / 255
test_images = test_images / 255 

num_classes = 10
train_labels = to_categorical(train_labels, num_classes)
test_labels = to_categorical(test_labels, num_classes)

iterations = 1
for i in range(iterations):
    print("Iteration ", i)
    image = test_images[i].reshape(-1, 32, 32, 3)
    #predictions = model.predict(image)
    get_all_layer_outputs = K.function([model.layers[0].input],
                                      [l.output for l in model.layers[0:]])

    layer_output = get_all_layer_outputs([image]) # return the same thing
    m = 0
    for j in layer_output:
        print(model.layers[m].__class__.__name__)
        print(j)
        m = m+1

And my output:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (QConv2D)             (None, 32, 32, 32)        896       
_________________________________________________________________
batch_normalization (QBatchN (None, 32, 32, 32)        128       
_________________________________________________________________
conv2d_1 (QConv2D)           (None, 32, 32, 32)        9248      
_________________________________________________________________
batch_normalization_1 (QBatc (None, 32, 32, 32)        128       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_2 (QConv2D)           (None, 16, 16, 64)        18496     
_________________________________________________________________
batch_normalization_2 (QBatc (None, 16, 16, 64)        256       
_________________________________________________________________
conv2d_3 (QConv2D)           (None, 16, 16, 64)        36928     
_________________________________________________________________
batch_normalization_3 (QBatc (None, 16, 16, 64)        256       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 8, 8, 64)          0         
_________________________________________________________________
conv2d_4 (QConv2D)           (None, 8, 8, 128)         73856     
_________________________________________________________________
batch_normalization_4 (QBatc (None, 8, 8, 128)         512       
_________________________________________________________________
conv2d_5 (QConv2D)           (None, 8, 8, 128)         147584    
_________________________________________________________________
batch_normalization_5 (QBatc (None, 8, 8, 128)         512       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 128)         0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 4, 4, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 2048)              0         
_________________________________________________________________
dense (QDense)               (None, 128)               262272    
_________________________________________________________________
batch_normalization_6 (QBatc (None, 128)               512       
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (QDense)             (None, 10)                1290      
=================================================================
...

QConv2D
[[[[0.     0.     0.25   ... 0.     0.375  0.    ]
   [0.     0.     0.     ... 0.     0.6875 0.25  ]
   [0.     0.     0.     ... 0.     0.6875 0.1875]

...

QBatchNormalization
[[[[ 0.02544868  0.16547686  1.791272   ... -0.0244638   0.58454317
    -0.66077614]
   [ 0.02544868  0.16547686  0.0947198  ... -0.0244638   1.4546151
     1.0357761 ]
   [ 0.02544868  0.16547686  0.0947198  ... -0.0244638   1.4546151
     0.61163807]
...

QConv2D
[[[[0.     0.9375 0.     ... 0.     0.     0.9375]
   [0.     0.     0.     ... 0.375  0.     0.    ]
   [0.     0.     0.     ... 0.0625 0.     0.    ]
   ...

opened by laumecha 0

Error in energy estimation for AveragePooling2D layers

Greetings, I am trying to quantize the network for the KWS application using DS CNN. The network is described here (LINK)(lines from 85 to 141).

When running AutoQKeras, It shows an error on energy estimation for Average2D pooling layers:

Traceback (most recent call last): File "/home/auto_qk.py", line 180, in autoqk = AutoQKeras(model, metrics=[keras.metrics.SparseCategoricalAccuracy()], custom_objects=custom_objects, **run_config) File "/usr/local/lib/python3.8/dist-packages/qkeras/autoqkeras/autoqkeras_internal.py", line 831, in init self.hypermodel = AutoQKHyperModel( File "/usr/local/lib/python3.8/dist-packages/qkeras/autoqkeras/autoqkeras_internal.py", line 125, in init self.reference_size = self.target.get_reference(model) File "/usr/local/lib/python3.8/dist-packages/qkeras/autoqkeras/forgiving_metrics/forgiving_energy.py", line 121, in get_reference energy_dict = q.pe( File "/usr/local/lib/python3.8/dist-packages/qkeras/qtools/run_qtools.py", line 85, in pe energy_dict = qenergy.energy_estimate( File "/usr/local/lib/python3.8/dist-packages/qkeras/qtools/qenergy/qenergy.py", line 302, in energy_estimate add_energy = OP[get_op_type(accumulator.output)]["add"]( AttributeError: 'NoneType' object has no attribute 'output'

When I remove the Average2D pooling layer, the AutoQKeras does not produce the error. I tried to set quant parameters for AveragePooling, but no luck.

Code for AutoQKeras:

AutoQkeras start

# set quantization configs 

quantization_config = {
    "kernel": {
            "binary": 1,
            "stochastic_binary": 1,
            "ternary": 2,
            "stochastic_ternary": 2,
            "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
            "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
            "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
            "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
            "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
    },
    "bias": {
            "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
            "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
            "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
            "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
            "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
    },
    "activation": {
            "binary": 1,
            "ternary": 2,
            "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
            "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
            "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
            "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
            "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
    },
    "linear": {
            "binary": 1,
            "ternary": 2,
            "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
            "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
            "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
            "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
            "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
    }
}

# define limits 
limit = {
    "Dense": [4, 4, 4],
    "Conv2D": [4, 4, 4],
    "DepthwiseConv2D": [4, 4, 4],
    "Activation": [4],
    "AveragePooling2D":  [4, 4, 4],
    "BatchNormalization": [],
    "Dense":[],
}

# define goal (delta = forgiving factor lets put at 8% like in tutorial )

goal = {
    "type": "energy",
    "params": {
        "delta_p": 8.0,
        "delta_n": 8.0,
        "rate": 2.0,
        "stress": 1.0,
        "process": "horowitz",
        "parameters_on_memory": ["sram", "sram"],
        "activations_on_memory": ["sram", "sram"],
        "rd_wr_on_io": [False, False],
        "min_sram_size": [0, 0],
        "source_quantizers": ["int8"],
        "reference_internal": "int8",
        "reference_accumulator": "int32"
        }
}

# SOME RUN CONFIGS

run_config = {
    "output_dir": Flags.bg_path + "auto_qk_dump",
    "goal": goal,
    "quantization_config": quantization_config,
    "learning_rate_optimizer": False,
    "transfer_weights": False,
    "mode": "random",
    "seed": 42,
    "limit": limit,
    "tune_filters": "layer",
    "tune_filters_exceptions": "^dense",
    # first layer is input, layer two layers are softmax and flatten
    "layer_indexes": range(1, len(model.layers)-1),
    "max_trials": 20
    }


# Start autoQkeras 

model.summary()
model.compile(
    #optimizer=keras.optimizers.RMSprop(learning_rate=args.learning_rate),  # Optimizer
    optimizer=keras.optimizers.Adam(learning_rate=Flags.learning_rate),  # Optimizer
    # Loss function to minimize
    loss=keras.losses.SparseCategoricalCrossentropy(),
    # List of metrics to monitor
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
)   
#model = keras.models.load_model(Flags.saved_model_path)

custom_objects = {}
autoqk = AutoQKeras(model, metrics=[keras.metrics.SparseCategoricalAccuracy()], custom_objects=custom_objects, **run_config)
autoqk.fit(ds_train, validation_data=ds_val, epochs=Flags.epochs, callbacks=callbacks)

qmodel = autoqk.get_best_model()
model.save_weights(Flags.bg_path + "auto_qk_dump/","qmodel.h5")
### AutoQkeras stop

opened by RatkoFri 0

Releases(v0.9.0)

v0.9.0(Feb 20, 2021)
Major Features

qtools energy support for global_average_pooling layer.

Added layers for sequence model, LSTM, RNN, GRU.

Added activation and weight compression notebook.

Added QSeparableConv2D class

Renamed previous QSeparableConv2D layer to QMobileNetSeparableConv2D

It is more consistent with Keras SeparableConv2D API

Bugfix of QDepthwiseConv2D.

Added an experimental QAdaptiveActivation layer to learn quantizer integer bits from activation values.

Added weight sparsity calculation to model qstats.

Enabled AutoQKeras to use custom Keras Tuners.

Fixed various bugs in AutoQKeras.

Thanks to our contributors

This release contains contributions from many people at Google and CERN.
Source code(tar.gz)
Source code(zip)
v0.8.0(Jun 19, 2020)
Major Features

Automatic quantization using QKeras;

Stochastic behavior (including stochastic rounding) is disabled during inference;

LeakyReLU for quantized_relu;

Qtools for estimating effort to perform inference;

Qtools will estimate the sizes and types of operations to perform inference, with its data sizes compatible with high-level synthesis datatypes. For example, quantized_bits and quantized_relu bits and int_bits from Qtools will match exactly ac_fixed datatypes (if you rely on QKeras alone, the correct datatype should be ac_fixed<bits, int_bits+is_negative, is_negative>, where is_negative has to be inferred from the other parameters of the quantizer.

Other bug fixes and enhancement.

Thanks to our contributors

This release contains contributions from many people at Google and CERN.
Source code(tar.gz)
Source code(zip)
v0.7.4(Apr 11, 2020)

Major Features

A patch with better weight initialization for https://github.com/google/qkeras/releases/tag/v0.7.0
Source code(tar.gz)
Source code(zip)
v0.7.0(Mar 27, 2020)
Major Features

Enhancement of binary and ternary quantization as well as their stochastic counterparts for parameters and activation.

Add auto scaling for low-bitwidth quantization.

Add jupyter notebook.

Thanks to our Contributors

This release contains contributions from many people at Google.
Source code(tar.gz)
Source code(zip)
v0.6.0(Mar 11, 2020)
Major Features

Use Tensorflow 2.1+ and tf.keras.

QKeras does not support the standalone Keras anymore.

Use Python 3.

Support APIs of pruning and PrunableLayer from tensorflow_model_optimization for model sparsity.

Add QBatchNormalization layer.

Thanks to our Contributors

This release contains contributions from many people at Google and CERN.
Source code(tar.gz)
Source code(zip)
v0.5.0(Jan 12, 2020)

QKeras 0.5.0 uses Tensorflow version < 2 and standalone Keras as backend.

Major Features

This is the first release of QKeras.

Notes

In the next release, we will support TensorFlow 2+ and tf.keras.

Thanks to our Contributors

This release contains contributions from many people at Google.
Source code(tar.gz)
Source code(zip)

Owner

Google

Google ❤️ Open Source

GitHub

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

HAWQ: Hessian AWare Quantization HAWQ is an advanced quantization library written for PyTorch. HAWQ enables low-precision and mixed-precision uniform

293 Dec 30, 2022

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Differentiable Model Compression via Pseudo Quantization Noise DiffQ performs differentiable quantization using pseudo quantization noise. It can auto

145 Dec 30, 2022

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Nonuniform-to-Uniform Quantization This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quanti

60 Dec 28, 2022

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

5.7k Jan 9, 2023

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

1.5k Jan 3, 2023

Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Realtime Face Anti-Spoofing Detection ?? Realtime Face Anti Spoofing Detection with Face Detector to detect real and fake faces Please star this repo

86 Aug 3, 2022

Vision Deep-Learning using Tensorflow, Keras.

Welcome! I am a computer vision deep learning developer working in Korea. This is my blog, and you can see everything I've studied here. https://www.n

6 Dec 14, 2022

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

1 Dec 18, 2021

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

1 Dec 19, 2021

Keras udrl - Keras implementation of Upside Down Reinforcement Learning

keras_udrl Keras implementation of Upside Down Reinforcement Learning This is me

7 Jan 24, 2022

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

PWLQ Updates 2020/07/16 - We are working on getting permission from our institution to release our source code. We will release it once we are granted

54 Dec 15, 2022

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit counterparts.

51 Dec 10, 2022

Deep GPs built on top of TensorFlow/Keras and GPflow

GPflux Documentation | Tutorials | API reference | Slack What does GPflux do? GPflux is a toolbox dedicated to Deep Gaussian processes (DGP), the hier

107 Nov 2, 2022

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

14 Sep 13, 2022

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

8 Nov 2, 2022

Classification models 1D Zoo - Keras and TF.Keras

Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

12 Jan 6, 2023

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Efficient implementations of Product Quantization and its variants using Pytorch and CUDA

146 Dec 28, 2022

This source code is implemented using keras library based on "Automatic ocular artifacts removal in EEG using deep learning"

CSP_Deep_EEG This source code is implemented using keras library based on "Automatic ocular artifacts removal in EEG using deep learning" {https://www

2 Nov 8, 2022

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python >= 3.7.10 Pytorch == 1.7

1 Nov 19, 2021

QKeras: a quantization deep learning library for Tensorflow Keras

Related tags

Overview

QKeras

Introduction

Publications

Layers Implemented in QKeras

Activation Layers Implemented in QKeras

Example

QTools

AutoQKeras

Related Work

Acknowledgements

Comments

AutoQkeras start

Releases(v0.9.0)

v0.9.0(Feb 20, 2021)

Major Features

Thanks to our contributors

v0.8.0(Jun 19, 2020)

Major Features

Thanks to our contributors

v0.7.4(Apr 11, 2020)

Major Features

v0.7.0(Mar 27, 2020)

Major Features

Thanks to our Contributors

v0.6.0(Mar 11, 2020)

Major Features

Thanks to our Contributors

v0.5.0(Jan 12, 2020)

Major Features

Notes

Thanks to our Contributors

Owner

Google

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Vision Deep-Learning using Tensorflow, Keras.

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Keras udrl - Keras implementation of Upside Down Reinforcement Learning

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

Deep GPs built on top of TensorFlow/Keras and GPflow

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Classification models 1D Zoo - Keras and TF.Keras

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

This source code is implemented using keras library based on "Automatic ocular artifacts removal in EEG using deep learning"

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization