QKeras: a quantization deep learning library for Tensorflow Keras

Overview

QKeras

github.com/google/qkeras

QKeras 0.8 highlights:

  • Automatic quantization using QKeras;

  • Stochastic behavior (including stochastic rouding) is disabled during inference;

  • LeakyReLU for quantized_relu;

  • Qtools for estimating effort to perform inference;

    • Qtools will estimate the sizes and types of operations to perform inference, with its data sizes compatible with high-level synthesis datatypes. For example, quantized_bits and quantized_relu bits and int_bits from Qtools will match exactly ac_fixed datatypes (if you rely on QKeras alone, the correct datatype should be ac_fixed<bits, int_bits+is_negative, is_negative>, where is_negative has to be inferred from the other parameters of the quantizer.

Introduction

QKeras is a quantization extension to Keras that provides drop-in replacement for some of the Keras layers, especially the ones that creates parameters and activation layers, and perform arithmetic operations, so that we can quickly create a deep quantized version of Keras network.

According to Tensorflow documentation, Keras is a high-level API to build and train deep learning models. It's used for fast prototyping, advanced research, and production, with three key advantages:

  • User friendly

Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.

  • Modular and composable

Keras models are made by connecting configurable building blocks together, with few restrictions.

  • Easy to extend

Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.

QKeras is being designed to extend the functionality of Keras using Keras' design principle, i.e. being user friendly, modular and extensible, adding to it being "minimally intrusive" of Keras native functionality.

In order to successfully quantize a model, users need to replace variable creating layers (Dense, Conv2D, etc) by their counterparts (QDense, QConv2D, etc), and any layers that perform math operations need to be quantized afterwards.

Publications

http://arxiv.org/abs/2006.10159

Layers Implemented in QKeras

  • QDense

  • QConv1D

  • QConv2D

  • QDepthwiseConv2D

  • QSeparableConv1D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)

  • QSeparableConv2D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)

  • QMobileNetSeparableConv2D (extended from MobileNet SeparableConv2D implementation, quantizes the activation values after the depthwise step)

  • QConv2DTranspose

  • QActivation

  • QAdaptiveActivation [EXPERIMENTAL]

  • QAveragePooling2D (in fact, an AveragePooling2D stacked with a QActivation layer for quantization of the result)

  • QBatchNormalization (is still in its experimental stage, as we have not seen the need to use this yet due to the normalization and regularization effects of stochastic activation functions.)

  • QOctaveConv2D

  • QSimpleRNN, QSimpleRNNCell

  • QLSTM, QLSTMCell

  • QGRU, QGRUCell

  • QBidirectional

It is worth noting that not all functionality is safe at this time to be used with other high-level operations, such as with layer wrappers. For example, Bidirectional layer wrappers are used with RNNs. If this is required, we encourage users to use quantization functions invoked as strings instead of the actual functions as a way through this, but we may change that implementation in the future.

A first attempt to create a safe mechanism in QKeras is the adoption of QActivation is a wrap-up that provides an encapsulation around the activation functions so that we can save and restore the network architecture, and duplicate them using Keras interface, but this interface has not been fully tested yet.

Activation Layers Implemented in QKeras

  • smooth_sigmoid(x)

  • hard_sigmoid(x)

  • binary_sigmoid(x)

  • binary_tanh(x)

  • smooth_tanh(x)

  • hard_tanh(x)

  • quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=1)(x)

  • bernoulli(alpha=1.0)(x)

  • stochastic_ternary(alpha=1.0, threshold=0.33)(x)

  • ternary(alpha=1.0, threshold=0.33)(x)

  • stochastic_binary(alpha=1.0)(x)

  • binary(alpha=1.0)(x)

  • quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0)(x)

  • quantized_ulaw(bits=8, integer=0, symmetric=0, u=255.0)(x)

  • quantized_tanh(bits=8, integer=0, symmetric=0)(x)

  • quantized_po2(bits=8, max_value=-1)(x)

  • quantized_relu_po2(bits=8, max_value=-1)(x)

The stochastic_* functions, bernoulli as well as quantized_relu and quantized_tanh rely on stochastic versions of the activation functions. They draw a random number with uniform distribution from _hard_sigmoid of the input x, and result is based on the expected value of the activation function. Please refer to the papers if you want to understand the underlying theory, or the documentation in qkeras/qlayers.py.

The parameters "bits" specify the number of bits for the quantization, and "integer" specifies how many bits of "bits" are to the left of the decimal point. Finally, our experience in training networks with QSeparableConv2D, both quantized_bits and quantized_tanh that generates values between [-1, 1), required symmetric versions of the range in order to properly converge and eliminate the bias.

Every time we use a quantization for weights and bias that can generate numbers outside the range [-1.0, 1.0], we need to adjust the *_range to the number. For example, if we have a quantized_bits(bits=6, integer=2) in a weight of a layer, we need to set the weight range to 2**2, which is equivalent to Catapult HLS ac_fixed<6, 3, true>. Similarly, for quantization functions that accept an alpha parameter, we need to specify a range of alpha, and for po2 type of quantizers, we need to specify the range of max_value.

Example

Suppose you have the following network.

An example of a very simple network is given below in Keras.

from keras.layers import *

x = x_in = Input(shape)
x = Conv2D(18, (3, 3), name="first_conv2d")(x)
x = Activation("relu")(x)
x = SeparableConv2D(32, (3, 3))(x)
x = Activation("relu")(x)
x = Flatten()(x)
x = Dense(NB_CLASSES)(x)
x = Activation("softmax")(x)

You can easily quantize this network as follows:

from keras.layers import *
from qkeras import *

x = x_in = Input(shape)
x = QConv2D(18, (3, 3),
        kernel_quantizer="stochastic_ternary",
        bias_quantizer="ternary", name="first_conv2d")(x)
x = QActivation("quantized_relu(3)")(x)
x = QSeparableConv2D(32, (3, 3),
        depthwise_quantizer=quantized_bits(4, 0, 1),
        pointwise_quantizer=quantized_bits(3, 0, 1),
        bias_quantizer=quantized_bits(3),
        depthwise_activation=quantized_tanh(6, 2, 1))(x)
x = QActivation("quantized_relu(3)")(x)
x = Flatten()(x)
x = QDense(NB_CLASSES,
        kernel_quantizer=quantized_bits(3),
        bias_quantizer=quantized_bits(3))(x)
x = QActivation("quantized_bits(20, 5)")(x)
x = Activation("softmax")(x)

The last QActivation is advisable if you want to compare results later on. Please find more cases under the directory examples.

QTools

The purpose of QTools is to assist hardware implementation of the quantized model and model energy consumption estimation. QTools has two functions: data type map generation and energy consumption estimation.

  • Data Type Map Generation: QTools automatically generate the data type map for weights, bias, multiplier, adder, etc. of each layer. The data type map includes operation type, variable size, quantizer type and bits, etc. Input of the QTools is:
  1. a given quantized model;
  2. a list of input quantizers for the model. Output of QTools json file that list the data type map of each layer (stored in qtools_instance._output_dict) Output methods include: qtools_stats_to_json, which is to output the data type map to a json file; qtools_stats_print which is to print out the data type map.
  • Energy Consumption Estimation: Another function of QTools is to estimate the model energy consumption in Pico Joules (pJ). It provides a tool for QKeras users to quickly estimate energy consumption for memory access and MAC operations in a quantized model derived from QKeras, especially when comparing power consumption of two models running on the same device.

As with any high-level model, it should be used with caution when attempting to estimate the absolute energy consumption of a model for a given technology, or when attempting to compare different technologies.

This tool also provides a measure for model tuning which needs to consider both accuracy and model energy consumption. The energy cost provided by this tool can be integrated into a total loss function which combines energy cost and accuracy.

  • Energy Model: The best work referenced by the literature on energy consumption was first computed by Horowitz M.: “1.1 computing’s energy problem ( and what we can do about it)”; IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014

In this work, the author attempted to estimate the energy consumption for accelerators, and for 45 nm process, the data points he presented has since been used whenever someone wants to compare accelerator performance. QTools energy consumption on a 45nm process is based on the data published in this work.

  • Examples: Example of how to generate data type map can be found in qkeras/qtools/ examples/example_generate_json.py. Example of how to generate energy consumption estimation can be found in qkeras/qtools/examples/example_get_energy.py

AutoQKeras

AutoQKeras allows the automatic quantization and rebalancing of deep neural networks by treating quantization and rebalancing of an existing deep neural network as a hyperparameter search in Keras-Tuner using random search, hyperband or gaussian processes.

In order to contain the explosion of hyperparameters, users can group tasks by patterns, and perform distribute training using available resources.

Extensive documentation is present in notebook/AutoQKeras.ipynb.

Related Work

QKeras has been implemented based on the work of "B.Moons et al. - Minimum Energy Quantized Neural Networks", Asilomar Conference on Signals, Systems and Computers, 2017 and "Zhou, S. et al. - DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients," but the framework should be easily extensible. The original code from QNN can be found below.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks. Finally, our main goal is easy of use, so we attempt to make QKeras layers a true drop-in replacement for Keras, so that users can easily exchange non-quantized layers by quantized ones.

Acknowledgements

Portions of QKeras were derived from QNN.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

Copyright (c) 2017, Bert Moons where it applies

Comments
  • Kernel weights and activations not quantized after training

    Kernel weights and activations not quantized after training

    Hi there! I was interested in implementing the Qkeras example for MNIST CNN model as given in the examples section - Link. This examples involves quantizing the weights and activations into INT4 or 4 bits using the quantized_bits(4,0,1) method for Conv kernels and activations. I was expecting the weights and activations to be in INT4 but they were in FP32 and there wasn't any integer left of decimal point. I ran some experiments with the quantized_bits() method and the results were quantized! And here are the weights and activations for the MNIST model after model_save_quantized_weights(): I would essentially want to save the quantized model with the INT8 or INT4 weights and convert into a TRT engine and do GPU inferencing. Any pointers? Thanks, Yoga

    opened by YogaVicky 11
  • Converting regular Keras weights to Qkeras

    Converting regular Keras weights to Qkeras

    Hello,

    First I wanted to say: kudos for creating this library; I'm really excited to try it out on different models!

    I saw in the readme:

    QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks.

    Is there any documentation on using those tools to convert pretrained weights (e.g. ImageNet) to the quantized versions?

    Thanks!

    opened by xhluca 10
  • Transposed convolution (deconvolution)

    Transposed convolution (deconvolution)

    This PR adds QConv2DTranspose which is useful in autoencoders. I can also add QConv1DTranspose but the equivalent Keras layer is only avaiable in nightly TF releases, not in stable channel yet.

    cla: yes ready to pull 
    opened by vloncar 9
  • Support JSON save/load

    Support JSON save/load

    Currently, QKeras doesn't support saving model with model.to_json() and loading with load_from_json. This extends the QDense, QConv1D, QConv2D, QDepthwiseConv2D and QBatchNormalization as well as the quantizers and activations to support this functionality.

    cla: yes ready to pull 
    opened by vloncar 8
  • add non-stochastic inference mode to quantizers

    add non-stochastic inference mode to quantizers

    Currently, models with stochastic quantizers peform stochastic operations even in inference mode. This mechanism prevents this and uses the non-stochastic version of the quantizer if in inference mode.

    cla: no 
    opened by jecorona97 7
  • Fixing Conv1D's weight info extraction

    Fixing Conv1D's weight info extraction

    Hi,

    This is a PR to fix the print_qstats error I mentioned in #13 . Essentially I just replaced kernel_h and kernel_w extraction by kernel_length for Conv1D. Someone might have copied the code from Conv2D and forgot to change it. I tested and it worked fine.

    Regards,

    Duc.

    cla: yes ready to pull 
    opened by Duchstf 7
  • Using qkeras layers concurrently with Tensorflow's pruning tools.

    Using qkeras layers concurrently with Tensorflow's pruning tools.

    Hello, very cool project!!

    I'm just wondering if it would be possible to train the model using qkeras layers with the pruning tools in Tensorflow's model optimization package. For example, can we have something like this?

    tf.keras.Sequential([
        sparsity.prune_low_magnitude(
            l.QConv2D(32, 5, padding='same', activation='relu'),
            input_shape=input_shape,
            **pruning_params)])
    

    Thanks,

    Duc.

    enhancement 
    opened by Duchstf 7
  • QSeparableConv1D and 2D

    QSeparableConv1D and 2D

    This PR adds QSeparableConv1D and QSeparableConv2D. The existing QSeparableConv2D (which expands to QDepthwiseConv2D and 1x1 QConv2D) that is based on MobileNet is retained and renamed QMobileNetSeparableConv2D

    cla: yes ready to pull 
    opened by vloncar 6
  • print_qstats(): operation type issue with Sequential() model

    print_qstats(): operation type issue with Sequential() model

    When I was applying quantization on a Keras Sequential() model, I found that there could be an issue about the operation type in print_stats() function.

    For example, with the model in example_mnist.py but coded by the Sequential() API, I got an output as below. The operation type for the first conv2d layer is unull_4_-1, whereas it is smult_4_8 with the functional API.

    Based on my experiments with some other models, this only happens to the first layer of the Sequential() model.

    Also, for smult_4_8, I would like to know what does the 8 stand for here?

    I am on: tensorflow-gpu 2.2.0 tensorflow-model-optimization 0.4.1

    Number of operations in model:
        conv2d_0_m                    : 25088 (unull_4_-1)
        conv2d_1_m                    : 663552 (smult_4_4)
        conv2d_2_m                    : 147456 (smult_4_4)
        dense                         : 5760  (smult_4_4)
    
    Number of operation types in model:
        smult_4_4                     : 816768
        unull_4_-1                    : 25088
    
    Weight profiling:
        conv2d_0_m_weights             : 128   (4-bit unit)
        conv2d_0_m_bias                : 32    (4-bit unit)
        conv2d_1_m_weights             : 18432 (4-bit unit)
        conv2d_1_m_bias                : 64    (4-bit unit)
        conv2d_2_m_weights             : 16384 (4-bit unit)
        conv2d_2_m_bias                : 64    (4-bit unit)
        dense_weights                  : 5760  (4-bit unit)
        dense_bias                     : 10    (4-bit unit)
    
    Weight sparsity:
    ... quantizing model
        conv2d_0_m                     : 0.1812
        conv2d_1_m                     : 0.1345
        conv2d_2_m                     : 0.1156
        dense                          : 0.1393
        ----------------------------------------
        Total Sparsity                 : 0.1278
    
    opened by HaoranREN 6
  • Return a function instead of calling it (in safe_eval)

    Return a function instead of calling it (in safe_eval)

    If quantizer is a function (like binary_tanh or hard_sigmoid, used as activations), safe_eval would try to make an instance of it and fail. This was due to the change introduced in #12. We should check if quantizer is class to be instantiated before being called or a function ready to be called.

    cla: yes ready to pull 
    opened by vloncar 6
  • QBatchNormalization with scale=False and model_save_quantized_weights

    QBatchNormalization with scale=False and model_save_quantized_weights

    When model_save_quantized_weights is called on a model including a QBatchNormalization with scale=False it seems that the wrong quantizers are used. QBatchNormalization.get_quantizers() returns a list with gamma_quantizer as first element even when there is no gamma, resulting in a disalignment between quantizers and weights in this point https://github.com/google/qkeras/blob/1f2134b48548a548f22ee7b75079cb9e34eaff5b/qkeras/utils.py#L159

    opened by lattuada-st 5
  • `pyparser` vs `pyparsing`

    `pyparser` vs `pyparsing`

    I see you have both pyparser and pyparsing in your requirements.txt. However, only pyparser is in the setup.py as a dependency. Moreover, I only see a use of the pyparsing library in the code.

    It seems to me that only pyparsing should be in the requirements.txt and in setup.py as a dependency. What do you all think?

    For reference:

    • pyparser: Code: https://keep.imfreedom.org/grim/pyparser, PyPI: https://pypi.org/project/pyparser/
    • pyparsing: Code: https://github.com/pyparsing/pyparsing, PyPI: https://pypi.org/project/pyparsing/
    opened by jmduarte 0
  • How do I save an AutoQKeras model that a different script can load?

    How do I save an AutoQKeras model that a different script can load?

    I can't figure out how to get back a model from an AutoQKeras search in one script, when in another script. I tried to use qmodel.save('qmodel') and qmodel = load_qmodel('qmodel'), but I get these errors.

    Traceback (most recent call last):
      File "code/auto_qkeras.py", line 578, in <module>
        aqk_model = load_qmodel('qmodel')
      File "/home/berian/.local/lib/python3.8/site-packages/qkeras/utils.py", line 928, in load_qmodel
        qmodel = tf.keras.models.load_model(filepath, custom_objects=custom_objects,
      File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/load.py", line 1008, in revive_custom_object
        raise ValueError(
    ValueError: Unable to restore custom object of type _tf_keras_metric. Please make sure that any custom layers are included in the `custom_objects` arg when calling `load_model()` and make sure that all layers implement `get_config` and `from_config`.
    

    Following the AutoQKeras guide https://github.com/google/qkeras/blob/master/notebook/AutoQKeras.ipynb, there is an example for saving/loading weights into a QKeras model object. However, I won't have the model object from the AutoQKeras searchin a different script, so using qmodel.load_weights("qmodel.h5") is not feasible. I have also noticed that when I make my own QKeras model object, qmodel.save(...) and qmodel = load_qmodel(...) work just fine.

    Maybe there are some extra options I need to add to theload_qmodel(...) function? Or is there a better way altogether to transfer qmodel the object from one script to another?

    opened by alexberian 0
  • Cannot convert 6.0 to EagerTensor of dtype int64

    Cannot convert 6.0 to EagerTensor of dtype int64

    Hi all,

    My setup is:

    Arch Linux 5.15.78-1-lts Python 3.10.8 Tensorflow 2.11.0 Numpy 1.23.0 qkeras 0.9.0

    I am running the following example code:

    import tensorflow as tf
    import numpy as np
    from qkeras import QActivation
    
    
    # build the model
    l_0 = tf.keras.layers.Input(shape=2)
    l_1 = QActivation("bernoulli")(l_0)
    l_2 = tf.keras.layers.Dense(units=10, activation="sigmoid")(l_1)
    l_3 = QActivation("bernoulli")(l_2)
    out = tf.keras.layers.Dense(units=1, activation="sigmoid")(l_3)
    
    # create the model
    model = tf.keras.models.Model(inputs=l_0, outputs=out)
    model.compile(loss='binary_crossentropy')
    
    # create some data
    x = np.array([[1,2],[3,4],[5,6]])
    y = np.array([[0],[1],[1]])
    
    # fit the model
    model.fit(x, y)
    
    # eval the model layers
    layer_out = None
    for layer in model.layers:
        if "input" in layer.name:
            layer_out = layer(x)
        if "input" not in layer.name:
            layer_out = layer(layer_out)
    

    Until fitting everything works well but in the evaluation step of my model layers I encounter the following errro:

    Traceback (most recent call last):
      File "test.py", line 30, in <module>
        layer_out = layer(layer_out)
      File "keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "qkeras/qlayers.py", line 177, in call
        return self.quantizer(inputs)
      File "qkeras/quantizers.py", line 796, in __call__
        p = tf.keras.backend.sigmoid(self.temperature * x / std)
    TypeError: Exception encountered when calling layer 'q_activation' (type QActivation).
    
    Cannot convert 6.0 to EagerTensor of dtype int64
    
    Call arguments received by layer 'q_activation' (type QActivation):
      • inputs=tf.Tensor(shape=(3, 2), dtype=int64)
    

    I think the problem is caused because in quantizers.py the variables std and temperature are not match up with the input data type of x. One way to fix it is to change the code from line 790 to:

        std = tf.constant(1.0, dtype=tf.float32)
    
        if self.use_real_sigmoid:
          self.temperature = tf.constant(self.temperature, dtype=std.dtype)
          x = tf.cast(x, std.dtype)
          p = tf.keras.backend.sigmoid(self.temperature * x / std)
    

    with this one forces the type to be tf.float32.

    Cheers, Marius

    opened by makoeppel 0
  • Only Qconv layer's output tensors are quantized

    Only Qconv layer's output tensors are quantized

    Hello,

    I am using a quantized QKeras model, where all the Conv, BatchNormalization, and Dense parameters have been quantized to 4 bits.

    However, when I run the predict function of one image and then print the output tensors of the quantized layers, I can see that only the Qconv layer's output tensors are expressed in 4 bits. In contrast, the outputs tensors of the QBatchNormalization and the QDense are expressed in regular floating point.

    My question is: If I use a QKeras quantized model, does QKeras perform the quantization of the input tensors or output tensor of the quantized layers in the prediction function internally? Why is only the QConv layer's output expressed in 4 bits?

    ## Loading model
    model = qkeras_utils.load_qmodel(model_dir)
    model.summary()
    
    (train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()
    
    class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
                   'dog', 'frog', 'horse', 'ship', 'truck']
    
    # Converting the pixels data to float type
    train_images = train_images.astype('float32')
    test_images = test_images.astype('float32')
     
    # Standardizing (255 is the total number of pixels an image can have)
    train_images = train_images / 255
    test_images = test_images / 255 
    
    num_classes = 10
    train_labels = to_categorical(train_labels, num_classes)
    test_labels = to_categorical(test_labels, num_classes)
    
    iterations = 1
    for i in range(iterations):
        print("Iteration ", i)
        image = test_images[i].reshape(-1, 32, 32, 3)
        #predictions = model.predict(image)
        get_all_layer_outputs = K.function([model.layers[0].input],
                                          [l.output for l in model.layers[0:]])
    
        layer_output = get_all_layer_outputs([image]) # return the same thing
        m = 0
        for j in layer_output:
            print(model.layers[m].__class__.__name__)
            print(j)
            m = m+1
        
    

    And my output:

    Layer (type)                 Output Shape              Param #   
    =================================================================
    conv2d (QConv2D)             (None, 32, 32, 32)        896       
    _________________________________________________________________
    batch_normalization (QBatchN (None, 32, 32, 32)        128       
    _________________________________________________________________
    conv2d_1 (QConv2D)           (None, 32, 32, 32)        9248      
    _________________________________________________________________
    batch_normalization_1 (QBatc (None, 32, 32, 32)        128       
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 16, 16, 32)        0         
    _________________________________________________________________
    dropout (Dropout)            (None, 16, 16, 32)        0         
    _________________________________________________________________
    conv2d_2 (QConv2D)           (None, 16, 16, 64)        18496     
    _________________________________________________________________
    batch_normalization_2 (QBatc (None, 16, 16, 64)        256       
    _________________________________________________________________
    conv2d_3 (QConv2D)           (None, 16, 16, 64)        36928     
    _________________________________________________________________
    batch_normalization_3 (QBatc (None, 16, 16, 64)        256       
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 8, 8, 64)          0         
    _________________________________________________________________
    dropout_1 (Dropout)          (None, 8, 8, 64)          0         
    _________________________________________________________________
    conv2d_4 (QConv2D)           (None, 8, 8, 128)         73856     
    _________________________________________________________________
    batch_normalization_4 (QBatc (None, 8, 8, 128)         512       
    _________________________________________________________________
    conv2d_5 (QConv2D)           (None, 8, 8, 128)         147584    
    _________________________________________________________________
    batch_normalization_5 (QBatc (None, 8, 8, 128)         512       
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 4, 4, 128)         0         
    _________________________________________________________________
    dropout_2 (Dropout)          (None, 4, 4, 128)         0         
    _________________________________________________________________
    flatten (Flatten)            (None, 2048)              0         
    _________________________________________________________________
    dense (QDense)               (None, 128)               262272    
    _________________________________________________________________
    batch_normalization_6 (QBatc (None, 128)               512       
    _________________________________________________________________
    dropout_3 (Dropout)          (None, 128)               0         
    _________________________________________________________________
    dense_1 (QDense)             (None, 10)                1290      
    =================================================================
    ...
    
    QConv2D
    [[[[0.     0.     0.25   ... 0.     0.375  0.    ]
       [0.     0.     0.     ... 0.     0.6875 0.25  ]
       [0.     0.     0.     ... 0.     0.6875 0.1875]
    
    ...
    
    QBatchNormalization
    [[[[ 0.02544868  0.16547686  1.791272   ... -0.0244638   0.58454317
        -0.66077614]
       [ 0.02544868  0.16547686  0.0947198  ... -0.0244638   1.4546151
         1.0357761 ]
       [ 0.02544868  0.16547686  0.0947198  ... -0.0244638   1.4546151
         0.61163807]
    ...
    
    QConv2D
    [[[[0.     0.9375 0.     ... 0.     0.     0.9375]
       [0.     0.     0.     ... 0.375  0.     0.    ]
       [0.     0.     0.     ... 0.0625 0.     0.    ]
       ...
    
    opened by laumecha 0
  • Error in energy estimation for AveragePooling2D layers

    Error in energy estimation for AveragePooling2D layers

    Greetings, I am trying to quantize the network for the KWS application using DS CNN. The network is described here (LINK)(lines from 85 to 141).

    When running AutoQKeras, It shows an error on energy estimation for Average2D pooling layers:

    Traceback (most recent call last): File "/home/auto_qk.py", line 180, in autoqk = AutoQKeras(model, metrics=[keras.metrics.SparseCategoricalAccuracy()], custom_objects=custom_objects, **run_config) File "/usr/local/lib/python3.8/dist-packages/qkeras/autoqkeras/autoqkeras_internal.py", line 831, in init self.hypermodel = AutoQKHyperModel( File "/usr/local/lib/python3.8/dist-packages/qkeras/autoqkeras/autoqkeras_internal.py", line 125, in init self.reference_size = self.target.get_reference(model) File "/usr/local/lib/python3.8/dist-packages/qkeras/autoqkeras/forgiving_metrics/forgiving_energy.py", line 121, in get_reference energy_dict = q.pe( File "/usr/local/lib/python3.8/dist-packages/qkeras/qtools/run_qtools.py", line 85, in pe energy_dict = qenergy.energy_estimate( File "/usr/local/lib/python3.8/dist-packages/qkeras/qtools/qenergy/qenergy.py", line 302, in energy_estimate add_energy = OP[get_op_type(accumulator.output)]["add"]( AttributeError: 'NoneType' object has no attribute 'output'

    When I remove the Average2D pooling layer, the AutoQKeras does not produce the error. I tried to set quant parameters for AveragePooling, but no luck.

    Code for AutoQKeras:

    AutoQkeras start

    # set quantization configs 
    
    quantization_config = {
        "kernel": {
                "binary": 1,
                "stochastic_binary": 1,
                "ternary": 2,
                "stochastic_ternary": 2,
                "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
        },
        "bias": {
                "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
        },
        "activation": {
                "binary": 1,
                "ternary": 2,
                "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
        },
        "linear": {
                "binary": 1,
                "ternary": 2,
                "quantized_bits(2,0,1,1,alpha=\"auto_po2\")": 2,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 3,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 4,
                "quantized_bits(3,0,1,1,alpha=\"auto_po2\")": 5,
                "quantized_bits(4,0,1,1,alpha=\"auto_po2\")": 6
        }
    }
    
    # define limits 
    limit = {
        "Dense": [4, 4, 4],
        "Conv2D": [4, 4, 4],
        "DepthwiseConv2D": [4, 4, 4],
        "Activation": [4],
        "AveragePooling2D":  [4, 4, 4],
        "BatchNormalization": [],
        "Dense":[],
    }
    
    # define goal (delta = forgiving factor lets put at 8% like in tutorial )
    
    goal = {
        "type": "energy",
        "params": {
            "delta_p": 8.0,
            "delta_n": 8.0,
            "rate": 2.0,
            "stress": 1.0,
            "process": "horowitz",
            "parameters_on_memory": ["sram", "sram"],
            "activations_on_memory": ["sram", "sram"],
            "rd_wr_on_io": [False, False],
            "min_sram_size": [0, 0],
            "source_quantizers": ["int8"],
            "reference_internal": "int8",
            "reference_accumulator": "int32"
            }
    }
    
    # SOME RUN CONFIGS
    
    run_config = {
        "output_dir": Flags.bg_path + "auto_qk_dump",
        "goal": goal,
        "quantization_config": quantization_config,
        "learning_rate_optimizer": False,
        "transfer_weights": False,
        "mode": "random",
        "seed": 42,
        "limit": limit,
        "tune_filters": "layer",
        "tune_filters_exceptions": "^dense",
        # first layer is input, layer two layers are softmax and flatten
        "layer_indexes": range(1, len(model.layers)-1),
        "max_trials": 20
        }
    
    
    # Start autoQkeras 
    
    model.summary()
    model.compile(
        #optimizer=keras.optimizers.RMSprop(learning_rate=args.learning_rate),  # Optimizer
        optimizer=keras.optimizers.Adam(learning_rate=Flags.learning_rate),  # Optimizer
        # Loss function to minimize
        loss=keras.losses.SparseCategoricalCrossentropy(),
        # List of metrics to monitor
        metrics=[keras.metrics.SparseCategoricalAccuracy()],
    )   
    #model = keras.models.load_model(Flags.saved_model_path)
    
    custom_objects = {}
    autoqk = AutoQKeras(model, metrics=[keras.metrics.SparseCategoricalAccuracy()], custom_objects=custom_objects, **run_config)
    autoqk.fit(ds_train, validation_data=ds_val, epochs=Flags.epochs, callbacks=callbacks)
    
    qmodel = autoqk.get_best_model()
    model.save_weights(Flags.bg_path + "auto_qk_dump/","qmodel.h5")
    ### AutoQkeras stop
    
    opened by RatkoFri 0
Releases(v0.9.0)
  • v0.9.0(Feb 20, 2021)

    Major Features

    • qtools energy support for global_average_pooling layer.

    • Added layers for sequence model, LSTM, RNN, GRU.

    • Added activation and weight compression notebook.

    • Added QSeparableConv2D class

      • Renamed previous QSeparableConv2D layer to QMobileNetSeparableConv2D
      • It is more consistent with Keras SeparableConv2D API
    • Bugfix of QDepthwiseConv2D.

    • Added an experimental QAdaptiveActivation layer to learn quantizer integer bits from activation values.

    • Added weight sparsity calculation to model qstats.

    • Enabled AutoQKeras to use custom Keras Tuners.

    • Fixed various bugs in AutoQKeras.

    Thanks to our contributors

    This release contains contributions from many people at Google and CERN.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Jun 19, 2020)

    Major Features

    • Automatic quantization using QKeras;

    • Stochastic behavior (including stochastic rounding) is disabled during inference;

    • LeakyReLU for quantized_relu;

    • Qtools for estimating effort to perform inference;

      • Qtools will estimate the sizes and types of operations to perform inference, with its data sizes compatible with high-level synthesis datatypes. For example, quantized_bits and quantized_relu bits and int_bits from Qtools will match exactly ac_fixed datatypes (if you rely on QKeras alone, the correct datatype should be ac_fixed<bits, int_bits+is_negative, is_negative>, where is_negative has to be inferred from the other parameters of the quantizer.
    • Other bug fixes and enhancement.

    Thanks to our contributors

    This release contains contributions from many people at Google and CERN.

    Source code(tar.gz)
    Source code(zip)
  • v0.7.4(Apr 11, 2020)

  • v0.7.0(Mar 27, 2020)

    Major Features

    • Enhancement of binary and ternary quantization as well as their stochastic counterparts for parameters and activation.
    • Add auto scaling for low-bitwidth quantization.
    • Add jupyter notebook.

    Thanks to our Contributors

    This release contains contributions from many people at Google.

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 11, 2020)

    Major Features

    • Use Tensorflow 2.1+ and tf.keras.
      • QKeras does not support the standalone Keras anymore.
      • Use Python 3.
    • Support APIs of pruning and PrunableLayer from tensorflow_model_optimization for model sparsity.
    • Add QBatchNormalization layer.

    Thanks to our Contributors

    This release contains contributions from many people at Google and CERN.

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jan 12, 2020)

    QKeras 0.5.0 uses Tensorflow version < 2 and standalone Keras as backend.

    Major Features

    This is the first release of QKeras.

    Notes

    In the next release, we will support TensorFlow 2+ and tf.keras.

    Thanks to our Contributors

    This release contains contributions from many people at Google.

    Source code(tar.gz)
    Source code(zip)
Owner
Google
Google ❤️ Open Source
Google
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

HAWQ: Hessian AWare Quantization HAWQ is an advanced quantization library written for PyTorch. HAWQ enables low-precision and mixed-precision uniform

Zhen Dong 293 Dec 30, 2022
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Differentiable Model Compression via Pseudo Quantization Noise DiffQ performs differentiable quantization using pseudo quantization noise. It can auto

Facebook Research 145 Dec 30, 2022
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Nonuniform-to-Uniform Quantization This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quanti

Zechun Liu 60 Dec 28, 2022
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 9, 2023
Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Packt 1.5k Jan 3, 2023
Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Realtime Face Anti-Spoofing Detection ?? Realtime Face Anti Spoofing Detection with Face Detector to detect real and fake faces Please star this repo

Prem Kumar 86 Aug 3, 2022
Vision Deep-Learning using Tensorflow, Keras.

Welcome! I am a computer vision deep learning developer working in Korea. This is my blog, and you can see everything I've studied here. https://www.n

kimminjun 6 Dec 14, 2022
A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

Pawel Dziemiach 1 Dec 18, 2021
A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

Pawel Dziemiach 1 Dec 19, 2021
Keras udrl - Keras implementation of Upside Down Reinforcement Learning

keras_udrl Keras implementation of Upside Down Reinforcement Learning This is me

Eder Santana 7 Jan 24, 2022
Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

PWLQ Updates 2020/07/16 - We are working on getting permission from our institution to release our source code. We will release it once we are granted

null 54 Dec 15, 2022
QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit counterparts.

Monash Green AI Lab 51 Dec 10, 2022
Deep GPs built on top of TensorFlow/Keras and GPflow

GPflux Documentation | Tutorials | API reference | Slack What does GPflux do? GPflux is a toolbox dedicated to Deep Gaussian processes (DGP), the hier

Secondmind Labs 107 Nov 2, 2022
This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

null 14 Sep 13, 2022
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 2, 2022
Classification models 1D Zoo - Keras and TF.Keras

Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

Roman Solovyev 12 Jan 6, 2023
This source code is implemented using keras library based on "Automatic ocular artifacts removal in EEG using deep learning"

CSP_Deep_EEG This source code is implemented using keras library based on "Automatic ocular artifacts removal in EEG using deep learning" {https://www

Seyed Mahdi Roostaiyan 2 Nov 8, 2022
IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python >= 3.7.10 Pytorch == 1.7

null 1 Nov 19, 2021