Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Overview

Introduction to Intel® Neural Compressor

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool) is an open-source Python library running on Intel CPUs and GPUs, which delivers unified interfaces across multiple deep learning frameworks for popular network compression technologies, such as quantization, pruning, knowledge distillation. This tool supports automatic accuracy-driven tuning strategies to help user quickly find out the best quantized model. It also implements different weight pruning algorithms to generate pruned model with predefined sparsity goal and supports knowledge distillation to distill the knowledge from the teacher model to the student model.

Note

GPU support is under development.

Visit the Intel® Neural Compressor online document website at: https://intel.github.io/neural-compressor.

Architecture

Intel® Neural Compressor features an infrastructure and workflow that aids in increasing performance and faster deployments across architectures.

Infrastructure

Infrastructure

Click the image to enlarge it.

Workflow

Workflow

Click the image to enlarge it.

Supported Frameworks

Supported deep learning frameworks are:

Note: Intel Optimized TensorFlow 2.5.0 requires to set environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 before running Neural Compressor quantization or deploying the quantized model.

Note: From the official TensorFlow 2.6.0, oneDNN support has been upstreamed. Download the official TensorFlow 2.6.0 binary for the CPU device and set the environment variable TF_ENABLE_ONEDNN_OPTS=1 before running the quantization process or deploying the quantized model.

Installation

Select the installation based on your operating system.

Linux Installation

You can install Neural Compressor using one of three options: Install just the library from binary or source, or get the Intel-optimized framework together with the library by installing the Intel® oneAPI AI Analytics Toolkit.

Prerequisites

The following prerequisites and requirements must be satisfied for a successful installation:

  • Python version: 3.6 or 3.7 or 3.8 or 3.9

  • C++ compiler: 7.2.1 or above

  • CMake: 3.12 or above

common build issues

Issue 1: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Solution: reinstall pycocotools by "pip install pycocotools --no-cache-dir"

Issue 2: ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Solution: apt install or yum install opencv

Option 1 Install from binary

# install stable version from pip
pip install neural-compressor

# install nightly version from pip
pip install -i https://test.pypi.org/simple/ neural-compressor

# install stable version from from conda
conda install neural-compressor -c conda-forge -c intel 

Option 2 Install from source

git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
python setup.py install

Option 3 Install from AI Kit

The Intel® Neural Compressor library is released as part of the Intel® oneAPI AI Analytics Toolkit (AI Kit). The AI Kit provides a consolidated package of Intel's latest deep learning and machine optimizations all in one place for ease of development. Along with Neural Compressor, the AI Kit includes Intel-optimized versions of deep learning frameworks (such as TensorFlow and PyTorch) and high-performing Python libraries to streamline end-to-end data science and AI workflows on Intel architectures.

The AI Kit is distributed through many common channels, including from Intel's website, YUM, APT, Anaconda, and more. Select and download the AI Kit distribution package that's best suited for you and follow the Get Started Guide for post-installation instructions.

Download AI Kit AI Kit Get Started Guide

Windows Installation

Prerequisites

The following prerequisites and requirements must be satisfied for a successful installation:

  • Python version: 3.6 or 3.7 or 3.8 or 3.9

  • Download and install anaconda.

  • Create a virtual environment named nc in anaconda:

    # Here we install python 3.7 for instance. You can also choose python 3.6, 3.8, or 3.9.
    conda create -n nc python=3.7
    conda activate nc

Installation options

Option 1 Install from binary

# install stable version from pip
pip install neural-compressor

# install nightly version from pip
pip install -i https://test.pypi.org/simple/ neural-compressor

# install from conda
conda install neural-compressor -c conda-forge -c intel 

Option 2 Install from source

git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
python setup.py install

Documentation

Get Started

  • APIs explains Intel® Neural Compressor's API.
  • GUI provides web-based UI service to make quantization easier.
  • Transform introduces how to utilize Neural Compressor's built-in data processing and how to develop a custom data processing method.
  • Dataset introduces how to utilize Neural Compressor's built-in dataset and how to develop a custom dataset.
  • Metric introduces how to utilize Neural Compressor's built-in metrics and how to develop a custom metric.
  • Tutorial provides comprehensive instructions on how to utilize Neural Compressor's features with examples.
  • Examples are provided to demonstrate the usage of Neural Compressor in different frameworks: TensorFlow, PyTorch, MXNet, and ONNX Runtime.
  • Intel oneAPI AI Analytics Toolkit Get Started Guide explains the AI Kit components, installation and configuration guides, and instructions for building and running sample apps.
  • AI and Analytics Samples includes code samples for Intel oneAPI libraries.

Deep Dive

  • Quantization are processes that enable inference and training by performing computations at low-precision data types, such as fixed-point integers. Neural Compressor supports Post-Training Quantization (PTQ) with different quantization capabilities and Quantization-Aware Training (QAT). Note that (Dynamic Quantization) currently has limited support.
  • Pruning provides a common method for introducing sparsity in weights and activations.
  • Knowledge Distillation provides a common method for distilling knowledge from teacher model to student model.
  • Distributed Training introduces how to leverage Horovod to do multi-node training in Intel® Neural Compressor to speed up the training time.
  • Benchmarking introduces how to utilize the benchmark interface of Neural Compressor.
  • Mixed precision introduces how to enable mixed precision, including BFP16 and int8 and FP32, on Intel platforms during tuning.
  • Graph Optimization introduces how to enable graph optimization for FP32 and auto-mixed precision.
  • Model Conversion introduces how to convert TensorFlow QAT model to quantized model running on Intel platforms.
  • TensorBoard provides tensor histograms and execution graphs for tuning debugging purposes.

Advanced Topics

  • Execution Engine is a bare metal solution domain-specific NLP models as the reference for customers.
  • Adaptor is the interface between components and framework. The method to develop adaptor extension is introduced with ONNX Runtime as example.
  • Strategy can automatically optimized low-precision recipes for deep learning models to achieve optimal product objectives like inference performance and memory usage with expected accuracy criteria. The method to develop a new strategy is introduced.

Publications

Full publication list please refers to here

System Requirements

Intel® Neural Compressor supports systems based on Intel 64 architecture or compatible processors, specially optimized for the following CPUs:

  • Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, and Icelake)
  • future Intel Xeon Scalable processor (code name Sapphire Rapids)

Intel® Neural Compressor requires installing the Intel-optimized framework version for the supported DL framework you use: TensorFlow, PyTorch, MXNet, or ONNX runtime.

Note: Intel Neural Compressor supports Intel-optimized and official frameworks for some TensorFlow versions. Refer to Supported Frameworks for specifics.

Validated Hardware/Software Environment

Platform OS Python Framework Version
Cascade Lake

Cooper Lake

Skylake

Ice Lake
CentOS 8.3

Ubuntu 18.04
3.6

3.7

3.8

3.9
TensorFlow 2.7.0
2.6.2
2.5.0
1.15.0UP3
PyTorch 1.10.0+cpu
1.9.0+cpu
1.8.0+cpu
IPEX
MXNet 1.8.0
1.7.0
1.6.0
ONNX Runtime 1.9.0
1.8.0
1.7.0

Validated Models

Intel® Neural Compressor provides numerous examples to show promising accuracy loss with the best performance gain. A full quantized model list on various frameworks is available in the Model List.

Validated MLPerf Models

Model Framework Support Example
ResNet50 v1.5 TensorFlow Yes Link
PyTorch Yes Link
DLRM PyTorch Yes Link
BERT-large TensorFlow Yes Link
PyTorch Yes Link
SSD-ResNet34 TensorFlow Yes Link
PyTorch Yes Link
RNN-T PyTorch Yes Link
3D-UNet TensorFlow WIP
PyTorch Yes Link

Validated Quantized Models

Framework version model Accuracy Performance/ICX8380/1s4c10ins1bs/throughput(samples/sec)
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
tensorflow 2.6.0 resnet50v1.0 74.11% 74.27% -0.22% 1287.00 495.29 2.60x
tensorflow 2.6.0 resnet50v1.5 76.82% 76.46% 0.47% 1218.03 420.34 2.90x
tensorflow 2.6.0 resnet101 77.50% 76.45% 1.37% 849.62 345.54 2.46x
tensorflow 2.6.0 inception_v1 70.48% 69.74% 1.06% 2202.64 1058.20 2.08x
tensorflow 2.6.0 inception_v2 74.36% 73.97% 0.53% 1751.31 827.81 2.11x
tensorflow 2.6.0 inception_v3 77.28% 76.75% 0.69% 868.06 384.17 2.26x
tensorflow 2.6.0 inception_v4 80.40% 80.27% 0.16% 569.48 197.28 2.89x
tensorflow 2.6.0 inception_resnet_v2 80.44% 80.40% 0.05% 269.03 137.25 1.96x
tensorflow 2.6.0 mobilenetv1 71.79% 70.96% 1.17% 3831.42 1189.06 3.22x
tensorflow 2.6.0 mobilenetv2 71.79% 71.76% 0.04% 2570.69 1237.62 2.07x
tensorflow 2.6.0 ssd_resnet50_v1 37.86% 38.00% -0.37% 65.52 24.01 2.73x
tensorflow 2.6.0 ssd_mobilenet_v1 22.97% 23.13% -0.69% 842.46 404.04 2.08x
tensorflow 2.6.0 ssd_resnet34 21.69% 22.09% -1.81% 41.23 10.75 3.83x
Framework version model Accuracy Performance/ICX8380/1s4c10ins1bs/throughput(samples/sec)
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
pytorch 1.9.0+cpu resnet18 69.59% 69.76% -0.24% 692.04 363.64 1.90x
pytorch 1.9.0+cpu resnet50 76.00% 76.13% -0.17% 453.10 186.67 2.43x
pytorch 1.9.0+cpu resnext101_32x8d 79.02% 79.31% -0.36% 196.27 70.08 2.80x
pytorch 1.9.0+cpu bert_base_mrpc 88.12% 88.73% -0.69% 199.32 107.34 1.86x
pytorch 1.9.0+cpu bert_base_cola 59.06% 58.84% 0.37% 198.53 105.29 1.89x
pytorch 1.9.0+cpu bert_base_sts-b 88.72% 89.27% -0.62% 203.29 107.03 1.90x
pytorch 1.9.0+cpu bert_base_sst-2 91.74% 91.86% -0.13% 197.86 105.31 1.88x
pytorch 1.9.0+cpu bert_base_rte 70.40% 69.68% 1.04% 192.90 107.25 1.80x
pytorch 1.9.0+cpu bert_large_mrpc 87.66% 88.33% -0.75% 94.08 33.84 2.78x
pytorch 1.9.0+cpu bert_large_squad 92.69 93.05 -0.38% 20.93 11.18 1.87x
pytorch 1.9.0+cpu bert_large_qnli 91.12% 91.82% -0.76% 93.75 33.73 2.78x
pytorch 1.9.0+cpu bert_large_rte 72.20% 72.56% -0.50% 52.80 33.62 1.57x
pytorch 1.9.0+cpu bert_large_cola 62.07% 62.57% -0.80% 94.97 33.77 2.81x
pytorch 1.9.0+cpu inception_v3 69.48% 69.54% -0.09% 418.59 207.77 2.01x
pytorch 1.9.0+cpu peleenet 71.61% 72.08% -0.66% 461.47 359.58 1.28x
pytorch 1.9.0+cpu yolo_v3 24.50% 24.54% -0.17% 98.11 37.50 2.62x

Validated Pruning Models

Tasks FWK Model fp32 baseline gradient sensitivity with 20% sparsity +onnx dynamic quantization on pruned model
accuracy% drop% perf gain (sample/s) accuracy% drop% perf gain (sample/s)
SST-2 pytorch bert-base accuracy = 92.32 accuracy = 91.97 -0.38 1.30x accuracy = 92.20 -0.13 1.86x
QQP pytorch bert-base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [89.97, 86.54] [-1.24, -1.71] 1.32x [accuracy, f1] = [89.75, 86.60] [-1.48, -1.65] 1.81x
Tasks FWK Model fp32 baseline Pattern Lock on 70% Unstructured Sparsity Pattern Lock on 50% 1:2 Structured Sparsity
accuracy% drop% accuracy% drop%
MNLI pytorch bert-base [m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27] [-2.51, -1.80] [m, mm] = [83.20, 84.11] [-1.62, -0.80]
SST-2 pytorch bert-base accuracy = 92.32 accuracy = 91.51 -0.88 accuracy = 92.20 -0.13
QQP pytorch bert-base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06] [-0.68, -1.12] [accuracy, f1] = [90.92, 87.78] [-0.20, -0.31]
QNLI pytorch bert-base accuracy = 91.54 accuracy = 90.39 -1.26 accuracy = 90.87 -0.73
QnA pytorch bert-base [em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75] [-2.61, -1.54] [em, f1] = [78.03, 86.50] [-1.65, -0.69]
Framework Model fp32 baseline Compression dataset acc(drop)%
Pytorch resnet18 69.76 30% sparsity on magnitude ImageNet 69.47(-0.42)
Pytorch resnet18 69.76 30% sparsity on gradient sensitivity ImageNet 68.85(-1.30)
Pytorch resnet50 76.13 30% sparsity on magnitude ImageNet 76.11(-0.03)
Pytorch resnet50 76.13 30% sparsity on magnitude and post training quantization ImageNet 76.01(-0.16)
Pytorch resnet50 76.13 30% sparsity on magnitude and quantization aware training ImageNet 75.90(-0.30)

Validated Knowledge Distillation Examples

Example Name Dataset Student
(Accuracy)
Teacher
(Accuracy)
Student With Distillation
(Accuracy Improvement)
ResNet example ImageNet ResNet18
(0.6739)
ResNet50
(0.7399)
0.6845
(0.0106)
BlendCnn example MRPC BlendCnn
(0.7034)
BERT-Base
(0.8382)
0.7034
(0)
BiLSTM example SST-2 BiLSTM
(0.7913)
RoBERTa-Base
(0.9404)
0.8085
(0.0172)

Validated Engine Examples

model Accuracy Performance/ICX8380/1s4c10ins1bs/seq_len128/throughput(samples/sec) Performance/ICX8380/2s4c20ins64bs/seq_len128/throughput(samples/sec)
INT8 FP32 Acc   Ratio[(INT8-FP32)/FP32] INT8 FP32 Preformance   Ratio[INT8/FP32] INT8 FP32 Preformance   Ratio[INT8/FP32]
bert_large_squad 90.74 90.87 -0.14% 44.9 12.33 3.64x 362.21 88.38 4.10x
distilbert_base_uncased_sst2 90.14% 90.25% -0.12% 1003.01 283.69 3.54x 2104.26 606.58 3.47x
minilm_l6_h384_uncased_sst2 89.33% 90.14% -0.90% 2739.73 999 2.74x 5389.98 2333.14 2.31x
roberta_base_mrpc 89.46% 88.97% 0.55% 506.07 142.13 3.56x 1167.09 311.5 3.75x
bert_base_nli_mean_tokens_stsb 89.27% 89.55% -0.31% 503.52 140.98 3.57x 1096.46 332.54 3.30x
bert_base_sparse_mrpc 70.34% 70.59% -0.35% 506.59 142.33 3.56x 1133.04 339.96 3.33x
distilroberta_base_wnli 56.34% 56.34% 0.00% 1026.69 290.7 3.53x 2309.9 620.81 3.72x
paraphrase_xlm_r_multilingual_v1_stsb 86.72% 87.23% -0.58% 509.68 142.73 3.57x 1169.45 311.59 3.75x
distilbert_base_uncased_mrpc 84.07% 84.07% 0.00% 1002 280.27 3.58x 2107.96 606.95 3.47x
finbert_financial_phrasebank 82.74% 82.80% -0.07% 919.12 272.48 3.37x 1101.13 331.88 3.32x
distilbert_base_uncased_emotion 93.85% 94.20% -0.37% 1003.01 283.53 3.54x 2103.22 607.08 3.46x

Additional Content

Hiring

We are hiring. Please send your resume to [email protected] if you have interests in model compression techniques.

Comments
  • Issue on how to get started

    Issue on how to get started

    Hello on the main pages there are instruction to apply. I put those in a notebook however they don't work. with pip install tensorflow as you wrote I got ValueError: Please install Intel® Optimizations for TensorFlow or MKL enabled TensorFlow from source code within version >=1.14.0 and <=2.8.0. 2022-05-20 14:25:56 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit.

    Here https://www.intel.com/content/www/us/en/developer/articles/guide/optimization-for-tensorflow-installation-guide.html

    it is said to perform pip install intel-tensorflow==2.8.0 but then another error appear 2022-05-20 14:31:58 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit. Could you be please precise default_netcompressor.zip so that results are reproduceable ? Thanks

    opened by danilopau 28
  • [Tensorflow QAT] AttributeError: 'NoneType' object has no attribute 'graph_def'

    [Tensorflow QAT] AttributeError: 'NoneType' object has no attribute 'graph_def'

    Environment: Google Colab LPOT Version: 1.6 Tensorflow Version: Official 2.6.0 (with environment variables set as below) TF_ENABLE_ONEDNN_OPTS=1 TF_ENABLE_MKL_NATIVE_FORMAT=0

    I basically followed the qat example provided here. I used a pretrained model that is to be annotated with only Conv2D being quantized, and used the annotated model for model.fit() for several epochs and saved the model. After that, I use LPOT ModelConversion to convert the model, and the following error occurs:

    2021-09-10 03:07:43 [INFO] Pass Quantization elapsed time: 7581.68 ms
    2021-09-10 03:07:44 [INFO] Pass FreezeFakeQuantOpOptimizer elapsed time: 283.8 ms
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py", line 534, in quantize
        self._fuse_requantize_with_fused_quantized_node()
      File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py", line 698, in _fuse_requantize_with_fused_quantized_node
        self.device).do_transformation()
      File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_rewriter/int8/fuse_conv_requantize.py", line 47, in __init__
        self.graph_info = self.graph_analyzer.parse_graph()
      File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_rewriter/graph_util.py", line 611, in parse_graph
        each_input)].outputs.append(node_name)
    KeyError: 'model_3/quant_31/StatefulPartitionedCall/StatefulPartitionedCall/MovingAvgQuantize/FakeQuantWithMinMaxVars'
    2021-09-10 03:07:44 [ERROR] Fail to quantize graph due to 'model_3/quant_31/StatefulPartitionedCall/StatefulPartitionedCall/MovingAvgQuantize/FakeQuantWithMinMaxVars'.
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-3-515087c4513a> in <module>()
          4 conversion.destination = 'default'
          5 conversion.model = common.Model('./q_aware_model')
    ----> 6 q_model = conversion()
          7 q_model.save('quantized_model')
    
    2 frames
    /usr/local/lib/python3.7/dist-packages/lpot/experimental/model_conversion.py in __call__(self)
         94 
         95         self.adaptor = FRAMEWORKS[self.framework](framework_specific_info)
    ---> 96         q_model = self.adaptor.convert(self._model, self._source, self._destination)
         97 
         98         # when eval_func is None but metric or _eval_dataloader is set by yaml or code,
    
    /usr/local/lib/python3.7/dist-packages/lpot/adaptor/tensorflow.py in convert(self, model, source, destination)
        814                                    fake_quant=True)
        815 
    --> 816         return converter.convert()
        817 
        818     @dump_elapsed_time("Pass recover model")
    
    /usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py in convert(self)
        247         if len(self.bf16_ops) > 0:
        248             model = self.bf16_convert()
    --> 249         post_cse_graph_def = PostCseOptimizer(model.graph_def).do_transformation()
        250         post_cse_graph_def.library.CopyFrom(self.model.graph_def.library)
        251         model.graph_def = post_cse_graph_def
    
    AttributeError: 'NoneType' object has no attribute 'graph_def'
    

    My original code (simplified):

    model = tf.keras.models.load_model('model')
    
    import tensorflow_model_optimization as tfmot
    
    def apply_quantization_to_Conv2D(layer):
      if isinstance(layer, tf.keras.layers.Conv2D):
        return tfmot.quantization.keras.quantize_annotate_layer(layer)
      return layer
    
    annotated_model = tf.keras.models.clone_model(model, clone_function=apply_quantization_to_Conv2D)
    
    q_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)
    q_aware_model.summary()
    
    q_aware_model.compile(optimizer='adam', loss='mse')
    q_aware_model.fit(x=[X_q, X_norm_q], y=y_q, 
                      batch_size=64,
                      epochs=45)
    q_aware_model.save('./q_aware_model')
    
    from lpot.experimental import ModelConversion, common
    conversion = ModelConversion()
    conversion.source = 'QAT'
    conversion.destination = 'default'
    conversion.model = common.Model('./q_aware_model')
    q_model = conversion()
    q_model.save('quantized_model')
    

    Please find model here. Thanks!

    opened by peiwenhuang27 25
  • Enable Transformer LT search space for Dynamic Neural Architecture Search Toolkit

    Enable Transformer LT search space for Dynamic Neural Architecture Search Toolkit

    Type of Change

    Feature. Extends current Dynamic Neural Architecture Search Toolkit set of supported search spaces with Transformer LT super-network for En-De language translation.

    Description

    DyNAS-T (Dynamic Neural Architecture Search Toolkit) is a SuperNet NAS optimization package (part of Intel Neural Compressor) designed for finding the optimal Pareto front during neural architecture search while minimizing the number of search validation measurements. It supports single-/multi-/many-objective problems for a variety of domains supported. The system currently heavily utilizes the pymoo optimization library. Some of the key DyNAS-T features are:

    • Automatic handling of super-network parameters for search and predictor training
    • Genetic Algorithm (e.g., NSGA-II) multi-objective subnetworks
    • LINAS (Lightweight Iterative Neural Architecture Search) accelerated search using approximate predictors
    • Warm-start (transfer) search
    • Search population statistical analysis

    This PR extends supported search spaces with Transformer-based Language Translation (transformer_lt_wmt_en_de) for English and German languages. Implementation of the supernet is based on Hardware Aware Transformers (HAT) by MIT HAN Lab.

    How has this PR been tested?

    To run an example, trained supernet weights and preprocessed WMT En-De dataset is needed. Both can be downloaded from Hardware Aware Transformers (HAT) repository.

    • Script to download preprocessed dataset: link
    • Download trained supernet weights: link

    Example code to test new functionality:

    config = NASConfig(approach='dynas', search_algorithm='nsga2')
    config.dynas.supernet = 'transformer_lt_wmt_en_de'
    config.seed = 42
    config.dynas.metrics = ['acc', 'macs']
    
    config.dynas.population = 50
    config.dynas.num_evals = 500
    config.dynas.batch_size = 64
    config.dynas.results_csv_path = 'results.csv'
    config.dynas.dataset_path = '/datasets/hat_dataset/data/binary/wmt16_en_de'
    config.dynas.supernet_ckpt_path  ='/datasets/hat_dataset/HAT_wmt14ende_super_space0.pt'
    agent = NAS(config)
    results = agent.search()
    

    Dependency Change?

    • fairseq
    • sacremoses
    • torchprofile

    Signed-off-by: Maciej Szankin [email protected]

    new feature 
    opened by macsz 21
  • Refine Keras Examples for INC New API

    Refine Keras Examples for INC New API

    Type of Change

    feature or bug fix or documentation or validation or others
    API changed or not: No

    Description

    detail description JIRA ticket: https://jira.devtools.intel.com/browse/ILITV-2451

    Expected Behavior & Potential Risk

    the expected behavior that triggered by this PR : refine examples

    How has this PR been tested?

    how to reproduce the test (including hardware information): extension

    Dependency Change?

    any library dependency introduced or removed: No

    examples extension test 
    opened by zehao-intel 17
  • [Tensorflow] ops not quantized

    [Tensorflow] ops not quantized

    Framework: Tensorflow 2.6.0 LPOT: 1.6.0

    When I printed out thetune_cfg() in strategy.py

    ### op_cfgs ###
    ('model/dense_5/Tensordot/MatMul', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
    ('model/dense_5/Tensordot/concat_1', 'concat')
    {'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
    ('model/dense_4/Tensordot/MatMul', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
    ('model/dense_4/Tensordot/concat_1', 'concat')
    {'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
    ('model/dense_3/Tensordot/MatMul', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
    ('model/dense_3/Tensordot/concat_1', 'concat')
    {'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
    ('model/dense_1/Tensordot/MatMul', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
    ('model/dense_1/Tensordot/concat_1', 'concat')
    {'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
    ('model/dense_2/Tensordot/MatMul', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
    ('model/dense_2/Tensordot/concat_1', 'concat')
    {'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
    ('model/dense/Tensordot/MatMul', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
    ('model/LSTM_2/PartitionedCall/while/body/_23/while/MatMul', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul', 'precision': 'int8'}}
    ('model/dense/Tensordot/concat_1', 'concat')
    {'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
    ('model/LSTM_2/PartitionedCall/while/body/_23/while/MatMul_1', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul', 'precision': 'int8'}}
    ('model/LSTM_1/PartitionedCall/while/body/_83/while/MatMul', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul', 'precision': 'int8'}}
    ('model/LSTM_1/PartitionedCall/while/body/_83/while/MatMul_1', 'matmul')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul', 'precision': 'int8'}}
    ('model/52/Conv2D', 'conv2d')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
    ('model/51/Conv2D', 'conv2d')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
    ('model/42/Conv2D', 'conv2d')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
    ('model/41/Conv2D', 'conv2d')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
    ('model/32/Conv2D', 'conv2d')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
    ('model/31/Conv2D', 'conv2d')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
    ('model/2/Conv2D', 'conv2d')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
    ('model/1/Conv2D', 'conv2d')
    {'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
    
    
    ### dispatched_op_names ###
    ['model/dense_5/Tensordot/MatMul', 'model/dense_5/Tensordot/concat_1', 'model/dense_4/Tensordot/MatMul', 'model/dense_4/Tensordot/concat_1', 'model/dense_3/Tensordot/MatMul', 'model/dense_3/Tensordot/concat_1', 'model/dense_1/Tensordot/MatMul', 'model/dense_1/Tensordot/concat_1', 'model/dense_2/Tensordot/MatMul', 'model/dense_2/Tensordot/concat_1', 'model/dense/Tensordot/MatMul', 'model/LSTM_2/PartitionedCall/while/body/_23/while/MatMul', 'model/dense/Tensordot/concat_1', 'model/LSTM_2/PartitionedCall/while/body/_23/while/MatMul_1', 'model/LSTM_1/PartitionedCall/while/body/_83/while/MatMul', 'model/LSTM_1/PartitionedCall/while/body/_83/while/MatMul_1', 'model/52/Conv2D', 'model/51/Conv2D', 'model/42/Conv2D', 'model/41/Conv2D', 'model/32/Conv2D', 'model/31/Conv2D', 'model/2/Conv2D', 'model/1/Conv2D']
    ### invalid_op_names ###
    []
    
    
    
    2021-08-27 08:43:41 [WARNING] Found possible input node names: ['input_noisy', 'input_noisy_norm'], output node names: ['outputMask'].
    2021-08-27 08:43:53 [WARNING] Found possible input node names: ['input_noisy', 'input_noisy_norm'], output node names: ['outputMask'].
    2021-08-27 08:44:01.108141: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
    2021-08-27 08:44:01.108428: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
    2021-08-27 08:44:01.156499: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1137] Optimization results for grappler item: graph_to_optimize
      function_optimizer: Graph size after: 974 nodes (370), 1237 edges (530), time = 17.966ms.
      function_optimizer: function_optimizer did nothing. time = 0.805ms.
    
    2021-08-27 08:44:02.385658: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
    2021-08-27 08:44:02.385886: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
    2021-08-27 08:44:02.657347: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1137] Optimization results for grappler item: tf_graph
      constant_folding: Graph size after: 782 nodes (-96), 919 edges (-118), time = 150.555ms.
      constant_folding: Graph size after: 782 nodes (0), 919 edges (0), time = 37.266ms.
    
    2021-08-27 08:44:05 [INFO] Pass Quantization elapsed time: 2325.7 ms
    2021-08-27 08:44:38 [INFO] Pass QuantizedRNNConverter elapsed time: 57.53 ms
    2021-08-27 08:44:39 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 168.84 ms
    2021-08-27 08:44:39 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 57.83 ms
    2021-08-27 08:44:39 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 57.06 ms
    2021-08-27 08:44:39 [INFO] Pass MetaOpOptimizer elapsed time: 54.55 ms
    2021-08-27 08:44:39 [WARNING] Node name unused_control_flow_input_20 specified in yaml doesn't exist in the model.
    2021-08-27 08:44:39 [WARNING] Found possible input node names: ['input_noisy', 'input_noisy_norm'], output node names: ['outputMask'].
    2021-08-27 08:44:41 [INFO] Pass PostCseOptimizer elapsed time: 1593.45 ms
    2021-08-27 08:44:41 [INFO] |********Mixed Precision Statistics*******|
    2021-08-27 08:44:41 [INFO] +---------------+---------+-------+-------+
    2021-08-27 08:44:41 [INFO] |    Op Type    |  Total  |  INT8 |  FP32 |
    2021-08-27 08:44:41 [INFO] +---------------+---------+-------+-------+
    2021-08-27 08:44:41 [INFO] |     Conv2D    |    8    |   0   |   8   |
    2021-08-27 08:44:41 [INFO] |     MatMul    |    10   |   6   |   4   |
    2021-08-27 08:44:41 [INFO] |    ConcatV2   |    6    |   0   |   6   |
    2021-08-27 08:44:41 [INFO] |   QuantizeV2  |    6    |   6   |   0   |
    2021-08-27 08:44:41 [INFO] |   Dequantize  |    1    |   1   |   0   |
    2021-08-27 08:44:41 [INFO] +---------------+---------+-------+-------+
    2021-08-27 08:44:41 [INFO] Pass quantize model elapsed time: 73892.89 ms
    2021-08-27 08:44:41 [INFO] Start to evaluate the TensorFlow model.
    2021-08-27 08:46:07 [INFO] Tune 1 result is: [accuracy: 0.3118, duration (seconds): 86.5451], Best tune result is: [accuracy: 0.3118, duration (seconds): 86.5451]
    

    First Conv2D and Matmul seems to be set to quantize to int8, but in mixed precision statistics, they are still in fp32 format. My main focus is to speed up Conv2D computation, but I cannot find the reason why it stays unquantized. Is this because the pattern is unmatched? Originally, my convolutional layer is paired with a leaky ReLU, and I also tried using ReLU, or no activation at all, but it just won't quantize Conv2D.

    Please find my model link here

    opened by peiwenhuang27 17
  • Quantization AssertionError

    Quantization AssertionError "inputs len must equal with input_tensor"

    I'm following the TensorFlow BERT MRPC example to run the neural compressor with a saved model that I exported after fine tuning BERT from the Intel Model Zoo using the IMDB movie review sentiment analysis dataset. The training task for this was "cola" instead of "mrpc", but I still used run_classifier.py to train the model.
    I used the same Dataset class definition and collate_fn from the example and my yaml has:

    model:
      name: bert
      framework: tensorflow
      inputs: input_file, batch_size
      outputs: Cast_147:0, loss/Mean:0, loss/Neg:0, loss/Cast:0
    

    My python code looks like this:

    from neural_compressor.metric import METRICS
    class Accuracy(object):
        def __init__(self):
            self.metric = METRICS('tensorflow')['Accuracy']()
              
        # it's ugly that the label is in the iterator
        def update(self, preds, label):
            logits, labels = preds
            self.metric.update(logits, labels)
    
        def reset(self):
            self.metric.reset()
    
        def result(self):
            return self.metric.result()
    
    # Using run_classifier from the Intel model zoo
    from run_classifier import file_based_input_fn_builder
    
    eval_file = os.path.join(output_dir, "eval.tf_record")
    estimator_input_fn = file_based_input_fn_builder(
              input_file=eval_file,
              seq_length=max_seq_length,
              is_training=False,
              drop_remainder=False)
    
    quantizer.model = common.Model(os.path.join(output_dir, "frozen"), input_fn=estimator_input_fn)
    quantizer.calib_dataloader = common.DataLoader(dataset, collate_fn=collate_fn)
    quantizer.eval_dataloader = common.DataLoader(dataset, collate_fn=collate_fn)
    
    quantizer.metric = common.Metric(metric_cls=Accuracy, name="bert_metric")
    q_model = quantizer()
    

    This is failing with the following error:

    2021-10-11 20:32:11 [INFO] Start to evaluate the TensorFlow model.
    2021-10-11 20:32:11 [ERROR] Unexpected exception AssertionError('inputs len must equal with input_tensor') happened during tuning.
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/quantization.py", line 151, in execute
        self.strategy.traverse()
      File "/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/strategy.py", line 307, in traverse
        self.baseline = self._evaluate(self.model)
      File "/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/strategy.py", line 446, in _evaluate
        val = self.objective.evaluate(eval_func, model)
      File "/usr/local/lib/python3.8/dist-packages/neural_compressor/objective.py", line 213, in evaluate
        acc = eval_func(model)
      File "/usr/local/lib/python3.8/dist-packages/neural_compressor/utils/create_obj_from_config.py", line 132, in eval_func
        return adaptor.evaluate(model, dataloader, postprocess,
      File "/usr/local/lib/python3.8/dist-packages/neural_compressor/adaptor/tensorflow.py", line 291, in evaluate
        assert len(input_tensor) == len(inputs), \
    AssertionError: inputs len must equal with input_tensor
    

    The input_tensor for my model looks like this so I'm assuming that length is 4. What is other inputs that's being checked in the error above that has a different length than the input tensor?

    [<tf.Tensor 'input_mask:0' shape=(None, 128) dtype=int32>,
     <tf.Tensor 'input_ids:0' shape=(None, 128) dtype=int32>,
     <tf.Tensor 'label_ids:0' shape=(None,) dtype=int32>,
     <tf.Tensor 'segment_ids:0' shape=(None, 128) dtype=int32>]
    

    Any suggestions on how to resolve this issue?

    opened by dmsuehir 15
  • Refine TF NLP models with INC User NewAPI

    Refine TF NLP models with INC User NewAPI

    Type of Change

    Examples API not changed

    Description

    JIRA ticket: https://jira.devtools.intel.com/browse/ILITV-2454

    Expected Behavior & Potential Risk

    Pre-CI Test pass Extension Test pass

    How has this PR been tested?

    Pre-CI Test Extension Test

    examples extension test 
    opened by ChendaLi-Intel 14
  • Update some PyTorch examples for new API

    Update some PyTorch examples for new API

    Signed-off-by: Cheng, Penghui [email protected]

    Type of Change

    Update PyTroch examples(3d_unet, CNN-2, MobileNet_v2) No API changed

    Description

    JIRA ticket: https://jira.devtools.intel.com/secure/RapidBoard.jspa?rapidView=32982&projectKey=ILITV&view=detail&selectedIssue=ILITV-2354

    examples extension test 
    opened by PenghuiCheng 13
  • InternalError: Missing 0-th output from node model/layer_1/Conv2D_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2)

    InternalError: Missing 0-th output from node model/layer_1/Conv2D_eightbit_requantize (defined at :2)

    Case 1

    Framework: Tensorflow 2.5.0, Intel-Tensorflow 2.5.0 Environment: Google Colab

    I have a successfully quantized model that is to be run for inference without using LPOT API, so I wrote the following inference code:

    with tf.compat.v1.Session() as sess:
        tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
        output = sess.graph.get_tensor_by_name(output_tensor_name)
        predictions = sess.run(output, {input_tensor_name: x})
        mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y, predictions))
        print(mse.eval())
    

    When running the line predictions = sess.run(output, {input_tensor_name: x}):

    ---------------------------------------------------------------------------
    InternalError                             Traceback (most recent call last)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
       1374     try:
    -> 1375       return fn(*args)
       1376     except errors.OpError as e:
    
    7 frames
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
       1359       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
    -> 1360                                       target_list, run_metadata)
       1361 
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
       1452                                             fetch_list, target_list,
    -> 1453                                             run_metadata)
       1454 
    
    InternalError: Missing 0-th output from {{node model/layer_1/Conv2D_eightbit_requantize}}
    
    During handling of the above exception, another exception occurred:
    
    InternalError                             Traceback (most recent call last)
    <ipython-input-6-2bddd853d111> in <module>()
          2     tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
          3     output = sess.graph.get_tensor_by_name(output_tensor_name)
    ----> 4     predictions = sess.run(output, {input_tensor_name: x[:64]}) # 64, 257, 60, 1
          5     mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y[:64], predictions))
          6     print(mse.eval())
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
        966     try:
        967       result = self._run(None, fetches, feed_dict, options_ptr,
    --> 968                          run_metadata_ptr)
        969       if run_metadata:
        970         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
       1189     if final_fetches or final_targets or (handle and feed_dict_tensor):
       1190       results = self._do_run(handle, final_targets, final_fetches,
    -> 1191                              feed_dict_tensor, options, run_metadata)
       1192     else:
       1193       results = []
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
       1367     if handle is None:
       1368       return self._do_call(_run_fn, feeds, fetches, targets, options,
    -> 1369                            run_metadata)
       1370     else:
       1371       return self._do_call(_prun_fn, handle, feeds, fetches)
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
       1392                     '\nsession_config.graph_options.rewrite_options.'
       1393                     'disable_meta_optimizer = True')
    -> 1394       raise type(e)(node_def, op, message)
       1395 
       1396   def _extend_graph(self):
    
    InternalError: Missing 0-th output from node model/layer_1/Conv2D_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2) 
    

    This error happens with or without Intel-Tensorflow==2.5.0 installed, nor is it resolved when os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1' is set explicitly.

    On the other hand, when I run the same code in VS Code with Python 3.6.8 64-bit base: Conda, it returns the same error message as in Case 2.

    Case 2

    Framework: Tensorflow 2.4.0, Intel-Tensorflow 2.4.0 Environment: Google Colab

    This case works well and prints out the MSE loss of the predictions, but when I uninstall Intel-Tensorflow 2.4.0 and run it with official Tensorflow, while running the same line in Case 1 (predictions = sess.run(output, {input_tensor_name: x})):

    ---------------------------------------------------------------------------
    InvalidArgumentError                      Traceback (most recent call last)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
       1374     try:
    -> 1375       return fn(*args)
       1376     except errors.OpError as e:
    
    7 frames
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
       1357       # Ensure any changes to the graph are reflected in the runtime.
    -> 1358       self._extend_graph()
       1359       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _extend_graph(self)
       1397     with self._graph._session_run_lock():  # pylint: disable=protected-access
    -> 1398       tf_session.ExtendSession(self._session)
       1399 
    
    InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by {{node model/dense/Tensordot/MatMul_eightbit_requantize}} with these attrs: [input_quant_mode="MIN_FIRST", T1=DT_QUINT8, Toutput=DT_FLOAT, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
    Registered devices: [CPU]
    Registered kernels:
      <no registered kernels>
    
    	 [[model/dense/Tensordot/MatMul_eightbit_requantize]]
    
    During handling of the above exception, another exception occurred:
    
    InvalidArgumentError                      Traceback (most recent call last)
    <ipython-input-6-2bddd853d111> in <module>()
          2     tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
          3     output = sess.graph.get_tensor_by_name(output_tensor_name)
    ----> 4     predictions = sess.run(output, {input_tensor_name: x[:64]}) # 64, 257, 60, 1
          5     mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y[:64], predictions))
          6     print(mse.eval())
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
        966     try:
        967       result = self._run(None, fetches, feed_dict, options_ptr,
    --> 968                          run_metadata_ptr)
        969       if run_metadata:
        970         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
       1189     if final_fetches or final_targets or (handle and feed_dict_tensor):
       1190       results = self._do_run(handle, final_targets, final_fetches,
    -> 1191                              feed_dict_tensor, options, run_metadata)
       1192     else:
       1193       results = []
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
       1367     if handle is None:
       1368       return self._do_call(_run_fn, feeds, fetches, targets, options,
    -> 1369                            run_metadata)
       1370     else:
       1371       return self._do_call(_prun_fn, handle, feeds, fetches)
    
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
       1392                     '\nsession_config.graph_options.rewrite_options.'
       1393                     'disable_meta_optimizer = True')
    -> 1394       raise type(e)(node_def, op, message)
       1395 
       1396   def _extend_graph(self):
    
    InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by node model/dense/Tensordot/MatMul_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2)  with these attrs: [input_quant_mode="MIN_FIRST", T1=DT_QUINT8, Toutput=DT_FLOAT, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
    Registered devices: [CPU]
    Registered kernels:
      <no registered kernels>
    
    	 [[model/dense/Tensordot/MatMul_eightbit_requantize]]
    

    The error persists even with os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1' set explicitly.

    I believe both cases are caused by the same type of error, i.e. No OpKernel was registered to support Op ...

    I was given to understand that with official Tensorflow v2.5 installed and the environment variable TF_ENABLE_ONEDNN_OPTS=1 set (reference), the quantized model is supposed to run with oneDNN supported. But it doesn't seem to be the case in neither v2.4 nor v2.5.

    Not sure if this is the right place to post this issue, but I have nowhere else to report the problem as Intel-Tensorflow doesn't allow issue reporting and Tensorflow developers usually ignore issues dependent on other packages. Any hint is greatly appreciated, thank you.

    opened by peiwenhuang27 13
  • INC New API TF oob, wide_deep_large_ds, 3dunet-mlperf examples

    INC New API TF oob, wide_deep_large_ds, 3dunet-mlperf examples

    Signed-off-by: Lv, Liang1 [email protected]

    Type of Change

    feature API not changed

    Description

    detail description JIRA ticket: ILITV-2455

    Expected Behavior & Potential Risk

    INC New API TF enable oob, wide_deep_large_ds, 3dunet-mlperf examples

    How has this PR been tested?

    UT, Pre-CI and OOB extension test.

    Dependency Change?

    No.

    examples extension test 
    opened by lvliang-intel 12
  • New API ONNXRT example update

    New API ONNXRT example update

    Type of Change

    example

    Description

    update ONNXRT example for new API

    JIRA ticket: ILITV-2468

    How has this PR been tested?

    extension test on onnx models

    Dependency Change?

    no

    examples extension test 
    opened by yuwenzho 11
  • numpy feature test

    numpy feature test

    Type of Change

    feature test issue fix

    Description

    AttributeError: module 'numpy' has no attribute 'bool'

    Expected Behavior & Potential Risk

    feature test pass

    How has this PR been tested?

    feature test

    Dependency Change?

    no

    opened by ronggegu 0
  • unet fix yaml

    unet fix yaml

    Type of Change

    bug fix

    Description

    Key 'version' error: (1.0) should evaluate to True. schema.SchemaError: (1.0) should evaluate to True

    Expected Behavior & Potential Risk

    model test pass

    How has this PR been tested?

    extension test

    Dependency Change?

    no

    opened by ronggegu 0
  • fix numpy bool

    fix numpy bool

    Type of Change

    bug fix

    Description

    module 'numpy' has no attribute 'bool'

    Expected Behavior & Potential Risk

    model test pass

    How has this PR been tested?

    extension test

    Dependency Change?

    no

    opened by ronggegu 0
  • Enable Tensorflow image recognition saved model examples using new API

    Enable Tensorflow image recognition saved model examples using new API

    Signed-off-by: Lv, Liang1 [email protected]

    Type of Change

    feature API not changed

    Description

    detail description JIRA ticket: ILITV-2598

    Expected Behavior & Potential Risk

    Tensorflow image recognition saved model examples using new API enabled successfully.

    How has this PR been tested?

    Pre-CI and examples test.

    Dependency Change?

    No.

    opened by lvliang-intel 0
  • Fix recommendation and speech_recognition README

    Fix recommendation and speech_recognition README

    Signed-off-by: changwa1 [email protected]

    Type of Change

    improve readme. remove the extra code for dlrm ipex example.

    Description

    detail description JIRA ticket: xxx

    Expected Behavior & Potential Risk

    the expected behavior that triggered by this PR

    How has this PR been tested?

    how to reproduce the test (including hardware information)

    Dependency Change?

    any library dependency introduced or removed

    opened by changwangss 0
  • Doc revision for the newapi example

    Doc revision for the newapi example

    Signed-off-by: XuhuiRen [email protected]

    Type of Change

    Revise the description for the examples

    Description

    https://jira.devtools.intel.com/browse/ILITV-2597

    opened by XuhuiRen 0
Releases(v2.0)
  • v2.0(Dec 30, 2022)

    • Highlights
    • Features
    • Bug Fixes
    • Examples
    • Documentations

    Highlights

    • Support the quantization for Intel® Xeon® Scalable Processors (e.g., Sapphire Rapids), Intel® Data Center GPU Flex Series, and Intel® Max Series CPUs & GPUs
    • Provide the new unified APIs for post-training optimizations (static/dynamic quantization) and during-training optimizations (quantization-aware training, pruning/sparsity, distillation, etc.)
    • Support the advanced fine-grained auto mixed precisions (AMP) upon all the supported precisions (e.g., INT8, BF16, and FP32)
    • Improve the model conversion from PyTorch INT8 model to ONNX INT8 model
    • Support the zero-code quantization in Visual Studio Code and JupyterLab with Neural Coder plugins
    • Support the quantization for 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)

    Features

    • [Quantization] Experimental Keras model in, quantized Keras model out (commit 4fa753)
    • [Quantization] Support quantization for ITEX v1.0 on Intel CPU and Intel GPU (commit a2fcb2)
    • [Quantization] Support hardware-neutral quantized ONNX QDQ models and validate on multiple devices (Intel CPU, NVidia GPU, AMD CPU, and ARM CPU) through ONNX Runtime
    • [Quantization] Enhance TensorFlow QAT: remove TFMOT dependency (commit 1deb7d)
    • [Quantization] Distinguish frameworks, backends and output formats for OnnxRuntime backend (commit 2483a8)
    • [Quantization] Support PyTorch/IPEX 1.13 and TensorFlow 2.11 (commit b7a2ef)
    • [AMP] Support more TensorFlow bf16 ops (commit 98d3c8)
    • [AMP] Add torch.amp bf16 support for IPEX backend (commit 2a361b)
    • [Strategy] Add accuracy-first tuning strategies: MSE_v2 (commit 80311f) and HAWQ (commit 83018e) to solve the accuracy problem of specific models
    • [Strategy] Refine the tuning strategy, add more data type, more op attributes like per tensor/per channel, dynamic/static, …etc
    • [Pruning] Add progressive pruning and pattern lock pruning_type (commit f46bb1)
    • [Pruning] Add per_channel sparse pattern (commit f46bb1)
    • [Distillation] Support self-distillation towards efficient and compact neural networks (commit acdd4c)
    • [Distillation] Enhance API of intermediate layers knowledge distillation (commit 3183f6)
    • [Neural Coder] Detect devices and ISA to adjust the optimization (commit 691d0b)
    • [Neural Coder] Automatically quantize with ONNX Runtime backend (commit f711b4)
    • [Neural Coder] Add Neural Coder Python Launcher (commit 7bb92d)
    • [Neural Coder] Add Visual Studio Plugin (commit dd39ca)
    • [Productivity] Support Pruning in GUI (commit d24fea)
    • [Productivity] Use config-driven API to replace yaml
    • [Productivity] Export ONNX QLinear to QDQ format (commit e996a9)
    • [Productivity] Validate 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)

    Bug Fixes

    • Fix quantization failed of Onnx models with over 2GB model size (commit 8d83cc)
    • Fix bf16 disabled by default (commit 83825a)
    • Fix PyTorch DLRM quantization out of memory (commit ff1725)
    • Fix ITEX resnetv2_50 tuning accuracy (commit ae1e05)
    • Fix bf16 ops error in QAT when torch version < 1.11 (commit eda8cb)
    • Fix the key comparison in the Bayesian strategy (commit 1e9c12)
    • Fix PyTorch T5 can’t do static quantization (commit ee3ef0)

    Examples

    • Add quantization examples of HuggingFace models with OnnxRuntime backend (commit f4aeb5)
    • Add Big language model quantization example: GPT-J (commit 01899d)
    • Add Distributed Distillation examples: MobileNetV2 (commit d33ebe) and CNN-2 (commit ebe9e2)
    • Update examples with INC v2.0 new API
    • Add Stable Diffusion example

    Documentations

    • Update the accuracy of broad hardware (commit 71b056)
    • Refine API helper and documents

    Validated Configurations

    • Centos 8.4 & Ubuntu 20.04
    • Python 3.7, 3.8, 3.9, 3.10
    • TensorFlow 2.9.3, 2.10.1, 2.11.0, ITEX 1.0
    • PyTorch/IPEX 1.11.0+cpu, 1.12.1+cpu, 1.13.0+cpu
    • ONNX Runtime 1.11.0, 1.12.1, 1.13.1
    • MxNet 1.7.0, 1.8.0, 1.9.1
    Source code(tar.gz)
    Source code(zip)
  • v1.14.2(Nov 1, 2022)

    • Highlights
    • Features
    • Bug Fixes
    • Examples

    Highlights

    • We support experimental quantization support for ITEX v1.0 on Intel CPU and GPU, which is the first time to support the quantization on Intel GPU. We support hardware-neutral quantized ONNX models and validate on multiple devices (Intel CPU, NVidia GPU, AMD CPU, and ARM CPU) through ONNX Runtime.

    Features

    • Support quantization support on PyTorch v1.13 (commit 97c946)
    • Support experimental quantization support for ITEX v1.0 on Intel CPU and GPU (commit a2fcb2)
    • Support GUI on native Windows (commit fe9923)
    • Support INT8 model load and save API with IPEX backend (commit 23c585)

    Bug Fixes

    • Fix GPT2 quantization failed with ONNX Runtime backend (commit aea121)

    Examples

    • Support personalized Stable Diffusion with few-shot fine-tuning (commit 4247fd)
    • Add ITEX examples efficientnet_v2_b0, mobilenet_v1, mobilenet_v2, inception_resnet_v2, inception_v3, resnet101, resnet50, vgg16, xception, densenet121....etc. (commit 6ab557)
    • Validate quantized ONNX model on multiple devices (Intel CPU, NVIDIA GPU, AMD CPU, and ARM CPU) (commit 288340)

    Validated Configurations

    • Centos 8.4
    • Python 3.8
    • TensorFlow 2.10, ITEX 1.0
    • PyTorch 1.12.0+cpu, 1.13.0+cpu, IPEX 1.12.0
    • ONNX Runtime 1.12
    • MxNet 1.9
    Source code(tar.gz)
    Source code(zip)
  • v1.14.1(Oct 1, 2022)

    • Bug Fixes
    • Productivity
    • Examples

    Bug Fixes

    • Fix name matching issue of scale and zero-point in PyTorch (commit fd7a53)
    • Fix incorrect output quantization mode of MatMul + Relu fusion in TensorFlow (commit 9b5293)

    Productivity

    • Support Onnx model with Python3.10 (commit 2faf0b)
    • Using TensorFlow create_file_writer API to support histogram of Tensorboard (commit f34852)

    Examples

    • Add NAS notebooks (commit 5f0adf)
    • Add Bert mini 2:4, 1x4 and mixed examples with new Pruning API (commit a52074)
    • Add keras in, saved_model out resnet101, inception_v3, mobilenetv2, xception, resnetv2 examples (commit fdd40e)

    Validated Configurations

    • Python 3.7, 3.8, 3.9, 3.10
    • Centos 8.3 & Ubuntu 18.04 & Win10
    • TensorFlow 2.9, 2.10
    • Intel TensorFlow 2.7, 2.8, 2.9
    • PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
    • IPEX 1.10.0, 1.11.0, 1.12.0
    • MxNet 1.7, 1.9
    • ONNX Runtime 1.10, 1.11, 1.12
    Source code(tar.gz)
    Source code(zip)
  • v1.14(Sep 20, 2022)

    • Highlights
    • New Features
    • Improvements
    • Bug Fixes
    • Productivity
    • Examples

    Highlights We are excited to announce the release of Intel® Neural Compressor v1.14! We release new Pruning API for PyTorch, allowing users select better combinations of criteria, pattern and scheduler to achieve better pruning accuracy. This release also supports Keras input for TensorFlow quantization, and self-distilled quantization for better quantization accuracy.

    New Features

    • Pruning/Sparsity
      • Support new structured sparse patterns N in M and NxM (commit 6cec70)
      • Add pruning criteria snip and snip momentum (commit 6cec70)
      • Add iterative pruning and decay types (commit 6cec70)
    • Quantization
      • Support different Keras formats (h5, keras, keras saved model) as input and output of TensorFlow saved model (commit 5a6f09)
      • Enable Distillation for Quantization (commit 03f1f3 & e20c76)
    • GUI
      • Add mixed precision (commit 26e902)

    Improvement

    • Enhance tuning for Quantization with IPEX 1.12 to remove additional Quant/DeQuant (commit 192100)
    • Add upstream and download API for HuggingFace model hub, which can handle configuration files, tokenizer files and int8 model weights in the format of transformers (commit 46d945)
    • Align with Intel PyTorch extension new API (commit cc368a)
    • Add load with yaml and pt to be compatible with older PyTorch model saving type (commit a28705)

    Bug Fixes

    • Quantization
      • Fix data type of ONNX Runtime quantization from fp64 to fp32 (commit cb7b48)
      • Fix MXNET config issue with default config (commit b75ff2)
    • Export
      • Fix export_to_onnx API (commit 158c7f)

    Productivity

    • Support TensorFlow 2.10.0 (commit d6b6c9 & 8130e7)
    • Support OnnxRuntime 1.12 (commit 498ac4)
    • Export PyTorch QAT to Onnx (commit 029a63)
    • Add Tensorflow and PyTorch container tpp file (commit d245b5)

    Examples

    • Add example of download from HuggingFace model hub and example of upstream models to the hub (commit 46d945)
    • Add notebooks for Neural Coder (commit 105db7)
    • Add 2 IPEX examples: bert_large (squad), distilbert_base (squad) (commit 192100)
    • ADD 2 DDP for prune once for all examples: roberta-base and Bert Base (commit 26a476)

    Validated Configurations

    • Python 3.7, 3.8, 3.9, 3.10
    • Centos 8.3 & Ubuntu 18.04 & Win10
    • TensorFlow 2.9, 2.10
    • Intel TensorFlow 2.7, 2.8, 2.9
    • PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
    • IPEX 1.10.0, 1.11.0, 1.12.0
    • MxNet 1.7, 1.9
    • ONNX Runtime 1.10, 1.11, 1.12
    Source code(tar.gz)
    Source code(zip)
  • v1.13.1(Aug 13, 2022)

    Features

    • Support experimental auto-coding quantization for PyTorch

      • Post-training static and dynamic quantization for PyTorch
      • Post-training static quantization for IPEX
      • Mixed-precision (BF16, INT8, and FP32) for PyTorch
    • Refactor quantization utilities for ONNX Runtime

    Bug fix

    • Fixed model compression orchestration issue caused by PyTorch v1.11
    • Fixed GUI issues

    Validated Configurations

    • Python 3.8
    • Centos 8.4
    • TensorFlow 2.9
    • Intel TensorFlow 2.9
    • PyTorch 1.12.0+cpu
    • IPEX 1.12.0
    • MXNet 1.7.0
    • ONNX Runtime 1.11.0
    Source code(tar.gz)
    Source code(zip)
  • v1.13(Jul 27, 2022)

    Features

    • Quantization

      • Support new quantization APIs for Intel TensorFlow
      • Support FakeQuant (QDQ) quantization format for ITEX
      • Improve INT8 quantization recipes for ONNX Runtime
    • Mixed Precision

      • Enhance mixed precision interface to support BF16 (FP16) mixed with FP32
    • Neural Architecture Search

      • Support SuperNet-based neural architecture search (DyNAS)
    • Sparsity

      • Support training for block-wise structured sparsity
    • Strategy

      • Support operator-type based tuning strategy

    Productivity

    • Support light (default) and full binary packages (default package size 0.5MB, full package size 2MB)
    • Add experimental accuracy diagnostic feature for INT8 quantization including tensor statistics visualization and fine-grained precision setting
    • Add experimental one-click BF16/INT8 low precision enabling & inference optimization, first-ever code-free solution in industry

    Ecosystem

    • Upstream 4 more quantized models (emotion_ferplus, ultraface, arcfase, bidaf) to ONNX Model Zoo
    • Upstream 10 quantized Transformers-based models to HuggingFace Model Hub

    Examples

    • Add notebooks for Quantization on Intel DevCloud, Distillation/Sparsity/Quantization for BERT-Mini SST-2, and Neural Architecture Search (DyNAS)
    • Add more quantization examples from TensorFlow Model Zoo

    Validated Configurations

    • Python 3.8, 3.9, 3.10
    • Centos 8.3 & Ubuntu 18.04 & Win10
    • TensorFlow 2.7, 2.8, 2.9
    • Intel TensorFlow 2.7, 2.8, 2.9
    • PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
    • IPEX 1.10.0, 1.11.0, 1.12.0
    • MxNet 1.6.0, 1.7.0, 1.8.0
    • ONNX Runtime 1.9.0, 1.10.0, 1.11.0
    Source code(tar.gz)
    Source code(zip)
  • v1.12(May 27, 2022)

    Features

    • Quantization

      • Support accuracy-aware AMP (INT8/BF16/FP32) on PyTorch
      • Improve post-training quantization (static & dynamic) on PyTorch
      • Improve post-training quantization on TensorFlow
      • Improve QLinear and QDQ quantization modes on ONNX Runtime
      • Improve accuracy-aware AMP (INT8/FP32) on ONNX Runtime
    • Pruning

      • Improve pruning-once-for-all for NLP models
    • Sparsity

      • Support experimental sparse kernel for reference examples

    Productivity

    • Support model deployment by loading INT8 models directly from HuggingFace model hub
    • Improve GUI with optimized model downloading, performance profiling, etc.

    Ecosystem

    • Highlight simple quantization usage with few clicks on ONNX Model Zoo
    • Upstream INC quantized models (ResNet101, Tiny YoloV3) to ONNX Model Zoo

    Examples

    • Add Bert-mini distillation + quantization notebook example
    • Add DLRM & SSD-ResNet34 quantization examples on IPEX
    • Improve BERT structured sparsity training example

    Validated Configurations

    • Python 3.8, 3.9, 3.10
    • Centos 8.3 & Ubuntu 18.04 & Win10
    • TensorFlow 2.6.2, 2.7, 2.8
    • Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
    • PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
    • IPEX 1.8.0, 1.9.0, 1.10.0
    • MxNet 1.6.0, 1.7.0, 1.8.0
    • ONNX Runtime 1.8.0, 1.9.0, 1.10.0
    Source code(tar.gz)
    Source code(zip)
  • v1.11(Apr 15, 2022)

    Features

    • Quantization
      • Supported QDQ as experimental quantization format for ONNX Runtime
      • Improved FX symbolic tracing for PyTorch
      • Supported multi-metrics for quantization tuning
    • Knowledge distillation
      • Improved distillation algorithm for intermediate layer knowledge transfer
    • Productivity
      • Improved quantization productivity for ONNX Runtime through GUI
      • Improved PyTorch INT8 model save/load methods
    • Ecosystem
      • Upstreamed INC quantized Yolov3, DenseNet, Mask-Rcnn, Yolov4 models to ONNX Model Zoo
      • Became PyTorch ecosystem tool shortly after published PyTorch INC tutorial
    • Examples
      • Added INC quantized ResNet50 v1.5 and BERT-Large model for IPEX
      • Supported dynamic quantization & weight sharing on bare metal reference engine
    Source code(tar.gz)
    Source code(zip)
  • v1.10(Feb 28, 2022)

    Features

    • Quantization
      • Supported the quantization on latest deep learning frameworks
      • Supported the quantization for a new model domain (Audio)
      • Supported the compatible quantization recipes for framework upgrade
    • Pruning & Knowledge distillation
      • Supported fine-tuning and quantization using INC & Optimum for “Prune Once for All: Sparse Pre-Trained Language Models” published at ENLSP NeurIPS Workshop 2021
    • Structured sparsity
      • Proved the sparsity training recipes across multiple model domains (CV, NLP, and Recommendation System)

    Productivity

    • Improved INC GUI for easy quantization
    • Supported Windows OS conda installation

    Ecosystem

    • Upgraded INC v1.9 into HuggingFace Optimum
    • Upsteamed INC quantized mobilenet & faster-rcnn models to ONNX Model Zoo

    Examples

    • Supported quantization on 300 random models
    • Added bare-metal examples for Bert-mini and DLRM

    Validated Configurations

    • Python 3.7, 3.8, 3.9
    • Centos 8.3 & Ubuntu 18.04 & Win10
    • TensorFlow 2.6.2, 2.7, 2.8
    • Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
    • PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
    • IPEX 1.8.0, 1.9.0, 1.10.0
    • MxNet 1.6.0, 1.7.0, 1.8.0
    • ONNX Runtime 1.8.0, 1.9.0, 1.10.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.9(Jan 4, 2022)

    Features

    • Knowledge distillation

      • Supported one-shot compression pipelines (knowledge distillation during quantization-aware training) on PyTorch
      • Added more distillation examples on TensorFlow and PyTorch
    • Quantization

      • Supported multi-objective tuning for quantization
      • Supported Intel Extension for PyTorch v1.10 version
      • Improved quantization-aware training support on PyTorch v1.10
    • Pruning

      • Added more magnitude pruning examples on TensorFlow
    • Reference bara-metal examples

      • Supported BF16 optimizations on NLP models
      • Added sparse DLRM model (experimental)
    • Productivity

      • Added Python favorable API (alternative to YAML configuration file)
      • Improved user facing APIs more pythonic
    • Ecosystem

      • Integrated pruning API into HuggingFace Optimum
      • Added ssd-mobilenetv1, efficientnet, ssd, fcn_rn50, inception_v1 quantized models to ONNX Model Zoo

    Validated Configurations

    • Python 3.7 & 3.8 & 3.9
    • Centos 8.3 & Ubuntu 18.04
    • TensorFlow 2.6.2 & 2.7
    • Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
    • PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
    • MxNet 1.6.0, 1.7.0, 1.8.0
    • ONNX Runtime 1.6.0, 1.7.0, 1.8.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.8.1(Dec 10, 2021)

    Features

    Validated Configurations

    • Python 3.6 & 3.7 & 3.8 & 3.9
    • Centos 8.3 & Ubuntu 18.04
    • TensorFlow 2.6.2 & 2.7
    • Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
    • PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
    • MxNet 1.6.0, 1.7.0, 1.8.0
    • ONNX Runtime 1.6.0, 1.7.0, 1.8.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.8(Nov 22, 2021)

    Features

    • Knowledge distillation
      • Implemented the algorithms of paper “Pruning Once For All” accepted by NeurIPS 2021 ENLSP workshop
      • Supported optimization pipelines (knowledge distillation & quantization-aware training) on PyTorch
    • Quantization
      • Added the support of ONNX RT 1.7
      • Added the support of TensorFlow 2.6.2 and 2.7
      • Added the support of PyTorch 1.10
    • Pruning
      • Supported magnitude pruning on TensorFlow
    • Acceleration library
      • Supported Hugging Face top 10 downloaded NLP models

    Productivity

    • Added performance profiling feature to INC UI service.
    • Improved ease-of-use user interface for quantization with few clicks

    Ecosystem

    • Added notebook of using HuggingFace optimization library (Optimum) to Transformers
    • Enabled top 20 downloaded Hugging Face NLP models with Optimum
    • Upstreamed more INC quantized models to ONNX Model Zoo

    Validated Configurations

    • Python 3.6 & 3.7 & 3.8 & 3.9
    • Centos 8.3 & Ubuntu 18.04
    • TensorFlow 2.6.2 & 2.7
    • Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
    • PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
    • MxNet 1.6.0, 1.7.0, 1.8.0
    • ONNX Runtime 1.6.0, 1.7.0, 1.8.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.7.1(Oct 24, 2021)

    Intel® Neural Compressor(formerly known as Intel® Low Precision Optimization Tool) v1.7 release is featured by:

    Features

    • Acceleration library
      • Support unified buffer memory allocation policy

    Ecosystem

    • Upstreamed INC quantized models (alexnet/caffenet/googlenet/squeezenet) to ONNX Model Zoo

    Documentation

    • Performance and accuracy data update

    Validated Configurations

    • Python 3.6 & 3.7 & 3.8 & 3.9
    • Centos 8.3 & Ubuntu 18.04
    • TensorFlow 2.6.0
    • Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
    • PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
    • MxNet 1.6.0, 1.7.0, 1.8.0
    • ONNX Runtime 1.6.0, 1.7.0, 1.8.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel

    Contact:

    Please feel free to contact INC Maintainers, if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.7(Oct 1, 2021)

    Intel® Neural Compressor(formerly known as Intel® Low Precision Optimization Tool) v1.7 release is featured by:

    Features

    • Quantization
      • Improved quantization accuracy in SSD-Reset34 and MobileNet v3 on TensorFlow
    • Pruning
      • Supported magnitude pruning on TensorFlow
    • Knowledge distillation
      • Supported knowledge distillation on PyTorch
    • Multi-node support
      • Supported multi-node pruning with distributed dataloader on PyTorch
      • Supported multi-node inference for benchmark on PyTorch
    • Acceleration library
      • Added a domain-specific acceleration library for NLP models

    Productivity

    • Supported the configuration-free (pure Python) quantization
    • Improved ease-of-use user interface for quantization with few clicks

    Ecosystem

    • Integrated into HuggingFace optimization library (Optimum)
    • Upstreamed INC quantized models (RN50, VGG16) to ONNX Model Zoo

    Documentation

    • Add tutorial and examples for knowledge distillation
    • Add tutorial and examples for multi-node training
    • Add tutorial and examples for acceleration library

    Validated Configurations

    • Python 3.6 & 3.7 & 3.8 & 3.9
    • Centos 8.3 & Ubuntu 18.04
    • TensorFlow 2.6.0
    • Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
    • PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
    • MxNet 1.6.0, 1.7.0, 1.8.0
    • ONNX Runtime 1.6.0, 1.7.0, 1.8.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/neural-compressor.git | $ git clone https://github.com/intel/neural-compressor.git Binary | Pip | https://pypi.org/project/neural-compressor | $ pip install neural-compressor Binary | Conda | https://anaconda.org/intel/neural-compressor | $ conda install neural-compressor -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.6(Aug 20, 2021)

    Intel® Low Precision Optimization Tool v1.6 release is featured by:

    Pruning:

    • Support pruning and post-training quantization pipeline on PyTorch
    • Support pruning during quantization-aware training on PyTorch

    Quantization:

    • Support post-training quantization on TensorFlow 2.6.0, PyTorch 1.9.0, IPEX 1.8.0, and MXNet 1.8.0
    • Support quantization-aware training on TensorFlow 2.x (Keras API)

    User Experience:

    • Improve quantization productivity with new UI
    • Support quantized model recovery from tuning history

    New Models:

    • Support ResNet50 on ONNX model zoo

    Documentation:

    • Add pruned models
    • Add quantized MLPerf models

    Validated Configurations:

    • Python 3.6 & 3.7 & 3.8 & 3.9
    • Centos 8.3 & Ubuntu 18.04
    • TensorFlow 2.6.0
    • Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
    • PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
    • MxNet 1.6.0, 1.7.0, 1.8.0
    • ONNX Runtime 1.6.0, 1.7.0, 1.8.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.5.1(Jul 25, 2021)

    Intel® Low Precision Optimization Tool v1.5.1 release is featured by:

    • Gradient-sensitivity pruning for CNN model
    • Static quantization support for ONNX NLP model
    • Dynamic seq length support in NLP dataloader
    • Enrich quantization statistics

    Validated Configurations:

    • Python 3.6 & 3.7 & 3.8 & 3.9
    • Centos 8.3 & Ubuntu 18.04
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0 and 1.15.0 UP1 & UP2 & UP3
    • PyTorch 1.5.0+cpu, 1.6.0+cpu, 1.8.0+cpu, ipex
    • MxNet 1.6.0, 1.7.0
    • ONNX Runtime 1.6.0, 1.7.0, 1.8.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.5(Jul 12, 2021)

    Intel® Low Precision Optimization Tool v1.5 release is featured by:

    • Add pattern-lock sparsity algorithm for NLP fine-tuning tasks
      • Up to 70% unstructured sparsity and 50% structured sparsity with <2% accuracy loss on 5 Bert finetuning tasks
    • Add NLP head pruning algorithm for HuggingFace models
      • Performance speedup up to 3.0X within 1.5% accuracy loss on HuggingFace BERT SST-2
    • Support model optimization pipeline
    • Integrate SigOPT with multi-metrics optimization
      • Complementary as basic strategy to speed up the tuning
    • Support TensorFlow 2.5, PyTorch 1.8, and ONNX Runtime 1.8

    Validated Configurations:

    • Python 3.6 & 3.7 & 3.8 & 3.9
    • Centos 8.3 & Ubuntu 18.04
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0 and 1.15.0 UP1 & UP2 & UP3
    • PyTorch 1.5.0+cpu, 1.6.0+cpu, 1.8.0+cpu, ipex
    • MxNet 1.6.0, 1.7.0
    • ONNX Runtime 1.6.0, 1.7.0, 1.8.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.4.1(Jun 25, 2021)

    Intel® Low Precision Optimization Tool v1.4.1 release is featured by:

    1. Support TensorFlow 2.5.0
    2. Support PyTorch 1.8.0
    3. Support TensorFlow Object Detection YOLO-V3 model

    Validated Configurations:

    • Python 3.6 & 3.7 & 3.8
    • Centos 7 & Ubuntu 18.04
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0 and 1.15.0 UP1 & UP2
    • PyTorch 1.5.0+cpu, 1.6.0+cpu, ipex
    • MxNet 1.7.0
    • ONNX Runtime 1.6.0, 1.7.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.4(May 30, 2021)

    Intel® Low Precision Optimization Tool v1.4 release is featured by:

    Quantization

    1. PyTorch FX-based quantization support
    2. TensorFlow & ONNX RT quantization enhancement

    Pruning

    1. Pruning/sparsity API refinement
    2. Magnitude-based pruning on PyTorch

    Model Zoo

    1. INT8 key models updated (BERT on TensorFlow, DLRM on PyTorch, etc.)
    2. 20+ HuggingFace model quantization

    User Experience

    1. More comprehensive logging message
    2. UI enhancement with FP32 optimization, auto-mixed precision (BF16/FP32), and graph visualization
    3. Online document: https://intel.github.io/lpot

    Extended Capabilities

    1. Model conversion from QAT to Intel Optimized TensorFlow model

    Validated Configurations:

    • Python 3.6 & 3.7 & 3.8
    • Centos 7 & Ubuntu 18.04
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0, 2.4.0 and 1.15.0 UP1 & UP2
    • PyTorch 1.5.0+cpu, 1.6.0+cpu, ipex
    • MxNet 1.7.0
    • ONNX Runtime 1.6.0, 1.7.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.3.1(May 11, 2021)

    Intel® Low Precision Optimization Tool v1.3 release is featured by:

    1. Improve graph optimization without explicit input/output setting

    Validated Configurations:

    • Python 3.6 & 3.7 & 3.8
    • Centos 7 & Ubuntu 18.04
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0, 2.4.0 and 1.15.0 UP1 & UP2
    • PyTorch 1.5.0+cpu, 1.6.0+cpu, ipex
    • MxNet 1.7.0
    • ONNX Runtime 1.6.0, 1.7.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.3(Apr 16, 2021)

    Intel® Low Precision Optimization Tool v1.3 release is featured by:

    1. FP32 optimization & auto-mixed precision (BF16/FP32) for TensorFlow
    2. Dynamic quantization support for PyTorch
    3. ONNX Runtime v1.7 support
    4. Configurable benchmarking support (multi-instances, warmup, etc.)
    5. Multiple batch size calibration & mAP metrics for object detection models
    6. Experimental user facing APIs for better usability
    7. Various HuggingFace models support

    Validated Configurations:

    • Python 3.6 & 3.7 & 3.8
    • Centos 7 & Ubuntu 18.04
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0, 2.4.0 and 1.15.0 UP1 & UP2
    • PyTorch 1.5.0+cpu, 1.6.0+cpu, ipex
    • MxNet 1.7.0
    • ONNX Runtime 1.6.0, 1.7.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Apr 2, 2021)

    Intel® Low Precision Optimization Tool v1.2.1 release is featured by:

    1. user-facing APIs backward compatibility with v1.1 and v1.0.
    2. refined experimental user-facing APIs for better out-of-box experience.

    Validated Configurations:

    • Python 3.6 & 3.7 & 3.8
    • Centos 7 & Ubuntu 18.04
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0, 2.4.0 and 1.15.0 UP1 & UP2
    • PyTorch 1.5.0+cpu, 1.6.0+cpu, ipex
    • MxNet 1.7.0
    • ONNX Runtime 1.6.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.2(Mar 12, 2021)

    Intel® Low Precision Optimization Tool v1.2 release is featured by:

    • Broad TensorFlow model type support
    • operator-wise quantization scheme for ONNX RT
    • MSE driven tuning for metric-free use cases
    • UX improvement, including UI web server preview support
    • More key model supports

    Validated Configurations:

    • Python 3.6 & 3.7 & 3.8
    • Centos 7 & Ubuntu 18.04
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0, 2.4.0 and 1.15.0 UP1 & UP2
    • PyTorch 1.5.0+cpu, 1.6.0+cpu, ipex
    • MxNet 1.7.0
    • ONNX Runtime 1.6.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.1(Dec 31, 2020)

    Intel® Low Precision Optimization Tool v1.1 release is featured by:

    • New backends (PyTorch/IPEX, ONNX Runtime) backend preview support
    • Add built-in industry dataset/metric and custom registration
    • Preliminary input/output node auto-detection on TensorFlow models
    • New INT8 quantization recipes: bias correction and label balance

    Validated Configurations:

    • Python 3.6 & 3.7
    • Centos 7
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0 and 1.15.0 UP1 & UP2
    • PyTorch 1.5.0+cpu
    • MxNet 1.7.0
    • ONNX Runtime 1.6.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lpot.git | $ git clone https://github.com/intel/lpot.git Binary | Pip | https://pypi.org/project/lpot | $ pip install lpot Binary | Conda | https://anaconda.org/intel/lpot | $ conda install lpot -c conda-forge -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.0(Oct 30, 2020)

    Intel® Low Precision Optimization Tool v1.0 release is featured by:

    • Refined user facing APIs for best OOB.
    • Add TPE tuning strategies (Experimental).
    • Pruning POC support on PyTorch
    • TensorBoard POC support for tuning analysis.
    • Built-in INT8/Dummy dataloader Support.
    • Built-in Benchmarking support.
    • Tuning history for strategy finetune.
    • Support TF Keras and checkpoint model type as input.

    Validated Configurations:

    • Python 3.6 & 3.7
    • Centos 7
    • Intel TensorFlow 1.15.2, 2.1.0, 2.2.0, 2.3.0 and 1.15UP1
    • PyTorch 1.5.0+cpu
    • MxNet 1.7.0

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lp-opt-tool.git | $ git clone https://github.com/intel/lp-opt-tool.git Binary | Pip | https://pypi.org/project/ilit | $ pip install ilit Binary | Conda | https://anaconda.org/intel/ilit | $ conda install ilit -c intel

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.0b(Aug 31, 2020)

    Intel® Low Precision Optimization Tool v1.0 beta release is featured by:

    • Built-in dataloaders and evaluators
    • Add random and exhaustive tuning strategies
    • Mix precision tuning support on TensorFlow (INT8/BF16/FP32)
    • Quantization-aware training POC support on Pytorch
    • TensorFlow mainstream version support, including 1.15.2, 1.15UP1 and 2.1.0
    • 50+ models validated

    Supported Models:

    | TensorFlow Model | Category | |---------------------------------------------------------------------|------------| |ResNet50 V1 | Image Recognition | |ResNet50 V1.5 | Image Recognition | |ResNet101 | Image Recognition | |Inception V1 | Image Recognition | |Inception V2 | Image Recognition | |Inception V3 | Image Recognition | |Inception V4 | Image Recognition | |ResNetV2_50 | Image Recognition | |ResNetV2_101 | Image Recognition | |ResNetV2_152 | Image Recognition | |Inception ResNet V2| Image Recognition | |SSD ResNet50 V1 | Object Detection | |Wide & Deep | Recommendation | |VGG16 | Image Recognition | |VGG19 | Image Recognition | |Style_transfer | Style Transfer |

    | PyTorch Model | Category | |---------------------------------------------------------------------|------------| |BERT-Large RTE | Language Translation | |BERT-Large QNLI | Language Translation | |BERT-Large CoLA | Language Translation | |BERT-Base SST-2 | Language Translation | |BERT-Base RTE | Language Translation | |BERT-Base STS-B | Language Translation | |BERT-Base CoLA | Language Translation | |BERT-Base MRPC | Language Translation | |DLRM | Recommendation | |BERT-Large MRPC | Language Translation | |ResNext101_32x8d | Image Recognition | |BERT-Large SQUAD | Language Translation | |ResNet50 V1.5 | Image Recognition | |ResNet18 | Image Recognition | |Inception V3 | Image Recognition | |YOLO V3 | Object Detection | |Peleenet | Image Recognition | |ResNest50 | Image Recognition | |SE_ResNext50_32x4d | Image Recognition | |ResNet50 V1.5 QAT | Image Recognition | |ResNet18 QAT | Image Recognition |

    | MxNet Model | Category | |---------------------------------------------------------------------|------------| |ResNet50 V1 | Image Recognition | |MobileNet V1 | Image Recognition | |MobileNet V2 | Image Recognition | |SSD-ResNet50 | Object Detection | |SqueezeNet V1 | Image Recognition | |ResNet18 | Image Recognition | |Inception V3 | Image Recognition |

    Known Issues:

    • TensorFlow ResNet50 v1.5 int8 model will crash on TensorFlow 1.15 UP1 branch

    Validated Configurations:

    • Python 3.6 & 3.7
    • Centos 7
    • Intel TensorFlow 1.15.2, 2.1.0 and 1.15UP1
    • PyTorch 1.5
    • MxNet 1.6

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lp-opt-tool.git | $ git clone https://github.com/intel/lp-opt-tool.git Binary | Pip | https://pypi.org/project/ilit | $ pip install ilit Binary | Conda | https://anaconda.org/intel/ilit | $ conda config --add channels intel $ conda install ilit

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
  • v1.0a(Aug 11, 2020)

    Intel® Low Precision Optimization Tool (iLiT) is an open-sourced python library which is intended to deliver a unified low-precision inference solution cross multiple Intel optimized DL frameworks on both CPU and GPU. It supports automatic accuracy-driven tuning strategies, along with additional objectives like performance, model size, or memory footprint. It also provides the easy extension capability for new backends, tuning strategies, metrics and objectives.

    Feature List:

    • Unified low precision quantization interface cross multiple Intel optimized frameworks (TensorFlow, PyTorch, and MXNet)
    • Built-in tuning strategies, including Basic, Bayesian, and MSE
    • Built-in evaluation metrics, including TopK (image classification), F1 (NLP), and CocoMAP (object detection)
    • Built-in tuning objectives, including Performance, ModelSize, and Footprint
    • Extensible API design to add new strategy, framework backend, metric, and objective
    • KL-divergence calibration for TensorFlow and MXNet
    • Tuning process resume from certain checkpoint

    Supported Models:

    Model | Framework | Model | Framework | Model | Framework -- | -- | -- | -- | -- | -- ResNet50 V1 | MXNet | BERT-Large RTE | PyTorch | ResNet18 | PyTorch MobileNet V1 | MXNet | BERT-Large QNLI | PyTorch | ResNet50 V1 | TensorFlow MobileNet V2 | MXNet | BERT-Large CoLA | PyTorch | ResNet50 V1.5 | TensorFlow SSD-ResNet50 | MXNet | BERT-Base SST-2 | PyTorch | ResNet101 | TensorFlow SqueezeNet V1 | MXNet | BERT-Base RTE | PyTorch | Inception V1 | TensorFlow ResNet18 | MXNet | BERT-Base STS-B | PyTorch | Inception V2 | TensorFlow Inception V3 | MXNet | BERT-Base CoLA | PyTorch | Inception V3 | TensorFlow DLRM | PyTorch | BERT-Base MRPC | PyTorch | Inception V4 | TensorFlow BERT-Large MRPC | PyTorch | ResNet101 | PyTorch | Inception ResNet V2 | TensorFlow BERT-Large SQUAD | PyTorch | ResNet50 V1.5 | PyTorch | SSD ResNet50 V1 | TensorFlow

    Known Issues:

    • Statistics collection for KL algorithm is slow in TensorFlow due to lack of tensor inspector APIs
    • MSE tuning strategy is not supported in PyTorch

    Validated Configurations:

    • Python 3.6 & 3.7
    • Centos 7
    • TensorFlow 1.15, 2.0 and 2.1
    • PyTorch 1.5
    • MxNet 1.6

    Distribution:

      | Channel | Links | Install Command -- | -- | -- | -- Source | Github | https://github.com/intel/lp-opt-tool.git | $ git clone https://github.com/intel/lp-opt-tool.git Binary | Pip | https://pypi.org/project/ilit | $ pip install ilit Binary | Conda | https://anaconda.org/intel/ilit | $ conda config --add channels intel $ conda install ilit

    Contact:

    Please feel free to contact [email protected], if you get any questions.

    Source code(tar.gz)
    Source code(zip)
Owner
Intel Corporation
Intel Corporation
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Dec 31, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Jan 6, 2023
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.3k Feb 12, 2021
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 419 Jan 3, 2023
🔮 Execution time predictions for deep neural network training iterations across different GPUs.

Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training Habitat is a tool that predicts a deep neural network's

Geoffrey Yu 44 Dec 27, 2022
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 7, 2023
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 647 Jan 4, 2023
Build and run Docker containers leveraging NVIDIA GPUs

NVIDIA Container Toolkit Introduction The NVIDIA Container Toolkit allows users to build and run GPU accelerated Docker containers. The toolkit includ

NVIDIA Corporation 15.6k Jan 1, 2023
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022
Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

null 69 Oct 19, 2022
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 2, 2022
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

DISCONTINUATION OF PROJECT. This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this

Nervana 3.9k Dec 20, 2022
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

DISCONTINUATION OF PROJECT. This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this

Nervana 3.9k Feb 9, 2021
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

BUPT GAMMA Lab 519 Jan 2, 2023
This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

DeepMind 892 Dec 28, 2022
TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline

null 193 Dec 22, 2022
PyZebrascope - an open-source Python platform for brain-wide neural activity imaging in behaving zebrafish

PyZebrascope - an open-source Python platform for brain-wide neural activity imaging in behaving zebrafish

null 1 May 31, 2022
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022