Easy-to-use library to boost AI inference leveraging state-of-the-art optimization techniques.

Overview

NEW RELEASE

How Nebullvm WorksTutorialsBenchmarksInstallationGet StartedOptimization Examples

Discord | Website | LinkedIn | Twitter

Nebullvm

nebullvm speeds up AI inference by 2-30x in just a few lines of code 🚀

How Nebullvm Works

This open-source library takes your AI model as input and outputs an optimized version that runs 2-30 times faster on your hardware. Nebullvm tests multiple optimization techniques (deep learning compilers, quantization, sparsity, distillation, and more) to identify the optimal way to execute your AI model on your specific hardware. The library can speed up your model 2 to 10 times without loss of performance, or up to 30 times if you specify that you are willing to trade off a self-defined amount of accuracy/precision for a super-low latency and a lighter model.

The goal of nebullvm is to let any developer benefit from the most advanced inference optimization techniques without having to spend countless hours understanding, installing, testing and debugging these powerful technologies.

The library aims to be:

☘️  Easy-to-use. It takes a few lines of code to install the library and optimize your models.

🔥  Framework agnosticnebullvm supports the most widely used frameworks (PyTorch, TensorFlow, ONNX and Hugging Face, etc.) and provides as output an optimized version of your model with the same interface (PyTorch, TensorFlow, etc.).

💻  Deep learning model agnostic. nebullvm supports all the most popular architectures such as transformers, LSTMs, CNNs and FCNs.

🤖  Hardware agnostic. The library now works on most CPUs and GPUs and will soon support TPUs and other deep learning-specific ASICs.

🔑  Secure. Everything runs locally on your machine.

 Leveraging the best optimization techniques. There are many inference optimization techniques such as deep learning compilers, quantization, half precision or distillation, which are all meant to optimize the way your AI models run on your hardware. It would take a developer countless hours to install and test them on every model deployment. The library does that for you.

Do you like the concept? Leave a  if you enjoy the project and join the Discord community where we chat about nebullvm and AI optimization. And happy acceleration  🚀 🚀

Benchmarks

We have tested nebullvm on popular AI models and hardware from leading vendors.

The table below shows the inference speedup provided by nebullvm. The speedup is calculated as the response time of the unoptimized model divided by the response time of the accelerated model, as an average over 100 experiments. As an example, if the response time of an unoptimized model was on average 600 milliseconds and after nebullvm optimization only 240 milliseconds, the resulting speedup is 2.5x times, meaning 150% faster inference.

A complete overview of the experiment and findings can be found on this page.

M1 Pro Intel Xeon AMD EPYC Nvidia T4
EfficientNetB0 23.3x 3.5x 2.7x 1.3x
EfficientNetB2 19.6x 2.8x 1.5x 2.7x
EfficientNetB6 19.8x 2.4x 2.5x 1.7x
Resnet18 1.2x 1.9x 1.7x 7.3x
Resnet152 1.3x 2.1x 1.5x 2.5x
SqueezeNet 1.9x 2.7x 2.0x 1.3x
Convnext tiny 3.2x 1.3x 1.8x 5.0x
Convnext large 3.2x 1.1x 1.6x 4.6x
GPT2 - 10 tokens 2.8x 3.2x 2.8x 3.8x
GPT2 - 1024 tokens - 1.7x 1.9x 1.4x
Bert - 8 tokens 6.4x 2.9x 4.8x 4.1x
Bert - 512 tokens 1.8x 1.3x 1.6x 3.1x
____________________ ____________ ____________ ____________ ____________

Overall, the library provides great results, with more than 2x acceleration in most cases and around 20x in a few applications. We can also observe that acceleration varies greatly across different hardware-model couplings, so we suggest you test nebullvm on your model and hardware to assess its full potential. You can find the instructions below.

Besides, across all scenarios, nebullvm is very helpful for its ease of use, allowing you to take advantage of inference optimization techniques without having to spend hours studying, testing and debugging these technologies.

Tutorials

We suggest testing the library on your AI models right away by following the installation instructions below. If you want to get a first feel of the library's capabilities or take a look at how nebullvm can be readily implemented in an AI workflow, we have built 3 tutorials and notebooks where the library can be tested on the most popular AI frameworks TensorFlow, PyTorch and Hugging Face.

  • Notebook: Accelerate fast.ai's Resnet34 with nebullvm
  • Notebook: Accelerate PyTorch YOLO with nebullvm
  • Notebook: Accelerate Hugging Face's GPT2 and BERT with nebullvm

Installation

[Click to expand] Step 1: Installation of nebullvm library

There are two ways to install nebullvm:

  1. Using PyPI. We suggest installing the library with pip to get the stable version of nebullvm
  2. From source code to get the latest features

Option 1A: Installation with PyPI (recommended)

The easiest way to install nebullvm is by using pip, running

pip install nebullvm

Option 1B: Source code installation

To install the source code you have to clone the directory on your local machine using git.

git clone https://github.com/nebuly-ai/nebullvm.git

Then, enter the repo and install nebullvm with pip.

cd nebullvm
pip install .
[Click to expand] Step 2: Installation of deep learning compilers

Now you need to install the compilers that the library leverages to create the optimized version of your models. We have built an auto-installer to install them automatically.

Option 2A: Installation at the first optimization run

The auto-installer is activated after you import nebullvm and perform your first optimization. You may run into import errors related to the deep learning compiler installation, but you can ignore these errors/warnings. It is also recommended restarting the python kernel between the auto-installation and the first optimization, otherwise not all compilers will be activated.

Option B: Installation before the first optimization run (recommended)

To avoid any problems, we strongly recommend running the auto-installation before performing the first optimization by running

python -c "import nebullvm"

You should ignore at this stage any import warning resulting from the previous command.

Option 2C: Selective installation of deep learning compilers

The library automatically installs all deep learning compilers it supports. In case you would be interested in bypassing the automatic installation, you can export the environment variable NO_COMPILER_INSTALLATION=1 by running

export NO_COMPILER_INSTALLATION=1

from your command line or adding

import os
os.environ["NO_COMPILER_INSTALLATION"] = "1"

in your python code before importing nebullvm for the first time.

Note that auto-installation of open-source compilers is done outside the nebullvm wheel. Installations of ApacheTVM and Openvino have been tested on macOS, linux distributions similar to Debian and CentOS.

The feature is still in an alpha version, so we expect that it may fail under untested circumstances.

Step 2-bis: Install TVM

Since the TVM compiler has to be installed from source code, its installation can take several minutes, or even hours, to complete. For this reason, we decided not to include it in the default automatic installer. However, if you want to squeeze the most performance out of your model on your machine, we highly recommend installing TVM as well. With nebullvm, installing TVM becomes very easy, just run

python -c "from nebullvm.installers.installers import install_tvm; install_tvm()"

and wait for the compiler to be installed! You can check that everything worked running

python -c "from tvm.runtime import Module"
[Click to expand] Possible installation issues

MacOS: the installation may fail on MacOS for MacBooks with the Apple Silicon chip, due to scipy compilation errors. The easy fix is to install scipy with another package manager such as conda (the Apple Silicon distribution of Mini-conda) and then install nebullvm. For any additional issues do not hesitate to open an issue or contact directly [email protected] by email.

Get Started

Nebullvm reduces the computation time of deep learning model inference by 2-30 times by testing multiple optimization techniques (deep learning compilers, quantization, half precision, distillation, and more) and identifying the optimal way to execute your AI model on your specific hardware.

Nebullvm can be deployed in two ways.

Option A: 2-10x acceleration, NO performance loss

If you choose this option, nebullvm will test multiple deep learning compilers (TensorRT, OpenVINO, ONNX Runtime, etc.) and identify the optimal way to compile your model on your hardware, increasing inference speed by 2-10 times without affecting the performance of your model.

Option B: 2-30x acceleration, supervised performance loss

Nebullvm is capable of speeding up inference by much more than 10 times in case you are willing to sacrifice a fraction of your model's performance. If you specify how much performance loss you are willing to sustain, nebullvm will push your model's response time to its limits by identifying the best possible blend of state-of-the-art inference optimization techniques, such as deep learning compilers, distillation, quantization, half precision, sparsity, etc.

Performance monitoring is accomplished using the perf_loss_ths (performance loss threshold), and the perf_metric for performance estimation.

Option B.1

When a predefined metric (e.g. “accuracy”) or a custom metric is passed as the perf_metric argument, the value of perf_loss_ths will be used as the maximum acceptable loss for the given metric evaluated on your datasets.

Options B.2 and B.3

When no perf_metric is provided as input, nebullvm calculates the performance loss using the default precision function. If the dataset is provided, the precision will be calculated on 100 sampled data (option B.2). Otherwise, the data will be randomly generated from the metadata provided as input, i.e. input_sizes and batch_size (option B.3).

[Click to expand] Options B.2 and B.3: Impact of perf_loss_ths on precision

The table below shows the impact of perf_loss_ths on the default metric "precision".

perf_loss_ths Expected behavior with the default “precision” metric
None or 0 No precision-reduction technique (distillation, quantization, half precision, sparsity, etc.) will be applied, as per Option A.
1 Nebullvm will accept the outcome of precision-reduction techniques only if the relative change of the smallest output logit is smaller than 1. This is usually correlated with a marginal drop in precision.
2 Nebullvm will accept a "riskier" output from precision-reduction techniques to achieve increased inference speed. This can usually have an impact on the accuracy of ~0.1%.
≥3 Aggressive precision reduction techniques are used to produce the lightest and fastest model possible. Accuracy drops depend on both model type and task type. A simple binary classification can still show accuracy drops around ~0.1%.

Optimization examples

[Click to expand] Optimization with PyTorch Here we present an example of optimizing a `pytorch` model with `nebullvm`:
>>> # FOR EACH OPTION
>>> import torch
>>> import torchvision.models as models
>>> from nebullvm import optimize_torch_model
>>> model = models.efficientnet_b0()
>>> save_dir = "."
>>>
>>> # ONLY FOR OPTION A 
>>> bs, input_sizes = 1, [(3, 256, 256)]
>>> optimized_model = optimize_torch_model(
... model, batch_size=bs, input_sizes=input_sizes, save_dir=save_dir
... )
>>>
>>> # ONLY FOR OPTION B.1
>>> dl = [((torch.randn(1, 3, 256, 256), ), 0)]
>>> perf_loss_ths = 0.1  # We can accept a drop in the loss function up to 10%
>>> optimized_model = optimize_torch_model(
... model, dataloader=dl, save_dir=save_dir, perf_loss_ths=perf_loss_ths, perf_metric="accuracy", 
... )
>>>
>>> # ONLY FOR OPTION B.2
>>> dl = [((torch.randn(1, 3, 256, 256), ), 0)]
>>> perf_loss_ths = 2  # Relative error on the smallest logits accepted
>>> optimized_model = optimize_torch_model(
... model, dataloader=dl, save_dir=save_dir, perf_loss_ths=perf_loss_ths, 
... )
>>>
>>> # ONLY FOR OPTION B.3
>>> perf_loss_ths = 2  # Relative error on the smallest logits accepted
>>> bs, input_sizes = 1, [(3, 256, 256)]
>>> optimized_model = optimize_torch_model(
... model, batch_size=bs, input_sizes=input_sizes, save_dir=save_dir, perf_loss_ths=perf_loss_ths, 
... )
>>>
>>> # FOR EACH OPTION
>>> x = torch.randn(bs, 3, 256, 256)
>>> res = optimized_model(x)

In the example above for options B.1 and B.2 we provided a dataset containing a single tuple (xs, y) where xs itself is a tuple containing all the inputs needed for the model. Note that for nebullvm input dataset should be in the format Sequence[Tuple[Tuple[Tensor, ...], TensorOrNone]]. The torch API also accept dataloaders as inputs, however the dataloader should return each batch as a tuple (xs, y) as described before.

[Click to expand] Optimization with TensorFlow
>>> # FOR EACH OPTION
>>> import tensorflow as tf 
>>> from tensorflow.keras.applications.resnet50 import ResNet50
>>> from nebullvm import optimize_tf_model
>>> model = ResNet50()
>>> save_dir = "."
>>>
>>> # ONLY FOR OPTION A
>>> bs, input_sizes = 1, [(224, 224, 3)]
>>> optimized_model = optimize_tf_model(
... model, batch_size=bs, input_sizes=input_sizes, save_dir=save_dir
... )
>>>
>>> # ONLY FOR OPTION B.1
>>> input_data = [((tf.random_normal_inizializer()(shape=(1, 224, 224, 3)), ), 0)]
>>> perf_loss_ths = 0.1  # We can accept a drop in the loss function up to 10%
>>> optimized_model = optimize_tf_model(
... model, dataset=input_data, save_dir=save_dir, perf_loss_ths=perf_loss_ths, perf_metric="accuracy", 
... )
>>>
>>> # ONLY FOR OPTION B.2
>>> input_data = [((tf.random_normal_inizializer()(shape=(1, 224, 224, 3)), ), 0)]
>>> perf_loss_ths = 2  # Relative error on the smallest logits accepted
>>> optimized_model = optimize_tf_model(
... model, dataset=input_data, save_dir=save_dir, perf_loss_ths=perf_loss_ths, 
... )
>>>
>>> # ONLY FOR OPTION B.3
>>> perf_loss_ths = 2  # Relative error on the smallest logits accepted
>>> bs, input_sizes = 1, [(224, 224, 3)]
>>> optimized_model = optimize_tf_model(
... model, batch_size=bs, input_sizes=input_sizes, save_dir=save_dir, perf_loss_ths=perf_loss_ths, 
... )
>>>
>>> # FOR EACH OPTION
>>> res = optimized_model(*optimized_model.get_inputs_example())
[Click to expand] Optimization with ONNX
>>> # FOR EACH OPTION
>>> from nebullvm import optimize_torch_model
>>> import numpy as np
>>> model_path = "path-to-onnx-model"
>>> save_dir = "."
>>>
>>> # ONLY FOR OPTION A
>>> bs, input_sizes = 1, [(3, 256, 256)]
>>> optimized_model = optimize_onnx_model(
... model_path, batch_size=bs, input_sizes=input_sizes, save_dir=save_dir
... )
>>>
>>> # ONLY FOR OPTION B.1
>>> data = [((np.random.randn(1, 3, 256, 256).astype(np.float32), ), 0)]
>>> perf_loss_ths = 0.1  # We can accept a drop in the loss function up to 10%
>>> optimized_model = optimize_onnx_model(
... model_path, data=data, save_dir=save_dir, perf_loss_ths=perf_loss_ths, perf_metric="accuracy", 
... )
>>>
>>> # ONLY FOR OPTION B.2
>>> data = [((np.random.randn(1, 3, 256, 256).astype(np.float32), ), 0)]
>>> perf_loss_ths = 2  # Relative error on the smallest logits accepted
>>> optimized_model = optimize_onnx_model(
... model_path, data=data, save_dir=save_dir, perf_loss_ths=perf_loss_ths, 
... )
>>>
>>> # ONLY FOR OPTION B.3
>>> perf_loss_ths = 2  # Relative error on the smallest logits accepted
>>> bs, input_sizes = 1, [(3, 256, 256)]
>>> optimized_model = optimize_onnx_model(
... model_path, batch_size=bs, input_sizes=input_sizes, save_dir=save_dir, perf_loss_ths=perf_loss_ths, 
... )
>>>
>>> # FOR EACH OPTION
>>> x = np.random.randn(1, 3, 256, 256).astype(np.float32)
>>> res = optimized_model(x)
[Click to expand] Optimization with Hugging Face

To make nebullvm work with huggingface we have changed the API slightly so that you can use the optimize_huggingface_model function to optimize your model. Below we show an example of how to accelerate GPT2 with nebullvm without loss of accuracy by leveraging only deep learning compilers (option A).

To make `nebullvm` work with `huggingface` we have changed the API slightly so that you can use the `optimize_huggingface_model` function to optimize your model.

`>>> from transformers import GPT2Tokenizer, GPT2Model
>>> from nebullvm.api.frontend.huggingface import optimize_huggingface_model
>>> tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
>>> model = GPT2Model.from_pretrained('gpt2')
>>> text = "Replace me by any text you'd like."
>>> encoded_input = tokenizer(text, return_tensors='pt')
>>> optimized_model = optimize_huggingface_model(
...     model=model,
...     tokenizer=tokenizer,
...     input_texts=[text],
...     batch_size=1,
...     max_input_sizes=[
...       tuple(value.size()[1:]) 
...       for value in encoded_input.values()
...     ],
...     save_dir=".",
...     extra_input_info=[{}, {"max_value": 1, "min_value": 0}],
... )
>>> res = optimized_model(**encoded_input)` 

Set the number of threads per model

When running multiple replicas of the model in parallel, it would be useful for CPU-optimized algorithms to limit the number of threads to use for each model. In nebullvm, it is possible to set the maximum number of threads a single model can use with the environment variable NEBULLVM_THREADS_PER_MODEL. For instance, you can run

export NEBULLVM_THREADS_PER_MODEL = 2

for using just two CPU-threads per model at inference time and during optimization.

Supported frameworks

  • PyTorch
  • TensorFlow
  • Hugging Face

Supported deep learning compilers

  • OpenVINO
  • TensorRT
  • TVM
  • MLIR (Coming soon 🚀 )

Integration with other open-source libraries

Deep learning libraries

Repositories of the best tools for AI

Do you want to integrate nebullvm in your open-source library? Try it out and if you need support, do not hesitate to contact us at [email protected].

The community for AI acceleration

Do you want to meet nebullvm contributors and other developers who share the vision of an superfast and sustainable artificial intelligence? Or would you like to report bugs or improvement ideas for nebullvm? Join the community for AI acceleration on Discord!

Acknowledgments

Nebullvm was built by Nebuly, with a major contribution by Diego Fiori, as well as a lot of support from the community who submitted pull requests, provided very useful feedback, and opened issues.

Nebullvm builds on the outstanding work being accomplished by the open-source community and major hardware vendors on deep learning compilers.


How Nebullvm WorksTutorialsBenchmarksInstallationGet StartedOptimization Examples

Discord | Website | LinkedIn | Twitter

Comments
  • Issues while setting up Nebullvm on Ubuntu

    Issues while setting up Nebullvm on Ubuntu

    Hey! I was installing nebullvm on a conda environment. Faced few issues. Please let me know how can I solve them. I know we have a docker version also, but I am keeping that incase normal installation process doesn't work for me.

    Configuration - Conda environment, Python 3.8.

    As per the documentation, I used pip command to install nebullvm and then followed the next guide to install TVM mentioned in docs itself.

    1. By running the pip install nebullvm, it installs Tensorflow==2.7.3 (I see in previous issues that Tensorflow 2.8 is not supported yet). Tensorflow 2.7.3 requires numpy>=1.19.0 and onnxruntime-gpu requires numpy>=1.21.0. So, this means I can either use Tensorflow or Onnxruntime-gpu with current configuration. Screenshot from 2022-07-06 18-52-16

    2. While installing openvino, some error popped up. Attaching screenshots. Screenshot from 2022-07-06 18-54-53 Screenshot from 2022-07-06 18-55-45

    3. Next, I ran - python -c "from nebullvm.installers.installers import install_tvm; install_tvm()" and got ImportError : numpy.core.multiarray failed to import. I guess this is because of numpy version we are using.

    4. While tvm was being built, received following errors Screenshot from 2022-07-06 19-16-37

    Few discussions/Solutions :-

    1. I was following the documentation to install it, didn't see any particular Python version. I think we have some Python version on which this installation is very smooth and no such errors are encountered.
    2. We can update the documentation and mention the Python version on which nebullvm and tvm installs smoothly. 3. Please correct me if I am installing nebullvm incorrectly.
    3. I will create environment with Python 3.7 and try to setup nebullvm.
    opened by SahilChachra 10
  • [Bug] Metadata.read with FileNotFoundError

    [Bug] Metadata.read with FileNotFoundError

    According to #18, it will throw FileNotFoundError:

    >>> model_dir = 'g8q7g2i3pPk7QVwkr6zT1lV6x8mo3nUWOOQvT6oI'
    >>> LearnerMetadata.read(model_dir).load_model(model_dir)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/chenshiyu/workspace/git/shiyu22/nebullvm/nebullvm/inference_learners/base.py", line 314, in read
        with open(path / cls.NAME, "r") as fin:
    FileNotFoundError: [Errno 2] No such file or directory: 'g8q7g2i3pPk7QVwkr6zT1lV6x8mo3nUWOOQvT6oI/metadata.json'
    

    I think in the latest version, should run this loaded_model = LearnerMetadata.read(path + '/optimized_model').load_model(path), and I have submitted PR #83 with load() to fixed it, so we can load the optimized model:

    from nebullvm.inference_learners.base import LearnerMetadata
    
    path = "path-to-directory-where-the-model-is-saved"
    loaded_model = LearnerMetadata.load(path).load_model(path)
    
    opened by shiyu22 8
  • meansure running time with median latency

    meansure running time with median latency

    Hi, I notice that nebullvm sometimes runs very slow on large models, and found that nebullvm currently uses mean latency to measure the running time of the optimized model in compute_optimized_running_time( optimized_model: BaseInferenceLearner, steps: int = 100). Could it be better if using median latency? Also, we could use an adaptive algorithm to reduce total running steps?

    opened by reiase 5
  • Different results with same data

    Different results with same data

    Hi everyone. I'm having some troubles getting consistency throughout the same inputs tensor.. Hw Sw overview:

    • Ubuntu 18.04
    • TensorRT-8.4.0.6
    • llvm 14.0
    • pip installed torch, onnx tensorflow latest stable
    • cuda 11.6
    • cudnn 8.4.0
    • nvidia 3070tx
    • Intel(R) Core(TM) i7-9700E CPU @ 2.60GHz

    The optimization that i'm doing as the samples:

    
    import torch
    import torchvision.models as models
    from nebullvm import optimize_torch_model
    import time
    import numpy as np
    
    model = models.vgg19()
    bs, input_sizes = 1, [(3, 224, 224)]
    save_dir = "."
    optimized_model = optimize_torch_model( model, batch_size=bs, input_sizes=input_sizes, save_dir=save_dir)
    
    x = torch.rand((1, *input_sizes[0]))
    
    with torch.no_grad():
        res = optimized_model(x)[0]
    

    every iteration with the same vector give different results.. Sometimes there are NaN values and sometimes they change based on odd or even number of iterations. Example res[0][:10]:

    tensor([1.4013e-45, 4.5877e-41, 1.4013e-45,        nan, 1.4013e-45, 7.0065e-45, 1.4013e-45, 1.4013e-44, 2.9427e-44, 1.4013e-44])
    tensor([3.6121e-11, 4.5877e-41, 3.6121e-11, 4.5877e-41, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00])
    tensor([ 1.4013e-45,  0.0000e+00, -1.8365e+00,  0.0000e+00,  3.3631e-44, 0.0000e+00,  3.0810e-17,  4.5877e-41,  1.4013e-45,  4.5877e-41])
    tensor([ 1.4013e-45,  0.0000e+00, -1.8370e+00,  0.0000e+00,  1.4013e-45, 0.0000e+00, -1.8370e+00,  0.0000e+00,  3.3631e-44,  0.0000e+00])
    
    

    Same x vector different results, sometimes about 40 order of magnitude. I'm getting something wrong? The inference time for the vgg19 it's incredible but I'm scared the something under the hood it's not working properly :

    %%timeit
    res = optimized_model(x)[0]
    145 µs ± 48 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    
    
    bug 
    opened by UnibsMatt 5
  • optimize_model - Information about the best performing optim method

    optimize_model - Information about the best performing optim method

    Hi,

    Thank you for this easy to use library. I'm using successfully using the optimize_model() function as per your notebook. I would like to know at the end, which optim method led to the optimized_model. Is there any way to have this information from the returned InferenceLearner object?

    Thank you!

    opened by pyvandenbussche 4
  • numpy API version 0xe vs 0xd

    numpy API version 0xe vs 0xd

    When trying to run the sample code provided in the readme I got an error

    RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd Traceback (most recent call last): File "test_pytorch.py", line 3, in from nebullvm import optimize_torch_model File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/init.py", line 1, in from nebullvm.api.frontend.torch import optimize_torch_model # noqa F401 File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/api/frontend/torch.py", line 8, in from nebullvm.converters import ONNXConverter File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/converters/init.py", line 1, in from .converters import ONNXConverter # noqa F401 File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/converters/converters.py", line 5, in import tensorflow as tf File "<PATH_TO_ENV>/lib/python3.8/site-packages/tensorflow/init.py", line 37, in from tensorflow.python.tools import module_util as _module_util File "<PATH_TO_ENV>/lib/python3.8/site-packages/tensorflow/python/init.py", line 37, in from tensorflow.python.eager import context File "<PATH_TO_ENV>/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 35, in from tensorflow.python.client import pywrap_tf_session File "<PATH_TO_ENV>/lib/python3.8/site-packages/tensorflow/python/client/pywrap_tf_session.py", line 19, in from tensorflow.python.client._pywrap_tf_session import * ImportError: SystemError: <built-in method contains of dict object at 0x7f7393496680> returned a result with an error set

    when searching about 0xe and 0xd versions of numpy, this page said to try and upgrade which I did after which the error changed and uninstalled the latest version of numpy and said that it could not find openvino:

    <PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/inference_learners/openvino.py:26: UserWarning: No valid OpenVino installation has been found. Trying to re-install it from source. warnings.warn( Collecting openvino-dev Using cached openvino_dev-2021.4.2-3976-py3-none-any.whl (6.2 MB) Collecting numpy<1.20,>=1.16.6 Using cached numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB) Collecting editdistance>=0.5.3 Using cached editdistance-0.6.0-cp38-cp38-manylinux2010_x86_64.whl (286 kB) Collecting pandas~=1.1.5 Using cached pandas-1.1.5-cp38-cp38-manylinux1_x86_64.whl (9.3 MB) Requirement already satisfied: pydicom>=2.1.2 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (2.2.2) Collecting rawpy>=0.16.0 Using cached rawpy-0.17.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB) Requirement already satisfied: jstyleson~=0.0.2 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (0.0.2) Collecting defusedxml>=0.7.1 Using cached defusedxml-0.7.1-py2.py3-none-any.whl (25 kB) Requirement already satisfied: requests>=2.25.1 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (2.27.1) Requirement already satisfied: PyYAML>=5.4.1 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (6.0) Requirement already satisfied: py-cpuinfo>=7.0.0 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (8.0.0) Collecting opencv-python==4.5.* Using cached opencv_python-4.5.5.62-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (60.4 MB) Requirement already satisfied: tqdm>=4.54.1 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (4.62.3) Collecting scipy~=1.5.4 Using cached scipy-1.5.4-cp38-cp38-manylinux1_x86_64.whl (25.8 MB) Collecting scikit-image>=0.17.2 Using cached scikit_image-0.19.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.0 MB) Requirement already satisfied: sentencepiece>=0.1.95 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (0.1.96) Requirement already satisfied: shapely>=1.7.1 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (1.8.1.post1) Collecting networkx~=2.5 Using cached networkx-2.6.3-py3-none-any.whl (1.9 MB) Collecting nltk>=3.5 Using cached nltk-3.7-py3-none-any.whl (1.5 MB) Requirement already satisfied: pillow>=8.1.2 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (9.0.0) Requirement already satisfied: yamlloader>=0.5 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (1.1.0) Requirement already satisfied: addict>=2.4.0 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (2.4.0) Collecting nibabel>=3.2.1 Using cached nibabel-3.2.2-py3-none-any.whl (3.3 MB) Collecting hyperopt~=0.1.2 Using cached hyperopt-0.1.2-py3-none-any.whl (115 kB) Requirement already satisfied: texttable~=1.6.3 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (1.6.4) Requirement already satisfied: progress>=1.5 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (1.6) Collecting parasail>=1.2.4 Using cached parasail-1.2.4-py2.py3-none-manylinux2010_x86_64.whl (14.1 MB) Collecting openvino==2021.4.2 Using cached openvino-2021.4.2-3976-cp38-cp38-manylinux2014_x86_64.whl (28.9 MB) Collecting scikit-learn>=0.24.1 Using cached scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB) Requirement already satisfied: fast-ctc-decode>=0.2.5 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (0.3.0) Requirement already satisfied: tokenizers>=0.10.1 in <PATH_TO_ENV>/lib/python3.8/site-packages (from openvino-dev) (0.11.5) Requirement already satisfied: six in <PATH_TO_ENV>/lib/python3.8/site-packages (from hyperopt~=0.1.2->openvino-dev) (1.16.0) Collecting future Using cached future-0.18.2-py3-none-any.whl Requirement already satisfied: pymongo in <PATH_TO_ENV>/lib/python3.8/site-packages (from hyperopt~=0.1.2->openvino-dev) (4.0.1) Requirement already satisfied: setuptools in <PATH_TO_ENV>/lib/python3.8/site-packages (from nibabel>=3.2.1->openvino-dev) (56.0.0) Requirement already satisfied: packaging>=14.3 in <PATH_TO_ENV>/lib/python3.8/site-packages (from nibabel>=3.2.1->openvino-dev) (21.3) Requirement already satisfied: regex>=2021.8.3 in <PATH_TO_ENV>/lib/python3.8/site-packages (from nltk>=3.5->openvino-dev) (2022.1.18) Collecting click Using cached click-8.0.4-py3-none-any.whl (97 kB) Requirement already satisfied: joblib in <PATH_TO_ENV>/lib/python3.8/site-packages (from nltk>=3.5->openvino-dev) (1.1.0) Requirement already satisfied: python-dateutil>=2.7.3 in <PATH_TO_ENV>/lib/python3.8/site-packages (from pandas~=1.1.5->openvino-dev) (2.8.2) Requirement already satisfied: pytz>=2017.2 in <PATH_TO_ENV>/lib/python3.8/site-packages (from pandas~=1.1.5->openvino-dev) (2021.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in <PATH_TO_ENV>/lib/python3.8/site-packages (from requests>=2.25.1->openvino-dev) (1.26.8) Requirement already satisfied: idna<4,>=2.5 in <PATH_TO_ENV>/lib/python3.8/site-packages (from requests>=2.25.1->openvino-dev) (3.3) Requirement already satisfied: certifi>=2017.4.17 in <PATH_TO_ENV>/lib/python3.8/site-packages (from requests>=2.25.1->openvino-dev) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in <PATH_TO_ENV>/lib/python3.8/site-packages (from requests>=2.25.1->openvino-dev) (2.0.12) Collecting tifffile>=2019.7.26 Using cached tifffile-2022.2.9-py3-none-any.whl (180 kB) Collecting PyWavelets>=1.1.1 Using cached PyWavelets-1.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (6.3 MB) Collecting imageio>=2.4.1 Using cached imageio-2.16.0-py3-none-any.whl (3.3 MB) Requirement already satisfied: threadpoolctl>=2.0.0 in <PATH_TO_ENV>/lib/python3.8/site-packages (from scikit-learn>=0.24.1->openvino-dev) (3.1.0) Using cached imageio-2.15.0-py3-none-any.whl (3.3 MB) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in <PATH_TO_ENV>/lib/python3.8/site-packages (from packaging>=14.3->nibabel>=3.2.1->openvino-dev) (3.0.7) Installing collected packages: numpy, networkx, future, editdistance, defusedxml, click, tifffile, scipy, rawpy, PyWavelets, parasail, pandas, openvino, opencv-python, nltk, nibabel, imageio, scikit-learn, scikit-image, hyperopt, openvino-dev Attempting uninstall: numpy Found existing installation: numpy 1.22.2 Uninstalling numpy-1.22.2: Successfully uninstalled numpy-1.22.2 ERROR: Could not install packages due to an OSError: [Errno 39] Directory not empty: '<PATH_TO_ENV>/lib/python3.8/site-packages/~~mpy.libs' Traceback (most recent call last): File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/inference_learners/openvino.py", line 23, in from openvino.inference_engine import IECore ModuleNotFoundError: No module named 'openvino' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "test_pytorch.py", line 3, in from nebullvm import optimize_torch_model File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/init.py", line 1, in from nebullvm.api.frontend.torch import optimize_torch_model # noqa F401 File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/api/frontend/torch.py", line 10, in from nebullvm.optimizers.multi_compiler import MultiCompilerOptimizer File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/optimizers/init.py", line 5, in from nebullvm.optimizers.openvino import OpenVinoOptimizer File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/optimizers/openvino.py", line 5, in from nebullvm.inference_learners.openvino import ( File "<PATH_TO_ENV>/lib/python3.8/site-packages/nebullvm/inference_learners/openvino.py", line 33, in from openvino.inference_engine import IECore ModuleNotFoundError: No module named 'openvino'

    The error about not having a module named openvino was present even when I ranpython -c "import nebullvm" but since the README.md asks to ignore import errors, I did. I tried pip install openvino but openvino needs numpy < 1.20 while tensorflow needs numpy >= 1.20 so there were incompatibilities. I had used pip to install nebullvm

    My specifications: python 3.8 torch 1.10.2+cu13 CPU: intel i5-9300H PC: Acer Predator Helios PH315-52 V1.12 OS: Arch Linux x86_64

    bug 
    opened by IamMarcIvanov 3
  • Support example inputs in `optimize_torch_model`

    Support example inputs in `optimize_torch_model`

    I'm trying to run nebullvm on PyTorch's benchmarks. Unfortunately, when optimizing the model BERT_pytorch, it crashes:

      File "/Users/reiase/workspace/nebullvm/nebullvm/api/frontend/torch.py", line 296, in optimize_torch_model
        onnx_path = model_converter.convert(
      File "/Users/reiase/workspace/nebullvm/nebullvm/converters/converters.py", line 58, in convert
        convert_torch_to_onnx(
      File "/Users/reiase/workspace/nebullvm/nebullvm/converters/torch_converters.py", line 32, in convert_torch_to_onnx
        output_sizes = get_outputs_sizes_torch(torch_model, input_tensors)
      File "/Users/reiase/workspace/nebullvm/nebullvm/utils/torch.py", line 16, in get_outputs_sizes_torch
        outputs = torch_model(*input_tensors)
      File "/Users/reiase/.pyenv/versions/miniforge3/lib/python3.9/site-packages/torch/fx/graph_module.py", line 652, in call_wrapped
        return self._wrapped_call(self, *args, **kwargs)
      File "/Users/reiase/.pyenv/versions/miniforge3/lib/python3.9/site-packages/torch/fx/graph_module.py", line 277, in __call__
        raise e
      File "/Users/reiase/.pyenv/versions/miniforge3/lib/python3.9/site-packages/torch/fx/graph_module.py", line 267, in __call__
        return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
      File "/Users/reiase/.pyenv/versions/miniforge3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "<eval_with_key>.1", line 15, in forward
        embedding_1 = torch.nn.functional.embedding(segment_info, self_embedding_segment_weight, 0, None, 2.0, False, False);  segment_info = self_embedding_segment_weight = None
      File "/Users/reiase/.pyenv/versions/miniforge3/lib/python3.9/site-packages/torch/nn/functional.py", line 2199, in embedding
        return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
    IndexError: index out of range in self
    

    It seems that the mocked input cases IndexError in torch.embedding. Can we support the direct use of example input in the inference learners?

    opened by reiase 2
  • ModuleNotFoundError: No module named 'openvino'

    ModuleNotFoundError: No module named 'openvino'

    When installing via "pip install nebullvm" the install goes through. However, when calling "python -c "import nebullvm" " to install the the different DL compilers, I am greeted by a ModuleNotFoundError for the module openvino. pip installing openvino only causes further errors. The read me says these errors/warnings can be ignored on this initial import, yet they persist for every other import, making the package unusable.

    opened by messmor 2
  • [bug] openVINO `Input shape

    [bug] openVINO `Input shape "(2,)" cannot be parsed.`

    Hi, i have speaker input with is a tensor of shape (batch_size,). It seems, that openVINO can't parse if that is not 2d tensor. One overcome is to expand inputs, but it needs to change model itself I think it is better to change cmd for openvino

    Model Optimizer arguments:
    Common parameters:
    	- Path to the Input Model: 	/var/folders/5f/hz_9yjd12zsc54py3z5rdr000000gn/T/tmpaiitdypu/checkpoint_200000.pth.tar.onnx
    	- Path for generated IR: 	/var/folders/5f/hz_9yjd12zsc54py3z5rdr000000gn/T/tmpaiitdypu
    	- IR output name: 	checkpoint_200000.pth.tar
    	- Log level: 	ERROR
    	- Batch: 	Not specified, inherited from the model
    	- Input layers: 	src_seq,p_target,vad_target,speaker
    	- Output layers: 	Not specified, inherited from the model
    	- Input shapes: 	(2, 94, 70),(2, 94),(2, 94),(2,)
    	- Source layout: 	Not specified
    	- Target layout: 	Not specified
    	- Layout: 	Not specified
    	- Mean values: 	Not specified
    	- Scale values: 	Not specified
    	- Scale factor: 	Not specified
    	- Precision of IR: 	FP32
    	- Enable fusing: 	True
    	- User transformations: 	Not specified
    	- Reverse input channels: 	False
    	- Enable IR generation for fixed input shape: 	False
    	- Use the transformations config file: 	None
    Advanced parameters:
    	- Force the usage of legacy Frontend of Model Optimizer for model conversion into IR: 	False
    	- Force the usage of new Frontend of Model Optimizer for model conversion into IR: 	False
    OpenVINO runtime found in: 	/Users/sotomi/envs/nebullvm_compilation/lib/python3.9/site-packages/openvino
    OpenVINO runtime version: 	2022.1.0-7019-cdb9bec7210-releases/2022/1
    Model Optimizer version: 	2022.1.0-7019-cdb9bec7210-releases/2022/1
    [ ERROR ]  Input shape "(2, 94, 70),(2, 94),(2, 94),(2,)" cannot be parsed. 
     For more information please refer to Model Optimizer FAQ, question #57. (https://docs.openvino.ai/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=57#question-57)
    100%|██████████| 1/1 [00:03<00:00,  3.44s/it]
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    /Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb Cell 26 in <cell line: 16>()
         [14](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=13) print([v.dtype for v in data])
         [15](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=14) print([v.shape for v in data])
    ---> [16](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=15) optimized_model = optimize_onnx_model(
         [17](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=16)   onnx_path,
         [18](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=17)   batch_size=bs,
         [19](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=18)   input_sizes=sizes,
         [20](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=19)   input_types=types,
         [21](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=20)   # data=[(data, 0)],
         [22](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=21)   save_dir=save_dir
         [23](vscode-notebook-cell:/Users/sotomi/projects/real-time-audiostreaming/FastSpeech/notebooks/convert_models_to_onnx.ipynb#ch0000025?line=22) )
    
    File ~/envs/nebullvm_compilation/lib/python3.9/site-packages/nebullvm/api/frontend/onnx.py:242, in optimize_onnx_model(model_path, save_dir, data, batch_size, input_sizes, input_types, extra_input_info, dynamic_axis, perf_loss_ths, perf_metric, ignore_compilers, custom_optimizers)
        237         model_optimized = None
        238     if model_optimized is None:
        239         raise RuntimeError(
        240             "No valid compiled model has been produced. "
        241             "Look at the logs for further information about the failure."
    --> 242         )
        243     model_optimized.save(save_dir)
        244 return model_optimized.load(save_dir)
    
    RuntimeError: No valid compiled model has been produced. Look at the logs for further information about the failure.
    
    opened by SolomidHero 1
  • ONNX optimization fails due to PyTorch ELU

    ONNX optimization fails due to PyTorch ELU

    While optimizing a PyTorch model using ONNX, I receive the following error:

    RuntimeError: 0 INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":614, please report a bug to PyTorch. We don't have an op for aten::elu but it isn't a special case.  Argument types: Tensor, bool, int, int, 
    
    Candidates:
    	aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
    	aten::elu.out(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1, *, Tensor(a!) out) -> (Tensor(a!))
    

    I have tried torch==1.10 and torch==1.12

    opened by MikeAleksa 1
  • Support of Detectron2 models

    Support of Detectron2 models

    For now Detectron2 models don't work out of the box because of the different input format.

    It would be nice to have an example on how to fix that.

    Thanks!

    question example 
    opened by justlike-prog 1
  • select compiler with hyperparameter

    select compiler with hyperparameter

    Signed-off-by: reiase [email protected]

    example for selecting compiler with hyperparameter:

    from hyperparameter import param_scope
    with param_scope() as ps:
        ps.nebullvm.compilers.selector.enabled = ["onnxruntime"]
        optimized_model = optimize_model(
            model, input_data=input_data, optimization_time="constrained")
    

    nebullvm.compilers.selector.enabled is the key for config compiler. We can also select compilers with a dict-style configuration:

    with param_scope(**{"nebullvm.compilers.selector.enabled": ["onnxruntime"]}) as ps:
        optimized_model = optimize_model(
            model, input_data=input_data, optimization_time="constrained")
    

    Advantages for managing configuration with hyperparameter :

    1. keeping the API simple for normal users;
    2. expose more detailed parameters to advanced users;
    opened by reiase 1
  • detailed parameters for optimizers

    detailed parameters for optimizers

    Each optimizer has its own parameters, which are hard to expose them all in high-level nebullvm APIs. Passing such parameters is difficult, for we have to pass it through all the calling stacks.

    Suppose we have a calling stack user_api -> nebullvm_func -> nebullvm_func -> tvm_api. If the user wants to change the default passes in tvm, we have to add something like tvm_params in the user_api so that the user can pass some detailed parameters to tvm. And some of the nebullvm_func also need to be modified to pass through the tvm_params, even though they are not used by any of the nebullvm_func. Another solution is a config file for the optimizers. But the users may not want to maintain such config files.

    HyperParameter is a config framework I developed for building DNN models. It is recently extracted into a new package to support more projects. A major feature of HyperParameter is we can pass parameters across the calling stacks, without putting them into the argument list of functions. For example:

    @auto_param()
    def dnn(input, layers=3, activation="relu"):
    	"""build a MLP
    	"""
        for i in range(layers):
            input = Linear(input)
            input = activation_fn(
                activation,
                input
            )
        return input
    	
    def some_framework_func(x):
    	...
    	return dnn(x)
    	
    def user_api(x):
    	...
    	return some_framework_func(x)
    
    # normal usage
    user_api(x) # call dnn(x, layers=4, activation="sigmoid")
    
    # advanced usage, passing parameter using param_scope
    with param_scope(
            "dnn.layers=4", 
            "dnn.activation=sigmoid"):
        user_api(x) # call dnn(x, layers=4, activation="sigmoid")
    

    the auto_param decorator will take over the kwargs of dnn, and change the default values according to param_scope.

    For more examples, please take a look at the document: https://reiase.github.io/hyperparameter/

    opened by reiase 0
  • Hugging Face bug when using a list of strings as input

    Hugging Face bug when using a list of strings as input

    Schermata 2022-07-26 alle 17 23 41 The optimization of an hugging face model doesn't work when providing as input a list of strings, line 370 in huggingface.py (input_example = tokenizer(input_data)) should be changed in order to allow to change the output type from list to tensor by passing the parameter tokenizer_args
    opened by valeriosofi 0
  • Does Nebullvm support detectron2?

    Does Nebullvm support detectron2?

    Hi,

    I'm working on a project that contains different models. One of those models is detectron2( which I'm not trying to optimize). However, when I install and import the nebullvm library on colab, the detectron2 librairies don't work anymore. Is there a way to fix that?

    opened by mst019 1
  • Different behavior in optimized padding block

    Different behavior in optimized padding block

    I'm trying to optimize a CNN model and getting different behavior in a padding/concatenation block during optimization, causing the optimization to fail. The block in question:

    class PadConcatBlock(nn.Module):
        def __init__(self):
            super().__init__()
    
        def forward(self, down_layer, x):
            x1_shape = down_layer.shape
            x2_shape = x.shape
    
            height_diff = (x1_shape[2] - x2_shape[2]) // 2
    
            # if down_layer is larger than x, pad down_layer
            down_layer_padded = F.pad(
                down_layer,
                (0, 0, abs(height_diff), 0, 0, 0),
                mode="replicate",
            )
    
            x = torch.concat([down_layer_padded, x], axis=1)
            return x
    

    down_layer shape is torch.Size([1, 32, 1024, 16]) x shape is torch.Size([1, 32, 1025, 16])

    height_diff should be -1 but when I log height_diff during the Running Optimization using ONNX interface step, it sometimes comes back as -1 and sometimes comes back as tensor(0), causing torch.concat() to fail.

    Why would the behavior of height_diff = (x1_shape[2] - x2_shape[2]) // 2 change?

    opened by MikeAleksa 1
Releases(v0.4.3)
  • v0.4.3(Sep 12, 2022)

    nebullvm 0.4.3 Release Notes

    Minor release that fixes some bugs added in v0.4.2.

    Bug fixed

    • Fix bug preventing the installation without TensorFlow.
    • Fix a bug while using the HuggingFace Interface

    Contributors

    • Diego Fiori (@morgoth95)
    • Valerio Sofi (@valeriosofi)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Sep 8, 2022)

    nebullvm 0.4.2 Release Notes

    Minor release that fixes some bugs and reduces the number of strict requirements needed to run Nebullvm.

    New Features

    • Support ignore_compilers also for torchscript and tflite
    • Tensorflow is not a strict nebullvm requirement anymore.

    Bug fixed

    • Solve bug on half-precision with onnx-runtime
    • Fix a bug on tensor rt quantization: numpy arrays were passed to inference learner instead of tensors.

    Contributors

    • Diego Fiori (@morgoth95)
    • Valerio Sofi (@valeriosofi)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Sep 2, 2022)

    nebullvm 0.4.1 Release Notes

    Minor release fixing some bugs and extending support for TensorRT directly with the PyTorch interface.

    New Features

    • Support for TensorRT directly with PyTorch models.

    Bug fixed

    • Bug in conversion to onnx that could lead to wrong inference results

    Contributors

    • Diego Fiori (@morgoth95)
    • Valerio Sofi (@valeriosofi)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jul 26, 2022)

    nebullvm 0.4.0 Release Notes

    "One API to rule them all". This major release of Nebullvm provides a brand new API unique to all Deep Learning frameworks.

    New Features

    • New unique API for all the Deep Learning frameworks.
    • Support for SparseML pruning.
    • Beta-feature Support for Intel-Neural-Compressor's Pruning.
    • Add support for BladeDISC compiler.
    • Modify the latency calculation for each model by using the median instead of the mean across different model runs.
    • Implement an early stop mechanism for latency computation.

    Bug fixed

    • Fix bug with HuggingFace models causing a failure during optimizations.

    Contributors

    • Diego Fiori (@morgoth95)
    • Valerio Sofi (@valeriosofi)
    • Reiase (@reiase)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.2(Jul 19, 2022)

    nebullvm 0.3.2 Release Notes

    Minor release for maintenance purposes. It fixes bugs and generally improves the code stability.

    New Features

    • In the Pytorch framework, whenever input data is provided for optimization, the model converter also uses it during the conversion of the model to onnx, instead of using the data only at the stage of applying the "precision reduction techniques."

    Bug fixed

    • Fix bug with OpenVino 2.0 not working with 1-dimensional arrays.
    • Fix bug while using TensorRT engine which was returning cpu-tensors also when input tensors where on GPU.
    • Fix requirements conflicts on Intel CPUs due to an old numpy version required by OpenVino.

    Contributors

    • Diego Fiori (@morgoth95)
    • Valerio Sofi (@valeriosofi)
    • SolomidHero (@SolomidHero)
    • Emile Courthoud (@emilecourthoud)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Jun 28, 2022)

    nebullvm 0.3.1 Release Notes

    We are pleased to announce that we have added the option to run nebullvm from a Docker container. We provide both a Docker image on Docker Hub and the Dockerfile code to produce the Docker container directly from the latest version of the source code.

    New Features

    • Add Dockerfile and upload docker images on Docker Hub.
    • Implement new backend for the Tensorflow API running on top of TensorFlow and TFLite.
    • Implement new backend for the PyTorch API running on top of TorchScript.

    Bug fixed

    • Fix bug with TensorRT in the Tensorflow API.
    • Fix bug with OpenVino 2.0 not using the quantization on intel devices.

    Contributors

    • Diego Fiori (@morgoth95)
    • Valerio Sofi (@valeriosofi)
    • Emile Courthoud (@emilecourthoud)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(May 10, 2022)

    nebullvm 0.3.0 Release Notes

    We are super excited to announce the new major release nebullvm 0.3.0, where nebullvm's AI inference accelerator becomes more powerful, stable and covers more use cases.

    nebullvm is an open-source library that generates an optimized version of your deep learning model that runs 2-10 times faster in inference without performance loss by leveraging multiple deep learning compilers (OpenVINO, TensorRT, etc.). With the new release 0.3.0, nebullvm can now accelerate inference up to 30x if you specify that you are willing to trade off a self-defined amount of accuracy/precision to get an even lower response time and a lighter model. This additional acceleration is achieved by exploiting optimization techniques that slightly modify the model graph to make it lighter, such as quantization, half precision, distillation, sparsity, etc.

    Find tutorials and examples on how to use nebullvm, as well as installation instructions in the main readme of nebullvm library. And check below if you want to learn more about

    • Overview of Nebullvm 0.3.0
    • Benchmarks
    • How the new Nebullvm 0.3.0 API Works
    • New Features & Bug Fixes

    Overview of Nebullvm

    With this new version, nebullvm continues in its mission to be:

    ☘️ Easy-to-use. It takes a few lines of code to install the library and optimize your models.

    🔥 Framework agnostic. nebullvm supports the most widely used frameworks (PyTorch, TensorFlow, 🆕ONNX🆕 and Hugging Face, etc.) and provides as output an optimized version of your model with the same interface (PyTorch, TensorFlow, etc.).

    💻 Deep learning model agnostic. nebullvm supports all the most popular deep learning architectures such as transformers, LSTM, CNN and FCN.

    🤖 Hardware agnostic. The library now works on most CPU and GPU and will soon support TPU and other deep learning-specific ASIC.

    🔑 Secure. Everything runs locally on your hardware.

    ✨ Leveraging the best optimization techniques. There are many inference techniques such as deep learning compilers, 🆕quantization or half precision🆕, and soon sparsity and distillation, which are all meant to optimize the way your AI models run on your hardware.

    Benchmarks

    We have tested nebullvm on popular AI models and hardware from leading vendors.

    The table below shows the inference speedup provided by nebullvm. The speedup is calculated as the response time of the unoptimized model divided by the response time of the accelerated model, as an average over 100 experiments. As an example, if the response time of an unoptimized model was on average 600 milliseconds and after nebullvm optimization only 240 milliseconds, the resulting speedup is 2.5x times, meaning 150% faster inference.

    A complete overview of the experiment and findings can be found on this page.

    | | M1 Pro | Intel Xeon | AMD EPYC | Nvidia T4 | |-------------------------|:------------:|:---------------:|:-------------:|:-------------:| | EfficientNetB0 | 23.3x | 3.5x | 2.7x | 1.3x | | EfficientNetB2 | 19.6x | 2.8x | 1.5x | 2.7x | | EfficientNetB6 | 19.8x | 2.4x | 2.5x | 1.7x | | Resnet18 | 1.2x | 1.9x | 1.7x | 7.3x | | Resnet152 | 1.3x | 2.1x | 1.5x | 2.5x | | SqueezeNet | 1.9x | 2.7x | 2.0x | 1.3x | | Convnext tiny | 3.2x | 1.3x | 1.8x | 5.0x | | Convnext large | 3.2x | 1.1x | 1.6x | 4.6x | | GPT2 - 10 tokens | 2.8x | 3.2x | 2.8x | 3.8x | | GPT2 - 1024 tokens | - | 1.7x | 1.9x | 1.4x | | Bert - 8 tokens | 6.4x | 2.9x | 4.8x | 4.1x | | Bert - 512 tokens | 1.8x | 1.3x | 1.6x | 3.1x | | ____________________ | ____________ | ____________ | ____________ | ____________ |

    Overall, the library provides great results, with more than 2x acceleration in most cases and around 20x in a few applications. We can also observe that acceleration varies greatly across different hardware-model couplings, so we suggest you test nebullvm on your model and hardware to assess its full potential. You can find the instructions below.

    Besides, across all scenarios, nebullvm is very helpful for its ease of use, allowing you to take advantage of inference optimization techniques without having to spend hours studying, testing and debugging these technologies.

    How the New Nebullvm API Works

    With the latest release, nebullvm has a new API and can be deployed in two ways.

    Option A: 2-10x acceleration, NO performance loss

    If you choose this option, nebullvm will test multiple deep learning compilers (TensorRT, OpenVINO, ONNX Runtime, etc.) and identify the optimal way to compile your model on your hardware, increasing inference speed by 2-10 times without affecting the performance of your model.

    Option B: 2-30x acceleration, supervised performance loss

    Nebullvm is capable of speeding up inference by much more than 10 times in case you are willing to sacrifice a fraction of your model's performance. If you specify how much performance loss you are willing to sustain, nebullvm will push your model's response time to its limits by identifying the best possible blend of state-of-the-art inference optimization techniques, such as deep learning compilers, distillation, quantization, half precision, sparsity, etc.

    Performance monitoring is accomplished using the perf_loss_ths (performance loss threshold), and the perf_metric for performance estimation.

    When a predefined metric (e.g. "accuracy") or a custom metric is passed as the perf_metric argument, the value of perf_loss_ths will be used as the maximum acceptable loss for the given metric evaluated on your datasets (Option B.1).

    When no perf_metric is provided as input, nebullvm calculates the performance loss using the default precision function. If the dataset is provided, the precision will be calculated on 100 sampled data (option B.2). Otherwise, the data will be randomly generated from the metadata provided as input, i.e. input_sizes and batch_size (option B.3).

    Check out the main GitHub readme if you want to take a look at nebullvm's performance and benchmarks, tutorials and notebooks on how to implement nebullvm with ease. And please leave a ⭐ if you enjoy the project and join the Discord community where we chat about nebullvm and AI optimization.

    New Features and Bug Fixes

    New features

    • Implemented quantization or half precision optimization techniques
    • Added support for models in the ONNX framework
    • Improved performance of Microsoft ONNX Runtime with transformers
    • Implemented nebullvm into Jina's amazing Clip-as-a-Service library for performance boost ( coming soon)
    • Accelerated library installation
    • Refactored the code to include support for datasets as an API
    • Released new benchmarks, notebooks and tutorials that can be found on the github readme

    Bug fixing

    • Fixed bug related to Intel OpenVINO applied to dynamic shapes. Thanks @kartikeyporwal for the support!
    • Fixed bug with model storage.
    • Fixed bug causing issues with NVIDIA TensorRT output. Thanks @UnibsMatt for identifying the problem.

    Contributors

    • @morgoth95 🥳
    • @emilecourthoud 🚀
    • @kartikeyporwal 🥇
    • @aurimgg 🚗
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Apr 30, 2022)

    nebullvm 0.2.2 Release Notes

    The nebullvm 0.2.2 is minor release fixing some bugs.

    New Features

    • Allow the user to select the maximum number of CPU-threads per model to use during optimization and inference.

    Bug fixed

    • Fix bug in ONNXRuntime InferenceLearner

    Contributors

    • Diego Fiori (@morgoth95)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Apr 26, 2022)

    nebullvm 0.2.1 Release Notes

    The nebullvm 0.2.1 is minor release fixing some bugs and supporting optimization directly on ONNX models.

    New Features

    • ONNX interface for model optimization

    Bug fixed

    • Fix bug in tensorRT

    Contributors

    • Diego Fiori (@morgoth95)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Apr 3, 2022)

    nebullvm 0.2.0 Release Notes

    The nebullvm 0.2.0 is major release implementing new important features and fixing some bugs.

    New Features

    • Support for dynamic shapes for both the PyTorch and TensorFlow interfaces
    • Support for Transformer models built using the HuggingFace framework
    • Add ONNXRuntime to the supported backends for optimized models
    • New README, updated with benchmarks on SOTA models for both NLP and Computer Vision

    Bug fixed

    • Fix error in the tensorflow API preventing the usage of the optimize_tf_model function

    Contributors

    • Diego Fiori (@morgoth95)
    • Emile Courthoud (@emilecourthoud)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Mar 1, 2022)

    nebullvm 0.1.2 Release Notes

    The nebullvm 0.1.2 is maintenance release fixing few bugs and implementing new features.

    New Features

    • Support for the TorchScript API when optimising with ApacheTVM compiler.

    Bug fixed

    • The learners optimised with OpenVino now do not raise KeyErrors at prediction time anymore.
    • The learners optimised with ApacheTVM can be saved and loaded multiple times. Previously, trying to save a loaded model ended up in raising an error.
    • Fix bug in the auto-installer feature due to incompatibilities between Tensorflow 2.8 and OpenVino
    • Modify the behaviour of MultiCompilerOptimizeravoiding errors due to the pickling of C-related files.

    Contributors

    • Diego Fiori (@morgoth95)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Feb 28, 2022)

    nebullvm 0.1.1 Release Notes

    Official Alpha release of the nebullvmlibrary. The all-in-one library for deep learning compilers.

    Main features

    The main release contains:

    • wheels for installing with pip
    • auto-installation feature for supported compilers
    • support for OpenVINO, TensorRT and ApacheTVM
    • support for model built in Tensorflow and PyTorch
    • Optimised model API identical to the one of the input model

    Contributors

    A total of 3 people contributed to this release.

    • Diego Fiori (@morgoth95)
    • Emile Courthoud (@emilecourthoud)
    • Francesco Signorato (@FrancescoSignorato)
    Source code(tar.gz)
    Source code(zip)
Owner
Nebuly
Optimize everywhere, all the time.
Nebuly
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Hugging Face Optimum ?? Optimum is an extension of ?? Transformers, providing a set of performance optimization tools enabling maximum efficiency to t

Hugging Face 710 Sep 30, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 143 Sep 14, 2022
Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.

Easy Few-Shot Learning Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification. This repository is made for you

Sicara 331 Sep 23, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 154 Sep 13, 2022
🐦 Opytimizer is a Python library consisting of meta-heuristic optimization techniques.

Opytimizer: A Nature-Inspired Python Optimizer Welcome to Opytimizer. Did you ever reach a bottleneck in your computational experiments? Are you tired

Gustavo Rosa 533 Sep 13, 2022
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Here is deepparse. Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning. Use deepparse to Use the pr

GRAAL/GRAIL 170 Oct 1, 2022
LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

LWCC: A LightWeight Crowd Counting library for Python LWCC is a lightweight crowd counting framework for Python. It wraps four state-of-the-art models

Matija Teršek 34 Sep 8, 2022
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

TorchMultimodal (Alpha Release) Introduction TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Meta Research 412 Sep 21, 2022
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, scikit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, speech recognition, generation, certification, etc.).

null 3.3k Sep 30, 2022
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan 63 Sep 20, 2022
An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

알고리즘 스터디 ?? 부스트캠프 웹모바일 6기 iOS 10조의 알고리즘 스터디 입니다. 개인적인 사정 등으로 S034, S055만 참가하였습니다. 스터디 목적 상진: 코테 합격 + 부캠끝나고 아침에 일어나기 위해 필요한 사이클 기완: 꾸준하게 자리에 앉아 공부하기 +

null 2 Jan 11, 2022
This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

null 4 Aug 2, 2022
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.3k Sep 25, 2022
State of the Art Neural Networks for Deep Learning

pyradox This python library helps you with implementing various state of the art neural networks in a totally customizable fashion using Tensorflow 2

Ritvik Rastogi 60 May 29, 2022
Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

Max Berrendorf 15 Aug 18, 2022
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

null 346 Sep 26, 2022
State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

Fredrik Carlsson 87 Jul 7, 2022
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

TuZheng 365 Sep 26, 2022