CPU inference engine that delivers unprecedented performance for sparse models

Overview

icon for DeepSparse DeepSparse Engine

CPU inference engine that delivers unprecedented performance for sparse models

GitHub GitHub Documentation GitHub release Contributor Covenant

Overview

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads. It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.

This repository includes package APIs along with examples to quickly get started learning about and actually running sparse models.

Related Products

  • SparseZoo: Neural network model repository for highly sparse models and optimization recipes
  • SparseML: Libraries for state-of-the-art deep neural network optimization algorithms, enabling simple pipelines integration with a few lines of code
  • Sparsify: Easy-to-use autoML interface to optimize deep neural networks for better inference performance and a smaller footprint

Compatibility

The DeepSparse Engine ingests models in the ONNX format, allowing for compatibility with PyTorch, TensorFlow, Keras, and many other frameworks that support it. This reduces the extra work of preparing your trained model for inference to just one step of exporting.

Quick Tour

To expedite inference and benchmarking on real models, we include the sparsezoo package. SparseZoo hosts inference optimized models, trained on repeatable optimization recipes using state-of-the-art techniques from SparseML.

Quickstart with SparseZoo ONNX Models

MobileNetV1 Dense

Here is how to quickly perform inference with DeepSparse Engine on a pre-trained dense MobileNetV1 from SparseZoo.

from deepsparse import compile_model
from sparsezoo.models import classification
batch_size = 64

# Download model and compile as optimized executable for your machine
model = classification.mobilenet_v1()
engine = compile_model(model, batch_size=batch_size)

# Fetch sample input and predict output using engine
inputs = model.data_inputs.sample_batch(batch_size=batch_size)
outputs, inference_time = engine.timed_run(inputs)

MobileNetV1 Optimized

When exploring available optimized models, you can use the Zoo.search_optimized_models utility to find models that share a base.

Let us try this on the dense MobileNetV1 to see what is available.

from sparsezoo import Zoo
from sparsezoo.models import classification
print(Zoo.search_optimized_models(classification.mobilenet_v1()))

Output:

[Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/base-none),
 Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-conservative),
 Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-moderate),
 Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned_quant-moderate)]

Great. We can see there are two pruned versions targeting FP32, conservative at 100% and moderate at >= 99% of baseline accuracy. There is also a pruned_quant variant targetting INT8.

Let's say you want to evaluate best performance on FP32 and are okay with a small drop in accuracy, so we can choose pruned-moderate over pruned-conservative.

from deepsparse import compile_model
from sparsezoo.models import classification
batch_size = 64

model = classification.mobilenet_v1(optim_name="pruned", optim_category="moderate")
engine = compile_model(model, batch_size=batch_size)

inputs = model.data_inputs.sample_batch(batch_size=batch_size)
outputs, inference_time = engine.timed_run(inputs)

Quickstart with custom ONNX models

We accept ONNX files for custom models, too. Simply plug in your model to compare performance with other solutions.

> wget https://github.com/onnx/models/raw/master/vision/classification/mobilenet/model/mobilenetv2-7.onnx
Saving to: ‘mobilenetv2-7.onnx’
from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs
onnx_filepath = "mobilenetv2-7.onnx"
batch_size = 16

# Generate random sample input
inputs = generate_random_inputs(onnx_filepath, batch_size)

# Compile and run
engine = compile_model(onnx_filepath, batch_size)
outputs = engine.run(inputs)

For a more in-depth read on available APIs and workflows, check out the examples and DeepSparse Engine documentation.

Hardware Support

The DeepSparse Engine is validated to work on x86 Intel and AMD CPUs running Linux operating systems.

It is highly recommended to run on a CPU with AVX-512 instructions available for optimal algorithms to be enabled.

Here is a table detailing specific support for some algorithms over different microarchitectures:

x86 Extension Microarchitectures Activation Sparsity Kernel Sparsity Sparse Quantization
AMD AVX2 Zen 2, Zen 3 not supported optimized not supported
Intel AVX2 Haswell, Broadwell, and newer not supported optimized not supported
Intel AVX-512 Skylake, Cannon Lake, and newer optimized optimized emulated
Intel AVX-512 VNNI (DL Boost) Cascade Lake, Ice Lake, Cooper Lake, Tiger Lake optimized optimized optimized

Installation

This repository is tested on Python 3.6+, and ONNX 1.5.0+. It is recommended to install in a virtual environment to keep your system in order.

Install with pip using:

pip install deepsparse

Then if you want to explore the examples, clone the repository and any install additional dependencies found in example folders.

Notebooks

For some step-by-step examples, we have Jupyter notebooks showing how to compile models with the DeepSparse Engine, check the predictions for accuracy, and benchmark them on your hardware.

Available Models and Recipes

A number of pre-trained baseline and recalibrated models models in the SparseZoo can be used with the engine for higher performance. The types available for each model architecture are noted in its SparseZoo model repository listing.

Resources and Learning More

Contributing

We appreciate contributions to the code, examples, and documentation as well as bug reports and feature requests! Learn how here.

Join the Community

For user help or questions about the DeepSparse Engine, use our GitHub Discussions. Everyone is welcome!

You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by subscribing to the Neural Magic community.

For more general questions about Neural Magic, please email us at [email protected] or fill out this form.

License

The project's binary containing the DeepSparse Engine is licensed under the Neural Magic Engine License.

Example files and scripts included in this repository are licensed under the Apache License Version 2.0 as noted.

Release History

Official builds are hosted on PyPi

Track this project via GitHub Releases.

Citation

Find this project useful in your research or other communications? Please consider citing Neural Magic's paper:

@inproceedings{pmlr-v119-kurtz20a, 
    title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, 
    author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, 
    booktitle = {Proceedings of the 37th International Conference on Machine Learning}, 
    pages = {5533--5543}, 
    year = {2020}, 
    editor = {Hal Daumé III and Aarti Singh}, 
    volume = {119}, 
    series = {Proceedings of Machine Learning Research},
    address = {Virtual}, 
    month = {13--18 Jul}, 
    publisher = {PMLR}, 
    pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},, 
    url = {http://proceedings.mlr.press/v119/kurtz20a.html}, 
    abstract = {Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.} 
}
Comments
  • YOLOv5 pruned_quant-aggressive_94 exception

    YOLOv5 pruned_quant-aggressive_94 exception

    Describe the bug I was trying to run demo code with YOLOv5 pruned_quant-aggressive_94 model on g4dn.x2large and encountered this exception.

    Stack trace

      | 2021-12-16T15:36:11.889+01:00 | Overwriting original model shape (640, 640) to (800, 800)
      | 2021-12-16T15:36:11.889+01:00 | Original model path: /mnt/pylot/unleash_models/yolov5_optimised/yolov5-s/pruned_quant-aggressive_94.onnx, new temporary model saved to /tmp/tmpd8kad_7r
      | 2021-12-16T15:36:11.890+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized) (system=avx512, binary=avx512)
      | 2021-12-16T15:36:13.559+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized)
      | 2021-12-16T15:36:13.559+01:00 | Date: 12-16-2021 @ 14:36:13 UTC
      | 2021-12-16T15:36:13.559+01:00 | OS: Linux ip-10-0-2-22.ap-southeast-2.compute.internal 4.14.173-137.229.amzn2.x86_64 #1 SMP Wed Apr 1 18:06:08 UTC 2020
      | 2021-12-16T15:36:13.559+01:00 | Arch: x86_64
      | 2021-12-16T15:36:13.559+01:00 | CPU: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
      | 2021-12-16T15:36:13.559+01:00 | Vendor: GenuineIntel
      | 2021-12-16T15:36:13.559+01:00 | Cores/sockets/threads: [4, 1, 8]
      | 2021-12-16T15:36:13.559+01:00 | Available cores/sockets/threads: [4, 1, 8]
      | 2021-12-16T15:36:13.559+01:00 | L1 cache size data/instruction: 32k/32k
      | 2021-12-16T15:36:13.559+01:00 | L2 cache size: 1Mb
      | 2021-12-16T15:36:13.559+01:00 | L3 cache size: 35.75Mb
      | 2021-12-16T15:36:13.559+01:00 | Total memory: 30.9605G
      | 2021-12-16T15:36:13.559+01:00 | Free memory: 14.6592G
      | 2021-12-16T15:36:13.559+01:00 | Assertion at ./src/include/wand/jit/pooling/common.hpp:239
      | 2021-12-16T15:36:13.559+01:00 | Backtrace:
      | 2021-12-16T15:36:13.560+01:00 | 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 1# wand::detail::assert_fail(char const*, char const*, int) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 2# 0x00007F4B71E55271 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 3# 0x00007F4B71E55125 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 4# 0x00007F4B71E554FD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 5# 0x00007F4B71E5A4E0 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 6# 0x00007F4B71E5A89A in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 7# 0x00007F4B71E5CDE8 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 8# 0x00007F4B7101F93B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 9# 0x00007F4B7101FAF9 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 10# 0x00007F4B7101B9D5 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 11# 0x00007F4B71042618 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 12# 0x00007F4B71042C91 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 13# 0x00007F4B71070667 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 14# 0x00007F4B70BFA76B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 15# 0x00007F4B70BEA8FC in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 16# 0x00007F4B70BD7A4F in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 17# 0x00007F4B71156499 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 18# 0x00007F4B70C0A3EF in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 19# 0x00007F4B70C28DCD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 20# 0x00007F4B70C28EF3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 21# 0x00007F4B70C295B3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 22# 0x00007F4B71FB8E10 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
      | 2021-12-16T15:36:13.560+01:00 | 23# 0x00007F4CFA2C06DB in /lib/x86_64-linux-gnu/libpthread.so.0
      | 2021-12-16T15:36:13.560+01:00 | Please email a copy of this stack trace and any additional information to: [email protected]
    

    Environment

    1. Ubuntu 18.04
    2. Python 3.8
    3. ML framework version(s)
    torch @ https://download.pytorch.org/whl/cu110/torch-1.7.1%2Bcu110-cp38-cp38-linux_x86_64.whl
    torchvision @ https://download.pytorch.org/whl/cu110/torchvision-0.8.2%2Bcu110-cp38-cp38-linux_x86_64.whl
    
    1. Other Python package versions
    sparseml==0.9.0
    sparsezoo==0.9.0
    numpy==1.21.4
    onnx==1.9.0
    onnxruntime==1.7.0
    

    Is there any chance you could help me out to debug that issue?

    bug 
    opened by SkalskiP 20
  • ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'

    ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'

    Describe the bug Trying to run the server-client example.

    Environment Include all relevant environment information: Ubuntu 18.04

    1. Python version : 3.8
    2. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:

    deepsparse 0.1.1

    1. CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

    {'vendor': 'GenuineIntel', 'isa': 'avx2', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 8, 'available_cores_per_socket': 8, 'threads_per_core': 1, 'available_threads_per_core': 1, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 12582912}

    To Reproduce from deepsparse.utils import arrays_to_bytes, bytes_to_arrays

    Errors Traceback (most recent call last): File "server.py", line 62, in from deepsparse.utils import arrays_to_bytes, bytes_to_arrays ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'

    bug 
    opened by adrianosantospb 13
  • Zero shot text classification pipeline

    Zero shot text classification pipeline

    Based off of https://discuss.huggingface.co/t/new-pipeline-for-zero-shot-text-classification/681

    Implements zero shot text classification pipeline. Batch size is equal to the number of sequences and currently only supports dynamic label passing, with static label support to come in a future PR. This implementation allows for future implementation of zero shot text classification based on models trained on classes other than mnli.

    example dynamic labels:

    zero_shot_text_classifier = Pipeline.create(
        task="zero_shot_text_classification",
        model_scheme="mnli",
        model_config={"hypothesis_template": "This text is related to {}"},     
        model_path="zoo:nlp/text_classification/distilbert-none/pytorch/"
                   "huggingface/mnli/pruned80_quant-none-vnni")
    
    sequence_to_classify = "Who are you voting for in 2020?"
    candidate_labels = ["Europe", "public health", "politics"]
    zero_shot_text_classifier(sequences=sequence_to_classify, labels=candidate_labels)
    >>> ZeroShotTextClassificationOutput(
        sequences='Who are you voting for in 2020?',
        labels=['politics', 'public health', 'Europe'],
        scores=[0.9073666334152222, 0.046810582280159, 0.04582275450229645])
    

    example static labels:

    zero_shot_text_classifier = Pipeline.create(
        task="zero_shot_text_classification",
        batch_size=3,
        model_scheme="mnli",
        model_config={"hypothesis_template": "This text is related to {}"},
        model_path="zoo:nlp/text_classification/distilbert-none/pytorch/"
                   "huggingface/mnli/pruned80_quant-none-vnni",
        labels=["politics", "Europe", "public health"]
    )
    
    sequence_to_classify = "Who are you voting for in 2020?"
    zero_shot_text_classifier(sequences=sequence_to_classify)
    >>> ZeroShotTextClassificationOutput(
        sequences='Who are you voting for in 2020?',
        labels=['politics', 'public health', 'Europe'],
        scores=[0.9073666334152222, 0.046810582280159, 0.04582275450229645])
    

    Evaluation results: Dataset sst2 Config multi_class = True hypothesis_template = "The sentiment of this text is {}"

    deepsparse.transformers.eval_downstream model_path -d sst2 --zero-shot True
    

    | model | accuracy (nm pipeline) | expected accuracy(hf pipeline) | | -------| ---------| ---------| | bert base | 0.823 | 0.823 | | 90% pruned bert | 0.779 | 0.779 |

    opened by mgoin 11
  • Converting onnx model to deepsparse

    Converting onnx model to deepsparse

    Hi, I'm trying to convert an onnx model to a deepsparse model, here is the code:

    from deepsparse import compile_model
    from deepsparse.utils import generate_random_inputs
    onnx_filepath = "fom.onnx"
    batch_size = 1
    
    # Generate random sample input
    inputs = generate_random_inputs(onnx_filepath, batch_size)
    
    # Compile and run
    engine = compile_model(onnx_filepath, batch_size)
    outputs = engine.run(inputs)
    
    **Environment**
    Include all relevant environment information:
    1. Ubuntu 18.04:
    2. Python version 3.7.9. :
    3. DeepSparse version 0.8.0 :
    4. torch 1.9.0+cu102:
    5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
    6. CPU {'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': True, 'num_sockets': 2, 'available_sockets': 2, 'cores_per_socket': 18, 'available_cores_per_socket': 18, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 25952256}
    
    **Errors**
    [     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #0 of shape = [1, 3, 256, 256]
    [     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #1 of shape = [1, 10, 2]
    [     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #2 of shape = [1, 10, 2]
    DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized) (system=avx512, binary=avx512)
    DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized)
    Date: 12-05-2021 @ 12:58:29 EST
    OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
    Arch: x86_64
    CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
    Vendor: GenuineIntel
    OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
    Arch: x86_64
    CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
    Vendor: GenuineIntel
    Cores/sockets/threads: [36, 2, 72]
    Available cores/sockets/threads: [36, 2, 72]
    L1 cache size data/instruction: 32k/32k
    L2 cache size: 1Mb
    L3 cache size: 24.75Mb
    Total memory: 507.367G
    Free memory: 22.4387G
    
    Assertion at ./src/include/wand/engine/compute/planner.hpp:131
    
    Backtrace:
    0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    1# 0x00007F36EB17C234 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    2# 0x00007F36EB185889 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    3# 0x00007F36EB185982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    4# 0x00007F36EB18AA8A in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    5# 0x00007F36EB18AB00 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    6# 0x00007F36EA7E985D in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    7# 0x00007F36EA7EE443 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    8# 0x00007F36EA76BD6B in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    9# 0x00007F36EA75AB3F in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    10# 0x00007F36EA75C1C1 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    11# 0x00007F36EADA9668 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    12# 0x00007F36EADAC0A2 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    13# 0x00007F36EADAF3B9 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    14# 0x00007F36EA73B76C in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    15# 0x00007F36EA7414C3 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    16# 0x00007F36EA6FB982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    17# 0x00007F36EA6FBC05 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
    18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, int, wand::safe_type<wand::parallel::use_current_affinity_tag, bool>, std::shared_ptr<wand::parallel::scheduler_factory_t>) in /data/lib/python3.7/site-packages/deepsparse/avx512/libdeepsparse.so
    19# 0x00007F3771031D1B in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
    20# 0x00007F3771031F39 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
    21# 0x00007F377105D5C5 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
    22# 0x00007F377104B250 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
    23# _PyMethodDef_RawFastCallDict in python
    
    Please email a copy of this stack trace and any additional information to: [email protected]
    Aborted
    

    Do you have any ideas why the code is failing?

    bug 
    opened by joaanna 11
  • [BugFix] Server Computer Vision `from_file` fixes

    [BugFix] Server Computer Vision `from_file` fixes

    This PR is now a parent PR for 3 different bugs across deepsparse server The from_file bugfix was tested locally for all 3 CV Pipelines.

    There is a bug in the current implementation of deepsparse.server, when files are sent by client (in CV tasks) the server wasn't actually reading in the files sent over the network, but was looking for local files(on the server machine) with the same name, if such a file existed it would use that for inference and return the result thus giving an illusion that everything worked as intended when it actually didn't, on the other hand if a local file with the same name wasn't found a FileNotFoundError was thrown on the server side (Which should not happen cause the file is not intended to be on the server machine) as follows:

    FileNotFoundError: [Errno 2] No such file or directory: 'buddy.jpeg'
    

    This current PR fixes this, The changes are two fold:

    1. Changes were made on the server side to rely on actual filepointers rather than filenames
    2. The from_files factory method for all CV Schemas was updated to accept an Iterable of FilePointers rather than a List[str], List --> Iterable change was made to make the function depend on behavior rather than the actual type; str --> TextIO change was made to accept File Pointers, TextIO is a generic typing module for File Pointers
    3. Now we rely on PIL.Image.open(...) function to actually read in the contents from the file pointer; this library in included with torchvision(necessary requirements for all CV tasks) thus does not require any additional installation steps, or additional auto-installation code.

    This bug was found while fixing another bug in the documentation as reported by @dbarbuzzi, the docs did not pass in an input with correct batch size, that bug was also fixed as a part of this Pull request

    Testing:

    Note: Please Test with an image that is not in the same directory from where the server is run, to actually check if the bug was fixed

    Step 1: checkout this branch git checkout server-file-fixes Step 2: Start the server

    deepsparse.server \
        --task image_classification \
        --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none" \
        --port 5543
    

    Step 3: Use the following code to make requests, the returned status must be [200], An HTTP status code 200 means success, change the image path accordingly in the following code:

    import requests
    
    url = 'http://0.0.0.0:5543/predict/from_files'
    image_path = "/home/dummy-data/buddy.jpeg"
    path = [
        image_path,
    ]
    files = [('request', open(img, 'rb')) for img in path]
    resp = requests.post(url=url, files=files)
    print(resp)
    

    Also fixes the following issue thanks to @dbogunowicz

    bug 
    opened by rahul-tuli 9
  • Failed building wheel for deepsparse and onnx

    Failed building wheel for deepsparse and onnx

    Describe the bug Failed building wheel for deepsparse

    Environment Include all relevant environment information:

    1. OS [e.g. Ubuntu 18.04]: Ubuntu 18.04 via WSL on Windows 10
    2. Python version [e.g. 3.7]: Python 3.6.9 (default, Jun 29 2022, 11:45:57)

    To Reproduce pip install --upgrade deepsparse

    Errors

    Collecting deepsparse
      Using cached https://files.pythonhosted.org/packages/c9/38/442bcc9403aaf0dd082e23397a8a5e5ca43b5856058cffa0f5449c8f8a5c/deepsparse-1.1.0.tar.gz
    Requirement already up-to-date: click~=8.0.0 in ./deepsparse/lib/python3.6/site-packages (from deepsparse)
    Requirement already up-to-date: numpy>=1.16.3 in ./deepsparse/lib/python3.6/site-packages (from deepsparse)
    Collecting onnx<=1.12.0,>=1.5.0 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/2c/6a/39b0580858589a67c3322aabc2634f158391ffbf98fa410127533e7f1495/onnx-1.12.0.tar.gz
    Requirement already up-to-date: protobuf<4,>=3.12.2 in ./deepsparse/lib/python3.6/site-packages (from deepsparse)
    Collecting pydantic>=1.8.2 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/fe/27/0de772dcd0517770b265dbc3998ed3ee3aa2ba25ba67e3685116cbbbccc6/pydantic-1.9.2-py3-none-any.whl
    Collecting requests>=2.0.0 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/2d/61/08076519c80041bc0ffa1a8af0cbd3bf3e2b62af10435d269a9d0f40564d/requests-2.27.1-py2.py3-none-any.whl
    Collecting sparsezoo~=1.1.0 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/10/aa/147378c7d961986cafdcd2c6ea5461bfc2078b2337584d1100083a3aaa6c/sparsezoo-1.1.0-py3-none-any.whl
    Collecting tqdm>=4.0.0 (from deepsparse)
      Using cached https://files.pythonhosted.org/packages/47/bb/849011636c4da2e44f1253cd927cfb20ada4374d8b3a4e425416e84900cc/tqdm-4.64.1-py2.py3-none-any.whl
    Requirement already up-to-date: importlib-metadata; python_version < "3.8" in ./deepsparse/lib/python3.6/site-packages (from click~=8.0.0->deepsparse)
    Requirement already up-to-date: typing-extensions>=3.6.2.1 in ./deepsparse/lib/python3.6/site-packages (from onnx<=1.12.0,>=1.5.0->deepsparse)
    Collecting dataclasses>=0.6; python_version < "3.7" (from pydantic>=1.8.2->deepsparse)
      Using cached https://files.pythonhosted.org/packages/fe/ca/75fac5856ab5cfa51bbbcefa250182e50441074fdc3f803f6e76451fab43/dataclasses-0.8-py3-none-any.whl
    Collecting urllib3<1.27,>=1.21.1 (from requests>=2.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/6f/de/5be2e3eed8426f871b170663333a0f627fc2924cc386cd41be065e7ea870/urllib3-1.26.12-py2.py3-none-any.whl
    Collecting charset-normalizer~=2.0.0; python_version >= "3" (from requests>=2.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/06/b3/24afc8868eba069a7f03650ac750a778862dc34941a4bebeb58706715726/charset_normalizer-2.0.12-py3-none-any.whl
    Collecting certifi>=2017.4.17 (from requests>=2.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/1d/38/fa96a426e0c0e68aabc68e896584b83ad1eec779265a028e156ce509630e/certifi-2022.9.24-py3-none-any.whl
    Collecting idna<4,>=2.5; python_version >= "3" (from requests>=2.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/fc/34/3030de6f1370931b9dbb4dad48f6ab1015ab1d32447850b9fc94e60097be/idna-3.4-py3-none-any.whl
    Collecting pyyaml>=5.1.0 (from sparsezoo~=1.1.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/b3/85/79b9e5b4e8d3c0ac657f4e8617713cca8408f6cdc65d2ee6554217cedff1/PyYAML-6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
    Collecting importlib-resources; python_version < "3.7" (from tqdm>=4.0.0->deepsparse)
      Using cached https://files.pythonhosted.org/packages/24/1b/33e489669a94da3ef4562938cd306e8fa915e13939d7b8277cb5569cb405/importlib_resources-5.4.0-py3-none-any.whl
    Requirement already up-to-date: zipp>=0.5 in ./deepsparse/lib/python3.6/site-packages (from importlib-metadata; python_version < "3.8"->click~=8.0.0->deepsparse)
    Building wheels for collected packages: deepsparse, onnx
      Running setup.py bdist_wheel for deepsparse ... error
      Complete output from command /root/deepsparse/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-9usltruh/deepsparse/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpq080et7upip-wheel- --python-tag cp36:
      Loaded version 1.1.0 from /tmp/pip-build-9usltruh/deepsparse/src/deepsparse/generated_version.py
      Checking to see if /tmp/pip-build-9usltruh/deepsparse/src/deepsparse/arch.bin exists.. True
      /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
        warnings.warn(msg)
      usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
         or: -c --help [cmd1 cmd2 ...]
         or: -c --help-commands
         or: -c cmd --help
    
      error: invalid command 'bdist_wheel'
    
      ----------------------------------------
      Failed building wheel for deepsparse
      Running setup.py clean for deepsparse
      Running setup.py bdist_wheel for onnx ... error
      Complete output from command /root/deepsparse/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-9usltruh/onnx/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpcl3nl8sspip-wheel- --python-tag cp36:
      fatal: Kein Git-Repository (oder irgendeines der Elternverzeichnisse): .git
      /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
        warnings.warn(msg)
      usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
         or: -c --help [cmd1 cmd2 ...]
         or: -c --help-commands
         or: -c cmd --help
    
      error: invalid command 'bdist_wheel'
    
      ----------------------------------------
      Failed building wheel for onnx
      Running setup.py clean for onnx
    Failed to build deepsparse onnx
    Installing collected packages: onnx, dataclasses, pydantic, urllib3, charset-normalizer, certifi, idna, requests, importlib-resources, tqdm, pyyaml, sparsezoo, deepsparse
      Running setup.py install for onnx ... -^canceled
    ^COperation cancelled by user
    

    @mgoin @KSGulin @shubhra @Willtor @dbarbuzzi @tlrmchlsmth

    bug 
    opened by ErfolgreichCharismatisch 8
  • Huggingface base Wav2Vec2 model crashing

    Huggingface base Wav2Vec2 model crashing

    Describe the bug Hello,

    I am trying to compile the onnx-converted model of a sparse Huggingface base Wav2Vec2 model (where sparsity was obtained via unstructured magnitude pruning) through compile_model :

    dse_network = compile_model(onnx_filepath, batch_size=batch_size, num_cores=1, num_streams=1)

    My kernel crashed and I received the following message:

    Backtrace: 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 1# 0x00007FFB125A27C4 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 2# 0x00007FFB125A8906 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 3# 0x00007FFB125A89F2 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 4# 0x00007FFB125B12FA in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 5# 0x00007FFB125B1370 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 6# 0x00007FFB11B1F76D in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 7# 0x00007FFB11B25BCF in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 8# 0x00007FFB11A92015 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 9# 0x00007FFB11A81939 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 10# 0x00007FFB11A82AF1 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 11# 0x00007FFB1213F938 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 12# 0x00007FFB121423B3 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 13# 0x00007FFB121456B9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 14# 0x00007FFB11A6312B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 15# 0x00007FFB11A6B3CE in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 16# 0x00007FFB11A11C1A in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 17# 0x00007FFB11A11ED5 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::shared_ptrwand::parallel::scheduler_factory_t) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libdeepsparse.so 19# 0x00007FFBE3641649 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 20# 0x00007FFBE364184B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 21# 0x00007FFBE36788B6 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 22# 0x00007FFBE364B0F9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 23# 0x0000561F0FD79B66 in /opt/conda/bin/python

    Please email a copy of this stack trace and any additional information to: [email protected]

    Environment Include all relevant environment information:

    1. OS : Ubuntu 18.04.5 LTS
    2. Python version [e.g. 3.7]: Python 3.9.4
    3. DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 1.0.2
    4. ML framework version(s) [e.g. torch 1.7.1]: torch 1.11.0
    5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: onnxruntime 1.12.0, onnx 1.12.0,
    6. CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:
    >>> import deepsparse.cpu
    >>> print(deepsparse.cpu.cpu_architecture())
    

    {'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 31719424, 'architecture': 'x86_64', 'available_cores_per_socket': 19, 'available_num_cores': 38, 'available_num_hw_threads': 76, 'available_num_numa': 2, 'available_num_sockets': 2, 'available_sockets': 2, 'available_threads_per_core': 2, 'cores_per_socket': 19, 'isa': 'avx512', 'num_cores': 38, 'num_hw_threads': 76, 'num_numa': 2, 'num_sockets': 2, 'threads_per_core': 2, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz', 'vnni': False}

    Would you please have any solution? Thank you

    bug 
    opened by Tim-blo 8
  • Cannot import deepsparse from WSL: cannot get cpu topology

    Cannot import deepsparse from WSL: cannot get cpu topology

    Describe the bug

    For testing purposes, I want to try if my code works on Windows Subsystem for Linux (WSL2). I'm using Ubuntu 18.04LTS.

    Once on Ubuntu on WSL, I create a new python virtual env, then pip install deepsparse.

    After that, while trying to import deepsparse I get:

    >>> import deepsparse
    arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
    Traceback (most recent call last):
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
        info_str = subprocess.check_output(file_path).decode("utf-8")
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
        return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
     
    During handling of the above exception, another exception occurred:
     
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
        from .engine import *
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
        from deepsparse.lib import init_deepsparse_lib
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
        CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
        arch = cpu_architecture()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
        arch = _parse_arch_bin()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
        self.memo[args] = self.f(*args)
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
        raise OSError(
    OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
    

    Expected behavior

    Maybe it should work on WSL :)

    Environment Include all relevant environment information:

    1. OS [e.g. Ubuntu 18.04]: Ubuntu 18.04LTS (On Windows 10, WSL2)
    2. Python version [e.g. 3.7]: 3.9
    3. DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.8.0
    4. ML framework version(s) [e.g. torch 1.7.1]: 1.10.0
    5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
    6. CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

    This is basically what's not working

    To Reproduce Exact steps to reproduce the behavior:

    1. On windows 10, activate WSL
    2. Install Ubuntu 18.04 from microsoft store
    3. On Ubuntu, create virtual env (I personnaly use mamba or conda)
    4. pip install deepsparse
    5. import deepsparse

    Errors If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

    >>> import deepsparse
    arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
    Traceback (most recent call last):
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
        info_str = subprocess.check_output(file_path).decode("utf-8")
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
        return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
     
    During handling of the above exception, another exception occurred:
     
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
        from .engine import *
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
        from deepsparse.lib import init_deepsparse_lib
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
        CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
        arch = cpu_architecture()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
        arch = _parse_arch_bin()
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
        self.memo[args] = self.f(*args)
      File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
        raise OSError(
    OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
    

    Additional context Add any other context about the problem here. Also include any relevant files.

    bug 
    opened by clementpoiret 8
  • Does a C API exist for deepsparse or is this python only and are all benchmarks via python?

    Does a C API exist for deepsparse or is this python only and are all benchmarks via python?

    Just a quick question. Is it possible to use deepsparse for inference directly in other languages e.g. C++, C# or similar? Or is all code written in python?

    enhancement 
    opened by nietras 8
  • getting low fps  & inference issue

    getting low fps & inference issue

    1.i used this repo https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo & this command !python annotate.py
    zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94
    --source "/content/loc1min.mp4"
    --quantized-inputs
    --image-shape 416 416
    --save-dir '/content/ops/'
    --model-config '/content/coco128.yaml'
    --device 'cpu'

    & im getting low fps on cpu, (yolov5s model) its normal fps or should we get 50-60 fps? bcz you have mentioned that model will be 10x faster. but its very less.

    image
    1. i trained model usiing sparsml repo on coco128 data for 40 epochs & converted .pth model into onnx & tried same inference script !python annotate.py
      /content/sparseml/integrations/ultralytics-yolov5/yolov5/runs/train/exp2/weights/best.onnx
      --source "/content/loc1min.mp4"
      --quantized-inputs
      --image-shape 416 416
      --save-dir '/content/ops/'
      --model-config '/content/coco128.yaml' --device 'cpu' & getting this issue.
    image

    What's wrong here? my goal is to use a custom data train model with sparceml & do inference using deepspare.

    bug 
    opened by akashAD98 7
  • no better speed on yolo quant

    no better speed on yolo quant

    Hi! How is it going?

    At first ,thanks for your good repo and helping to make better and faster model. I use your yolo example for getting better speed, and I compare base, pruned and quant models as you said. but all result were aproximatly same . there is no vnni warning, and my server is ubuntu 18 my code is:

    import os

    models :

    yolov5s_base = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none"

    yolov5s_pruned ="zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96"

    yolov5s_pruned_quant = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94"

    source_img = "img.bmp"

    print("\n base inference:\n") bash_cmd = f"python annotate.py {yolov5s_base} --source {source_img} --image-shape 640 640 " os.system(bash_cmd)

    print("\n pruned inference:\n") bash_cmd = f"python annotate.py {yolov5s_pruned } --source {source_img} --image-shape 640 640 " os.system(bash_cmd)

    print("\n pruned_quant inference:\n") bash_cmd = f"python annotate.py {yolov5s_pruned_quant} --source {source_img} --quantized-inputs --image-shape 640 640 " os.system(bash_cmd)

    when I run this code in bash script, I get this results:

    base inference:

    2022-03-08 20:28:15 main INFO Results will be saved to annotation_results/deepsparse-annotations-8 model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none downloaded to /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx 2022-03-08 20:28:17 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512) 2022-03-08 20:28:18 main INFO Inference 0 processed in 128.20696830749512 ms 2022-03-08 20:28:18 main INFO Results saved to annotation_results/deepsparse-annotations-8

    pruned inference:

    2022-03-08 20:28:19 main INFO Results will be saved to annotation_results/deepsparse-annotations-9 model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 downloaded to /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx 2022-03-08 20:28:21 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512) 2022-03-08 20:28:23 main INFO Inference 0 processed in 124.91464614868164 ms 2022-03-08 20:28:23 main INFO Results saved to annotation_results/deepsparse-annotations-9

    pruned_quant inference:

    2022-03-08 20:28:24 main INFO Results will be saved to annotation_results/deepsparse-annotations-10 model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 downloaded to /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx 2022-03-08 20:28:26 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512) 2022-03-08 20:28:28 main INFO Inference 0 processed in 114.76516723632812 ms 2022-03-08 20:28:28 main INFO Results saved to annotation_results/deepsparse-annotations-10

    as you see quant pruned has no more speed ! pls guide me to get faster result thanks!

    opened by RasoulZamani 7
  • How does the quantized op infered?

    How does the quantized op infered?

    Hello, just out of curiosity. How does the quantized int conv op infered? It wasn't supported in onnxruntime I think? Not even a standared onnx op.

    How does it infered? 图片

    documentation 
    opened by jinfagang 2
  • [Fix] Update the code for handling ragged numpy arrays in numpy >= 1.24.0

    [Fix] Update the code for handling ragged numpy arrays in numpy >= 1.24.0

    Response to: https://github.com/neuralmagic/deepsparse/issues/825

    Since NumPy version 19.0, one must specify dtype=object when creating an array from "ragged" sequences, otherwise, one receives a warning:

    VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
    

    Starting NumpyPy version 1.24.0, this warning turns into an explicit error:

    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (128,) + inhomogeneous part.
    

    This PR adds the code change necessary to remove the error. Afaik, only transformers QA code requires the update.

    This fix should also be backward compatible at least dating back to 1.19 (in our requirements we honor numpy>=1.16.3, maybe it is time to bump this version up?)

    opened by dbogunowicz 1
  • Broken Transformers QA Inference Pipeline

    Broken Transformers QA Inference Pipeline

    Describe the bug

    Transformers QA pipeline fails on a simple inference task.

    Expected behavior The inference pipeline for Question Answering should work without raising any errors.

    Environment Python version: 3.8 DeepSparse version: current main

    To Reproduce

    from deepsparse import Pipeline
    
    task = "question-answering"
    dense_qa_pipeline = Pipeline.create(
            task=task,
            model_path="zoo:nlp/question_answering/distilbert-none/pytorch/huggingface/squad/base-none",
            # or model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none",
            # was checking whether the problem is not model-dependent
        )
    
    question = "DeepSparse is sparsity-aware inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application"
    q_context = "What is DeepSparse?"
    
    dense_qa_pipeline(question=question, context=q_context)
    

    Errors

    None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
    DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.3.0.20221217 COMMUNITY | (d5bf112b) (release) (optimized) (system=avx2, binary=avx2)
    Traceback (most recent call last):
      File "/usr/lib/python3.8/code.py", line 90, in runcode
        exec(code, self.locals)
      File "<input>", line 1, in <module>
      File "/home/ubuntu/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
        pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
      File "/home/ubuntu/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
        exec(compile(contents+"\n", file, 'exec'), glob, loc)
      File "/home/ubuntu/damian/deepsparse_copy/hehe.py", line 14, in <module>
        dense_output = dense_qa_pipeline(question=question, context=q_context)
      File "/home/ubuntu/damian/deepsparse_copy/src/deepsparse/pipeline.py", line 217, in __call__
        engine_inputs: List[numpy.ndarray] = self.process_inputs(pipeline_inputs)
      File "/home/ubuntu/damian/deepsparse_copy/src/deepsparse/transformers/pipelines/question_answering.py", line 261, in process_inputs
        {
      File "/home/ubuntu/damian/deepsparse_copy/src/deepsparse/transformers/pipelines/question_answering.py", line 262, in <dictcomp>
        key: numpy.array(tokenized_example[key][span])
    ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (128,) + inhomogeneous part.
    

    Additional context The error occurs in https://github.com/neuralmagic/deepsparse/blob/main/src/deepsparse/transformers/pipelines/question_answering.py#L260.

    When attempting to perform dictionary comprehension:

    {
                        key: numpy.array(tokenized_example[key][span])
                        for key in tokenized_example.keys()
                        if key not in self.onnx_input_names
                    }
    

    Here: self.onnx_input_names = ['input_ids', 'attention_mask', 'token_type_ids'] tokenized_example.keys() = ['input_ids', 'token_type_ids', 'attention_mask', 'special_tokens_mask', 'offset_mapping', 'overflow_to_sample_mapping', 'example_id']

    As a result, we end up iterating over the list difference. One element of this resulting list, offset_mapping is the culprit:

    [tokenized_example[key][0] for key in['offset_mapping']]`
    

    results in : image

    Calling numpy.array(...) on this data structure envokes the error in question.

    Interestingly, when @mwitiderrick attempted to reproduce an error inside the collab notebook (not using the main, but the last release), the problem disappears: https://colab.research.google.com/drive/1aIrITYxgcR-5VmL4vm8P-6H4rvCBAeaX?usp=sharing However, it reappears (on the last release) when he attempted to run transformers QA pipeline in HF/Gradio: https://huggingface.co/spaces/neuralmagic/question-answering/blob/main/app.py

    bug 
    opened by dbogunowicz 0
  • Quantization and pruning for yolov7

    Quantization and pruning for yolov7

    I would like to perform quantization aware training for a custom object detector using yolov7 architecture. Could you please let me know if the functionality developed by deepsparse for yolov5 can be used straightway or what modifications do I need to make for me to use it for yolov7? Any leads would be appreciated. Thanks

    enhancement 
    opened by Sri20021 0
Releases(v1.3.0)
Owner
Neural Magic
Neural Magic
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 8 Oct 3, 2022
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

null 943 Jan 7, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021
Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Differentiable Neural Computers and family, for Pytorch Includes: Differentiable Neural Computers (DNC) Sparse Access Memory (SAM) Sparse Differentiab

ixaxaar 302 Dec 14, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
A modular, research-friendly framework for high-performance and inference of sequence models at many scales

T5X T5X is a modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of

Google Research 1.1k Jan 8, 2023
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Marco 3 Feb 8, 2022
Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

PyOpenVINO - An Experimental Python Implementation of OpenVINO Inference Engine (minimum-set) Description The PyOpenVINO is a spin-off product from my

Yasunori Shimura 7 Oct 31, 2022
Bytedance Inc. 2.5k Jan 6, 2023
A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

Karoo GP Karoo GP is an evolutionary algorithm, a genetic programming application suite written in Python which supports both symbolic regression and

Kai Staats 149 Jan 9, 2023
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022
A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

Manas Sharma 19 Feb 28, 2022
BlockUnexpectedPackets - Preventing BungeeCord CPU overload due to Layer 7 DDoS attacks by scanning BungeeCord's logs

BlockUnexpectedPackets This script automatically blocks DDoS attacks that are sp

SparklyPower 3 Mar 31, 2022
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
Deep Learning Models for Causal Inference

Extensive tutorials for learning how to build deep learning models for causal inference using selection on observables in Tensorflow 2.

Bernard  J Koch 151 Dec 31, 2022
Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Music Source Separation with Channel-wise Subband Phase Aware ResUnet (CWS-PResUNet) Introduction This repo contains the pretrained Music Source Separ

Lau 100 Dec 25, 2022