CPU inference engine that delivers unprecedented performance for sparse models

Neural Magic

Last update: Jan 9, 2023

Related tags

Deep Learning deepsparse

Overview

DeepSparse Engine

CPU inference engine that delivers unprecedented performance for sparse models

Overview

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads. It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.

This repository includes package APIs along with examples to quickly get started learning about and actually running sparse models.

Compatibility

The DeepSparse Engine ingests models in the ONNX format, allowing for compatibility with PyTorch, TensorFlow, Keras, and many other frameworks that support it. This reduces the extra work of preparing your trained model for inference to just one step of exporting.

Quick Tour

To expedite inference and benchmarking on real models, we include the sparsezoo package. SparseZoo hosts inference optimized models, trained on repeatable optimization recipes using state-of-the-art techniques from SparseML.

Quickstart with SparseZoo ONNX Models

MobileNetV1 Dense

Here is how to quickly perform inference with DeepSparse Engine on a pre-trained dense MobileNetV1 from SparseZoo.

from deepsparse import compile_model
from sparsezoo.models import classification
batch_size = 64

# Download model and compile as optimized executable for your machine
model = classification.mobilenet_v1()
engine = compile_model(model, batch_size=batch_size)

# Fetch sample input and predict output using engine
inputs = model.data_inputs.sample_batch(batch_size=batch_size)
outputs, inference_time = engine.timed_run(inputs)

MobileNetV1 Optimized

When exploring available optimized models, you can use the Zoo.search_optimized_models utility to find models that share a base.

Let us try this on the dense MobileNetV1 to see what is available.

from sparsezoo import Zoo
from sparsezoo.models import classification
print(Zoo.search_optimized_models(classification.mobilenet_v1()))

Output:

[Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/base-none),
 Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-conservative),
 Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-moderate),
 Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned_quant-moderate)]

Great. We can see there are two pruned versions targeting FP32, conservative at 100% and moderate at >= 99% of baseline accuracy. There is also a pruned_quant variant targetting INT8.

Let's say you want to evaluate best performance on FP32 and are okay with a small drop in accuracy, so we can choose pruned-moderate over pruned-conservative.

from deepsparse import compile_model
from sparsezoo.models import classification
batch_size = 64

model = classification.mobilenet_v1(optim_name="pruned", optim_category="moderate")
engine = compile_model(model, batch_size=batch_size)

inputs = model.data_inputs.sample_batch(batch_size=batch_size)
outputs, inference_time = engine.timed_run(inputs)

Quickstart with custom ONNX models

We accept ONNX files for custom models, too. Simply plug in your model to compare performance with other solutions.

> wget https://github.com/onnx/models/raw/master/vision/classification/mobilenet/model/mobilenetv2-7.onnx
Saving to: ‘mobilenetv2-7.onnx’

from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs
onnx_filepath = "mobilenetv2-7.onnx"
batch_size = 16

# Generate random sample input
inputs = generate_random_inputs(onnx_filepath, batch_size)

# Compile and run
engine = compile_model(onnx_filepath, batch_size)
outputs = engine.run(inputs)

For a more in-depth read on available APIs and workflows, check out the examples and DeepSparse Engine documentation.

Hardware Support

The DeepSparse Engine is validated to work on x86 Intel and AMD CPUs running Linux operating systems.

It is highly recommended to run on a CPU with AVX-512 instructions available for optimal algorithms to be enabled.

Here is a table detailing specific support for some algorithms over different microarchitectures:

x86 Extension	Microarchitectures	Activation Sparsity	Kernel Sparsity	Sparse Quantization
AMD AVX2	Zen 2, Zen 3	not supported	optimized	not supported
Intel AVX2	Haswell, Broadwell, and newer	not supported	optimized	not supported
Intel AVX-512	Skylake, Cannon Lake, and newer	optimized	optimized	emulated
Intel AVX-512 VNNI (DL Boost)	Cascade Lake, Ice Lake, Cooper Lake, Tiger Lake	optimized	optimized	optimized

Installation

This repository is tested on Python 3.6+, and ONNX 1.5.0+. It is recommended to install in a virtual environment to keep your system in order.

Install with pip using:

pip install deepsparse

Then if you want to explore the examples, clone the repository and any install additional dependencies found in example folders.

Notebooks

For some step-by-step examples, we have Jupyter notebooks showing how to compile models with the DeepSparse Engine, check the predictions for accuracy, and benchmark them on your hardware.

Available Models and Recipes

A number of pre-trained baseline and recalibrated models models in the SparseZoo can be used with the engine for higher performance. The types available for each model architecture are noted in its SparseZoo model repository listing.

Resources and Learning More

DeepSparse Engine Documentation, Notebooks, Examples
DeepSparse API
Debugging and Optimizing Performance
SparseML Documentation
Sparsify Documentation
SparseZoo Documentation
Neural Magic Blog, Resources, Website

Contributing

We appreciate contributions to the code, examples, and documentation as well as bug reports and feature requests! Learn how here.

Join the Community

For user help or questions about the DeepSparse Engine, use our GitHub Discussions. Everyone is welcome!

You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by subscribing to the Neural Magic community.

For more general questions about Neural Magic, please email us at [email protected] or fill out this form.

License

The project's binary containing the DeepSparse Engine is licensed under the Neural Magic Engine License.

Example files and scripts included in this repository are licensed under the Apache License Version 2.0 as noted.

Release History

Official builds are hosted on PyPi

stable: deepsparse
nightly (dev): deepsparse-nightly

Track this project via GitHub Releases.

Citation

Find this project useful in your research or other communications? Please consider citing Neural Magic's paper:

@inproceedings{pmlr-v119-kurtz20a, 
    title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, 
    author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, 
    booktitle = {Proceedings of the 37th International Conference on Machine Learning}, 
    pages = {5533--5543}, 
    year = {2020}, 
    editor = {Hal Daumé III and Aarti Singh}, 
    volume = {119}, 
    series = {Proceedings of Machine Learning Research},
    address = {Virtual}, 
    month = {13--18 Jul}, 
    publisher = {PMLR}, 
    pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},, 
    url = {http://proceedings.mlr.press/v119/kurtz20a.html}, 
    abstract = {Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.} 
}

Comments

YOLOv5 pruned_quant-aggressive_94 exception

Describe the bug I was trying to run demo code with YOLOv5 pruned_quant-aggressive_94 model on g4dn.x2large and encountered this exception.

Stack trace

  | 2021-12-16T15:36:11.889+01:00 | Overwriting original model shape (640, 640) to (800, 800)
  | 2021-12-16T15:36:11.889+01:00 | Original model path: /mnt/pylot/unleash_models/yolov5_optimised/yolov5-s/pruned_quant-aggressive_94.onnx, new temporary model saved to /tmp/tmpd8kad_7r
  | 2021-12-16T15:36:11.890+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized) (system=avx512, binary=avx512)
  | 2021-12-16T15:36:13.559+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized)
  | 2021-12-16T15:36:13.559+01:00 | Date: 12-16-2021 @ 14:36:13 UTC
  | 2021-12-16T15:36:13.559+01:00 | OS: Linux ip-10-0-2-22.ap-southeast-2.compute.internal 4.14.173-137.229.amzn2.x86_64 #1 SMP Wed Apr 1 18:06:08 UTC 2020
  | 2021-12-16T15:36:13.559+01:00 | Arch: x86_64
  | 2021-12-16T15:36:13.559+01:00 | CPU: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
  | 2021-12-16T15:36:13.559+01:00 | Vendor: GenuineIntel
  | 2021-12-16T15:36:13.559+01:00 | Cores/sockets/threads: [4, 1, 8]
  | 2021-12-16T15:36:13.559+01:00 | Available cores/sockets/threads: [4, 1, 8]
  | 2021-12-16T15:36:13.559+01:00 | L1 cache size data/instruction: 32k/32k
  | 2021-12-16T15:36:13.559+01:00 | L2 cache size: 1Mb
  | 2021-12-16T15:36:13.559+01:00 | L3 cache size: 35.75Mb
  | 2021-12-16T15:36:13.559+01:00 | Total memory: 30.9605G
  | 2021-12-16T15:36:13.559+01:00 | Free memory: 14.6592G
  | 2021-12-16T15:36:13.559+01:00 | Assertion at ./src/include/wand/jit/pooling/common.hpp:239
  | 2021-12-16T15:36:13.559+01:00 | Backtrace:
  | 2021-12-16T15:36:13.560+01:00 | 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 1# wand::detail::assert_fail(char const*, char const*, int) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 2# 0x00007F4B71E55271 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 3# 0x00007F4B71E55125 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 4# 0x00007F4B71E554FD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 5# 0x00007F4B71E5A4E0 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 6# 0x00007F4B71E5A89A in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 7# 0x00007F4B71E5CDE8 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 8# 0x00007F4B7101F93B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 9# 0x00007F4B7101FAF9 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 10# 0x00007F4B7101B9D5 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 11# 0x00007F4B71042618 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 12# 0x00007F4B71042C91 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 13# 0x00007F4B71070667 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 14# 0x00007F4B70BFA76B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 15# 0x00007F4B70BEA8FC in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 16# 0x00007F4B70BD7A4F in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 17# 0x00007F4B71156499 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 18# 0x00007F4B70C0A3EF in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 19# 0x00007F4B70C28DCD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 20# 0x00007F4B70C28EF3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 21# 0x00007F4B70C295B3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 22# 0x00007F4B71FB8E10 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 23# 0x00007F4CFA2C06DB in /lib/x86_64-linux-gnu/libpthread.so.0
  | 2021-12-16T15:36:13.560+01:00 | Please email a copy of this stack trace and any additional information to: [email protected]

Environment

Ubuntu 18.04
Python 3.8
ML framework version(s)

torch @ https://download.pytorch.org/whl/cu110/torch-1.7.1%2Bcu110-cp38-cp38-linux_x86_64.whl
torchvision @ https://download.pytorch.org/whl/cu110/torchvision-0.8.2%2Bcu110-cp38-cp38-linux_x86_64.whl

Other Python package versions

sparseml==0.9.0
sparsezoo==0.9.0
numpy==1.21.4
onnx==1.9.0
onnxruntime==1.7.0

Is there any chance you could help me out to debug that issue?

bug

opened by SkalskiP 20

ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'
Describe the bug Trying to run the server-client example.

Environment Include all relevant environment information: Ubuntu 18.04

Python version : 3.8

Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:

deepsparse 0.1.1

CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

{'vendor': 'GenuineIntel', 'isa': 'avx2', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 8, 'available_cores_per_socket': 8, 'threads_per_core': 1, 'available_threads_per_core': 1, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 12582912}

To Reproduce from deepsparse.utils import arrays_to_bytes, bytes_to_arrays

Errors Traceback (most recent call last): File "server.py", line 62, in from deepsparse.utils import arrays_to_bytes, bytes_to_arrays ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'
bug
opened by adrianosantospb 13

Zero shot text classification pipeline

Based off of https://discuss.huggingface.co/t/new-pipeline-for-zero-shot-text-classification/681

Implements zero shot text classification pipeline. Batch size is equal to the number of sequences and currently only supports dynamic label passing, with static label support to come in a future PR. This implementation allows for future implementation of zero shot text classification based on models trained on classes other than mnli.

example dynamic labels:

zero_shot_text_classifier = Pipeline.create(
    task="zero_shot_text_classification",
    model_scheme="mnli",
    model_config={"hypothesis_template": "This text is related to {}"},     
    model_path="zoo:nlp/text_classification/distilbert-none/pytorch/"
               "huggingface/mnli/pruned80_quant-none-vnni")

sequence_to_classify = "Who are you voting for in 2020?"
candidate_labels = ["Europe", "public health", "politics"]
zero_shot_text_classifier(sequences=sequence_to_classify, labels=candidate_labels)
>>> ZeroShotTextClassificationOutput(
    sequences='Who are you voting for in 2020?',
    labels=['politics', 'public health', 'Europe'],
    scores=[0.9073666334152222, 0.046810582280159, 0.04582275450229645])

example static labels:

zero_shot_text_classifier = Pipeline.create(
    task="zero_shot_text_classification",
    batch_size=3,
    model_scheme="mnli",
    model_config={"hypothesis_template": "This text is related to {}"},
    model_path="zoo:nlp/text_classification/distilbert-none/pytorch/"
               "huggingface/mnli/pruned80_quant-none-vnni",
    labels=["politics", "Europe", "public health"]
)

sequence_to_classify = "Who are you voting for in 2020?"
zero_shot_text_classifier(sequences=sequence_to_classify)
>>> ZeroShotTextClassificationOutput(
    sequences='Who are you voting for in 2020?',
    labels=['politics', 'public health', 'Europe'],
    scores=[0.9073666334152222, 0.046810582280159, 0.04582275450229645])

Evaluation results: Dataset sst2 Config multi_class = True hypothesis_template = "The sentiment of this text is {}"

deepsparse.transformers.eval_downstream model_path -d sst2 --zero-shot True

| model | accuracy (nm pipeline) | expected accuracy(hf pipeline) | | -------| ---------| ---------| | bert base | 0.823 | 0.823 | | 90% pruned bert | 0.779 | 0.779 |

opened by mgoin 11

Converting onnx model to deepsparse

Hi, I'm trying to convert an onnx model to a deepsparse model, here is the code:

from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs
onnx_filepath = "fom.onnx"
batch_size = 1

# Generate random sample input
inputs = generate_random_inputs(onnx_filepath, batch_size)

# Compile and run
engine = compile_model(onnx_filepath, batch_size)
outputs = engine.run(inputs)

**Environment**
Include all relevant environment information:
1. Ubuntu 18.04:
2. Python version 3.7.9. :
3. DeepSparse version 0.8.0 :
4. torch 1.9.0+cu102:
5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
6. CPU {'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': True, 'num_sockets': 2, 'available_sockets': 2, 'cores_per_socket': 18, 'available_cores_per_socket': 18, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 25952256}

**Errors**
[     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #0 of shape = [1, 3, 256, 256]
[     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #1 of shape = [1, 10, 2]
[     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #2 of shape = [1, 10, 2]
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized) (system=avx512, binary=avx512)
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized)
Date: 12-05-2021 @ 12:58:29 EST
OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
Arch: x86_64
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Vendor: GenuineIntel
OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
Arch: x86_64
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Vendor: GenuineIntel
Cores/sockets/threads: [36, 2, 72]
Available cores/sockets/threads: [36, 2, 72]
L1 cache size data/instruction: 32k/32k
L2 cache size: 1Mb
L3 cache size: 24.75Mb
Total memory: 507.367G
Free memory: 22.4387G

Assertion at ./src/include/wand/engine/compute/planner.hpp:131

Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
1# 0x00007F36EB17C234 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
2# 0x00007F36EB185889 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
3# 0x00007F36EB185982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
4# 0x00007F36EB18AA8A in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
5# 0x00007F36EB18AB00 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
6# 0x00007F36EA7E985D in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
7# 0x00007F36EA7EE443 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
8# 0x00007F36EA76BD6B in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
9# 0x00007F36EA75AB3F in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
10# 0x00007F36EA75C1C1 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
11# 0x00007F36EADA9668 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
12# 0x00007F36EADAC0A2 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
13# 0x00007F36EADAF3B9 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
14# 0x00007F36EA73B76C in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
15# 0x00007F36EA7414C3 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
16# 0x00007F36EA6FB982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
17# 0x00007F36EA6FBC05 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, int, wand::safe_type<wand::parallel::use_current_affinity_tag, bool>, std::shared_ptr<wand::parallel::scheduler_factory_t>) in /data/lib/python3.7/site-packages/deepsparse/avx512/libdeepsparse.so
19# 0x00007F3771031D1B in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
20# 0x00007F3771031F39 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
21# 0x00007F377105D5C5 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
22# 0x00007F377104B250 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
23# _PyMethodDef_RawFastCallDict in python

Please email a copy of this stack trace and any additional information to: [email protected]
Aborted

Do you have any ideas why the code is failing?

bug

opened by joaanna 11

[BugFix] Server Computer Vision `from_file` fixes
This PR is now a parent PR for 3 different bugs across deepsparse server The from_file bugfix was tested locally for all 3 CV Pipelines.

There is a bug in the current implementation of deepsparse.server, when files are sent by client (in CV tasks) the server wasn't actually reading in the files sent over the network, but was looking for local files(on the server machine) with the same name, if such a file existed it would use that for inference and return the result thus giving an illusion that everything worked as intended when it actually didn't, on the other hand if a local file with the same name wasn't found a FileNotFoundError was thrown on the server side (Which should not happen cause the file is not intended to be on the server machine) as follows:

FileNotFoundError: [Errno 2] No such file or directory: 'buddy.jpeg'

This current PR fixes this, The changes are two fold:

Changes were made on the server side to rely on actual filepointers rather than filenames

The from_files factory method for all CV Schemas was updated to accept an Iterable of FilePointers rather than a List[str], List --> Iterable change was made to make the function depend on behavior rather than the actual type; str --> TextIO change was made to accept File Pointers, TextIO is a generic typing module for File Pointers

Now we rely on PIL.Image.open(...) function to actually read in the contents from the file pointer; this library in included with torchvision(necessary requirements for all CV tasks) thus does not require any additional installation steps, or additional auto-installation code.

This bug was found while fixing another bug in the documentation as reported by @dbarbuzzi, the docs did not pass in an input with correct batch size, that bug was also fixed as a part of this Pull request

Testing:

Note: Please Test with an image that is not in the same directory from where the server is run, to actually check if the bug was fixed

Step 1: checkout this branch git checkout server-file-fixes Step 2: Start the server

deepsparse.server \ --task image_classification \ --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none" \ --port 5543

Step 3: Use the following code to make requests, the returned status must be [200], An HTTP status code 200 means success, change the image path accordingly in the following code:

import requests url = 'http://0.0.0.0:5543/predict/from_files' image_path = "/home/dummy-data/buddy.jpeg" path = [ image_path, ] files = [('request', open(img, 'rb')) for img in path] resp = requests.post(url=url, files=files) print(resp)

Also fixes the following issue thanks to @dbogunowicz
bug
opened by rahul-tuli 9

Failed building wheel for deepsparse and onnx

Describe the bug Failed building wheel for deepsparse

Environment Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: Ubuntu 18.04 via WSL on Windows 10
Python version [e.g. 3.7]: Python 3.6.9 (default, Jun 29 2022, 11:45:57)

To Reproduce pip install --upgrade deepsparse

Errors

Collecting deepsparse
  Using cached https://files.pythonhosted.org/packages/c9/38/442bcc9403aaf0dd082e23397a8a5e5ca43b5856058cffa0f5449c8f8a5c/deepsparse-1.1.0.tar.gz
Requirement already up-to-date: click~=8.0.0 in ./deepsparse/lib/python3.6/site-packages (from deepsparse)
Requirement already up-to-date: numpy>=1.16.3 in ./deepsparse/lib/python3.6/site-packages (from deepsparse)
Collecting onnx<=1.12.0,>=1.5.0 (from deepsparse)
  Using cached https://files.pythonhosted.org/packages/2c/6a/39b0580858589a67c3322aabc2634f158391ffbf98fa410127533e7f1495/onnx-1.12.0.tar.gz
Requirement already up-to-date: protobuf<4,>=3.12.2 in ./deepsparse/lib/python3.6/site-packages (from deepsparse)
Collecting pydantic>=1.8.2 (from deepsparse)
  Using cached https://files.pythonhosted.org/packages/fe/27/0de772dcd0517770b265dbc3998ed3ee3aa2ba25ba67e3685116cbbbccc6/pydantic-1.9.2-py3-none-any.whl
Collecting requests>=2.0.0 (from deepsparse)
  Using cached https://files.pythonhosted.org/packages/2d/61/08076519c80041bc0ffa1a8af0cbd3bf3e2b62af10435d269a9d0f40564d/requests-2.27.1-py2.py3-none-any.whl
Collecting sparsezoo~=1.1.0 (from deepsparse)
  Using cached https://files.pythonhosted.org/packages/10/aa/147378c7d961986cafdcd2c6ea5461bfc2078b2337584d1100083a3aaa6c/sparsezoo-1.1.0-py3-none-any.whl
Collecting tqdm>=4.0.0 (from deepsparse)
  Using cached https://files.pythonhosted.org/packages/47/bb/849011636c4da2e44f1253cd927cfb20ada4374d8b3a4e425416e84900cc/tqdm-4.64.1-py2.py3-none-any.whl
Requirement already up-to-date: importlib-metadata; python_version < "3.8" in ./deepsparse/lib/python3.6/site-packages (from click~=8.0.0->deepsparse)
Requirement already up-to-date: typing-extensions>=3.6.2.1 in ./deepsparse/lib/python3.6/site-packages (from onnx<=1.12.0,>=1.5.0->deepsparse)
Collecting dataclasses>=0.6; python_version < "3.7" (from pydantic>=1.8.2->deepsparse)
  Using cached https://files.pythonhosted.org/packages/fe/ca/75fac5856ab5cfa51bbbcefa250182e50441074fdc3f803f6e76451fab43/dataclasses-0.8-py3-none-any.whl
Collecting urllib3<1.27,>=1.21.1 (from requests>=2.0.0->deepsparse)
  Using cached https://files.pythonhosted.org/packages/6f/de/5be2e3eed8426f871b170663333a0f627fc2924cc386cd41be065e7ea870/urllib3-1.26.12-py2.py3-none-any.whl
Collecting charset-normalizer~=2.0.0; python_version >= "3" (from requests>=2.0.0->deepsparse)
  Using cached https://files.pythonhosted.org/packages/06/b3/24afc8868eba069a7f03650ac750a778862dc34941a4bebeb58706715726/charset_normalizer-2.0.12-py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests>=2.0.0->deepsparse)
  Using cached https://files.pythonhosted.org/packages/1d/38/fa96a426e0c0e68aabc68e896584b83ad1eec779265a028e156ce509630e/certifi-2022.9.24-py3-none-any.whl
Collecting idna<4,>=2.5; python_version >= "3" (from requests>=2.0.0->deepsparse)
  Using cached https://files.pythonhosted.org/packages/fc/34/3030de6f1370931b9dbb4dad48f6ab1015ab1d32447850b9fc94e60097be/idna-3.4-py3-none-any.whl
Collecting pyyaml>=5.1.0 (from sparsezoo~=1.1.0->deepsparse)
  Using cached https://files.pythonhosted.org/packages/b3/85/79b9e5b4e8d3c0ac657f4e8617713cca8408f6cdc65d2ee6554217cedff1/PyYAML-6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Collecting importlib-resources; python_version < "3.7" (from tqdm>=4.0.0->deepsparse)
  Using cached https://files.pythonhosted.org/packages/24/1b/33e489669a94da3ef4562938cd306e8fa915e13939d7b8277cb5569cb405/importlib_resources-5.4.0-py3-none-any.whl
Requirement already up-to-date: zipp>=0.5 in ./deepsparse/lib/python3.6/site-packages (from importlib-metadata; python_version < "3.8"->click~=8.0.0->deepsparse)
Building wheels for collected packages: deepsparse, onnx
  Running setup.py bdist_wheel for deepsparse ... error
  Complete output from command /root/deepsparse/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-9usltruh/deepsparse/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpq080et7upip-wheel- --python-tag cp36:
  Loaded version 1.1.0 from /tmp/pip-build-9usltruh/deepsparse/src/deepsparse/generated_version.py
  Checking to see if /tmp/pip-build-9usltruh/deepsparse/src/deepsparse/arch.bin exists.. True
  /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
    warnings.warn(msg)
  usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
     or: -c --help [cmd1 cmd2 ...]
     or: -c --help-commands
     or: -c cmd --help

  error: invalid command 'bdist_wheel'

  ----------------------------------------
  Failed building wheel for deepsparse
  Running setup.py clean for deepsparse
  Running setup.py bdist_wheel for onnx ... error
  Complete output from command /root/deepsparse/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-9usltruh/onnx/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpcl3nl8sspip-wheel- --python-tag cp36:
  fatal: Kein Git-Repository (oder irgendeines der Elternverzeichnisse): .git
  /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
    warnings.warn(msg)
  usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
     or: -c --help [cmd1 cmd2 ...]
     or: -c --help-commands
     or: -c cmd --help

  error: invalid command 'bdist_wheel'

  ----------------------------------------
  Failed building wheel for onnx
  Running setup.py clean for onnx
Failed to build deepsparse onnx
Installing collected packages: onnx, dataclasses, pydantic, urllib3, charset-normalizer, certifi, idna, requests, importlib-resources, tqdm, pyyaml, sparsezoo, deepsparse
  Running setup.py install for onnx ... -^canceled
^COperation cancelled by user

@mgoin @KSGulin @shubhra @Willtor @dbarbuzzi @tlrmchlsmth

bug

opened by ErfolgreichCharismatisch 8

Huggingface base Wav2Vec2 model crashing
Describe the bug Hello,

I am trying to compile the onnx-converted model of a sparse Huggingface base Wav2Vec2 model (where sparsity was obtained via unstructured magnitude pruning) through compile_model :

dse_network = compile_model(onnx_filepath, batch_size=batch_size, num_cores=1, num_streams=1)

My kernel crashed and I received the following message:

Backtrace: 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 1# 0x00007FFB125A27C4 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 2# 0x00007FFB125A8906 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 3# 0x00007FFB125A89F2 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 4# 0x00007FFB125B12FA in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 5# 0x00007FFB125B1370 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 6# 0x00007FFB11B1F76D in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 7# 0x00007FFB11B25BCF in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 8# 0x00007FFB11A92015 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 9# 0x00007FFB11A81939 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 10# 0x00007FFB11A82AF1 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 11# 0x00007FFB1213F938 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 12# 0x00007FFB121423B3 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 13# 0x00007FFB121456B9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 14# 0x00007FFB11A6312B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 15# 0x00007FFB11A6B3CE in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 16# 0x00007FFB11A11C1A in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 17# 0x00007FFB11A11ED5 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0 18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::shared_ptrwand::parallel::scheduler_factory_t) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libdeepsparse.so 19# 0x00007FFBE3641649 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 20# 0x00007FFBE364184B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 21# 0x00007FFBE36788B6 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 22# 0x00007FFBE364B0F9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so 23# 0x0000561F0FD79B66 in /opt/conda/bin/python

Please email a copy of this stack trace and any additional information to: [email protected]

Environment Include all relevant environment information:

OS : Ubuntu 18.04.5 LTS

Python version [e.g. 3.7]: Python 3.9.4

DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 1.0.2

ML framework version(s) [e.g. torch 1.7.1]: torch 1.11.0

Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: onnxruntime 1.12.0, onnx 1.12.0,

CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

>>> import deepsparse.cpu >>> print(deepsparse.cpu.cpu_architecture())

{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 31719424, 'architecture': 'x86_64', 'available_cores_per_socket': 19, 'available_num_cores': 38, 'available_num_hw_threads': 76, 'available_num_numa': 2, 'available_num_sockets': 2, 'available_sockets': 2, 'available_threads_per_core': 2, 'cores_per_socket': 19, 'isa': 'avx512', 'num_cores': 38, 'num_hw_threads': 76, 'num_numa': 2, 'num_sockets': 2, 'threads_per_core': 2, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz', 'vnni': False}

Would you please have any solution? Thank you
bug
opened by Tim-blo 8

Cannot import deepsparse from WSL: cannot get cpu topology

Describe the bug

For testing purposes, I want to try if my code works on Windows Subsystem for Linux (WSL2). I'm using Ubuntu 18.04LTS.

Once on Ubuntu on WSL, I create a new python virtual env, then pip install deepsparse.

After that, while trying to import deepsparse I get:

>>> import deepsparse
arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
Traceback (most recent call last):
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
    info_str = subprocess.check_output(file_path).decode("utf-8")
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
    from .engine import *
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
    from deepsparse.lib import init_deepsparse_lib
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
    CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
    arch = cpu_architecture()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
    arch = _parse_arch_bin()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
    self.memo[args] = self.f(*args)
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
    raise OSError(
OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.

Expected behavior

Maybe it should work on WSL :)

Environment Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: Ubuntu 18.04LTS (On Windows 10, WSL2)
Python version [e.g. 3.7]: 3.9
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.8.0
ML framework version(s) [e.g. torch 1.7.1]: 1.10.0
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

This is basically what's not working

To Reproduce Exact steps to reproduce the behavior:

On windows 10, activate WSL
Install Ubuntu 18.04 from microsoft store
On Ubuntu, create virtual env (I personnaly use mamba or conda)
pip install deepsparse
import deepsparse

Errors If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

>>> import deepsparse
arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
Traceback (most recent call last):
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
    info_str = subprocess.check_output(file_path).decode("utf-8")
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
    from .engine import *
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
    from deepsparse.lib import init_deepsparse_lib
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
    CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
    arch = cpu_architecture()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
    arch = _parse_arch_bin()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
    self.memo[args] = self.f(*args)
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
    raise OSError(
OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.

Additional context Add any other context about the problem here. Also include any relevant files.

bug

opened by clementpoiret 8

Does a C API exist for deepsparse or is this python only and are all benchmarks via python?

Just a quick question. Is it possible to use deepsparse for inference directly in other languages e.g. C++, C# or similar? Or is all code written in python?
enhancement

opened by nietras 8
getting low fps & inference issue
1.i used this repo https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo & this command !python annotate.py
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94
--source "/content/loc1min.mp4"
--quantized-inputs
--image-shape 416 416
--save-dir '/content/ops/'
--model-config '/content/coco128.yaml'
--device 'cpu'

& im getting low fps on cpu, (yolov5s model) its normal fps or should we get 50-60 fps? bcz you have mentioned that model will be 10x faster. but its very less.

i trained model usiing sparsml repo on coco128 data for 40 epochs & converted .pth model into onnx & tried same inference script !python annotate.py
/content/sparseml/integrations/ultralytics-yolov5/yolov5/runs/train/exp2/weights/best.onnx
--source "/content/loc1min.mp4"
--quantized-inputs
--image-shape 416 416
--save-dir '/content/ops/'
--model-config '/content/coco128.yaml' --device 'cpu' & getting this issue.

What's wrong here? my goal is to use a custom data train model with sparceml & do inference using deepspare.
bug
opened by akashAD98 7
no better speed on yolo quant

Hi! How is it going?

At first ,thanks for your good repo and helping to make better and faster model. I use your yolo example for getting better speed, and I compare base, pruned and quant models as you said. but all result were aproximatly same . there is no vnni warning, and my server is ubuntu 18 my code is:

import os

models :

yolov5s_base = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none"

yolov5s_pruned ="zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96"

yolov5s_pruned_quant = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94"

source_img = "img.bmp"

print("\n base inference:\n") bash_cmd = f"python annotate.py {yolov5s_base} --source {source_img} --image-shape 640 640 " os.system(bash_cmd)

print("\n pruned inference:\n") bash_cmd = f"python annotate.py {yolov5s_pruned } --source {source_img} --image-shape 640 640 " os.system(bash_cmd)

print("\n pruned_quant inference:\n") bash_cmd = f"python annotate.py {yolov5s_pruned_quant} --source {source_img} --quantized-inputs --image-shape 640 640 " os.system(bash_cmd)

when I run this code in bash script, I get this results:

base inference:

2022-03-08 20:28:15 main INFO Results will be saved to annotation_results/deepsparse-annotations-8 model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none downloaded to /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx 2022-03-08 20:28:17 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512) 2022-03-08 20:28:18 main INFO Inference 0 processed in 128.20696830749512 ms 2022-03-08 20:28:18 main INFO Results saved to annotation_results/deepsparse-annotations-8

pruned inference:

2022-03-08 20:28:19 main INFO Results will be saved to annotation_results/deepsparse-annotations-9 model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 downloaded to /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx 2022-03-08 20:28:21 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512) 2022-03-08 20:28:23 main INFO Inference 0 processed in 124.91464614868164 ms 2022-03-08 20:28:23 main INFO Results saved to annotation_results/deepsparse-annotations-9

pruned_quant inference:

2022-03-08 20:28:24 main INFO Results will be saved to annotation_results/deepsparse-annotations-10 model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 downloaded to /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx 2022-03-08 20:28:26 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512) 2022-03-08 20:28:28 main INFO Inference 0 processed in 114.76516723632812 ms 2022-03-08 20:28:28 main INFO Results saved to annotation_results/deepsparse-annotations-10

as you see quant pruned has no more speed ! pls guide me to get faster result thanks!

opened by RasoulZamani 7
How does the quantized op infered?

Hello, just out of curiosity. How does the quantized int conv op infered? It wasn't supported in onnxruntime I think? Not even a standared onnx op.

How does it infered?
documentation

opened by jinfagang 2
[Fix] Update the code for handling ragged numpy arrays in numpy >= 1.24.0
Response to: https://github.com/neuralmagic/deepsparse/issues/825

Since NumPy version 19.0, one must specify dtype=object when creating an array from "ragged" sequences, otherwise, one receives a warning:

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.

Starting NumpyPy version 1.24.0, this warning turns into an explicit error:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (128,) + inhomogeneous part.

This PR adds the code change necessary to remove the error. Afaik, only transformers QA code requires the update.

This fix should also be backward compatible at least dating back to 1.19 (in our requirements we honor numpy>=1.16.3, maybe it is time to bump this version up?)
opened by dbogunowicz 1

Broken Transformers QA Inference Pipeline

Describe the bug

Transformers QA pipeline fails on a simple inference task.

Expected behavior The inference pipeline for Question Answering should work without raising any errors.

Environment Python version: 3.8 DeepSparse version: current main

To Reproduce

from deepsparse import Pipeline

task = "question-answering"
dense_qa_pipeline = Pipeline.create(
        task=task,
        model_path="zoo:nlp/question_answering/distilbert-none/pytorch/huggingface/squad/base-none",
        # or model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none",
        # was checking whether the problem is not model-dependent
    )

question = "DeepSparse is sparsity-aware inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application"
q_context = "What is DeepSparse?"

dense_qa_pipeline(question=question, context=q_context)

Errors

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.3.0.20221217 COMMUNITY | (d5bf112b) (release) (optimized) (system=avx2, binary=avx2)
Traceback (most recent call last):
  File "/usr/lib/python3.8/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/home/ubuntu/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/ubuntu/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/ubuntu/damian/deepsparse_copy/hehe.py", line 14, in <module>
    dense_output = dense_qa_pipeline(question=question, context=q_context)
  File "/home/ubuntu/damian/deepsparse_copy/src/deepsparse/pipeline.py", line 217, in __call__
    engine_inputs: List[numpy.ndarray] = self.process_inputs(pipeline_inputs)
  File "/home/ubuntu/damian/deepsparse_copy/src/deepsparse/transformers/pipelines/question_answering.py", line 261, in process_inputs
    {
  File "/home/ubuntu/damian/deepsparse_copy/src/deepsparse/transformers/pipelines/question_answering.py", line 262, in <dictcomp>
    key: numpy.array(tokenized_example[key][span])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (128,) + inhomogeneous part.

Additional context The error occurs in https://github.com/neuralmagic/deepsparse/blob/main/src/deepsparse/transformers/pipelines/question_answering.py#L260.

When attempting to perform dictionary comprehension:

{
                    key: numpy.array(tokenized_example[key][span])
                    for key in tokenized_example.keys()
                    if key not in self.onnx_input_names
                }

Here: self.onnx_input_names = ['input_ids', 'attention_mask', 'token_type_ids'] tokenized_example.keys() = ['input_ids', 'token_type_ids', 'attention_mask', 'special_tokens_mask', 'offset_mapping', 'overflow_to_sample_mapping', 'example_id']

As a result, we end up iterating over the list difference. One element of this resulting list, offset_mapping is the culprit:

[tokenized_example[key][0] for key in['offset_mapping']]`

results in :

Calling numpy.array(...) on this data structure envokes the error in question.

Interestingly, when @mwitiderrick attempted to reproduce an error inside the collab notebook (not using the main, but the last release), the problem disappears: https://colab.research.google.com/drive/1aIrITYxgcR-5VmL4vm8P-6H4rvCBAeaX?usp=sharing However, it reappears (on the last release) when he attempted to run transformers QA pipeline in HF/Gradio: https://huggingface.co/spaces/neuralmagic/question-answering/blob/main/app.py

bug

opened by dbogunowicz 0

Quantization and pruning for yolov7

I would like to perform quantization aware training for a custom object detector using yolov7 architecture. Could you please let me know if the functionality developed by deepsparse for yolov5 can be used straightway or what modifications do I need to make for me to use it for yolov7? Any leads would be appreciated. Thanks
enhancement

opened by Sri20021 0

Releases(v1.3.0)

v1.3.0(Dec 21, 2022)
New Features:

Bfloat16 is now supported on CPUs with the AVX512_BF16 extension. Users can expect up to 30% performance improvement for sparse FP32 networks and an up to 75% performance improvement for dense FP32 networks. This feature is opt-in and is specified with the default_precision parameter in the configuration file.

Several options can now be specified using a configuration file.

Max and min operators are now supported for performance.

SQuAD 2.0 support provided.

NLP multi-label and eval support added.

Fraction of supported operations property added to engine class.

New ML Ops logging capabilities implemented, including metrics logging, custom functions, and Prometheus support.

Changes:

Minimum Python version set to 3.7.

The default logging level has been changed to warn.

Timing functions and a default no-op deallocator have been added to improve usability of the C++ API.

DeepSparse now supports the axes parameter to be specified either as an input or an attribute in several ONNX operators.

Model compilation times have been improved on machines with many cores.

YOLOv5 pipelines upgraded to latest state from Ultralytics.

Transformers pipelines upgraded to latest state from Hugging Face.

Resolved Issues:

DeepSparse no longer crashes with an assertion failure for softmax operators on dimensions with a single element.

DeepSparse no longer crashes with an assertion failure on some unstructured sparse quantized BERT models.

Image classification evaluation script no longer crashes for larger batch sizes.

Known Issues:

None

Source code(tar.gz)
Source code(zip)
deepsparse-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.09 MB)
deepsparse-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.10 MB)
deepsparse-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.09 MB)
deepsparse-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.09 MB)
deepsparse-1.3.0.tar.gz(37.77 MB)
deepsparse-ent-1.3.0.tar.gz(39.34 MB)
deepsparse-ent_api_demo.tar.gz(70.79 MB)
deepsparse_api_demo.tar.gz(69.21 MB)
deepsparse_ent-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.67 MB)
deepsparse_ent-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.68 MB)
deepsparse_ent-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.67 MB)
deepsparse_ent-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.67 MB)
v1.3.1(Jan 4, 2023)
This is a patch release for 1.3.0 that contains the following changes:

Performance on some unstructured sparse quantized YOLOv5 models has been improved. This fixes a performance regression compared to DeepSparse 1.1.

DeepSparse no longer throws an exception when it cannot determine L3 cache information and instead logs a warning message.

An assertion failure on some compound sparse quantized transformer models has been fixed.

Models with ONNX opset 13 Squeeze operators no longer exhibit poor performance, and DeepSparse now sees speedup from sparsity when running them.

NumPy version pinned to <=1.21.6 to avoid deprecation warning/index errors in pipelines.

Source code(tar.gz)
Source code(zip)
deepsparse-1.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.09 MB)
deepsparse-1.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.10 MB)
deepsparse-1.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.09 MB)
deepsparse-1.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.09 MB)
deepsparse-1.3.1.tar.gz(37.77 MB)
deepsparse-ent-1.3.1.tar.gz(39.34 MB)
deepsparse-ent_api_demo.tar.gz(70.80 MB)
deepsparse_api_demo.tar.gz(69.21 MB)
deepsparse_ent-1.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.67 MB)
deepsparse_ent-1.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.68 MB)
deepsparse_ent-1.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.67 MB)
deepsparse_ent-1.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.67 MB)
v1.2.0(Oct 28, 2022)
New Features:

DeepSparse Engine Trial and Enterprise Editions now available, including license key activations.

DeepSparse Pipelines document classification use case in NLP supported.

Changes:

Mock engine tests added to enable faster and more precise unit tests in pipelines and Python code.

DeepSparse Engine benchmarking updated to use time.perf_counter for more accurate benchmarks.

Dynamic batch implemented to be more generic so it can support any pipeline.

Minimum Python version changed to 3.7 as 3.6 reached EOL.

Performance:

Performance improvements for unstructured sparse quantized convolutional neural networks implemented for throughput use cases.

Resolved Issues:

In the C++ interface, the engine no longer crashes with a segmentation fault when the num_streams provided to the engine_context_t is greater than the number of physical CPU cores.

The engine no longer crashes with assertion failures when running YOLOv4.

YOLACT pipelines fixed where dynamic batch was not working and exported images had color channels improperly swapped.

DeepSparse Server no longer crashes for hyphenated task names such as "question-answering."

Computer vision pipelines now additionally accept single NumPy array inputs.

Protobuf version for ONNX 1.12 compatibility pinned to prevent installation failures on some systems.

Known Issues:

None

Source code(tar.gz)
Source code(zip)
deepsparse-1.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.18 MB)
deepsparse-1.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.19 MB)
deepsparse-1.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.18 MB)
deepsparse-1.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.18 MB)
deepsparse-1.2.0.tar.gz(36.88 MB)
deepsparse-ent-1.2.0.tar.gz(38.46 MB)
deepsparse-ent_api_demo.tar.gz(68.80 MB)
deepsparse_api_demo.tar.gz(67.22 MB)
deepsparse_ent-1.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.76 MB)
deepsparse_ent-1.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.77 MB)
deepsparse_ent-1.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.76 MB)
deepsparse_ent-1.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.76 MB)
v1.1.0(Aug 25, 2022)
New Features:

Python 3.10 support added.

Zero-shot text classification pipeline implemented.

Haystack Information Retrieval pipeline implemented.

YOLACT pipeline native integration for deployments is available.

DeepSparse pipelines now support dynamic batch, dynamic shape through bucketing, and asynchronous execution support.

CustomTaskPipeline added to enable easier custom pipeline creation.

Changes:

The behavior of the Multi-stream scheduler is now identical to the Elastic scheduler, and the old Multi-stream scheduler has been removed.

NLP pipelines for question answering, text classification, and token classification upgraded to improve accuracy and better match the SparseML training pathways.

Updates made across the repository for new SparseZoo Python APIs.

Max torchvision version increased to 0.12.0 for computer vision deployment pathways.

Performance:

Inference performance improvements for

unstructured sparse quantized Transformer models.

slow activation functions (such as Gelu or Swish) when they follow a QuantizeLinear operator.

some sparse 1D convolutions. Speedups of up to 3x are observed.

Squeeze, when operating on a single axis.

Resolved Issues:

Assertion errors no longer when one node had multiple inputs, both coming from the same node no longer occurs.

An assertion error no longer appears when a MatMul operator followed a Transpose or Reshape operator no longer occurs.

Pipelines now support hyphenated versions of standard task names such as question-answering,

Known Issues:

In the C++ interface, the engine will crash with a segmentation fault when the num_streams provided to the engine_context_t is greater than the number of physical CPU cores.

Source code(tar.gz)
Source code(zip)
deepsparse-1.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.73 MB)
deepsparse-1.1.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.75 MB)
deepsparse-1.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.75 MB)
deepsparse-1.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.73 MB)
deepsparse-1.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(39.73 MB)
deepsparse-1.1.0.tar.gz(39.44 MB)
deepsparse_api_demo.tar.gz(69.80 MB)
v1.0.2(Jul 13, 2022)
This is a patch release for 1.0.0 that contains the following changes:

Question answering pipeline pre-processing now to exactly match the SparseML training pre-processing. Before there were differences between the logic of the two that was leading to minor drops in accuracy.

Source code(tar.gz)
Source code(zip)
deepsparse-1.0.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.18 MB)
deepsparse-1.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.18 MB)
deepsparse-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.16 MB)
deepsparse-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.16 MB)
deepsparse-1.0.2.tar.gz(37.89 MB)
deepsparse_api_demo.tar.gz(68.29 MB)
v1.0.1(Jul 7, 2022)
This is a patch release for 1.0.0 that contains the following changes:

Crashes with an assertion failure no longer happen in the following cases:

during model compilation for a convolution with a 1x1 kernel with 2x2 convolution strides.

when setting the num_streams parameter to fewer than the number of NUMA nodes.

The engine no longer enters an infinite loop when an operation has multiple inputs coming from the same source.

Error messaging improved for installation failures of non-supported operating systems.

Supported transformers datasets version capped for compatibility with pipelines.
Source code(tar.gz)
Source code(zip)
deepsparse-1.0.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.18 MB)
deepsparse-1.0.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.18 MB)
deepsparse-1.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.16 MB)
deepsparse-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.16 MB)
deepsparse-1.0.1.tar.gz(37.89 MB)
deepsparse_api_demo.tar.gz(68.29 MB)
v1.0.0(Jul 1, 2022)
New Features:

Support added for running multiple models with the same engine when using the Elastic Scheduler.

When using the Elastic Scheduler, the caller can now use the num_streams argument to tune the number of requests that are processed in parallel.

Pipeline and annotation support added and generalized for transformers, yolov5, and torchvision.

Documentation additions made for transformers, yolov5, torchvision, and serving that focus on model deployment for the given integrations.

AWS SageMaker example created.

Changes:

Click as a root dependency added as the new preferred route for CLI invocation and arg management.

Performance:

Inference performance has been improved for unstructured sparse quantized models on AVX2 and AVX-512 systems that do not support VNNI instructions. This includes up to 20% on BERT and 45% on ResNet-50.

Resolved Issues:

When a layer operates on a dataset larger than 2GB, potential crashes no longer happen.

Assertion error addressed for Reduce operations where the reduction axis is of length 1.

Rare assertion failure addressed related to Tensor Columns.

When running the DeepSparse Engine on a system with a non-uniform system topology, model compilation now properly terminates.

Known Issues:

In rare cases, the engine may crash with an assertion failure during model compilation for a convolution with a 1x1 kernel with 2x2 convolution strides; hotfix forthcoming.

The engine will crash with an assertion failure when setting the num_streams parameter to fewer than the number of NUMA nodes; hotfix forthcoming.

In rare cases, the engine may enter an infinite loop when an operation has multiple inputs coming from the same source; hotfix forthcoming.

Source code(tar.gz)
Source code(zip)
deepsparse-1.0.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.17 MB)
deepsparse-1.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.17 MB)
deepsparse-1.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.16 MB)
deepsparse-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(38.16 MB)
deepsparse-1.0.0.tar.gz(37.89 MB)
deepsparse_api_demo.tar.gz(68.29 MB)
v0.12.2(Jun 2, 2022)
This is a patch release for 0.12.0 that contains the following changes:

Protobuf is restricted to version < 4.0 as the newer version breaks ONNX.

Source code(tar.gz)
Source code(zip)
deepsparse-0.12.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.25 MB)
deepsparse-0.12.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.25 MB)
deepsparse-0.12.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.24 MB)
deepsparse-0.12.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.23 MB)
deepsparse-0.12.2.tar.gz(36.99 MB)
deepsparse_api_demo.tar.gz(67.42 MB)
v0.12.1(May 5, 2022)
This is a patch release for 0.12.0 that contains the following changes:

Improper label mapping no longer crashes for validation flows within DeepSparse transformers.

DeepSparse Server now exposes proper routes for SageMaker.

Dependency issue with DeepSparse Server no longer installs an old version of a library that caused crashing issues in some use cases.

Source code(tar.gz)
Source code(zip)
deepsparse-0.12.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.25 MB)
deepsparse-0.12.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.25 MB)
deepsparse-0.12.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.24 MB)
deepsparse-0.12.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.23 MB)
deepsparse-0.12.1.tar.gz(36.99 MB)
deepsparse_api_demo.tar.gz(67.42 MB)
v0.12.0(Apr 22, 2022)
New Features:

Documentation:

SparseServer.UI: a Streamlit app for deploying the DeepSparse Server for exploring the inference performance of BERT on the question answering task.

DeepSparse Server README: deepsparse.server capabilities, including single model and multi-model inferencing.

Twitter NLP Inference Examples added.

Changes:

Performance:

Speedup for large batch sizes when using sync mode on AMD EPYC processors.

AVX2 improvements for

Up to 40% speedup out of the box for dense quantized models.

Up to 20% speedup for pruned quantized BERT, ResNet-50 and MobileNet.

Speedup from sparsity realized for ConvInteger operators.

Model compilation time decreased on systems with many cores.

Multi-stream Scheduler: certain computations that were executed during runtime are now precomputed.

Hugging Face Transformers integration updated to latest state from upstream main branch.

Documentation:

DeepSparse README: references to deepsparse.server, deepsparse.benchmark, and Transformer pipelines.

DeepSparse Benchmark README: highlights of deepsparse.benchmark CLI command.

Transformers 🤗 Inference Pipelines: examples included on how to run inference via Python for several NLP tasks.

Resolved Issues:

When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will no longer disable optimizations and see very poor performance.

Users executing arch.bin now receive a correct architecture profile of their system.

Known Issues:

When running the DeepSparse engine on a system with a nonuniform system topology, for example, an AMD EPYC processor where some cores per core-complex (CCX) have been disabled, model compilation will never terminate. A workaround is to set the environment variable NM_SERIAL_UNIT_GENERATION=1.

Source code(tar.gz)
Source code(zip)
deepsparse-0.12.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.25 MB)
deepsparse-0.12.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.25 MB)
deepsparse-0.12.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.23 MB)
deepsparse-0.12.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.23 MB)
deepsparse-0.12.0.tar.gz(36.99 MB)
deepsparse_api_demo.tar.gz(67.42 MB)
v0.11.2(Mar 23, 2022)
This is a patch release for 0.11.0 that contains the following changes:

Fixed an assertion error that would occur when using deepsparse.benchmark on AMD machines with the argument -pin none.

Known Issues:

When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will disable optimizations and see very poor performance.

Source code(tar.gz)
Source code(zip)
deepsparse-0.11.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.91 MB)
deepsparse-0.11.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.91 MB)
deepsparse-0.11.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.89 MB)
deepsparse-0.11.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.89 MB)
deepsparse-0.11.2.tar.gz(36.65 MB)
deepsparse_api_demo.tar.gz(67.08 MB)
v0.11.1(Mar 21, 2022)
This is a patch release for 0.11.0 that contains the following changes:

When running NanoDet-Plus-m, the DeepSparse Engine will no longer fail with an assertion (See #279).

The DeepSparse Engine now respects the cpu affinity set by the calling thread. This is essential for the new Command-line (CLI) tool multi-process-benchmark.py to function correctly. This script allows users to measure the performance using multiple separate processes in parallel.

Fixed a performance regression on BERT batch size 1 sequence length 128 models.

Source code(tar.gz)
Source code(zip)
deepsparse-0.11.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.91 MB)
deepsparse-0.11.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.91 MB)
deepsparse-0.11.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.89 MB)
deepsparse-0.11.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.89 MB)
deepsparse-0.11.1.tar.gz(36.65 MB)
deepsparse_api_demo.tar.gz(67.07 MB)
v0.11.0(Mar 11, 2022)
New Features:

High-performance sparse quantized convolutional neural networks supported on AVX2 systems.

CCX detection added to the DeepSparse Engine for AMD systems.

deepsparse.server integration and CLIs added with Hugging Face transformers pipelines support.

Changes:

Performance improvements made for

FP32 sparse BERT models

batch size 1 networks

quantized sparse BERT models

Pooling operations

Resolved Issues:

When hyperthreads are disabled in the BIOS, core/socket information on certain systems can now be detected.

Hugging Face transformers validation flows for QQP now giving correct accuracy metrics.

PyTorch downloaded for YOLO model stubs now supported.

Known Issues:

When running NanoDet-Plus-m, the DeepSparse Engine will fail with an assertion (See #279). A hotfix is being pursued.

Source code(tar.gz)
Source code(zip)
deepsparse-0.11.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.90 MB)
deepsparse-0.11.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.90 MB)
deepsparse-0.11.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.88 MB)
deepsparse-0.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.88 MB)
deepsparse-0.11.0.tar.gz(36.64 MB)
deepsparse_api_demo.tar.gz(67.07 MB)
v0.10.0(Feb 3, 2022)
New Features:

Quantization support enabled on AVX2 instruction set for GEMM and elementwise operations.

NM_SPOOF_ARCH environment variable added for testing different architectural configurations.

Elastic scheduler implemented as an alternative to the single-stream or multi-stream schedulers.

deepsparse.benchmark application is now usable from the command-line after installing deepsparse to simplify benchmarking.

deepsparse.server CLI and API added with transformers support to make serving models like BERT with pipelines easy.

Changes:

More robust architecture detection added to help resolve CPU topology, such as when running inside a virtual machine.

Tensor columns improved, leading to significant speedups from 5 to 20% in pruned YOLO (larger batch size), BERT (smaller batch size), MobileNet, and ResNet models.

Sparse quantized network performance improved on machines that do not support VNNI instructions.

Performance improved for dense BERT with large batch sizes.

Resolved Issues:

Possible crashes eliminated for:

Pooling operations with small image sizes

Rarely, networks containing convolution or GEMM operations

Some models with many residual connections

Known Issues:

None

Source code(tar.gz)
Source code(zip)
deepsparse-0.10.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.66 MB)
deepsparse-0.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.66 MB)
deepsparse-0.10.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.64 MB)
deepsparse-0.10.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(37.64 MB)
deepsparse-0.10.0.tar.gz(37.40 MB)
deepsparse_api_demo.tar.gz(67.85 MB)
v0.9.1(Dec 14, 2021)
This is a patch release for 0.9.0 that contains the following changes:

YOLACT models and other models with constant outputs no longer fail with a mismatched shape error on multi-socket systems with batch sizes greater than 1. However, a corner case exists where a model with a constant output whose first dimension is equal to the (nonunit) batch size will fail.

GEMM operations where the number of columns of the output matrix is not divisible by 16 will no longer fail with an assertion error.

Broadcasted inputs to elementwise operators no longer fail with an assertion error.

Int64 multiplications no longer fail with an illegal instruction on AVX2.

Source code(tar.gz)
Source code(zip)
deepsparse-0.9.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.38 MB)
deepsparse-0.9.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.38 MB)
deepsparse-0.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.37 MB)
deepsparse-0.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.37 MB)
deepsparse-0.9.1.tar.gz(36.13 MB)
deepsparse_api_demo.tar.gz(66.60 MB)
v0.9.0(Dec 1, 2021)
New Features:

Support optimized for resize operators with coordinate transformations of pytorch_half_pixel and align_corners.

Up-to-date version check implemented for DeepSparse.

YOLACT and DeepSparse integration added in examples/dbolya-yolact.

Changes:

The parameter for the number of sockets to use has been removed -- the Python interface now only takes only the number of cores as a parameter.

Tensor columns have been optimized. Users will see performance improvements specifically for pruned quantized BERT models:

The softmax operator can now take advantage of tensor columns.

Inference batch sizes that are not divisible by 16 are now supported.

Various performance improvements made to:

certain networks, such as YOLOv5, on AVX2 systems.

dense convolutions on some AVX-512 systems.

API docs recompiled.

Resolved Issues:

In rare circumstances, users could have experienced an assertion error when executing networks with depthwise convolutions.

Known Issues:

YOLACT models fail with a mismatched shape error on multi-socket systems with batch size greater than 1. This issue applies to any model with a constant output.

In some circumstances, GEMM operations where the number of columns of the output matrix is not divisible by 16 may fail with an assertion error.

Source code(tar.gz)
Source code(zip)
deepsparse-0.9.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.37 MB)
deepsparse-0.9.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.37 MB)
deepsparse-0.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.36 MB)
deepsparse-0.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.36 MB)
deepsparse-0.9.0.tar.gz(36.12 MB)
deepsparse_api_demo.tar.gz(66.59 MB)
v0.8.0(Oct 26, 2021)
New Features:

Tensor columns have been optimized, improving the performance of some networks.

This includes but is not limited to pruned and quantized YOLOv5s and BERT.

For networks with subgraphs comprised of low-compute operations.

Batch size must be a multiple of 16.

Reduce operators have been further optimized in the Engine.

C++ API support is available for the DeepSparse Engine.

Changes:

Performance improvements made for low-precision (8 and 16-bit) datatypes on AVX2.

Resolved Issues:

Rarely, when several data arrangement operators were in a row, e.g., Reshape, Transpose, or Slice, assertion errors occurred.

When Pad operators were not followed by convolution or pooling, assertion errors occurred.

CPU threads migrated between cores when running benchmarks.

Known Issues:

None

Source code(tar.gz)
Source code(zip)
deepsparse-0.8.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.69 MB)
deepsparse-0.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.69 MB)
deepsparse-0.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.68 MB)
deepsparse-0.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(36.68 MB)
deepsparse-0.8.0.tar.gz(33.95 MB)
deepsparse_api_demo.tar.gz(66.90 MB)
v0.7.0(Sep 13, 2021)
New Features:

Operators optimized for Engine support:

Where*

Cast*

IntegerMatMul*

QLinearMatMul*

Gather (for scalar indices) *optimized only for AVX-512 support

Flag created to disable any batch size overrides, setting the environment variable "NM_DISABLE_BATCH_OVERRIDE=1".

Warnings display when emulating quantized operations on machines without VNNI instructions.

Support added for Python 3.9.

Support added for ONNX versions 1.8 - 1.10.

Changes:

Performance improvements made for sparse quantized transformer models.

Documentation updates made for examples/ultralytics-yolo to include YOLOv5.

Resolved Issues:

A crash could result with an uninitialized memory read. A check is now in place before trying to access it.

Engine output_shape functions corrected on multi-socket systems when the output dimensions are not statically known.

Known Issues:

BERT models with quantized embeds currently segfault on AVX2 machines. Workaround is to run on a VNNI-compatible machine.

Source code(tar.gz)
Source code(zip)
deepsparse-0.7.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(34.27 MB)
deepsparse-0.7.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(34.27 MB)
deepsparse-0.7.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(34.25 MB)
deepsparse-0.7.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(34.25 MB)
deepsparse-0.7.0.tar.gz(31.53 MB)
v0.6.1(Aug 11, 2021)
This is a patch release for 0.6.0 that contains the following changes:

Users no longer experience crashes

when running the ReduceSum operation in the DeepSparse Engine.

when running operations on tensors that are 8- or 16-bit integers, or booleans, on AVX2.

Source code(tar.gz)
Source code(zip)
deepsparse-0.6.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.78 MB)
deepsparse-0.6.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.78 MB)
deepsparse-0.6.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.76 MB)
deepsparse-0.6.1.tar.gz(30.05 MB)
v0.6.0(Jul 30, 2021)
New Features:

DeepSparse Engine optimized for Sparse FP32 BERT.

Optimized BERT model collection now in the SparseZoo.

Performance improvement example includes 5x increased throughput on PruneBERT (281 seq/sec) compared to dense BERT (53 seq/sec) at batch size 32 and sequence length 128 (AWS c5.12xlarge).

Optimized Tanh operator support provided.

Hugging Face transformers pipeline APIs added for NLP models.

Hugging Face transformers examples added for benchmarking, deploying, and sample application.

Ultralytics YOLOv5 example support added.

Changes:

Performance improvements made for: - all networks when running on multi-socket machines, especially those with large outputs. - batched Softmax and Reduce operators with many threads available. - Reshape operators when multiple dimensions are combined into one or one dimension is split into multiple. - stacked matrix multiplications by supporting more input layouts.

YOLOv3 example integration was generalized to ultralytics-yolo in support of both V3 and V5.

Resolved Issues:

Engine now runs on architectures with more than one NUMA node per socket.

Known Issues:

None

Source code(tar.gz)
Source code(zip)
deepsparse-0.6.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.78 MB)
deepsparse-0.6.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.78 MB)
deepsparse-0.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.76 MB)
deepsparse-0.6.0.tar.gz(30.05 MB)
v0.5.1(Jun 30, 2021)
This is a patch release for 0.5.0 that contains the following changes:

resolution to address an issue that caused a performance regression on YOLOv5 and could have affected the correctness of some models.

Source code(tar.gz)
Source code(zip)
deepsparse-0.5.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.16 MB)
deepsparse-0.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.16 MB)
deepsparse-0.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.16 MB)
deepsparse-0.5.1.tar.gz(29.66 MB)
v0.5.0(Jun 28, 2021)
New Features:

None

Changes:

Performance optimizations implemented for binary elementwise operations, where both inputs come from the same source buffer. One of the inputs may have intermediate unary operations.

Performance optimizations implemented for binary elementwise operations where one of the inputs is a constant scalar.

Small performance improvement for large batch sizes (> 64) on quantized ResNet.

Resolved Issues:

Assertion deepsparse num_sockets removed when too many sockets were requested, causing users to experience a crash.

Rare assertion failure fixed when a nonlinearity appeared between an elementwise addition and a convolution or gemm.

Broken URLs for classification and detection examples updated in the contained READMEs.

Known Issues:

None

Source code(tar.gz)
Source code(zip)
deepsparse-0.5.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.16 MB)
deepsparse-0.5.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.16 MB)
deepsparse-0.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(32.16 MB)
deepsparse-0.5.0.tar.gz(29.66 MB)
v0.4.0(Jun 4, 2021)
New Features:

New operator support implemented for Expand.

Slice operator support for positive step sizes. Only slice operations that operate on a single axis are supported. Previously, slice was only supported for constant tensors and step size equal to one.

Changes:

Memory usage of compiled models reduced.

Memory layout for matrix multiplications in Transformers optimized.

Precision for swish and sigmoid operations improved.

Runtime performance improved for some networks whose outputs are immediately preceded by transpose operators.

Runtime performance of softmax operations improved.

Readme redesigned for better clarity on the repository's purpose.

Resolved Issues:

Using the multi-stream scheduler, when more threads were selected than the number of cores on the system, it no longer causes a performance hit.

Neural Magic dependencies upgrade to intended bug versions instead of minor versions.

Known Issues:

None

Source code(tar.gz)
Source code(zip)
deepsparse-0.4.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(31.98 MB)
deepsparse-0.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(31.98 MB)
deepsparse-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(31.98 MB)
deepsparse-0.4.0.tar.gz(29.48 MB)
v0.3.1(May 14, 2021)
This is a patch release for 0.3.0 that contains the following changes:

Docs updated for new Discourse and Slack links

Check added for supported Python version so DeepSparse does not improperly install on unsupported systems

Source code(tar.gz)
Source code(zip)
deepsparse-0.3.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(31.18 MB)
deepsparse-0.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(31.18 MB)
deepsparse-0.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(31.18 MB)
deepsparse-0.3.1.tar.gz(28.69 MB)
v0.3.0(Apr 30, 2021)
New Features:

Multi-stream scheduler added as a configurable option to the engine.

Changes:

Errors related to setting the NUMA memory policy are now issued as warnings.

Improved compilation times for sparse networks.

Performance improvements made for: networks with large outputs and multi-socket machines; ResNet-50 v1 quantized and kernel sparsity gemms.

Copy operations and placement of quantization operations within network optimized.

Version changed to be loaded from version.py file, default build on branches is now nightly.

cpu.py file and related APIs added to DeepSparse repo instead of copying over from backend.

Add unsupported system install errors for end users when running on non-Linux systems.

YOLOv3 batch 64 quantized now has a speedup of 16% in the DeepSparse Engine.

Resolved Issues:

An assertion is no longer triggered when more sockets or threads than available are requested.

Resolved assertion when performing Concat operations on constant buffers.

Engine no longer crashes when the output of a QLinearMatMul operation has a dimension not divisible by 4.

The engine now starts without crashing on Windows Subsystem for Linux and Docker for Windows or Docker for Mac.

Known Issues:

None

Source code(tar.gz)
Source code(zip)
deepsparse-0.3.0-cp36-cp36m-manylinux2014_x86_64.whl(31.18 MB)
deepsparse-0.3.0-cp37-cp37m-manylinux2014_x86_64.whl(31.18 MB)
deepsparse-0.3.0-cp38-cp38-manylinux2014_x86_64.whl(31.18 MB)
deepsparse-0.3.0.tar.gz(28.69 MB)
v0.2.0(Mar 31, 2021)
New Features:

None

Changes:

Dense convolutions on AVX2 systems were optimized, improving performance for many non-pruned networks. In particular, this results in a speed improvement for batch size 64 ResNet-50 of up to 28% on Intel AVX2 systems and up to 39% on AMD AVX2 systems.

Operations to shuffle activations in engine optimized, resulting in up to 14% speed improvement for batch size 64 pruned quantized MobileNetV1.

Performance improvements made for networks with large output arrays.

Resolved Issues:

Engine no longer fails with an assert when running some quantized networks.

Some Resize operators were not optimized if they had a ROI input.

Memory leak addressed on multi-socket systems when batch size > 1.

Docs and readme corrections made for minor issues and broken links.

Makefile no longer deletes files for docs compilation and cleaning.

Known Issues:

In rare cases where a tensor, used as the input or output to an operation, is larger than 2GB, the engine can segfault. Users should decrease the batch size as a workaround.

In some cases, models running complicated pre- or post-processing steps could diminish the DeepSparse Engine performance by up to a factor of 10x due to hyperthreading, as two engine threads can run on the same physical core. Address the performance issue by trying the following recommended solutions in order of preference:

Enable thread binding

If that does not give performance benefit or you want to try additional options:

Use the numactl utility to prevent the process from running on hyperthreads.

Manually set the thread affinity in Python as follows:

import os from deepsparse.cpu import cpu_architecture ARCH = cpu_architecture() if ARCH.vendor == "GenuineIntel": os.sched_setaffinity(0, range(ARCH.num_physical_cores())) elif ARCH.vendor == "AuthenticAMD": os.sched_setaffinity(0, range(0, 2*ARCH.num_physical_cores(), 2)) else: raise RuntimeError(f"Unknown CPU vendor {ARCH.vendor}")

Source code(tar.gz)
Source code(zip)
deepsparse-0.2.0-cp36-cp36m-manylinux2014_x86_64.whl(30.42 MB)
deepsparse-0.2.0-cp37-cp37m-manylinux2014_x86_64.whl(30.42 MB)
deepsparse-0.2.0-cp38-cp38-manylinux2014_x86_64.whl(30.42 MB)
v0.1.1(Mar 1, 2021)
This is a patch release for 0.1.0 that contains the following changes:

Docs updates: tagline, overview, update to use sparsification for verbiage

Examples updated to use new ResNet-50 pruned_quant moderate model from the SparseZoo

Nightly build dependencies now match on major.minor and not full version

Benchmarking script added for reproducing ResNet-50 numbers

Small (3-5%) performance improvement for pruned quantized ResNet-50 models, for batch size greater than 16

Reduced memory footprint for networks with sparse fully connected layers

Improved performance on multi-socket systems when batch size is larger than 1

Source code(tar.gz)
Source code(zip)
deepsparse-0.1.1-cp36-cp36m-manylinux2014_x86_64.whl(30.39 MB)
deepsparse-0.1.1-cp37-cp37m-manylinux2014_x86_64.whl(30.39 MB)
deepsparse-0.1.1-cp38-cp38-manylinux2014_x86_64.whl(30.39 MB)
v0.1.0(Feb 4, 2021)
Welcome to our initial release on GitHub! Older release notes can be found here.

New Features:

Operator support enabled:

QLinearAdd

2D QLinearMatMul when the second operand is constant

Multi-stream support added for concurrent requests.

Examples for benchmarking, classification flows, detection flows, and Flask servers added.

Jupyter Notebooks for classification and detection flows added.

MakeFile flows and utilities implemented for GitHub repo structure.

Changes:

Software packaging updated to reflect new GitHub distribution channel, from file naming conventions to license enforcement removal.

Initial startup message updated with improved language.

Distribution now manylinux2014 compliant; support for Ubuntu 16.04 deprecated.

QuantizeLinear operations now use division instead of scaling by reciprocal for small quantization scales.

Small performance improvements made on some quantized networks with nontrivial activation zero points.

Resolved Issues:

Networks with sparse quantized convolutions and nontrivial activation zero points now have consistent correct results.

Crash no longer occurs for some models where a quantized depthwise convolution follows a non-depthwise quantized convolution.

Known Issues:

None

Source code(tar.gz)
Source code(zip)
deepsparse-0.1.0-cp36-cp36m-manylinux2014_x86_64.whl(32.06 MB)
deepsparse-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl(32.06 MB)
deepsparse-0.1.0-cp38-cp38-manylinux2014_x86_64.whl(32.06 MB)