Fast sparse deep learning on CPUs

Ziheng Wang

Last update: Nov 30, 2022

Related tags

Deep Learning sparsednn

Overview

SPARSEDNN

**If you want to use this repo, please send me an email: [email protected], or raise a Github issue. **

Fast sparse deep learning on CPUs. This is the kernel library generator described in the paper: https://arxiv.org/abs/2101.07948

Python API: python fastsparse.py. Minimal required dependencies. Should work anywhere.

C++ API: check out driver_cpu.cpp, or run autotune_cpu_random.sh 128 128 128 0. This requires cnpy to read numpy files, so make sure that you can link to cnpy.

Python API has some bad overhead due to using ctypes. This is noticeable for smaller matrices but not really noticeable for large matrices. The benchmarkings done in the Arxiv paper was all done with the C++ API.

Work that is not yet open sourced: kernel generator for sparse convolutions (as described in the Arxiv paper) using implicit convolution, lightweight inference engine to get end-to-end results, sparse int8 kernels. If interested in any of this please email.

FAQs:

How does this compare to Neuralmagic? Last time I checked the deepsparse library does not allow you to run kernel-level benchmarks. If you care about end to end neural network acceleration, you should definitely go with Neuralmagic if they happen to support your model.
Future work? This is not exactly along the lines of my PhD thesis so I work on this sparingly. If you want to contribute to this repo you could make a Pytorch or Tensorflow custom op with the Python or C++ API. However it's unclear how gradients would work, and you will have to compile this op with the fixed sparsity pattern, something that the current Pytorch/Tensorflow frameworks might not support that well.

You might also like...

🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

Cogitare is a Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python. A friendly interface for beginners and a powerful too

Cogitare - Modern and Easy Deep Learning with Python

76 Sep 30, 2022

CPU inference engine that delivers unprecedented performance for sparse models

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads. It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.

1.2k Jan 9, 2023

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

QueryDet-PyTorch This repository is the official implementation of our paper: QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small O

276 Dec 31, 2022

Submanifold sparse convolutional networks

Submanifold Sparse Convolutional Networks This is the PyTorch library for training Submanifold Sparse Convolutional Networks. Spatial sparsity This li

1.8k Jan 6, 2023

Block Sparse movement pruning

Movement Pruning: Adaptive Sparsity by Fine-Tuning Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; ho

54 Dec 20, 2022

PyTorch implementation of: Michieli U. and Zanuttigh P., "Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations", CVPR 2021.

Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations This is the official PyTorch implementation

Multimedia Technology and Telecommunication Lab

42 Nov 9, 2022

PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

143 Dec 28, 2022

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

117 Dec 28, 2022

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

MS-SVConv : 3D Point Cloud Registration with Multi-Scale Architecture and Self-supervised Fine-tuning Compute features for 3D point cloud registration

42 Jul 25, 2022

Comments

bash autotune_cpu_random_int8.cpu 16 16 16, which seems compared failed

On branch amx_sparse

case1:

sparsednn$ bash autotune_cpu_random_int8.cpu 128 128 128
density 0.101318359375
False
Generating X86 vector intrinsics
(128, 128)
Reduced A dimension 128
128 128
 == Load shared library ==
== at 20.56021 milliseconds ==
 445553
 == spmm microkernel ==
== at 0.00243 milliseconds ==
 == 445553 reps ==
Difference:  0
(array([], dtype=int64), array([], dtype=int64))
Best Runtime 100000

case2:

sparsednn$ bash autotune_cpu_random_int8.cpu 16 16 16
density 0.15625
False
Generating X86 vector intrinsics
(16, 16)
Reduced A dimension 12
16 16
 == Load shared library ==
== at 12.99022 milliseconds ==
 4538234
 == spmm microkernel ==
== at 0.00009 milliseconds ==
 == 4538234 reps ==
Difference:  160
(array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 14,
       14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14]), array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  0,
        1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]))
Best Runtime 100000

opened by yi1zhao 1

It seems that the quantitative input is missing
Hello, thank you for your contribution to sparse.

In the autotune_cpu_random.sh, You use driver_cpu.cpp to call spmm API. Issue is, there seems to be a lack of scale and zero_point data pointers in the input parameters?

Details: The input parameter is,

struct thread_data { const int8_t * __restrict__ AB_val; const int * __restrict__ AB_bias; const int8_t * __restrict__ BC; int8_t * AC; int start; int end; };

If the external interface of SPMM is:

s8xs8->s8.

Then its internal logic is:

s8xs8->s32->fp32->s8.

The quantitative formula of DST(s32) is:

dst_int8 = (scale_src * scale_weights / scale_dst) * dst_int32.

See for details: https://oneapi-src.github.io/oneDNN/dev_guide_attributes_quantization.html
opened by yi1zhao 1

Compilation error when running command: autotune_cpu_random.sh

Hello, thank you for your contribution to sparse.

I ran spmm's example, but the compiler reported an error: ./autotune_cpu_random.sh 128 128 128 1

~/sparsednn$ ./autotune_cpu_random.sh 128 128 128 1
Generating X86 vector intrinsics
(128, 128)
Reduced A dimension 128
/opt/intel/oneapi/compiler/2021.4.0/modulefiles/icc: line 100: syntax error: unexpected end of file
./autotune_cpu_random.sh: line 24: ./test: No such file or directory

opened by yi1zhao 1

Fast sparse deep learning on CPUs

Related tags

Overview

SPARSEDNN

You might also like...

🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

CPU inference engine that delivers unprecedented performance for sparse models

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

Submanifold sparse convolutional networks

Block Sparse movement pruning

PyTorch implementation of: Michieli U. and Zanuttigh P., "Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations", CVPR 2021.

PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

Comments

bash autotune_cpu_random_int8.cpu 16 16 16, which seems compared failed

It seems that the quantitative input is missing

Compilation error when running command: autotune_cpu_random.sh

Owner

Ziheng Wang

Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Sdf sparse conv - Deep Learning on SDF for Classifying Brain Biomarkers

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Improving Deep Network Debuggability via Sparse Decision Layers

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

Efficient Sparse Attacks on Videos using Reinforcement Learning

Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".