Fast sparse deep learning on CPUs

Overview

SPARSEDNN

**If you want to use this repo, please send me an email: [email protected], or raise a Github issue. **

Fast sparse deep learning on CPUs. This is the kernel library generator described in the paper: https://arxiv.org/abs/2101.07948

Python API: python fastsparse.py. Minimal required dependencies. Should work anywhere.

C++ API: check out driver_cpu.cpp, or run autotune_cpu_random.sh 128 128 128 0. This requires cnpy to read numpy files, so make sure that you can link to cnpy.

Python API has some bad overhead due to using ctypes. This is noticeable for smaller matrices but not really noticeable for large matrices. The benchmarkings done in the Arxiv paper was all done with the C++ API.

Work that is not yet open sourced: kernel generator for sparse convolutions (as described in the Arxiv paper) using implicit convolution, lightweight inference engine to get end-to-end results, sparse int8 kernels. If interested in any of this please email.

FAQs:

  1. How does this compare to Neuralmagic? Last time I checked the deepsparse library does not allow you to run kernel-level benchmarks. If you care about end to end neural network acceleration, you should definitely go with Neuralmagic if they happen to support your model.
  2. Future work? This is not exactly along the lines of my PhD thesis so I work on this sparingly. If you want to contribute to this repo you could make a Pytorch or Tensorflow custom op with the Python or C++ API. However it's unclear how gradients would work, and you will have to compile this op with the fixed sparsity pattern, something that the current Pytorch/Tensorflow frameworks might not support that well.
You might also like...
🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python
🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

Cogitare is a Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python. A friendly interface for beginners and a powerful too

CPU inference engine that delivers unprecedented performance for sparse models
CPU inference engine that delivers unprecedented performance for sparse models

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads. It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

QueryDet-PyTorch This repository is the official implementation of our paper: QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small O

Submanifold sparse convolutional networks
Submanifold sparse convolutional networks

Submanifold Sparse Convolutional Networks This is the PyTorch library for training Submanifold Sparse Convolutional Networks. Spatial sparsity This li

Block Sparse movement pruning
Block Sparse movement pruning

Movement Pruning: Adaptive Sparsity by Fine-Tuning Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; ho

PyTorch implementation of: Michieli U. and Zanuttigh P.,
PyTorch implementation of: Michieli U. and Zanuttigh P., "Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations", CVPR 2021.

Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations This is the official PyTorch implementation

PyTorch code for our paper
PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.
A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture
Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

MS-SVConv : 3D Point Cloud Registration with Multi-Scale Architecture and Self-supervised Fine-tuning Compute features for 3D point cloud registration

Comments
  • bash autotune_cpu_random_int8.cpu 16 16 16, which seems compared failed

    bash autotune_cpu_random_int8.cpu 16 16 16, which seems compared failed

    On branch amx_sparse

    case1:

    sparsednn$ bash autotune_cpu_random_int8.cpu 128 128 128
    density 0.101318359375
    False
    Generating X86 vector intrinsics
    (128, 128)
    Reduced A dimension 128
    128 128
     == Load shared library ==
    == at 20.56021 milliseconds ==
     445553
     == spmm microkernel ==
    == at 0.00243 milliseconds ==
     == 445553 reps ==
    Difference:  0
    (array([], dtype=int64), array([], dtype=int64))
    Best Runtime 100000
    

    case2:

    sparsednn$ bash autotune_cpu_random_int8.cpu 16 16 16
    density 0.15625
    False
    Generating X86 vector intrinsics
    (16, 16)
    Reduced A dimension 12
    16 16
     == Load shared library ==
    == at 12.99022 milliseconds ==
     4538234
     == spmm microkernel ==
    == at 0.00009 milliseconds ==
     == 4538234 reps ==
    Difference:  160
    (array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 14,
           14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14]), array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  0,
            1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]))
    Best Runtime 100000
    
    opened by yi1zhao 1
  • It seems that the quantitative input is missing

    It seems that the quantitative input is missing

    Hello, thank you for your contribution to sparse.

    In the autotune_cpu_random.sh, You use driver_cpu.cpp to call spmm API. Issue is, there seems to be a lack of scale and zero_point data pointers in the input parameters?

    Details: The input parameter is,

    struct thread_data {
            const int8_t * __restrict__ AB_val;
            const int * __restrict__ AB_bias;
            const int8_t * __restrict__ BC;
            int8_t * AC;
            int start;
            int end;
    };
    

    If the external interface of SPMM is:

    s8xs8->s8.
    

    Then its internal logic is:

    s8xs8->s32->fp32->s8.
    

    The quantitative formula of DST(s32) is:

    dst_int8 = (scale_src * scale_weights / scale_dst) * dst_int32.
    

    See for details: https://oneapi-src.github.io/oneDNN/dev_guide_attributes_quantization.html

    opened by yi1zhao 1
  • Compilation error when running command: autotune_cpu_random.sh

    Compilation error when running command: autotune_cpu_random.sh

    Hello, thank you for your contribution to sparse.

    I ran spmm's example, but the compiler reported an error: ./autotune_cpu_random.sh 128 128 128 1

    ~/sparsednn$ ./autotune_cpu_random.sh 128 128 128 1
    Generating X86 vector intrinsics
    (128, 128)
    Reduced A dimension 128
    /opt/intel/oneapi/compiler/2021.4.0/modulefiles/icc: line 100: syntax error: unexpected end of file
    ./autotune_cpu_random.sh: line 24: ./test: No such file or directory
    
    opened by yi1zhao 1
Owner
Ziheng Wang
Ziheng Wang
Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Differentiable Neural Computers and family, for Pytorch Includes: Differentiable Neural Computers (DNC) Sparse Access Memory (SAM) Sparse Differentiab

ixaxaar 302 Dec 14, 2022
Sdf sparse conv - Deep Learning on SDF for Classifying Brain Biomarkers

Deep Learning on SDF for Classifying Brain Biomarkers To reproduce the results f

null 1 Jan 25, 2022
CondenseNet V2: Sparse Feature Reactivation for Deep Networks

CondenseNetV2 This repository is the official Pytorch implementation for "CondenseNet V2: Sparse Feature Reactivation for Deep Networks" paper by Le Y

Haojun Jiang 74 Dec 12, 2022
Improving Deep Network Debuggability via Sparse Decision Layers

Improving Deep Network Debuggability via Sparse Decision Layers This repository contains the code for our paper: Leveraging Sparse Linear Layers for D

Madry Lab 35 Nov 14, 2022
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

null 111 Dec 29, 2022
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

SGLKT-VisDial Pytorch Implementation for the paper: Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer Gi-Cheon Kang, Junseok P

Gi-Cheon Kang 9 Jul 5, 2022
Efficient Sparse Attacks on Videos using Reinforcement Learning

EARL This repository provides a simple implementation of the work "Efficient Sparse Attacks on Videos using Reinforcement Learning" Example: Demo: Her

null 12 Dec 5, 2021
Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

Ziyue Feng 72 Dec 9, 2022
This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".

?? Sparse Spatial Transformers for Few-Shot Learning This code implements the Sparse Spatial Transformers for Few-Shot Learning(SSFormers). Our code i

chx_nju 38 Dec 13, 2022