Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Last update: Dec 30, 2022

Related tags

Deep Learning JCW

Overview

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Abstract

For practical deep neural network design on mobile devices, it is essential to consider the constraints incurred by the computational resources and the inference latency in various applications. Among deep network acceleration related approaches, pruning is a widely adopted practice to balance the computational resource consumption and the accuracy, where unimportant connections can be removed either channel-wisely or randomly with a minimal impact on model accuracy. The channel pruning instantly results in a significant latency reduction, while the random weight pruning is more flexible to balance the latency and accuracy. In this paper, we present a unified framework with Joint Channel pruning and Weight pruning (JCW), and achieves a better Pareto-frontier between the latency and accuracy than previous model compression approaches. To fully optimize the trade-off between the latency and accuracy, we develop a tailored multi-objective evolutionary algorithm in the JCW framework, which enables one single search to obtain the optimal candidate architectures for various deployment requirements. Extensive experiments demonstrate that the JCW achieves a better trade-off between the latency and accuracy against various state-of-the-art pruning methods on the ImageNet classification dataset.

Framework

Evaluation

Resnet18

Method	Latency/ms	Accuracy
Uniform 1x	537	69.8
DMCP	341	69.7
APS	363	70.3
JCW	160	69.2
	194	69.7
	196	69.9
	224	70.2

MobileNetV1

Method	Latency/ms	Accuracy
Uniform 1x	167	70.9
Uniform 0.75x	102	68.4
Uniform 0.5x	53	64.4
AMC	94	70.7
Fast	61	68.4
AutoSlim	99	71.5
AutoSlim	55	67.9
USNet	102	69.5
USNet	53	64.2
JCW	31	69.1
	39	69.9
	43	69.8
	54	70.3
	69	71.4

MobileNetV2

Method	Latency/ms	Accuracy
Uniform 1x	114	71.8
Uniform 0.75x	71	69.8
Uniform 0.5x	41	65.4
APS	110	72.8
APS	64	69.0
DMCP	83	72.4
DMCP	45	67.0
DMCP	43	66.1
Fast	89	72.0
Fast	62	70.2
JCW	30	69.1
	40	69.9
	44	70.8
	59	72.2

Requirements

torch
torchvision
numpy
scipy

Usage

The JCW works in a two-step fashion. i.e. the search step and the training step. The search step seaches for the layer-wise channel numbers and weight sparsity for Pareto-optimal models. The training steps trains the searched models with ADMM. We give a simple example for resnet18.

The search step

Modify the configuration file

First, open the file experiments/res18-search.yaml:
```
vim experiments/res18-search.yaml
```
Go to the 44th line and find the following codes:
```
DATASET:
  data: ImageNet
  root: /path/to/imagenet
  ...
```
and modify the root property of DATASET to the path of ImageNet dataset on your machine.
Apply the search

After modifying the configuration file, you can simply start the search by:
```
python emo_search.py --config experiments/res18-search.yaml | tee experiments/res18-search.log
```
After searching, the search results will be saved in experiments/search.pth

The training step

After searching, we can train the searched models by:

Modify the base configuration file

Open the file experiments/res18-train.yaml:
```
vim experiments/res18-train.yaml
```
Go to the 5th line, find the following codes:
```
root: &root /path/to/imagenet
```
and modify the root property to the path of ImageNet dataset on your machine.
Generate configuration files for training

After modifying the base configuration file, we are ready to generate the configuration files for training. To do that, simply run the following command:
```
python scripts/generate_training_configs.py --base-config experiments/res18-train.yaml --search-result experiments/search.pth --output ./train-configs 
```
After running the above command, the training configuration files will be written into ./train-configs/model-{id}/train.yaml.
Apply the training

After generating the configuration files, simply run the following command to train one certain model:
```
python train.py --config xxxx/xxx/train.yaml | tee xxx/xxx/train.log
```

Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

46.1k Feb 13, 2021

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

MASA-SR Official PyTorch implementation of our CVPR2021 paper MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Re

126 Dec 20, 2022

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

The SLIDE package contains the source code for reproducing the main experiments in this paper. Dataset The Datasets can be downloaded in Amazon-

72 Dec 16, 2022

DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

DI-HPC: Decision Intelligence - High Performance Computation DI-HPC is an acceleration operator component for general algorithm modules in reinforceme

185 Dec 29, 2022

Neural Fixed-Point Acceleration for Convex Optimization

Licensing The majority of neural-scs is licensed under the CC BY-NC 4.0 License, however, portions of the project are available under separate license

27 Oct 6, 2022

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa

3 Sep 23, 2022

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

2 Sep 20, 2022

PyTorch Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning"

PyTorch Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning".

8 Dec 8, 2022

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Differentiable Model Compression via Pseudo Quantization Noise DiffQ performs differentiable quantization using pseudo quantization noise. It can auto

145 Dec 30, 2022

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Related tags

Overview

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Abstract

Framework

Evaluation

Resnet18

MobileNetV1

MobileNetV2

Requirements

Usage

The search step

The training step

You might also like...

Tensors and Dynamic neural networks in Python with strong GPU acceleration

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

Neural Fixed-Point Acceleration for Convex Optimization

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

PyTorch Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning"

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Owner

Convert weight file.pth to weight file.blob

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

CondenseNet: Light weighted CNN for mobile devices

Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

Tensors and Dynamic neural networks in Python with strong GPU acceleration