paper) .
1xN Pattern for Pruning Convolutional Neural Networks (This is Pytorch re-implementation of "1xN Pattern for Pruning Convolutional Neural Networks". A more formal project will be released as soon as we are given the authority from Alibaba Group.
1) 1×N Block Pruning
Requirements
- Python 3.7
- Pytorch >= 1.0.1
- CUDA = 10.0.0
Code Running
To reproduce our experiments, please use the following command:
python imagenet.py \
--gpus 0 \
--arch mobilenet_v1 (or mobilenet_v2 or mobilenet_v3_large or mobilenet_v3_small) \
--job_dir ./experiment/ \
--data_path [DATA_PATH] \
--pretrained_model [PRETRAIN_MODEL_PATH] \
--pr_target 0.5 \
--N 4 (or 2, 8, 16, 32) \
--conv_type BlockL1Conv \
--train_batch_size 256 \
--eval_batch_size 256 \
--rearrange \
Accuracy Performance
Table 1: Performance comparison of our 1×N block sparsity against weight pruning and filter pruning (p = 50%).
MobileNet-V1 | Top-1 Acc. | Top-5 Acc. | Model Link |
---|---|---|---|
Weight Pruning | 70.764 | 89.592 | Pruned Model |
Filter Pruning | 65.348 | 86.264 | Pruned Model |
1 x 2 Block | 70.281 | 89.370 | Pruned Model |
1 x 4 Block | 70.052 | 89.056 | Pruned Model |
1 x 8 Block | 69.908 | 89.027 | Pruned Model |
1 x 16 Block | 69.559 | 88.933 | Pruned Model |
1 x 32 Block | 69.541 | 88.801 | Pruned Model |
MobileNet-V2 | Top-1 Acc. | Top-5 Acc. | Model Link |
---|---|---|---|
Weight Pruning | 71.146 | 89.872 | Pruned Model |
Filter Pruning | 66.730 | 87.190 | Pruned Model |
1 x 2 Block | 70.233 | 89.417 | Pruned Model |
1 x 4 Block | 60.706 | 89.165 | Pruned Model |
1 x 8 Block | 69.372 | 88.862 | Pruned Model |
1 x 16 Block | 69.352 | 88.708 | Pruned Model |
1 x 32 Block | 68.762 | 88.425 | Pruned Model |
MobileNet-V3-small | Top-1 Acc. | Top-5 Acc. | Model Link |
---|---|---|---|
Weight Pruning | 66.376 | 86.868 | Pruned Model |
Filter Pruning | 59.054 | 81.713 | Pruned Model |
1 x 2 Block | 65.380 | 86.060 | Pruned Model |
1 x 4 Block | 64.465 | 85.495 | Pruned Model |
1 x 8 Block | 64.101 | 85.274 | Pruned Model |
1 x 16 Block | 63.126 | 84.203 | Pruned Model |
1 x 32 Block | 62.881 | 83.982 | Pruned Model |
MobileNet-V3-large | Top-1 Acc. | Top-5 Acc. | Model Link |
---|---|---|---|
Weight Pruning | 72.897 | 91.093 | Pruned Model |
Filter Pruning | 69.137 | 89.097 | Pruned Model |
1 x 2 Block | 72.120 | 90.677 | Pruned Model |
1 x 4 Block | 71.935 | 90.458 | Pruned Model |
1 x 8 Block | 71.478 | 90.163 | Pruned Model |
1 x 16 Block | 71.112 | 90.129 | Pruned Model |
1 x 32 Block | 70.769 | 89.696 | Pruned Model |
More links for pruned models under different pruning rates and their training logs can be found in MobileNet-V2 and ResNet-50.
Evaluate our models
To verify the performance of our pruned models, download our pruned models from the links provided above and run the following command:
python imagenet.py \
--gpus 0 \
--arch mobilenet_v1 (or mobilenet_v2 or mobilenet_v3_large or mobilenet_v3_small) \
--data_path [DATA_PATH] \
--conv_type DenseConv \
--evaluate [PRUNED_MODEL_PATH] \
--eval_batch_size 256 \
Arguments
optional arguments:
-h, --help show this help message and exit
--gpus Select gpu_id to use. default:[0]
--data_path The dictionary where the data is stored.
--job_dir The directory where the summaries will be stored.
--resume Load the model from the specified checkpoint.
--pretrain_model Path of the pre-trained model.
--pruned_model Path of the pruned model to evaluate.
--arch Architecture of model. For ImageNet :mobilenet_v1, mobilenet_v2, mobilenet_v3_small, mobilenet_v3_large
--num_epochs The num of epochs to train. default:180
--train_batch_size Batch size for training. default:256
--eval_batch_size Batch size for validation. default:100
--momentum Momentum for Momentum Optimizer. default:0.9
--lr LR Learning rate. default:1e-2
--lr_decay_step The iterval of learn rate decay for cifar. default:100 150
--lr_decay_freq The frequecy of learn rate decay for Imagenet. default:30
--weight_decay The weight decay of loss. default:4e-5
--lr_type lr scheduler. default: cos. optional:exp/cos/step/fixed
--use_dali If this parameter exists, use dali module to load ImageNet data (benefit in training acceleration).
--conv_type Importance criterion of filters. Default: BlockL1Conv. optional: BlockRandomConv, DenseConv
--pr_target Pruning rate. default:0.5
--full If this parameter exists, prune fully-connected layer.
--N Consecutive N kernels for removal (see paper for details).
--rearrange If this parameter exists, filters will be rearranged (see paper for details).
--export_onnx If this parameter exists, export onnx model.
2)Filter Rearrangement
Table 2: Performance studies of our 1×N block sparsity with and without filter rearrangement (p=50%).
N = 2 | Top-1 Acc. | Top-5 Acc. | Model Link |
---|---|---|---|
w/o Rearange | 69.900 | 89.296 | Pruned Model |
Rearrange | 70.233 | 89.417 | Pruned Model |
N = 4 | Top-1 Acc. | Top-5 Acc. | Model Link |
---|---|---|---|
w/o Rearange | 69.521 | 88.920 | Pruned Model |
Rearrange | 69.579 | 88.944 | Pruned Model |
N = 8 | Top-1 Acc. | Top-5 Acc. | Model Link |
---|---|---|---|
w/o Rearange | 69.206 | 88.608 | Pruned Model |
Rearrange | 69.372 | 88.862 | Pruned Model |
N = 16 | Top-1 Acc. | Top-5 Acc. | Model Link |
---|---|---|---|
w/o Rearange | 68.971 | 88.399 | Pruned Model |
Rearrange | 69.352 | 88.708 | Pruned Model |
N = 32 | Top-1 Acc. | Top-5 Acc. | Model Link |
---|---|---|---|
w/o Rearange | 68.431 | 88.315 | Pruned Model |
Rearrange | 68.762 | 88.425 | Pruned Model |
3)Encoding and Decoding Efficiency
Performance and latency comparison
Our sparse convolution implementation has been released to TVM community.
To verify the performance of our pruned models, convert onnx model and run the following command:
python model_tune.py \
--onnx_path [ONNX_MODEL_PATH] \
--bsr 4 \
--bsc 1 \
--sparsity 0.5
The detail tuning setting is referred to TVM.
4)Contact
Any problem regarding this code re-implementation, please contact the first author: [email protected] or the third author: [email protected].
Any problem regarding the sparse convolution implementation, please contact the second author: [email protected].