A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Rainforest Wang

Last update: Oct 28, 2022

Related tags

Deep Learning GFNet-Pytorch

Overview

GFNet-Pytorch (NeurIPS 2020)

This repo contains the official code and pre-trained models for the glance and focus network (GFNet).

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classiﬁcation

Citation

@inproceedings{NeurIPS2020_7866,
        title = {Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification},
       author = {Wang, Yulin and Lv, Kangchen and Huang, Rui and Song, Shiji and Yang, Le and Huang, Gao},
    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
         year = {2020},
}

Update on 2020/10/08: Release Pre-trained Models and the Inference Code on ImageNet.

Update on 2020/12/28: Release Training Code.

Introduction

Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efﬁcient image classiﬁcation by processing a sequence of relatively small inputs, which are strategically cropped from the original image. Experiments on ImageNet show that our method consistently improves the computational efﬁciency of a wide variety of deep models. For example, it further reduces the average latency of the highly efﬁcient MobileNet-V3 on an iPhone XS Max by 20% without sacriﬁcing accuracy.

Results

Top-1 accuracy on ImageNet v.s. Multiply-Adds

Top-1 accuracy on ImageNet v.s. Inference Latency (ms) on an iPhone XS Max

Visualization

Pre-trained Models

Backbone CNNs	Patch Size	T	Links
ResNet-50	96x96	5	Tsinghua Cloud / Google Drive
ResNet-50	128x128	5	Tsinghua Cloud / Google Drive
DenseNet-121	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-169	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-201	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-600MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-800MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-1.6GF	96x96	5	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	96x96	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	128x128	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.25)	128x128	3	Tsinghua Cloud / Google Drive
EfﬁcientNet-B2	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	144x144	4	Tsinghua Cloud / Google Drive

What are contained in the checkpoints:

**.pth.tar
├── model_name: name of the backbone CNNs (e.g., resnet50, densenet121)
├── patch_size: size of image patches (i.e., H' or W' in the paper)
├── model_prime_state_dict, model_state_dict, fc, policy: state dictionaries of the four components of GFNets
├── model_flops, policy_flops, fc_flops: Multiply-Adds of inferring the encoder, patch proposal network and classifier for once
├── flops: a list containing the Multiply-Adds corresponding to each length of the input sequence during inference
├── anytime_classification: results of anytime prediction (in Top-1 accuracy)
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

python 3.7.7
pytorch 1.3.1
torchvision 0.4.2
pyyaml 5.3.1 (for RegNets)

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Training

Here we take training ResNet-50 (96x96, T=5) for example. All the used initialization models and stage-1/2 checkpoints can be found in Tsinghua Cloud / Google Drive. Currently, this link includes ResNet and MobileNet-V3. We will update it as soon as possible. If you need other helps, feel free to contact us.
The Results in the paper is based on 2 Tesla V100 GPUs. For most of experiments, up to 4 Titan Xp GPUs may be enough.

Training stage 1, the initializations of global encoder (model_prime) and local encoder (model) are required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 1 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --model_prime_path PATH_TO_CHECKPOINTS  --model_path PATH_TO_CHECKPOINTS

Training stage 2, a stage-1 checkpoint is required:

CUDA_VISIBLE_DEVICES=0 python train.py --data_url PATH_TO_DATASET --train_stage 2 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Training stage 3, a stage-2 checkpoint is required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 3 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: [email protected].

Acknowledgment

Our code of MobileNet-V3 and EfficientNet is from here. Our code of RegNet is from here.

To Do

Update the code for visualizing.
Update the code for MIXED PRECISION TRAINING。

Comments

very nice work!

very great work,great paper ,l have a question just is for example,in first layer Classifier fc,l know fc is RNN,but l dont know what is the fc to next fc pass through h1C,thanks,vevy nice!

opened by ljwwwiop 1
How do you plot the MobileNet-V3,EfficientNet,etc inFigure4

Thanks for your great work which inspires me a lot. I understand the process of budgeted batch classification, you set a threshold to every classifier, so you can calcuate the FLOPs via different stages with a rate. But in the baseline in figure4 just like MobileNet-V3,EfficientNet, how do you get the result of multi flops with different accuracy. You esemble them with early-exit? or other settings? Thank you.

opened by Liuyang829 1
Where does your 'dynamic_threshold' come from?

I find you have dynamic_threshold = checkpoint['dynamic_threshold '] in inference.py and this code is necessary when eval_mode=1. But where does 'dynamic_threshold' come from? You did not save this value when you saved you checkpoint.

opened by Chauncey-Wang 0
How to set flops in inference.py

I am interested in your project and your paper. I find flops = checkpoint['flops'] in inference.py. But when I train a new model by myself, I couldn't find 'flops' during the process of saving checkpoint. I think I can set this parameter manually. But what should I based on, if I want to set this parameter manually? Looking forward to your reply.

老哥，我在跑你代码的时候，看到了inference.py里有个flops = checkpoint['flops']，但是，我看了训练和保存模型代码，这些部分并没有设定这个flops，我就想在inference过程里手动设定一下，但是我应该按啥标准来设定这个列表呀？期待你的回复。

opened by Chauncey-Wang 0
How to reproduce this code?

Thanks for your great work which inspires me a lot. I'm a beginner and I plan to use GFnet to classify my data, what platform does your code need to run under? and how does it work? Ubuntu18.04? What software do you use? I hope you can reply, thank you.

opened by DoubleHui97 0
Test Code

Hello, Thanks for your great work, it's a really fascinating Can you release the code for visualization ? I mean how can we see the sequence of patches in network ?

opened by AminOwfsKi 0

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Related tags

Overview

GFNet-Pytorch (NeurIPS 2020)

Introduction

Results

Pre-trained Models

Requirements

Evaluate Pre-trained Models

Training

Contact

Acknowledgment

To Do

Comments

very nice work!

How do you plot the MobileNet-V3,EfficientNet,etc inFigure4

Where does your 'dynamic_threshold' come from?

How to set flops in inference.py

How to reproduce this code?

Test Code

Owner

Rainforest Wang

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

Official Pytorch implementation of paper "Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images"

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad to your characters in Modo.

Artificial intelligence technology inferring issues and logically supporting facts from raw text

Inferring Lexicographically-Ordered Rewards from Preferences

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

Code for "Adversarial Attack Generation Empowered by Min-Max Optimization", NeurIPS 2021

Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Predict the latency time of the deep learning models

Parasite: a tool allowing you to compress and decompress files, to reduce their size

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.