Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

MEGVII Research

Last update: Jan 5, 2023

Related tags

Deep Learning Arch-Net

Overview

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

The official implementation of Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

TL;DR Arch-Net is a family of neural networks made up of simple and efficient operators. When a Arch-Net is produced, less common network constructs, like Layer Normalization and Embedding Layers, are eliminated in a progressive manner through label-free Blockwise Model Distillation, while performing sub-eight bit quantization at the same time to maximize performance. For the classification task, only 30k unlabeled images randomly sampled from ImageNet dataset is needed.

Main Results

ImageNet Classification

Model	Bit Width	Top1	Top5
Arch-Net_Resnet18	32w32a	69.76	89.08
Arch-Net_Resnet18	2w4a	68.77	88.66
Arch-Net_Resnet34	32w32a	73.30	91.42
Arch-Net_Resnet34	2w4a	72.40	91.01
Arch-Net_Resnet50	32w32a	76.13	92.86
Arch-Net_Resnet50	2w4a	74.56	92.39
Arch-Net_MobilenetV1	32w32a	68.79	88.68
Arch-Net_MobilenetV1	2w4a	67.29	88.07
Arch-Net_MobilenetV2	32w32a	71.88	90.29
Arch-Net_MobilenetV2	2w4a	69.09	89.13

Multi30k Machine Translation

Model	translation direction	Bit Width	BLEU
Transformer	English to Gemany	32w32a	32.44
Transformer	English to Gemany	2w4a	33.75
Transformer	English to Gemany	4w4a	34.35
Transformer	English to Gemany	8w8a	36.44
Transformer	Gemany to English	32w32a	30.32
Transformer	Gemany to English	2w4a	32.50
Transformer	Gemany to English	4w4a	34.34
Transformer	Gemany to English	8w8a	34.05

Dependencies

python == 3.6

refer to requirements.txt for more details

Data Preparation

Download ImageNet and multi30k data(google drive or BaiduYun, code: 8brd) and put them in ./arch-net/data/ as follow:

./data/
├── imagenet
│   ├── train
│   ├── val
├── multi30k

Download teacher models at google drive or BaiduYun(code: 57ew) and put them in ./arch-net/models/teacher/pretrained_models/

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

train and evaluate

cd ./train_imagenet

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn

evaluate if you already have the trained models

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn --evaluate

Machine Translation

train a arch-net_transformer of 2w4a

cd ./train_transformer

python3 train_archnet_transformer.py --translate_direction en2de --teacher_model_path ../models/teacher/pretrained_models/transformer_en_de.chkpt --data_pkl ../data/multi30k/m30k_ende_shr.pkl --batch_size 48 --final_epochs 50 --weight_bit 2 --feature_bit 4 --lr 1e-3 --weight_decay 1e-6 --label_smoothing

for arch-net_transformer of 8w8a, use the lr of 1e-3 and the weight decay of 1e-4

evaluate

cd ./evaluate

python3 translate.py --data_pkl ./data/multi30k/m30k_ende_shr.pkl --model path_to_the_outptu_directory/model_max_acc.chkpt

to get the BLEU of the evaluated results, go to this website, and then upload 'predictions.txt' in the output directory and the 'gt_en.txt' or 'gt_de.txt' in ./arch-net/data_gt/multi30k/

Citation

If you find this project useful for your research, please consider citing the paper.

@misc{xu2021archnet,
      title={Arch-Net: Model Distillation for Architecture Agnostic Model Deployment}, 
      author={Weixin Xu and Zipeng Feng and Shuangkang Fang and Song Yuan and Yi Yang and Shuchang Zhou},
      year={2021},
      eprint={2111.01135},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgements

attention-is-all-you-need-pytorch

LSQuantization

pytorch-mobilenet-v1

Contact

If you have any questions, feel free to open an issue or contact us at [email protected].

Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

308 Jan 5, 2023

code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

39 Dec 17, 2022

A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

210 Dec 21, 2022

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

PSPNet-logits and feature-distillation Introduction This repository is based on PSPNet and modified from semseg and Pixelwise_Knowledge_Distillation_P

6 Dec 1, 2022

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch-maml This is a PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML): https://arxiv

516 Jan 5, 2023

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

47 Dec 28, 2022

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J

AI Wizards for Software Management (AWSM) Research Group

14 Nov 13, 2022

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

15 Oct 14, 2022

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Related tags

Overview

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

Main Results

Dependencies

Data Preparation

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

Machine Translation

Citation

Acknowledgements

Contact

You might also like...

Rethinking the U-Net architecture for multimodal biomedical image segmentation

code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

Owner

MEGVII Research

The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework that ensures reliability, high concurrency and scalability of services.

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

A task-agnostic vision-language architecture as a step towards General Purpose Vision

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

U-2-Net: U Square Net - Modified for paired image training of style transfer

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.