PyTorch source code for Distilling Knowledge by Mimicking Features

Guo-Hua Wang

Last update: Dec 17, 2022

Related tags

Overview

LSHFM.detection

This is the PyTorch source code for Distilling Knowledge by Mimicking Features. And this project contains code for object detection with mimicking features. For image classification, please visit LSHFM.classification.

dependence

python
pytorch 1.7.1
torchvision 0.8.2

Prepare the dataset

Please prepare the COCO and VOC datasets by youself. Then you need to fix the get_data_path function in src/dataset/coco_utils.py and src/dataset/voc_utils.py.

Run

You can run the experiments by

PORT=4444 bash experiments/[script name].sh 0,1,2,3

the training set contains VOC2007 trainval and VOC2012 trainval, while the testing set is VOC2007 test.

We train all models by 24 epochs while the learning rate decays at the 18th and 22th epoch.

Faster R-CNN

Before you run the KD experiments, please make sure the teacher model weight have been saved in pretrained. You can first run ResNet101 baseline and VGG16 baseline to train the teacher model, and then move the model to pretrained and edit --teacher-ckpt in the training shell scripts. You can also download voc0712_fasterrcnn_r101_83.6 and voc0712_fasterrcnn_vgg16fpn_79.0 directly, and move them to pretrained.

ResNet101 baseline: voc0712_fasterrcnn_r101_baseline.sh
ResNet50 baseline: voc0712_fasterrcnn_r50_baseline.sh
ResNet50@ResNet101 L2: voc0712_fasterrcnn_r50_r101_l2.sh
ResNet50@ResNet101 LSH: voc0712_fasterrcnn_r50_r101_lsh.sh
ResNet50@ResNet101 LSHL2: voc0712_fasterrcnn_r50_r101_lshl2.sh
VGG16 baseline: voc0712_fasterrcnn_vgg11fpn_baseline.sh
VGG11 baseline: voc0712_fasterrcnn_vgg16fpn_baseline.sh
VGG11@VGG16 L2: voc0712_fasterrcnn_vgg11fpn_vgg16fpn_l2.sh
VGG11@VGG16 LSH: voc0712_fasterrcnn_vgg11fpn_vgg16fpn_lsh.sh
VGG11@VGG16 LSHL2: voc0712_fasterrcnn_vgg11fpn_vgg16fpn_lshl2.sh

	ResNet50@ResNet101	VGG11@VGG16
Teacher	83.6	79.0
Student	82.0	75.1
L2	83.0	76.8
LSH	82.6	76.7
LSHL2	83.0	77.2

RetinaNet

As mentioned in Faster R-CNN, please make sure there are teacher models in pretrained. You can download the teacher models in voc0712_retinanet_r101_83.0.ckpt and voc0712_retinanet_vgg16fpn_76.6.ckpt.

ResNet101 baseline: voc0712_retinanet_r101_baseline.sh
ResNet50 baseline: voc0712_retinanet_r50_baseline.sh
ResNet50@ResNet101 L2: voc0712_retinanet_r50_r101_l2.sh
ResNet50@ResNet101 LSHL2: voc0712_retinanet_r50_r101_lshl2.sh
VGG16 baseline: voc0712_retinanet_vgg11fpn_baseline.sh
VGG11 baseline: voc0712_retinanet_vgg16fpn_baseline.sh
VGG11@VGG16 L2: voc0712_retinanet_vgg11fpn_vgg16fpn_l2.sh
VGG11@VGG16 LSHL2: voc0712_retinanet_vgg11fpn_vgg16fpn_lshl2.sh

	ResNet50@ResNet101	VGG11@VGG16
Teacher	83.0	76.6
Student	82.5	73.2
L2	82.6	74.8
LSHL2	83.0	75.2

We find that it is easy to get NaN loss when training by LSH KD.

visualize

visualize the ground truth label

python src/visual.py --dataset voc07 --idx 1 --gt

visualize the model prediction

python src/visual.py --dataset voc07 --idx 2 --model fasterrcnn_resnet50_fpn --checkpoint results/voc0712/fasterrcnn_resnet50_fpn/2020-12-11_20\:14\:09/model_13.pth

Citing this repository

If you find this code useful in your research, please consider citing us:

@article{LSHFM,
  title={Distilling knowledge by mimicking features},
  author={Wang, Guo-Hua and Ge, Yifan and Wu, Jianxin},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2021},
}

Acknowledgement

This project is based on https://github.com/pytorch/vision/tree/master/references/detection. This project aims at object detection, so I remove the code about segmentation and keypoint detection.

Comments

A question about COCO results in your paper

hi, thanks for your sharing ~ In your paper table 15, the result of resnet50 on COCO dataset mAP is up to 77.16,I was wondering how you calculated? Is it the official calculation method? so far, SOTA of COCO detection is 63.3mAP, How did you to improve to such a high level of precision?

opened by peiyingxin 1

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Static Features Classifier This is a static features classifier for Point-Could

1 Jan 25, 2022

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Knowledge-Inheritance Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq fo

31 Nov 19, 2022

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022

This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

9 Sep 1, 2022

source code for 'Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge' by A. Shah, K. Shanmugam, K. Ahuja

Source code for "Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge" Reference: Abhin Shah, Karthikeyan Shanmugam, Kartik Ahu

1 Jun 3, 2022

Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"

Hold me tight! Influence of discriminative features on deep network boundaries This is the source code to reproduce the experiments of the NeurIPS 202

19 Dec 10, 2021

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

56 Nov 15, 2022

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

VidLanKD Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohi

54 Dec 20, 2022

PyTorch source code for Distilling Knowledge by Mimicking Features

Related tags

Overview

LSHFM.detection

dependence

Prepare the dataset

Run

Faster R-CNN

RetinaNet

visualize

Citing this repository

Acknowledgement

You might also like...

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

source code for 'Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge' by A. Shah, K. Shanmugam, K. Ahuja

Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

Comments

A question about COCO results in your paper

Owner

Guo-Hua Wang

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

Deep Structured Instance Graph for Distilling Object Detectors (ICCV 2021)

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.