Instance-conditional Knowledge Distillation for Object Detection

MEGVII Research

Last update: Nov 17, 2022

Related tags

Deep Learning ICD

Overview

Instance-conditional Knowledge Distillation for Object Detection

This is a MegEngine implementation of the paper "Instance-conditional Knowledge Distillation for Object Detection", based on MegEngine Models.

The pytorch implementation based on detectron2 will be released soon.

Instance-Conditional Knowledge Distillation for Object Detection,
Zijian Kang, Peizhen Zhang, Xiangyu Zhang, Jian Sun, Nanning Zheng
In: Proc. Advances in Neural Information Processing Systems (NeurIPS), 2021
[arXiv]

Requirements

Installation

In order to run the code, please prepare a CUDA environment with:

Python 3 (3.6 is recommended)
MegEngine

Install dependancies.

pip3 install --upgrade pip
pip3 install -r requirements.txt

Prepare MS-COCO 2017 dataset，put it to a proper directory with the following structures:

/path/to/
    |->coco
    |    |annotations
    |    |train2017
    |    |val2017

Microsoft COCO: Common Objects in Context Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. European Conference on Computer Vision (ECCV), 2014.

Usage

Train baseline models

Following MegEngine Models:

python3 train.py -f distill_configs/retinanet_res50_coco_1x_800size.py -n 8 \
                       -d /data/Datasets

train.py arguments：

-f, config file for the network.
-n, required devices(gpu).
-w, pretrained backbone weights.
-b, training batch size, default is 2.
-d, dataset root，default is /data/datasets.

Train with distillation

python3 train_distill_icd.py -f distill_configs/retinanet_res50_coco_1x_800size.py \ 
    -n 8 -l -d /data/Datasets -tf configs/retinanet_res101_coco_3x_800size.py \
    -df distill_configs/ICD.py \
    -tw _model_zoo/retinanet_res101_coco_3x_800size_41dot4_73b01887.pkl

train_distill_icd.py arguments：

-f, config file for the student network.
-w, pretrained backbone weights.
-tf, config file for the teacher network.
-tw, pretrained weights for the teacher.
-df, config file for the distillation module, distill_configs/ICD.py by default.
-l, use the inheriting strategy, load pretrained parameters.
-n, required devices(gpu).
-b, training batch size, default is 2.
-d, dataset root，default is /data/datasets.

Note that we set backbone_pretrained in distill configs, where backbone weights will be loaded automatically, that -w can be omitted. Checkpoints will be saved to a log-xxx directory.

Evaluate

python3 test.py -f distill_configs/retinanet_res50_coco_3x_800size.py -n 8 \
     -w log-of-xxx/epoch_17.pkl -d /data/Datasets/

test.py arguments：

-f, config file for the network.
-n, required devices(gpu).
-w, pretrained weights.
-d, dataset root，default is /data/datasets.

Examples and Results

Steps

Download the pretrained teacher model to _model_zoo directory.
Train baseline or distill with ICD.
Evaluate checkpoints (use the last checkpoint by default).

Example of Common Detectors

RetinaNet

Focal Loss for Dense Object Detection Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár. IEEE International Conference on Computer Vision (ICCV), 2017.
Teacher RetinaNet-R101-3x: https://data.megengine.org.cn/models/weights/retinanet_res101_coco_3x_800size_41dot4_73b01887.pkl
Config: distill_configs/retinanet_res50_coco_1x_800size.py

Command:

python3 train_distill_icd.py -f distill_configs/retinanet_res50_coco_1x_800size.py \
    -n 8 -l -d /data/Datasets -tf configs/retinanet_res101_coco_3x_800size.py \
    -df distill_configs/ICD.py \
    -tw _model_zoo/retinanet_res101_coco_3x_800size_41dot4_73b01887.pkl

FCOS

FCOS: Fully Convolutional One-Stage Object Detection Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. IEEE International Conference on Computer Vision (ICCV), 2019.
Teacher FCOS-R101-3x: https://data.megengine.org.cn/models/weights/fcos_res101_coco_3x_800size_44dot3_f38e8df1.pkl
Config: distill_configs/fcos_res50_coco_1x_800size.py

Command:

python3 train_distill_icd.py -f distill_configs/fcos_res50_coco_1x_800size.py \
    -n 8 -l -d /data/Datasets -tf configs/fcos_res101_coco_3x_800size.py \
    -df distill_configs/ICD.py \
    -tw _model_zoo/fcos_res101_coco_3x_800size_44dot3_f38e8df1.pkl

ATSS

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z. Li. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
Teacher ATSS-R101-3x: https://data.megengine.org.cn/models/weights/atss_res101_coco_3x_800size_44dot7_9181687e.pkl
Config: distill_configs/atss_res50_coco_1x_800size.py

Command:

python3 train_distill_icd.py -f distill_configs/atss_res50_coco_1x_800size.py \
    -n 8 -l -d /data/Datasets -tf configs/atss_res101_coco_3x_800size.py \
    -df distill_configs/ICD.py \
    -tw _model_zoo/atss_res101_coco_3x_800size_44dot7_9181687e.pkl

Results of AP in MS-COCO:

Model	Baseline	+ICD
Retinanet	36.8	40.3
FCOS	40.0	43.3
ATSS	39.6	43.0

Notice

Results of this implementation are mainly for demonstration, please refer to the Detectron2 version for reproduction.
We simply adopt the hyperparameter from Detectron2 version, further tunning could be helpful.
There is a known CUDA memory issue related to MegEngine: the actual memory consumption will be much larger than the theoretical value, due to the memory fragmentation. This is expected to be fixed in a future version of MegEngine.

Acknowledgement

This repo is modified from MegEngine Models. We also refer to Pytorch, DETR and Detectron2 for some implementations.

License

This repo is licensed under the Apache License, Version 2.0 (the "License").

Citation

@inproceedings{kang2021icd,
    title={Instance-conditional Distillation for Object Detection},
    author={Zijian Kang, Peizhen Zhang, Xiangyu Zhang, Jian Sun, Nanning Zheng},
    year={2021},
    booktitle={NeurIPS},
}

Comments

Can you provide me Pascal VOC configs?

Hello. I'd like to try to replicate your great distortion methods at Pascal VOC. However, this repository only provides MS-coco configurations. you used extra 6k iterations for auxiliary task warm-up on the Pascal VOC dataset, how can I reproduce this training details?

Is it alright to train teacher standalone / and distill students?

teacher standalone train BASE: "../Base-RCNN-FPN.yaml" MODEL: WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-101.pkl" MASK_ON: False RESNETS: DEPTH: 101 ROI_HEADS: NUM_CLASSES: 20 INPUT: MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800) MIN_SIZE_TEST: 800 DATASETS: TRAIN: ('voc_2007_trainval', 'voc_2012_trainval') TEST: ('voc_2007_test') SOLVER: STEPS: (12000, 16000) MAX_ITER: 18000 # 17.4 epochs

student distillation BASE: "../Base-RCNN-FPN.yaml" MODEL: WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl" MASK_ON: False RESNETS: DEPTH: 50 ROI_HEADS: NUM_CLASSES: 20 DISTILLER: MODEL_LOAD_OFFICIAL: False MODEL_DISTILLER_CONFIG: 'PascalVOC-Detection/faster_rcnn_R_101_FPN.yaml' INS_ATT_MIMIC: WEIGHT_VALUE: 3.0 INS: INPUT_FEATS: ['p2', 'p3', 'p4', 'p5', 'p6'] MAX_LABELS: 100 INPUT: MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800) MIN_SIZE_TEST: 800 DATASETS: TRAIN: ('voc_2007_trainval', 'voc_2012_trainval') TEST: ('voc_2007_test',) SOLVER: STEPS: (12000, 16000) MAX_ITER: 18000 # 17.4 epochs CLIP_GRADIENTS: {"ENABLED": True}

opened by seoha-kim 3
复现baseline有问题？

您好，我用如下命令，复现baseline train_baseline.py --num-gpus 8 --config-file configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml 最后得到的结果是即AP只有37.292，而该代码报的是37.4。是不是哪里超参改动了？

opened by GuoYi0 3
1GPU下效果不理想

我的的实验环境不具备8GPU条件，于是我在detectron2的官方说明指导下，将8GPU下运行的情况改为1GPU运行的情况，我使用如下命令 python train_baseline.py --num-gpus 1 --config-file configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 python3 train_distill.py --num-gpus 1 --resume --config-file configs/Distillation-ICD/retinanet_R_50_R101_icd_FPN_1x.yaml OUTPUT_DIR output/icd_retinanet SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 我得到的baseline效果为AP = 25.149，得到的蒸馏结果为AP = 35.860，baseline效果远低于论文给出的37.4，蒸馏结果也与39.9有一定差距。请问这个结果正常吗？我怎么样才能在1个GPU上达到与论文接近的水平？十分感谢！

opened by wokeyide1999 3
AttributeError: BASE_LR_END

您好，我在命令python train_baseline.py --num-gpus 1 --config-file configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml 下可以正常运行并生成model_final.pth，但是在命令python3 train_distill.py --num-gpus 1 --config-file configs/Distillation-ICD/retinanet_R_50_R101_icd_FPN_1x.yaml OUTPUT_DIR output/icd_retinanet中，却报如下错误，可以请问您是什么原因吗？十分感谢！

Traceback (most recent call last): File "train_distill.py", line 462, in launch( File "/home/sh/detectron2-main/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "train_distill.py", line 447, in main do_train(cfg, model, teacher, resume=args.resume) File "train_distill.py", line 152, in do_train teacher_sche = build_lr_scheduler(cfg.MODEL.DISTILLER, teacher_opt) File "/home/sh/detectron2-main/detectron2/solver/build.py", line 275, in build_lr_scheduler end_value = cfg.SOLVER.BASE_LR_END / cfg.SOLVER.BASE_LR File "/home/sh/anaconda3/lib/python3.8/site-packages/yacs/config.py", line 141, in getattr raise AttributeError(name) AttributeError: BASE_LR_END Segmentation fault (core dumped)

opened by wokeyide1999 3
scale information是什么

1、如图，请问其中的scale information指代的是什么操作？ 2、请问有没有只使用Identification和Localization，而没有scale information的实验数据？ 3、Identification，Localization和scale information共同使用的数据仅比使用Localization和scale information增长了0.5个AP，这是否意味着中文所在设计的辅助优化任务仍然有待提升，Identification与Localization的配合并非最优！

opened by 2021SH 2

Owner

MEGVII Research

Power Human with AI. 持续创新拓展认知边界非凡科技成就产品价值

GitHub

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

67 Dec 20, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

Localization Distillation for Object Detection

Localization Distillation for Object Detection This repo is based on mmDetection. This is the code for our paper: Localization Distillation

274 Dec 26, 2022

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

LiDAR Distillation Paper | Model LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection Yi Wei, Zibu Wei, Yongming Rao, Jiax

75 Dec 22, 2022

Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

FRSKD Official implementation for Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation (CVPR-2021) Requirements Pytho

75 Dec 28, 2022

Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

DNA This repository provides the code of our paper: Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Illustration of DNA

215 Dec 19, 2022

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

26 Oct 13, 2022

Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching Official pytorch implementation of "Show, Attend and Distill: Kn

80 Dec 16, 2022

Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Data Efficient Stagewise Knowledge Distillation Table of Contents Data Efficient Stagewise Knowledge Distillation Table of Contents Requirements Image

112 Dec 2, 2022

PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

PocketNet This is the official repository of the paper: PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and M

40 Dec 22, 2022

This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation (TESKD) By Zheng Li[1,4], Xiang Li[2], Lingfeng Yang[2,4], Jian Yang[2], Zh

9 Sep 26, 2022

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022

Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"

Lightweight-Deep-CNN-for-Natural-Image-Matting-via-Similarity-Preserving-Knowledge-Distillation Introduction Accepted at IEEE Signal Processing Letter

19 Jun 7, 2022

Instance-conditional Knowledge Distillation for Object Detection

Related tags

Overview

Instance-conditional Knowledge Distillation for Object Detection

Requirements

Installation

Usage

Train baseline models

Train with distillation

Evaluate

Examples and Results

Steps

Example of Common Detectors

RetinaNet

FCOS

ATSS

Results of AP in MS-COCO:

Notice

Acknowledgement

License

Citation

Comments

Can you provide me Pascal VOC configs?

复现baseline有问题？

1GPU下效果不理想

AttributeError: BASE_LR_END

scale information是什么

Owner

MEGVII Research

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

Localization Distillation for Object Detection

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Knowledge Distillation Toolbox for Semantic Segmentation

Focal and Global Knowledge Distillation for Detectors

Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"