Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Jia Research Lab

Last update: Dec 23, 2022

Related tags

Deep Learning MSAD

Overview

MSAD

Multi-Scale Aligned Distillation for Low-Resolution Detection

Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia

This project provides an implementation for the CVPR 2021 paper "Multi-Scale Aligned Distillation for Low-Resolution Detection" based on Detectron2. MSAD targets to detect objects using low-resolution instead of high-resolution image. MSAD could obtain comparable performance in high-resolution image size. Our paper use Slimmable Neural Networks as our pretrained weight.

Installation

This project is based on Detectron2, which can be constructed as follows.

Install Detectron2 following the instructions.
Setup the dataset following the structure.
Copy this project to /path/to/detectron2/projects/MSAD
Download the slimmable networks in the github. The slimmable resnet50 pretrained weight link is here.

Pretrained Weight

Move the pretrained weight to your target path
Modify the weight path in configs/Base-SLRESNET-FCOS.yaml

Teacher Training

To train teacher model with 8 GPUs, run:

cd /path/to/detectron2
python3 projects/MSAD/train_net_T.py --config-file <projects/MSAD/configs/config.yaml> --num-gpus 8

For example, to launch MSAD teacher training (1x schedule) with Slimmable-ResNet-50 backbone in 0.25 width on 8 GPUs and save the model in the path "/data/SLR025-50-T". one should execute:

cd /path/to/detectron2
python3 projects/MSAD/train_net_T.py --config-file projects/MSAD/configs/SLR025-50-T.yaml --num-gpus 8 OUTPUT_DIR /data/SLR025-50-T

Student Training

To train student model with 8 GPUs, run:

cd /path/to/detectron2
python3 projects/MSAD/train_net_S.py --config-file <projects/MSAD/configs/config.yaml> --num-gpus 8

For example, to launch MSAD student training (1x schedule) with Slimmable-ResNet-50 backbone in 0.25 width on 8 GPUs and save the model in the path "/data/SLR025-50-S". We assume the teacher weight is saved in the path "/data/SLR025-50-T/model_final.pth" one should execute:

cd /path/to/detectron2
python3 projects/MSAD/train_net_S.py --config-file projects/MSAD/configs/MSAD-R50-S025-1x.yaml --num-gpus 8 MODEL.WEIGHTS /data/SLR025-50-T/model_final.pth OUTPUT_DIR MSAD-R50-S025-1x

Evaluation

To evaluate a teacher or student pre-trained model with 8 GPUs, run:

cd /path/to/detectron2
python3 projects/MSAD/train_net_T.py --config-file <config.yaml> --num-gpus 8 --eval-only MODEL.WEIGHTS model_checkpoint

cd /path/to/detectron2
python3 projects/MSAD/train_net_S.py --config-file <config.yaml> --num-gpus 8 --eval-only MODEL.WEIGHTS model_checkpoint

Results

We provide the results on COCO val set with pretrained models. In the following table, we define the backbone FLOPs as capacity. For brevity, we regard the FLOPs of Slimmable Resnet50 in width 1.0 and high resolution input (800,1333) as 1x.

Method	Backbone	Capacity	Sched	Width	Role	Resolution	BoxAP	download
FCOS	Slimmable-R50	1.25x	1x	1.00	Teacher	H & L	42.8	model \| metrics
FCOS	Slimmable-R50	0.25x	1x	1.00	Student	L	39.9	model \| metrics
FCOS	Slimmable-R50	0.70x	1x	0.75	Teacher	H & L	41.2	model \| metrics
FCOS	Slimmable-R50	0.14x	1x	0.75	Student	L	38.8	model \| metrics
FCOS	Slimmable-R50	0.31x	1x	0.50	Teacher	H & L	38.4	model \| metrics
FCOS	Slimmable-R50	0.06x	1x	0.50	Student	L	35.7	model \| metrics
FCOS	Slimmable-R50	0.08x	1x	0.25	Teacher	H & L	33.2	model \| metrics
FCOS	Slimmable-R50	0.02x	1x	0.25	Student	L	30.3	model \| metrics

Citing MSAD

Consider cite MSAD in your publications if it helps your research.

@article{qi2021msad,
  title={Multi-Scale Aligned Distillation for Low-Resolution Detection},
  author={Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia},
  journal={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

You might also like...

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

77 Dec 24, 2022

Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

HKDnet Paper Title: "Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution" Email: 18186470991@163.

11 Nov 12, 2022

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

209 Dec 30, 2022

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

12 Dec 12, 2022

(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

308 Jan 5, 2023

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

MASA-SR Official PyTorch implementation of our CVPR2021 paper MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Re

126 Dec 20, 2022

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

151 Dec 26, 2022

PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

143 Dec 28, 2022

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" 🌟 🌟 . 🎓 Re

87 Dec 28, 2022

Comments

Some questions about the implementation details
Hi, thanks for your nice work and open source project. I have the following questions about the implementation details on teacher part:

How did you get the inference performance (e.g. 42.5 mAP in Table 2(a)) of the trained C-FF(Crossing Feature-level Fusion) teacher? Is it an ensemble of two inference parts(high/low resolution inputs), or ensemble of three parts(low/high/fusion), or only from the fusion part?

Does the teacher's detection head for the fused features (returned by C-FF module) share exactly the same learnable weights with the head used for high/low resolution inputs?

Thanks a lot for your reply.
opened by v-qjqs 18

Implemented error in training process

Thanks for your excellent work! I encountered the following problems during training: Environment: 2080Ti x 4 + cuda 10.2 + pytorch 1.6 +detectron2 The issue is described as follows: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). It seems that there exists residuary parameters, and then I add find_unused_parameters=True to the class DefaultTrainer(TrainerBase): in detectron2, the code runs normally. However, when I train SLR025-50-S.yamlwith the teacher weights you provide, the training loss seems abnormal compared to the provided log:

[04/13 17:33:38] d2.engine.train_loop INFO: Starting training from iteration 0
[04/13 17:34:33] d2.utils.events INFO:  eta: 15:20:23  iter: 19  total_loss: 2.931  loss_fcos_cls_st: 1.145  loss_fcos_loc_st: 0.9628  loss_fcos_ctr_st: 0.7237  loss_kd: 0.1189  time: 0.6158  data_time: 2.0520  lr: 0.00019981  max_mem: 5806M
[04/13 17:34:45] d2.utils.events INFO:  eta: 15:20:54  iter: 39  total_loss: 2.825  loss_fcos_cls_st: 0.9993  loss_fcos_loc_st: 0.7467  loss_fcos_ctr_st: 0.6875  loss_kd: 0.3699  time: 0.6174  data_time: 0.0149  lr: 0.00039961  max_mem: 5806M
[04/13 17:34:57] d2.utils.events INFO:  eta: 15:20:42  iter: 59  total_loss: 2.879  loss_fcos_cls_st: 0.9837  loss_fcos_loc_st: 0.6015  loss_fcos_ctr_st: 0.6765  loss_kd: 0.6186  time: 0.6161  data_time: 0.0160  lr: 0.00059941  max_mem: 5806M
[04/13 17:35:10] d2.utils.events INFO:  eta: 15:25:02  iter: 79  total_loss: 3.017  loss_fcos_cls_st: 0.9883  loss_fcos_loc_st: 0.5809  loss_fcos_ctr_st: 0.6759  loss_kd: 0.7888  time: 0.6172  data_time: 0.0161  lr: 0.00079921  max_mem: 5806M
[04/13 17:35:22] d2.utils.events INFO:  eta: 15:23:48  iter: 99  total_loss: 3.152  loss_fcos_cls_st: 0.9831  loss_fcos_loc_st: 0.5779  loss_fcos_ctr_st: 0.6779  loss_kd: 0.9303  time: 0.6178  data_time: 0.0148  lr: 0.00099901  max_mem: 5806M
[04/13 17:35:35] d2.utils.events INFO:  eta: 15:27:29  iter: 119  total_loss: 3.209  loss_fcos_cls_st: 0.9703  loss_fcos_loc_st: 0.5653  loss_fcos_ctr_st: 0.6757  loss_kd: 1.001  time: 0.6205  data_time: 0.0162  lr: 0.0011988  max_mem: 5807M
[04/13 17:35:48] d2.utils.events INFO:  eta: 15:30:00  iter: 139  total_loss: 3.268  loss_fcos_cls_st: 0.9778  loss_fcos_loc_st: 0.5575  loss_fcos_ctr_st: 0.6778  loss_kd: 1.052  time: 0.6219  data_time: 0.0153  lr: 0.0013986  max_mem: 5807M
[04/13 17:36:00] d2.utils.events INFO:  eta: 15:29:48  iter: 159  total_loss: 3.273  loss_fcos_cls_st: 0.9318  loss_fcos_loc_st: 0.5516  loss_fcos_ctr_st: 0.6738  loss_kd: 1.123  time: 0.6228  data_time: 0.0152  lr: 0.0015984  max_mem: 5807M
[04/13 17:36:12] d2.utils.events INFO:  eta: 15:24:32  iter: 179  total_loss: 3.388  loss_fcos_cls_st: 0.9453  loss_fcos_loc_st: 0.5504  loss_fcos_ctr_st: 0.6754  loss_kd: 1.195  time: 0.6191  data_time: 0.0154  lr: 0.0017982  max_mem: 5807M

opened by ruiningTang 11

Low accuracy on Teacher model.

Hi, This is a good work, the idea is very novel! The Student Model is ok, but when I training on Teacher Model , its mAp is only around 20, the config files are R50-T.yaml and SLR100-50-T.yaml. the batchsize was reduce to 4 ,and learning rate also reduce by 50%. Machine is 2x 3090.

I want to know do you have similar problems when you training? Thanks

opened by JingBo-L 2
怎么用 cpu 来evaluation呀？

作者你好，我的电脑没有gpu，因此执行 python3 projects/MSAD/train_net_T.py --config-file <config.yaml> --num-gpus 8 --eval-only MODEL.WEIGHTS model_checkpoint 时会报错。请问怎么用cpu来evaluation呢？

opened by singing4you 0

Owner

Jia Research Lab

Research lab focusing on CV led by Prof. Jiaya Jia

GitHub

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

110 Dec 20, 2022

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

110 Dec 20, 2022

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization News: [2020/05/04] Added EGL rendering option for training data g

1.5k Jan 3, 2023

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

47 Jan 9, 2023

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

67 Dec 20, 2022

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection The official PyTorch implementation for HLA-Face: Joint High-Low Adaptation for Low L

77 Dec 8, 2022

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

368 Jan 6, 2023

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.

264 Jan 9, 2023

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

26 Oct 13, 2022

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

1.1k Jan 2, 2023

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Related tags

Overview

MSAD

Installation

Pretrained Weight

Teacher Training

Student Training

Evaluation

Results

Citing MSAD

You might also like...

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

Comments

Some questions about the implementation details

Implemented error in training process

Low accuracy on Teacher model.

怎么用 cpu 来evaluation呀？

Owner

Jia Research Lab

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging