Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Related tags

Deep Learning MSAD
Overview

MSAD

Multi-Scale Aligned Distillation for Low-Resolution Detection

Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia


This project provides an implementation for the CVPR 2021 paper "Multi-Scale Aligned Distillation for Low-Resolution Detection" based on Detectron2. MSAD targets to detect objects using low-resolution instead of high-resolution image. MSAD could obtain comparable performance in high-resolution image size. Our paper use Slimmable Neural Networks as our pretrained weight.

Installation

This project is based on Detectron2, which can be constructed as follows.

  • Install Detectron2 following the instructions.
  • Setup the dataset following the structure.
  • Copy this project to /path/to/detectron2/projects/MSAD
  • Download the slimmable networks in the github. The slimmable resnet50 pretrained weight link is here.

Pretrained Weight

  • Move the pretrained weight to your target path
  • Modify the weight path in configs/Base-SLRESNET-FCOS.yaml

Teacher Training

To train teacher model with 8 GPUs, run:

cd /path/to/detectron2
python3 projects/MSAD/train_net_T.py --config-file <projects/MSAD/configs/config.yaml> --num-gpus 8

For example, to launch MSAD teacher training (1x schedule) with Slimmable-ResNet-50 backbone in 0.25 width on 8 GPUs and save the model in the path "/data/SLR025-50-T". one should execute:

cd /path/to/detectron2
python3 projects/MSAD/train_net_T.py --config-file projects/MSAD/configs/SLR025-50-T.yaml --num-gpus 8 OUTPUT_DIR /data/SLR025-50-T 

Student Training

To train student model with 8 GPUs, run:

cd /path/to/detectron2
python3 projects/MSAD/train_net_S.py --config-file <projects/MSAD/configs/config.yaml> --num-gpus 8

For example, to launch MSAD student training (1x schedule) with Slimmable-ResNet-50 backbone in 0.25 width on 8 GPUs and save the model in the path "/data/SLR025-50-S". We assume the teacher weight is saved in the path "/data/SLR025-50-T/model_final.pth" one should execute:

cd /path/to/detectron2
python3 projects/MSAD/train_net_S.py --config-file projects/MSAD/configs/MSAD-R50-S025-1x.yaml --num-gpus 8 MODEL.WEIGHTS /data/SLR025-50-T/model_final.pth OUTPUT_DIR MSAD-R50-S025-1x

Evaluation

To evaluate a teacher or student pre-trained model with 8 GPUs, run:

cd /path/to/detectron2
python3 projects/MSAD/train_net_T.py --config-file <config.yaml> --num-gpus 8 --eval-only MODEL.WEIGHTS model_checkpoint

or

cd /path/to/detectron2
python3 projects/MSAD/train_net_S.py --config-file <config.yaml> --num-gpus 8 --eval-only MODEL.WEIGHTS model_checkpoint

Results

We provide the results on COCO val set with pretrained models. In the following table, we define the backbone FLOPs as capacity. For brevity, we regard the FLOPs of Slimmable Resnet50 in width 1.0 and high resolution input (800,1333) as 1x.

Method Backbone Capacity Sched Width Role Resolution BoxAP download
FCOS Slimmable-R50 1.25x 1x 1.00 Teacher H & L 42.8 model | metrics
FCOS Slimmable-R50 0.25x 1x 1.00 Student L 39.9 model | metrics
FCOS Slimmable-R50 0.70x 1x 0.75 Teacher H & L 41.2 model | metrics
FCOS Slimmable-R50 0.14x 1x 0.75 Student L 38.8 model | metrics
FCOS Slimmable-R50 0.31x 1x 0.50 Teacher H & L 38.4 model | metrics
FCOS Slimmable-R50 0.06x 1x 0.50 Student L 35.7 model | metrics
FCOS Slimmable-R50 0.08x 1x 0.25 Teacher H & L 33.2 model | metrics
FCOS Slimmable-R50 0.02x 1x 0.25 Student L 30.3 model | metrics

Citing MSAD

Consider cite MSAD in your publications if it helps your research.

@article{qi2021msad,
  title={Multi-Scale Aligned Distillation for Low-Resolution Detection},
  author={Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia},
  journal={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}
You might also like...
Official code of
Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

HKDnet Paper Title: "Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution" Email: 18186470991@163.

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

MASA-SR Official PyTorch implementation of our CVPR2021 paper MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Re

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

PyTorch code for our paper
PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" 🌟 🌟 . 🎓 Re

Comments
  • Some questions about the implementation details

    Some questions about the implementation details

    Hi, thanks for your nice work and open source project. I have the following questions about the implementation details on teacher part:

    1. How did you get the inference performance (e.g. 42.5 mAP in Table 2(a)) of the trained C-FF(Crossing Feature-level Fusion) teacher? Is it an ensemble of two inference parts(high/low resolution inputs), or ensemble of three parts(low/high/fusion), or only from the fusion part?

    2. Does the teacher's detection head for the fused features (returned by C-FF module) share exactly the same learnable weights with the head used for high/low resolution inputs?

    Thanks a lot for your reply.

    opened by v-qjqs 18
  • Implemented error in training process

    Implemented error in training process

    Thanks for your excellent work! I encountered the following problems during training: Environment: 2080Ti x 4 + cuda 10.2 + pytorch 1.6 +detectron2 The issue is described as follows: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). It seems that there exists residuary parameters, and then I add find_unused_parameters=True to the class DefaultTrainer(TrainerBase): in detectron2, the code runs normally. However, when I train SLR025-50-S.yamlwith the teacher weights you provide, the training loss seems abnormal compared to the provided log:

    [04/13 17:33:38] d2.engine.train_loop INFO: Starting training from iteration 0
    [04/13 17:34:33] d2.utils.events INFO:  eta: 15:20:23  iter: 19  total_loss: 2.931  loss_fcos_cls_st: 1.145  loss_fcos_loc_st: 0.9628  loss_fcos_ctr_st: 0.7237  loss_kd: 0.1189  time: 0.6158  data_time: 2.0520  lr: 0.00019981  max_mem: 5806M
    [04/13 17:34:45] d2.utils.events INFO:  eta: 15:20:54  iter: 39  total_loss: 2.825  loss_fcos_cls_st: 0.9993  loss_fcos_loc_st: 0.7467  loss_fcos_ctr_st: 0.6875  loss_kd: 0.3699  time: 0.6174  data_time: 0.0149  lr: 0.00039961  max_mem: 5806M
    [04/13 17:34:57] d2.utils.events INFO:  eta: 15:20:42  iter: 59  total_loss: 2.879  loss_fcos_cls_st: 0.9837  loss_fcos_loc_st: 0.6015  loss_fcos_ctr_st: 0.6765  loss_kd: 0.6186  time: 0.6161  data_time: 0.0160  lr: 0.00059941  max_mem: 5806M
    [04/13 17:35:10] d2.utils.events INFO:  eta: 15:25:02  iter: 79  total_loss: 3.017  loss_fcos_cls_st: 0.9883  loss_fcos_loc_st: 0.5809  loss_fcos_ctr_st: 0.6759  loss_kd: 0.7888  time: 0.6172  data_time: 0.0161  lr: 0.00079921  max_mem: 5806M
    [04/13 17:35:22] d2.utils.events INFO:  eta: 15:23:48  iter: 99  total_loss: 3.152  loss_fcos_cls_st: 0.9831  loss_fcos_loc_st: 0.5779  loss_fcos_ctr_st: 0.6779  loss_kd: 0.9303  time: 0.6178  data_time: 0.0148  lr: 0.00099901  max_mem: 5806M
    [04/13 17:35:35] d2.utils.events INFO:  eta: 15:27:29  iter: 119  total_loss: 3.209  loss_fcos_cls_st: 0.9703  loss_fcos_loc_st: 0.5653  loss_fcos_ctr_st: 0.6757  loss_kd: 1.001  time: 0.6205  data_time: 0.0162  lr: 0.0011988  max_mem: 5807M
    [04/13 17:35:48] d2.utils.events INFO:  eta: 15:30:00  iter: 139  total_loss: 3.268  loss_fcos_cls_st: 0.9778  loss_fcos_loc_st: 0.5575  loss_fcos_ctr_st: 0.6778  loss_kd: 1.052  time: 0.6219  data_time: 0.0153  lr: 0.0013986  max_mem: 5807M
    [04/13 17:36:00] d2.utils.events INFO:  eta: 15:29:48  iter: 159  total_loss: 3.273  loss_fcos_cls_st: 0.9318  loss_fcos_loc_st: 0.5516  loss_fcos_ctr_st: 0.6738  loss_kd: 1.123  time: 0.6228  data_time: 0.0152  lr: 0.0015984  max_mem: 5807M
    [04/13 17:36:12] d2.utils.events INFO:  eta: 15:24:32  iter: 179  total_loss: 3.388  loss_fcos_cls_st: 0.9453  loss_fcos_loc_st: 0.5504  loss_fcos_ctr_st: 0.6754  loss_kd: 1.195  time: 0.6191  data_time: 0.0154  lr: 0.0017982  max_mem: 5807M
    
    opened by ruiningTang 11
  • Low accuracy on Teacher model.

    Low accuracy on Teacher model.

    Hi, This is a good work, the idea is very novel! The Student Model is ok, but when I training on Teacher Model , its mAp is only around 20, the config files are R50-T.yaml and SLR100-50-T.yaml. the batchsize was reduce to 4 ,and learning rate also reduce by 50%. Machine is 2x 3090.

    I want to know do you have similar problems when you training? Thanks

    opened by JingBo-L 2
  • 怎么用 cpu 来evaluation呀?

    怎么用 cpu 来evaluation呀?

    作者你好,我的电脑没有gpu,因此执行 python3 projects/MSAD/train_net_T.py --config-file <config.yaml> --num-gpus 8 --eval-only MODEL.WEIGHTS model_checkpoint 时会报错 。请问怎么用cpu来evaluation呢?

    opened by singing4you 0
Owner
Jia Research Lab
Research lab focusing on CV led by Prof. Jiaya Jia
Jia Research Lab
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization News: [2020/05/04] Added EGL rendering option for training data g

Shunsuke Saito 1.5k Jan 3, 2023
[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

ZJU-VIPA 47 Jan 9, 2023
TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

Sayak Paul 67 Dec 20, 2022
Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection The official PyTorch implementation for HLA-Face: Joint High-Low Adaptation for Low L

Wenjing Wang 77 Dec 8, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 6, 2023
TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.

null 264 Jan 9, 2023
AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Frank Liu 26 Oct 13, 2022
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023