Improving Object Detection by Label Assignment Distillation

Cybercore Co. Ltd

Last update: Dec 8, 2022

Related tags

Deep Learning CoLAD

Overview

Improving Object Detection by Label Assignment Distillation

This is the official implementation of the WACV 2022 paper Improving Object Detection by Label Assignment Distillation. We provide the code for Label Assignement Distillation (LAD), training logs and several model checkpoints.

Introduction
Installation
Usage
Experiments
Citation

Introduction

This is the official repository for the paper Improving Object Detection by Label Assignment Distillation.

Soft Label Distillation concept (a)	Label Assignment Distillation concept (b)

Distillation in Object Detection is typically achived by mimicking the teacher's output directly, such soft-label distillation or feature mimicking (Fig 1.a).
We propose the concept of Label Assignment Distillation (LAD), which solves the label assignment problems from distillation perspective, thus allowing the student learns from the teacher's knowledge without direct mimicking (Fig 1.b). LAD is very general, and applied to many dynamic label assignment methods. Following figure shows a concrete example of how to adopt Probabilistic Anchor Assignment (PAA) to LAD.

Probabilistic Anchor Assignment (PAA)	Label Assignment Distillation (LAD) based on PAA

We demonstrate a number of advantages of LAD, notably that it is very simple and effective, flexible to use with most of detectors, and complementary to other distillation techniques.
Later, we introduced the Co-learning dynamic Label Assignment Distillation (CoLAD) to allow two networks to be trained mutually based on a dynamic switching criterion. We show that two networks trained with CoLAD are significantly better than if each was trained individually, given the same initialization.

Installation

Create environment:

conda create -n lad python=3.7 -y
conda activate lad

Install dependencies:

conda install pytorch=1.7.0 torchvision cudatoolkit=10.2 -c pytorch -y
pip install openmim future tensorboard sklearn timm==0.3.4
mim install mmcv-full==1.2.5
mim install mmdet==2.10.0
pip install -e ./

Usage

Train the model

#!/usr/bin/env bash
set -e
export GPUS=2
export CUDA_VISIBLE_DEVICES=0,2

CFG="configs/lad/paa_lad_r50_r101p1x_1x_coco.py"
WORKDIR="/checkpoints/lad/paa_lad_r50_r101p1x_1x_coco"

mim train mmdet $CFG --work-dir $WORKDIR \
    --gpus=$GPUS --launcher pytorch --seed 0 --deterministic

Test the model

#!/usr/bin/env bash
set -e
export GPUS=2
export CUDA_VISIBLE_DEVICES=0,2

CFG="configs/paa/paa_lad_r50_r101p1x_1x_coco.py"
CKPT="/checkpoints/lad/paa_lad_r50_r101p1x_1x_coco/epoch_12.pth"

mim test mmdet $CFG --checkpoint $CKPT --gpus $GPUS --launcher pytorch --eval bbox

Experiments

1. A Pilot Study - Preliminary Comparison.

Table 2: Compare the performance of the student PAA-R50 using Soft-Label, Label Assignment Distillation (LAD) and their combination (SoLAD) on COCO validation set.

Method	Teacher	Student	gamma	mAP	Improve	Config	Download
Baseline	None	PAA-R50 (baseline)	2	40.4	-	config	model \| log
Soft-Label-KL loss	PAA-R101	PAA-R50	0.5	41.3	+0.9	config	model \| log
LAD (ours)	PAA-R101	PAA-R50	2	41.6	+1.2	config	model \| log
SoLAD(ours)	PAA-R101	PAA-R50	0.5	42.4	+2.0	config	model \| log

2. A Pilot Study - Does LAD need a bigger teacher network?

Table 3: Compare Soft-Label and Label Assignment Distillation (LAD) on COCO validation set. Teacher and student use ResNet50 and ResNet101 backbone, respectively. 2× denotes the 2× training schedule.

Method	Teacher	Student	mAP	Improve	Config	Download
Baseline	None	PAA-R50	40.4	-	config	model \| log
Baseline (1x)	None	PAA-R101	42.6	-	config	model \| log
Baseline (2x)	None	PAA-R101	43.5	+0.9	config	model \| log
Soft-Label	PAA-R50	PAA-R101	40.4	-2.2	config	model \| log
LAD (our)	PAA-R50	PAA-R101	43.3	+0.7	config	model \| log

3. Compare with State-ot-the-art Label Assignment methods

We use the PAA-R50 3x pretrained with multi-scale on COCO as the initial teacher. The teacher was evaluated with 43:3AP on the minval set. We train our network with the COP branch, and the post-processing steps are similar to PAA.

Table 7.1 Backbone ResNet-101, train 2x schedule, Multi-scale training. Results are evaluated on the COCO testset.

Method	AP	AP50	AP75	APs	APm	APl	Download
FCOS	41.5	60.7	45.0	24.4	44.8	51.6
NoisyAnchor	41.8	61.1	44.9	23.4	44.9	52.9
FreeAnchor	43.1	62.2	46.4	24.5	46.1	54.8
SAPD	43.5	63.6	46.5	24.9	46.8	54.6
MAL	43.6	61.8	47.1	25.0	46.9	55.8
ATSS	43.6	62.1	47.4	26.1	47.0	53.6
AutoAssign	44.5	64.3	48.4	25.9	47.4	55.0
PAA	44.8	63.3	48.7	26.5	48.8	56.3
OTA	45.3	63.5	49.3	26.9	48.8	56.1
IQDet	45.1	63.4	49.3	26.7	48.5	56.6
CoLAD (ours)	`46.0`	`64.4`	`50.6`	`27.9`	`49.9`	`57.3`	config \| model \| log

4. Appendix - Ablation Study of Conditional Objectness Prediction (COP)

Table 1 - Appendix: Compare different auxiliary predictions: IoU, Implicit Object prediction (IOP), and Conditional Objectness prediction (COP), with ResNet-18 and ResNet-50 backbones. Experiments on COCO validate set.

IoU	IOP	COP	ResNet-18	ResNet-50
✔️			35.8 (config \| model \| log)	40.4 (config \| model \| log)
✔️	✔️		36.7 (config \| model \| log)	41.6 (config \| model \| log)
✔️		✔️	36.9 (config \| model \| log)	41.6 (config \| model \| log)
	✔️		36.6 (config \| model \| log)	41.1 (config \| model \| log)
		✔️	36.9 (config \| model \| log)	41.2 (config \| model \| log)

Citation

Please cite the paper in your publications if it helps your research:

@misc{nguyen2021improving,
      title={Improving Object Detection by Label Assignment Distillation}, 
      author={Chuong H. Nguyen and Thuy C. Nguyen and Tuan N. Tang and Nam L. H. Phan},
      year={2021},
      eprint={2108.10520},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

On Sep 25 2021, we found that there is a concurrent (unpublished) work from Jianfend Wang, that shares the key idea about Label Assignment Distillation. However, both of our works are independent and original. We would like to acknowledge his work and thank for his help to clarify the issue.

Comments

RV1 please justify the benefit of this approach
Thanks all for the interesting work!

I just have a small concern and hope to get through!

Can you justify the benefit of this approach?

I am dissapointed that there is no portion in the paper showing the gain in term of model size and inference speed. Futhermore, when seeing the accuracy of small models like Resnet18 or Resnet 50, they look like inferior results. Because the best accuracy in this paper does not get into top results in COCO leaderboard, I assume that justifications in one of the following aspects (or both) are needed.

model size

inference speed

Sorry if there is any misunderstading!

The expectation is that the authors can justify the benefit among many other approaches which also reduce model size and increase speed, such as model pruning, NAS, or model compression methods like quantization, ...
opened by wanted2 8
KeyError: 'LAD is already registered in models'

When I want to run this model on my server:Ubuntu 20.4, python 3.7 CUDA 10.1，I create a bash with all the sh commands you mentioned. But I encounter this error , I have no ideas to solve it.

File "/home/ouc/CodeFiles/CvFiles/CoLAD/ccdet/models/detectors/__init__.py", line 3, in <module> from .lad import LAD File "/home/ouc/CodeFiles/CvFiles/CoLAD/ccdet/models/detectors/lad.py", line 8, in <module> class LAD(SingleStageDetector): File "/home/ouc/anaconda3/envs/lad/lib/python3.7/site-packages/mmcv/utils/registry.py", line 312, in _register module_class=cls, module_name=name, force=force) File "/home/ouc/anaconda3/envs/lad/lib/python3.7/site-packages/mmcv/utils/registry.py", line 246, in _register_module raise KeyError(f'{name} is already registered ' KeyError: 'LAD is already registered in models'

opened by Rogersiy 3
NotImplementedError in colab.py

Hi, thank you for releasing the code!

It seems that the code for the training of CoLAD is not implemented yet, could you please update it? https://github.com/cybercore-co-ltd/CoLAD/blob/cb888371db60e4973f91bba4f5a79bde0deed2e4/ccdet/models/detectors/colad.py#L58-L64

opened by haotianliu001 3
RV3 Lack of number of parameters

Hi,

Can you add the number of parameters to tables? Especially, for the student model. Because this number together with inference speed and accuracy, will help readers to choose the model for deployment.

opened by wanted2 1
RV2 Demo?

I am going to replicate the results and integrate into some fancy demo in my dayoffs.

Can you provide some insights on performance such as the harware specs?

opened by wanted2 1
About the generalization of the LAD?

Hi, Thanks for such great work. I wonder if you test your method proposed in the paper on recently proposed advanced DETR-based object detectors. In DETR-based detectors, the object queries are defined as positive samples or negative samples by Hungarian Matching, which is a prediction-aware label assignment method. Besides, DETR takes the set-prediction formulation, and the object queries of student and teacher models have no certain correspondence. So, I doubt the generalization of your method on DETR-based detectors for the following two reasons: (1) due to the prediction-aware label assignment, the quality of the assignment highly depends on the model's prediction, so, the student with a lower performance model may not be able to distillate effective knowledge to the teacher model with higher performance as claimed in your paper; (2) due the unordered object queries, it cannot directly take the assignment results of teacher model to provide supervision for the student model. Hope to receive your reply.

opened by Zhangjiacheng144 0

Owner

Cybercore Co. Ltd

GitHub

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

47 Jan 9, 2023

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

67 Dec 20, 2022

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022

Label Mask for Multi-label Classification

LM-MLC 一种基于完型填空的多标签分类算法 1 前言本文主要介绍本人在全球人工智能技术创新大赛【赛道一】设计的一种基于完型填空(模板)的多标签分类算法：LM-MLC，该算法拟合能力很强能感知标签关联性，在多个数据集上测试表明该算法与主流算法无显著性差异，在该比赛数据集上的dev效果很好，但是由

52 Nov 20, 2022

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

12 Dec 7, 2022

A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 Oral paper PiCO; also see our Project

83 May 11, 2022

Localization Distillation for Object Detection

Localization Distillation for Object Detection This repo is based on mmDetection. This is the code for our paper: Localization Distillation

274 Dec 26, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

MOT-Tracking-by-Detection-Pipeline Tracking-by-Detection形式のMOT(Multi Object Trac

41 Nov 23, 2022

Improving Object Detection by Label Assignment Distillation

Related tags

Overview

Improving Object Detection by Label Assignment Distillation

Table of Contents

Introduction

Installation

Usage

Train the model

Test the model

Experiments

1. A Pilot Study - Preliminary Comparison.

2. A Pilot Study - Does LAD need a bigger teacher network?

3. Compare with State-ot-the-art Label Assignment methods

4. Appendix - Ablation Study of Conditional Objectness Prediction (COP)

Citation

Comments

RV1 please justify the benefit of this approach

KeyError: 'LAD is already registered in models'

NotImplementedError in colab.py

RV3 Lack of number of parameters

RV2 Demo?

About the generalization of the LAD?

Owner

Cybercore Co. Ltd

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Label Mask for Multi-label Classification

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

Localization Distillation for Object Detection

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

Instance-conditional Knowledge Distillation for Object Detection

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

Object detection on multiple datasets with an automatically learned unified label space.

[CVPR 2022] Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement

Job Assignment System by Real-time Emotion Detection

Improving 3D Object Detection with Channel-wise Transformer

Improving Object Detection by Estimating Bounding Box Quality Accurately

LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Yolo object detection - Yolo object detection with python

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?