TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

Overview

TOOD: Task-aligned One-stage Object Detection (ICCV 2021 Oral)

Paper

Introduction

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks. In this work, we propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner. First, we design a novel Task-aligned Head (T-Head) which offers a better balance between learning task-interactive and task-specific features, as well as a greater flexibility to learn the alignment via a task-aligned predictor. Second, we propose Task Alignment Learning (TAL) to explicitly pull closer (or even unify) the optimal anchors for the two tasks during training via a designed sample assignment scheme and a task-aligned loss. Extensive experiments are conducted on MS-COCO, where TOOD achieves a 51.1 AP at single-model single-scale testing. This surpasses the recent one-stage detectors by a large margin, such as ATSS (47.7 AP), GFL (48.2 AP), and PAA (49.0 AP), with fewer parameters and FLOPs. Qualitative results also demonstrate the effectiveness of TOOD for better aligning the tasks of object classification and localization.

Method overview

Parallel head vs. T-head

method overview

Prerequisites

  • MMDetection version 2.14.0.

  • Please see get_started.md for installation and the basic usage of MMDetection.

Train

# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
# and with COCO dataset in 'data/coco/'.

./tools/dist_train.sh configs/tood/tood_r50_fpn_1x_coco.py 4

Inference

./tools/dist_test.sh configs/tood/tood_r50_fpn_1x_coco.py work_dirs/tood_r50_fpn_1x_coco/epoch_12.pth 4 --eval bbox

Models

For your convenience, we provide the following trained models (TOOD). All models are trained with 16 images in a mini-batch.

Model Anchor MS train DCN Lr schd AP (minival) AP (test-dev) Config Download
TOOD_R_50_FPN_1x Anchor-free No N 1x 42.5 42.7 config google / baidu
TOOD_R_50_FPN_anchor_based_1x Anchor-based No N 1x 42.4 42.8 config google / baidu
TOOD_R_101_FPN_2x Anchor-free Yes N 2x 46.2 46.7 config google / baidu
TOOD_X_101_FPN_2x Anchor-free Yes N 2x 47.6 48.5 config google / baidu
TOOD_R_101_dcnv2_FPN_2x Anchor-free Yes Y 2x 49.2 49.6 config google / baidu
TOOD_X_101_dcnv2_FPN_2x Anchor-free Yes Y 2x 50.5 51.1 config google / baidu

[0] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[1] dcnv2 denotes deformable convolutional networks v2.
[2] Refer to more details in config files in config/tood/.
[3] Extraction code of baidu netdisk: tood.

Acknowledgement

Thanks MMDetection team for the wonderful open source project!

Citation

If you find TOOD useful in your research, please consider citing:

@inproceedings{feng2021tood,
    title={TOOD: Task-aligned One-stage Object Detection},
    author={Feng, Chengjian and Zhong, Yujie and Gao, Yu and Scott, Matthew R and Huang, Weilin},
    booktitle={ICCV},
    year={2021}
}
Comments
  • Layer Attention instead of Channel Attention?

    Layer Attention instead of Channel Attention?

    Why did you choose Layer Attention instead of normal Channel Attention?
    Task-interactive features are concatenated after N consecutive Conv layers, then using Channel Attention could further separate each channels to specific task, instead of Layer Attention, which also conduct separation on channel dim, but can only separate in group of 6?

    opened by iumyx2612 5
  • How about the ATSS  assigner as initial static assignment method?

    How about the ATSS assigner as initial static assignment method?

    In the initial stage of training, the scores of task alignment learning metric are so small, so that theirs value are almost zero, because of low ious and classification scores. As my point of view, using the ATSS is aim to select positive samples closing to gt center points in order to accelerate the model convergence in the early training?

    opened by ttjjmm 2
  • Tood's onnx file request

    Tood's onnx file request

    Hello, I was not able to export the onnx file successfully using mmdet.Is there a Tood onnx file available? I would like to further visualize the network structure for learning, thank you!

    opened by hjfdsssdg 2
  • i changed the number of ratios, then model can not train ,where should i have to modify futher?

    i changed the number of ratios, then model can not train ,where should i have to modify futher?

    i modify the ratios=[1] to ratios=[2.444, 3.182, 1.574, 1.721, 0.994, 1.163, 0.751, 0.534] then have a error like this:

    2022-03-29 15:12:23,844 - mmdet - INFO - workflow: [('train', 1)], max: 100 epochs
    2022-03-29 15:12:23,844 - mmdet - INFO - Checkpoints will be saved to E:\Object-Detection\Github\radar-detection\work_dirs\radar_tood by HardDiskBackend.
    D:\App\anaconda\envs\swin-t\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ..\c10/core/TensorImpl.h:1156.)
      return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
    Traceback (most recent call last):
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 50, in train
        self.run_iter(data_batch, train_mode=True, **kwargs)
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 30, in run_iter
        **kwargs)
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\mmcv\parallel\data_parallel.py", line 75, in train_step
        return self.module.train_step(*inputs[0], **kwargs[0])
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\mmdet\models\detectors\base.py", line 248, in train_step
        losses = self(**data)
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\mmcv\runner\fp16_utils.py", line 98, in new_func
        return old_func(*args, **kwargs)
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\mmdet\models\detectors\base.py", line 172, in forward
        return self.forward_train(img, img_metas, **kwargs)
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\mmdet\models\detectors\single_stage.py", line 84, in forward_train
        gt_labels, gt_bboxes_ignore)
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\mmdet\models\dense_heads\base_dense_head.py", line 330, in forward_train
        outs = self(x)
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "D:\App\anaconda\envs\swin-t\lib\site-packages\mmdet\models\dense_heads\tood_head.py", line 263, in forward
        b, h, w, 4).permute(0, 3, 1, 2) / stride[0]
    RuntimeError: shape '[8, 32, 168, 4]' is invalid for input of size 1376256
    

    and this is my config file

    dataset_type = 'CocoDataset'
    data_root = 'data/coco/'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
    train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='LoadAnnotations', with_bbox=True),
        dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
        dict(type='RandomFlip', flip_ratio=0.5),
        dict(
            type='Normalize',
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            to_rgb=True),
        dict(type='Pad', size_divisor=32),
        dict(type='DefaultFormatBundle'),
        dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
    ]
    test_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(
            type='MultiScaleFlipAug',
            img_scale=(1333, 800),
            flip=False,
            transforms=[
                dict(type='Resize', keep_ratio=True),
                dict(type='RandomFlip'),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size_divisor=32),
                dict(type='ImageToTensor', keys=['img']),
                dict(type='Collect', keys=['img'])
            ])
    ]
    data = dict(
        samples_per_gpu=8,
        workers_per_gpu=1,
        train=dict(
            type='CocoDataset',
            ann_file='E:/Object-Detection/data_radar/devkit/voc07_train.json',
            img_prefix='E:/Object-Detection/data_radar/devkit/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations', with_bbox=True),
                dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
                dict(type='RandomFlip', flip_ratio=0.5),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size_divisor=32),
                dict(type='DefaultFormatBundle'),
                dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
            ],
            classes=('loose_l', 'loose_s', 'poor_l', 'porous')),
        val=dict(
            type='CocoDataset',
            ann_file='E:/Object-Detection/data_radar/devkit/voc07_val.json',
            img_prefix='E:/Object-Detection/data_radar/devkit/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(
                    type='MultiScaleFlipAug',
                    img_scale=(1333, 800),
                    flip=False,
                    transforms=[
                        dict(type='Resize', keep_ratio=True),
                        dict(type='RandomFlip'),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='Pad', size_divisor=32),
                        dict(type='ImageToTensor', keys=['img']),
                        dict(type='Collect', keys=['img'])
                    ])
            ],
            classes=('loose_l', 'loose_s', 'poor_l', 'porous')),
        test=dict(
            type='CocoDataset',
            ann_file='E:/Object-Detection/data_radar/devkit/voc07_test.json',
            img_prefix='E:/Object-Detection/data_radar/devkit/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(
                    type='MultiScaleFlipAug',
                    img_scale=(1333, 800),
                    flip=False,
                    transforms=[
                        dict(type='Resize', keep_ratio=True),
                        dict(type='RandomFlip'),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='Pad', size_divisor=32),
                        dict(type='ImageToTensor', keys=['img']),
                        dict(type='Collect', keys=['img'])
                    ])
            ],
            classes=('loose_l', 'loose_s', 'poor_l', 'porous')))
    evaluation = dict(interval=1, metric='bbox')
    optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
    optimizer_config = dict(grad_clip=None)
    lr_config = dict(
        policy='step',
        warmup='linear',
        warmup_iters=500,
        warmup_ratio=0.001,
        step=[8, 11])
    runner = dict(type='EpochBasedRunner', max_epochs=100)
    checkpoint_config = dict(interval=10)
    log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
    custom_hooks = [dict(type='SetEpochInfoHook')]
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]
    opencv_num_threads = 0
    mp_start_method = 'fork'
    model = dict(
        type='TOOD',
        backbone=dict(
            type='ResNet',
            depth=50,
            num_stages=4,
            out_indices=(0, 1, 2, 3),
            frozen_stages=1,
            norm_cfg=dict(type='BN', requires_grad=True),
            norm_eval=True,
            style='pytorch',
            init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
        neck=dict(
            type='FPN',
            in_channels=[256, 512, 1024, 2048],
            out_channels=256,
            start_level=1,
            add_extra_convs='on_output',
            num_outs=5),
        bbox_head=dict(
            type='TOODHead',
            num_classes=4,
            in_channels=256,
            stacked_convs=6,
            feat_channels=256,
            anchor_type='anchor_based',
            anchor_generator=dict(
                type='AnchorGenerator',
                ratios=[2.444, 3.182, 1.574, 1.721, 0.994, 1.163, 0.751, 0.534],
                octave_base_scale=1,
                scales_per_octave=1,
                strides=[8, 16, 32, 64, 128]),
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            initial_loss_cls=dict(
                type='FocalLoss',
                use_sigmoid=True,
                activated=True,
                gamma=2.0,
                alpha=0.25,
                loss_weight=1.0),
            loss_cls=dict(
                type='QualityFocalLoss',
                use_sigmoid=True,
                activated=True,
                beta=2.0,
                loss_weight=1.0),
            loss_bbox=dict(type='GIoULoss', loss_weight=2.0)),
        train_cfg=dict(
            initial_epoch=4,
            initial_assigner=dict(type='ATSSAssigner', topk=9),
            assigner=dict(type='TaskAlignedAssigner', topk=13),
            alpha=1,
            beta=6,
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        test_cfg=dict(
            nms_pre=1000,
            min_bbox_size=0,
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.6),
            max_per_img=100))
    classes = ('loose_l', 'loose_s', 'poor_l', 'porous')
    work_dir = './work_dirs\radar_tood'
    auto_resume = False
    gpu_ids = [0]
    
    opened by joeyslv 0
  • Plot result

    Plot result

    Hi, thanks for your wonderful work, I have a question how do you plot the detection result in Figure1? especially the prediction score map and localization map. image

    opened by qdd1234 9
Owner
null
Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Status: Archive (code is provided as-is, no updates expected) PPO-EWMA [Paper] This is code for training agents using PPO-EWMA and PPG-EWMA, introduce

OpenAI 33 Dec 15, 2022
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

Zhengyuan Yang 118 Dec 5, 2022
DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection Code for our Paper DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Obje

Steven Lang 58 Dec 19, 2022
A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)

Data-Annotation-Tool How to Run this Tool? To run this software, follow the steps: git clone https://github.com/Autonomous-Car-Project/Data-Annotation

TiVRA AI 13 Aug 18, 2022
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Portrait Segmentation using Tensorflow This script removes the background from an input image. You can read more about segmentation here Setup The scr

null 291 Dec 24, 2022
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

PGNet Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022, CVPR 2022 (arXiv 2204.05041) Abstract Recent salient objec

CVTEAM 109 Dec 5, 2022
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

Jia Research Lab 115 Dec 23, 2022
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

DV Lab 115 Dec 23, 2022
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

HU Zeyu 82 Dec 27, 2022
Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

A Unified Objective for Novel Class Discovery This is the official repository for the paper: A Unified Objective for Novel Class Discovery Enrico Fini

Enrico Fini 118 Dec 26, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

null 63 Nov 18, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

null 64 Jan 5, 2023
Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Robust Object Detection via Instance-Level Temporal Cycle Confusion This repo contains the implementation of the ICCV 2021 paper, Robust Object Detect

Xin Wang 69 Oct 13, 2022
Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021) Paper Introduction The conventional detectors tend to make imba

null 52 Nov 21, 2022
ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Zongdai 107 Dec 20, 2022
Semi-Supervised Learning, Object Detection, ICCV2021

End-to-End Semi-Supervised Object Detection with Soft Teacher By Mengde Xu*, Zheng Zhang*, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai,

Microsoft 789 Dec 27, 2022
Code for one-stage adaptive set-based HOI detector AS-Net.

AS-Net Code for one-stage adaptive set-based HOI detector AS-Net. Mingfei Chen*, Yue Liao*, Si Liu, Zhiyuan Chen, Fei Wang, Chen Qian. "Reformulating

Mingfei Chen 45 Dec 9, 2022
(CVPR2021) DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation

DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation CVPR2021(oral) [arxiv] Requirements python3.7 pytorch==

W-zx-Y 85 Dec 7, 2022