Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

Last update: Dec 13, 2022

Related tags

Overview

Oriented RepPoints for Aerial Object Detection

The code for the implementation of “Oriented RepPoints + Swin Transformer/ReResNet”.

Introduction

Based on the Oriented Reppoints detector with Swin Transformer backbone, the 3rd Place is achieved on the Task 1 and the 2nd Place is achieved on the Task 2 of 2021 challenge of Learning to Understand Aerial Images (LUAI) held on ICCV’2021. The detailed information is introduced in this paper of "LUAI Challenge 2021 on Learning to Understand Aerial Images, ICCVW2021".

New Feature

BackBone: add Swin-Transformer, ReResNet
DataAug: add Mosaic4or9, Mixup, HSV, RandomPerspective, RandomScaleCrop

Installation

Please refer to for installation and dataset preparation.

Getting Started

This repo is based on . Please see for the basic usage.

Results and Models

The results on DOTA test-dev set are shown in the table below(password:aabb/swin/ABCD). More detailed results please see the paper.

Model	Backbone	MS	DataAug	DOTAv1 mAP	DOTAv2 mAP	Download
OrientedReppoints	R-50	-	-	75.68	-	baidu(aabb)
OrientedReppoints	R-101	-	√	76.21	-	baidu(aabb)
OrientedReppoints	R-101	√	√	78.12	-	baidu(aabb)
OrientedReppoints	SwinT-tiny	-	√	-	-	-

ImageNet-1K and ImageNet-22K Pretrained Models

name	pretrain	resolution	acc@1	acc@5	#params	FLOPs	FPS	22K model	1K model	Need to turn read version
Swin-T	ImageNet-1K	224x224	81.2	95.5	28M	4.5G	755	-	github/baidu(swin)/config	✔
Swin-S	ImageNet-1K	224x224	83.2	96.2	50M	8.7G	437	-	github/baidu(swin)/config	✔
Swin-B	ImageNet-1K	224x224	83.5	96.5	88M	15.4G	278	-	github/baidu(swin)/config	✔
Swin-B	ImageNet-1K	384x384	84.5	97.0	88M	47.1G	85	-	github/baidu(swin)/test-config	✔
Swin-B	ImageNet-22K	224x224	85.2	97.5	88M	15.4G	278	github/baidu(swin)	github/baidu(swin)/test-config	✔
Swin-B	ImageNet-22K	384x384	86.4	98.0	88M	47.1G	85	github/baidu(swin)	github/baidu(swin)/test-config	✔
Swin-L	ImageNet-22K	224x224	86.3	97.9	197M	34.5G	141	github/baidu(swin)	github/baidu(swin)/test-config	✔
Swin-L	ImageNet-22K	384x384	87.3	98.2	197M	103.9G	42	github/baidu(swin)	github/baidu(swin)/test-config	✔
ReResNet50	ImageNet-1K	224x224	71.20	90.28	-	-	-	-	google/baidu(ABCD)/log	-

The mAOE results on DOTAv1 val set are shown in the table below(password:aabb).

Model	Backbone	mAOE	Download
OrientedReppoints	R-50	5.93°	baidu(aabb)

Note：

Wtihout the ground-truth of test subset, the mAOE of orientation evaluation is calculated on the val subset(original train subset for training).
The orientation (angle) of an aerial object is define as below, the detail of mAOE, please see the paper. The code of mAOE is mAOE_evaluation.py.

Visual results

The visual results of learning points and the oriented bounding boxes. The visualization code is .

Learning points

Oriented bounding box

Citation

@article{Li2021oriented,
  title={Oriented RepPoints for Aerial Object Detection},
  author={Wentong Li and Jianke Zhu},
  journal={arXiv preprint arXiv:2105.11111},
  year={2021}
}

Acknowledgements

I have used utility functions from other wonderful open-source projects. Espeicially thank the authors of:

OrientedRepPoints

Swin-Transformer-Object-Detection

ReDet

You might also like...

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

MOT-Tracking-by-Detection-Pipeline Tracking-by-Detection形式のMOT(Multi Object Trac

41 Nov 23, 2022

Comments

ConnectionResetError: conection reset by peer.

When I used a GPU for training, this error occurred before the end of an epoch, and the same error occurred after retrying several times. Have you encountered it before? How to solve it? Looking forward to your reply.

opened by xc-chengdu 2
精度问题

大牛作者您好！首先打扰到您，深感抱歉！我试着用mmrotate跑了一下这个代码（DOTA1.0，主干网是resnet50，加载了预训练权重），训练了48epoch，得到的map是59，其实是和mmrotate官方公布的一样，但是与作者您的75有点不符合。所以在想：是否是mmrotate官方集成的时候出问题了？倍感困惑！

opened by chentp-1183 3

训练时报错：RuntimeError: CUDA error: too many resources requested for launch

完整报错信息：

ReResNet Orientation: 8 Fix Params: False
2022-06-25 00:20:44,437 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.7.r11.7/compiler.31294372_0
GPU 0: NVIDIA GeForce RTX 2080 Ti
GCC: gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.5.0
OpenCV: 4.6.0
MMCV: 0.6.2
MMDetection: 1.1.0+258d792
MMDetection Compiler: GCC 11.2
MMDetection CUDA Compiler: 11.7
------------------------------------------------------------

2022-06-25 00:20:44,437 - mmdet - INFO - Distributed training: False
2022-06-25 00:20:44,437 - mmdet - INFO - Config:
/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/configs/dota/r50_dotav1.py
work_dir = 'work_dirs/r50_dotav1/'

# model settings
norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)

model = dict(
    type='OrientedRepPointsDetector',
    pretrained='torchvision://resnet50', 
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        style='pytorch',
    ),
    neck=
        dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs=True,
        num_outs=5,
        norm_cfg=norm_cfg
        ),
    bbox_head=dict(
        type='OrientedRepPointsHead',
        num_classes=16,
        in_channels=256,
        feat_channels=256,
        point_feat_channels=256,
        stacked_convs=3,
        num_points=9,
        gradient_mul=0.3,
        point_strides=[8, 16, 32, 64, 128],
        point_base_scale=2,
        norm_cfg=norm_cfg,
        loss_cls=dict(type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0),
        loss_rbox_init=dict(type='GIoULoss', loss_weight=0.375),
        loss_rbox_refine=dict(type='GIoULoss', loss_weight=1.0),
        loss_spatial_init=dict(type='SpatialBorderLoss', loss_weight=0.05),
        loss_spatial_refine=dict(type='SpatialBorderLoss', loss_weight=0.1),
        top_ratio=0.4,))
# training and testing settings
train_cfg = dict(
    init=dict(
        assigner=dict(type='PointAssigner', scale=4, pos_num=1),  # 每个gtbox仅选一个正样本
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    refine=dict(
        assigner=dict(
            type='MaxIoUAssigner', #pre-assign to select more samples for samples selection
            pos_iou_thr=0.1,
            neg_iou_thr=0.1,
            min_pos_iou=0,
            ignore_iof_thr=-1),
        allowed_border=-1,
        pos_weight=-1,
        debug=False))

test_cfg = dict(
    nms_pre=2000,
    min_bbox_size=0,
    score_thr=0.05,
    nms=dict(type='rnms', iou_thr=0.4),
    max_per_img=2000)

# dataset settings
dataset_type = 'DotaDatasetv1'
data_root = '/home/r/文档/WPW/Remote/DataSets/Dota-v1.5/' #'data/dataset_demo_split/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='CorrectRBBox', correct_rbbox=True, refine_rbbox=True),
    dict(type='PolyResize',
        img_scale=[(1333, 768), (1333, 1280)],
        keep_ratio=True,
        multiscale_mode='range',
        clamp_rbbox=False),
    dict(type='PolyRandomFlip', flip_ratio=0.5),
    #dict(type='HSVAugment', hgain=0.015, sgain=0.7, vgain=0.4),
    #dict(type='PolyRandomRotate', rotate_ratio=0.5, angles_range=180, auto_bound=False),
    dict(type='Pad', size_divisor=32),
    #dict(type='Poly_Mosaic_RandomPerspective', mosaic_ratio=0.5, ifcrop=True, degrees=0, translate=0.1, scale=0.2, shear=0, perspective=0.0),
    #dict(type='MixUp', mixup_ratio=0.5),
    dict(type='PolyImgPlot', img_save_path=work_dir, save_img_num=16, class_num=15, thickness=2),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='PolyResize', keep_ratio=True),
            dict(type='PolyRandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']), 
            dict(type='Collect', keys=['img']),
        ])
]

data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'trainval_split/' + 'trainval.json',
        img_prefix=data_root + 'trainval_split/' + 'images/',
        pipeline=train_pipeline,
        Mosaic4=False,
        Mosaic9=False,
        Mixup=False),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'trainval_split/' + 'trainval.json',
        img_prefix=data_root + 'trainval_split/' + 'images/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'test_split/' + 'test.json',
        img_prefix=data_root + 'test_split/' + 'images/',
        pipeline=test_pipeline))

evaluation = dict(interval=1, metric='bbox')
# optimizer
optimizer = dict(type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05,
                paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
                                                 'relative_position_bias_table': dict(decay_mult=0.),
                                                 'norm': dict(decay_mult=0.)}))
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[24, 32, 38])
checkpoint_config = dict(interval=20)
# yapf:disable
log_config = dict(
    interval=1,          # 迭代n次时打印一次
    hooks=[
        dict(type='TextLoggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 40
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None#'work_dirs/orientedreppoints_r50_demo/latest.pth'
workflow = [('train', 1)]


2022-06-25 00:20:44,666 - mmdet - INFO - load model from: torchvision://resnet50
2022-06-25 00:20:44,779 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

loading annotations into memory...
Done (t=4.07s)
creating index...
index created!
2022-06-25 00:20:50,462 - mmdet - INFO - Start running, host: r@4508, work_dir: /home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/work_dirs/r50_dotav1
2022-06-25 00:20:50,462 - mmdet - INFO - workflow: [('train', 1)], max: 40 epochs
Traceback (most recent call last):
  File "tools/train.py", line 154, in <module>
    main()
  File "tools/train.py", line 143, in main
    train_detector(
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/apis/train.py", line 105, in train_detector
    _non_dist_train(
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/apis/train.py", line 244, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/r/miniconda3/envs/orientedreppoints/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/r/miniconda3/envs/orientedreppoints/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 34, in train
    outputs = self.batch_processor(
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/apis/train.py", line 75, in batch_processor
    losses = model(**data)
  File "/home/r/miniconda3/envs/orientedreppoints/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/r/miniconda3/envs/orientedreppoints/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/r/miniconda3/envs/orientedreppoints/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/models/detectors/base.py", line 147, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/models/detectors/orientedreppoints_detector.py", line 31, in forward_train
    losses = self.bbox_head.loss(
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/models/anchor_heads/orientedreppoints_head.py", line 388, in loss
    cls_reg_targets_refine = refine_pointset_target(
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/core/bbox/pointset_target.py", line 148, in refine_pointset_target
    all_proposal_weights, pos_inds_list, neg_inds_list, all_gt_inds) = multi_apply(
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/core/utils/misc.py", line 24, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/core/bbox/pointset_target.py", line 190, in refine_pointset_target_single
    assign_result = bbox_assigner.assign(proposals, gt_rbboxes,
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/core/bbox/assigners/max_iou_assigner.py", line 80, in assign
    assign_result = self.assign_wrt_overlaps(overlaps, gt_labels)
  File "/home/r/文档/WPW/Remote/Projects/OrientedRepPoints_DOTA/mmdet/core/bbox/assigners/max_iou_assigner.py", line 92, in assign_wrt_overlaps
    assigned_gt_inds = overlaps.new_full((num_bboxes,),
RuntimeError: CUDA error: too many resources requested for launch

opened by Strontia 1

DOTA_TASK1验证问题

你好，我通过该项目swin-t跑出来的权重文件latest.pth在dotav1测试集上进行测试得到的Task_merged.zip文件足足有40M，result.pkl600M,上传到官网测试的精度也不正常。我尝试从https://github.com/LiWentomng/OrientedRepPoints下载了权重文件，然后加载到你的项目中测试Task_merged.zip为5M,result.pkl16M,精度是原论文的精度。我不知道我复现哪里出了问题了，我想请问一下您的测试权重和文件是怎么样的？

opened by zack2020-star 8

Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

Related tags

Overview

Oriented RepPoints for Aerial Object Detection

Introduction

New Feature

Installation

Getting Started

Results and Models

Visual results

Citation

Acknowledgements

You might also like...

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Yolo object detection - Yolo object detection with python

CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

OBBDetection is a oriented object detection library, which is based on MMdetection.

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

OBBDetection: an oriented object detection toolbox modified from MMdetection

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

Comments

ConnectionResetError: conection reset by peer.

精度问题

训练时报错：RuntimeError: CUDA error: too many resources requested for launch

DOTA_TASK1验证问题

Owner

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Implementation of the Swin Transformer in PyTorch.

Tensorflow implementation of Swin Transformer model.

The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

SwinIR: Image Restoration Using Swin Transformer

Image Restoration Using Swin Transformer for VapourSynth

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

This repository contains a CBIR system that uses swin transformer to extract image's feature.

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution