Location-Sensitive Visual Recognition with Cross-IOU Loss

Overview

The trained models are temporarily unavailable, but you can train the code using reasonable computational resource.

Location-Sensitive Visual Recognition with Cross-IOU Loss

by Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang and Qi Tian

The code to train and evaluate the proposed LSNet is available here. For more technical details, please refer to our arXiv paper.

The location-sensitive visual recognition tasks, including object detection, instance segmentation, and human pose estimation, can be formulated into localizing an anchor point (in red) and a set of landmarks (in green). Our work aims to offer a unified framework for these tasks.

Abstract

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks. This paper summarizes these tasks as location-sensitive visual recognition and proposes a unified solution named location-sensitive network (LSNet). Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object. The key to optimizing the LSNet lies in the ability of fitting various scales, for which we design a novel loss function named cross-IOU loss that computes the cross-IOU of each anchor-landmark pair to approximate the global IOU between the prediction and groundtruth. The flexibly located and accurately predicted landmarks also enable LSNet to incorporate richer contextual information for visual recognition. Evaluated on the MSCOCO dataset, LSNet set the new state-of-the-art accuracy for anchor-free object detection (a 53.5% box AP) and instance segmentation (a 40.2% mask AP), and shows promising performance in detecting multi-scale human poses.

If you encounter any problems in using our code, please contact Kaiwen Duan: [email protected]

Bbox AP(%) on COCO test-dev

Method Backbone epoch MStrain AP AP50 AP75 APS APM APL
Anchor-based:
Libra R-CNN X-101-64x4d 12 N 43.0 64.0 47.0 25.3 45.6 54.6
AB+FSAF* X-101-64x4d 18 Y 44.6 65.2 48.6 29.7 47.1 54.6
FreeAnchor* X-101-32x8d 24 Y 47.3 66.3 51.5 30.6 50.4 59.0
GFLV1* X-101-32x8d 24 Y 48.2 67.4 52.6 29.2 51.7 60.2
ATSS* X-101-64x4d-DCN 24 Y 50.7 68.9 56.3 33.2 52.9 62.4
PAA* X-101-64x4d-DCN 24 Y 51.4 69.7 57.0 34.0 53.8 64.0
GFLV2* R2-101-DCN 24 Y 53.3 70.9 59.2 35.7 56.1 65.6
YOLOv4-P7* CSP-P7 450 Y 56.0 73.3 61.2 38.9 60.0 68.6
Anchor-free:
ExtremeNet* HG-104 200 Y 43.2 59.8 46.4 24.1 46.0 57.1
RepPointsV1* R-101-DCN 24 Y 46.5 67.4 50.9 30.3 49.7 57.1
SAPD X-101-64x4d-DCN 24 Y 47.4 67.4 51.1 28.1 50.3 61.5
CornerNet* HG-104 200 Y 42.1 57.8 45.3 20.8 44.8 56.7
DETR R-101 500 Y 44.9 64.7 47.7 23.7 49.5 62.3
CenterNet* HG-104 190 Y 47.0 64.5 50.7 28.9 49.9 58.9
CPNDet* HG-104 100 Y 49.2 67.4 53.7 31.0 51.9 62.4
BorderDet* X-101-64x4d-DCN 24 Y 50.3 68.9 55.2 32.8 52.8 62.3
FCOS-BiFPN X-101-32x8-DCN 24 Y 50.4 68.9 55.0 33.2 53.0 62.7
RepPointsV2* X-101-64x4d-DCN 24 Y 52.1 70.1 57.5 34.5 54.6 63.6
LSNet R-50 24 Y 44.8 64.1 48.8 26.6 47.7 55.7
LSNet X-101-64x4d 24 Y 48.2 67.6 52.6 29.6 51.3 60.5
LSNet X-101-64x4d-DCN 24 Y 49.6 69.0 54.1 30.3 52.8 62.8
LSNet-CPV X-101-64x4d-DCN 24 Y 50.4 69.4 54.5 31.0 53.3 64.0
LSNet-CPV R2-101-DCN 24 Y 51.1 70.3 55.2 31.2 54.3 65.0
LSNet-CPV* R2-101-DCN 24 Y 53.5 71.1 59.2 35.2 56.4 65.8

A comparison between LSNet and the sate-of-the-art methods in object detection on the MS-COCO test-dev set. LSNet surpasses all competitors in the anchor-free group. The abbreviations are: ‘R’ – ResNet, ‘X’ – ResNeXt, ‘HG’ – Hourglass network, ‘R2’ – Res2Net, ‘CPV’ – corner point verification, ‘MStrain’ – multi-scale training, * – multi-scale testing.

Segm AP(%) on COCO test-dev

Method Backbone epoch AP AP50 AP75 APS APM APL
Pixel-based:
YOLACT R-101 48 31.2 50.6 32.8 12.1 33.3 47.1
TensorMask R-101 72 37.1 59.3 39.4 17.1 39.1 51.6
Mask R-CNN X-101-32x4d 12 37.1 60.0 39.4 16.9 39.9 53.5
HTC X-101-64x4d 20 41.2 63.9 44.7 22.8 43.9 54.6
DetectoRS* X-101-64x4d 40 48.5 72.0 53.3 31.6 50.9 61.5
Contour-based:
ExtremeNet HG-104 100 18.9 44.5 13.7 10.4 20.4 28.3
DeepSnake DLA-34 120 30.3 - - - - -
PolarMask X-101-64x4d-DCN 24 36.2 59.4 37.7 17.8 37.7 51.5
LSNet X-101-64x4d-DCN 30 37.6 64.0 38.3 22.1 39.9 49.1
LSNet R2-101-DCN 30 38.0 64.6 39.0 22.4 40.6 49.2
LSNet* X-101-64x4d-DCN 30 39.7 65.5 41.3 25.5 41.3 50.4
LSNet* R2-101-DCN 30 40.2 66.2 42.1 25.8 42.2 51.0

Comparison of LSNet to the sate-of-the-art methods in instance segmentation task on the COCO test-dev set. Our LSNet achieves the state-of-the-art accuracy for contour-based instance segmentation. ‘R’ - ResNet, ‘X’ - ResNeXt, ‘HG’ - Hourglass, ‘R2’ - Res2Net, * - multi-scale testing.

Keypoints AP(%) on COCO test-dev

Method Backbone epoch AP AP50 AP75 APM APL
Heatmap-based:
CenterNet-jd DLA-34 320 57.9 84.7 63.1 52.5 67.4
OpenPose VGG-19 - 61.8 84.9 67.5 58.0 70.4
Pose-AE HG 300 62.8 84.6 69.2 57.5 70.6
CenterNet-jd HG104 150 63.0 86.8 69.6 58.9 70.4
Mask R-CNN R-50 28 63.1 87.3 68.7 57.8 71.4
PersonLab R-152 >1000 66.5 85.5 71.3 62.3 70.0
HRNet HRNet-W32 210 74.9 92.5 82.8 71.3 80.9
Regression-based:
CenterNet-reg [66] DLA-34 320 51.7 81.4 55.2 44.6 63.0
CenterNet-reg [66] HG-104 150 55.0 83.5 59.7 49.4 64.0
LSNet w/ obj-box X-101-64x4d-DCN 60 55.7 81.3 61.0 52.9 60.5
LSNet w/ kps-box X-101-64x4d-DCN 20 59.0 83.6 65.2 53.3 67.9

Comparison of LSNet to the sate-of-the-art methods in pose estimation task on the COCO test-dev set. LSNet predict the keypoints by regression. ‘obj-box’ and ‘kps-box’ denote the object bounding boxes and the keypoint-boxes, respectively. For LSNet w/ kps-box, we fine-tune the model from the LSNet w/ kps-box for another 20 epochs.

Visualization

Some location-sensitive visual recognition results on the MS-COCO validation set.

We compared with the CenterNet to show that our LSNet w/ ‘obj-box’ tends to predict more human pose of small scales, which are not annotated on the dataset. Only pose results with scores higher than 0:3 are shown for both methods.

Left: LSNet uses the object bounding boxes to assign training samples. Right: LSNet uses the keypoint-boxes to assign training samples. Although LSNet with keypoint-boxes enjoys higher AP score, its ability of perceiving multi-scale human instances is weakened.

Preparation

The master branch works with PyTorch 1.5.0

The dataset directory should be like this:

├── data
│   ├── coco
│   │   ├── annotations
│   │   ├── images
            ├── train2017
            ├── val2017
            ├── test2017

Generate extreme point annotation from segmentation:

  • cd code/tools
  • python gen_coco_lsvr.py
  • cd ..

Installation

1. Installing cocoapi
  • cd cocoapi/pycocotools
  • python setup.py develop
  • cd ../..
2. Installing mmcv
  • cd mmcv
  • pip install -e.
  • cd ..
3. Installing mmdet
  • python setup.py develop

Training and Evaluation

Our LSNet is based on mmdetection. Please check with existing dataset for Training and Evaluation.

Comments
  • RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:23

    RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:23

    **Met a problem during training,

    the environment is:**

    sys.platform: linux Python: 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] CUDA available: True CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.0_bu.TC445_37.28540450_0 GPU 0,1: GeForce GTX 1080 Ti GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:

    • GCC 7.3
    • C++ Version: 201402
    • Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 10.2
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
    • CuDNN 7.6.5
    • Magma 2.5.2
    • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

    TorchVision: 0.6.0 OpenCV: 4.5.2 MMCV: 0.6.2 MMDetection: 2.2.0+unknown MMDetection Compiler: GCC 5.4 MMDetection CUDA Compiler: 10.1

    I follow the instruction in README and the documents of MMDetection, but the process of training always crashes because of the following error:

    2021-05-27 16:58:06,202 - mmdet - INFO - Epoch [1][1250/10687] lr: 1.000e-02, eta: 2 days, 6:48:27, time: 1.571, data_time: 0.010, memory: 3447, loss_cls: 0.4738, loss_bbox_init: 0.0343, loss_bbox_refine: 0.0738, loss_pose_init: 0.7736, loss_pose_refine: 1.4652, loss: 2.8207, grad_norm: 1.2611

    Traceback (most recent call last): File "code/tools/train.py", line 159, in main() File "code/tools/train.py", line 155, in main meta=meta) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/apis/train.py", line 128, in train_detector runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/gy/receive_client/yuanye/LSNet/code/mmcv/mmcv/runner/epoch_based_runner.py", line 122, in run epoch_runner(data_loaders[i], **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmcv/mmcv/runner/epoch_based_runner.py", line 32, in train **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmcv/mmcv/parallel/data_parallel.py", line 31, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/detectors/base.py", line 237, in train_step losses = self(**data) File "/home/gy/anaconda3/envs/lsnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/core/fp16/decorators.py", line 51, in new_func return old_func(*args, **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/detectors/base.py", line 172, in forward return self.forward_train(img, img_metas, **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/detectors/lsnet.py", line 55, in forward_train gt_masks, gt_labels, gt_bboxes_ignore) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 472, in forward_train losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 1374, in loss label_channels=label_channels) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 978, in get_targets unmap_outputs=unmap_outputs) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/core/utils/misc.py", line 54, in multi_apply return tuple(map(list, zip(*map_results))) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 831, in _target_single gt_bboxes_ignore, gt_labels) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/core/bbox/assigners/atss_assigner.py", line 110, in assign self.topk, dim=0, largest=False) RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:23

    Many thanks to anyone who can help me :)

    opened by Yuanye-F 10
  • A problem when I run your code

    A problem when I run your code

    When I run the code, I meet one problem that I cannot solve.

    The error is: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 9, 7, 5]], which is output 0 of AsStridedBackward, is at version 6; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

    the mmdet version is 2.2.0, and the pytorch version is 1.7.0

    Thanks a lot for your help.

    opened by xiehousen 6
  • Error with segmentation

    Error with segmentation

    Thanks for your good job! When I tested the code on the segmentation task, there are the following bug report:

    2021-05-14 21:13:25,373 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
    Traceback (most recent call last):
      File "tools/train.py", line 159, in <module>
        main()
      File "tools/train.py", line 155, in main
        meta=meta)
      File "/home/bit/ming7/LSNet/mmdet/apis/train.py", line 128, in train_detector
        runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
      File "/home/bit/ming7/LSNet/mmcv/mmcv/runner/epoch_based_runner.py", line 122, in run
        epoch_runner(data_loaders[i], **kwargs)
      File "/home/bit/ming7/LSNet/mmcv/mmcv/runner/epoch_based_runner.py", line 27, in train
        for i, data_batch in enumerate(data_loader):
      File "/home/bit/anaconda2/envs/lsnet/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
        data = self._next_data()
      File "/home/bit/anaconda2/envs/lsnet/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
        return self._process_data(data)
      File "/home/bit/anaconda2/envs/lsnet/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
        data.reraise()
      File "/home/bit/anaconda2/envs/lsnet/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
        raise self.exc_type(msg)
    TypeError: __init__() missing 4 required positional arguments: 'casting', 'from_', 'to', and 'i'
    

    It seems that there is something wrong with my annotation? (refer to this url). But I have no idea how to solve it. Hope to get your reply, thanks !

    opened by ming71 4
  • cross iou 代码细节问题

    cross iou 代码细节问题

    https://github.com/Duankaiwen/LSNet/blob/9b51ffcea4215ad981595ae7bb544b93e4b5f6fd/code/mmdet/models/losses/cross_iou_loss.py#L63

    作者您好,感谢您开源优秀的工作。在研究代码的时候,对这里loss_type == 'segm' 时的stride=9不是很理解。请问对于segmentation任务来说,这个stride的含义和作用是什么呢?为什么loss_type =='bbox'时不需要它呢?

    祝您新春快乐,工作顺利!

    opened by dongdongyee 2
  • errors on training custom instance segmentation dataset

    errors on training custom instance segmentation dataset

    I have prepared a custom instance segmentation dataset, which contains 5 classes (not counting background). It worked fine on original MMDetection framework training (such as: detectoRS, mask-RCNN, HTC), but when I modified file lsnet_segm_r50_fpn_1x_coco.py to train on this dataset, the system report errors:

    File "/data2/lixuan/workspace/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 1299, in loss gt_polygons, gt_bboxes = self.process_polygons(gt_masks, cls_scores) File "/data2/lixuan/workspace/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 1742, in process_polygons gt_polygons_stack = torch.stack(gt_polygons) RuntimeError: stack expects a non-empty TensorList

    I checked file lsnet_head.py, and found that the gt_masks is empty:

    def forward_train(self,
                      x,
                      img_metas,
                      gt_bboxes,
                      gt_extremes = None,
                      gt_keypoints = None,
                      gt_masks = None,
                      gt_labels = None,
                      gt_bboxes_ignore=None,
                      proposal_cfg = None,
                      **kwargs):
        outs = self(x)
        print(gt_masks)
        input()
    

    results: [PolygonMasks(num_masks=0, height=800, width=1088)]

    what causes this error and how can I solve it.

    opened by kklots 2
  • The more weights

    The more weights

    Thank you for your good work. Can you provide the weight of the model on instance segmentation (using resnet50 and resnet101 to train for 12 epochs without multi-scale training)

    opened by hero-y 1
  • bbox-pose training results

    bbox-pose training results

    config: lsnet_pose_bbox_r50_fpn_1x_coco.py

    2021-05-31 22:31:03,588 - mmdet - INFO - Evaluating bbox... Loading and preparing results... DONE (t=0.75s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=3.70s). Accumulating evaluation results... DONE (t=0.70s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.448 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.625 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.499 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.151 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.607 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.734 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.186 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.482 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.509 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.150 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.700 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.816 2021-05-31 22:31:08,760 - mmdet - INFO - Evaluating keypoints... Loading and preparing results... DONE (t=1.40s) creating index... index created! Running per image evaluation... Evaluate annotation type keypoints DONE (t=5.90s). Accumulating evaluation results... DONE (t=0.27s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.401 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.732 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.392 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.389 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.450 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.510 Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.818 Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.532 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.469 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.567 2021-05-31 22:31:16,511 - mmdet - INFO - Saving checkpoint at 12 epochs 2021-05-31 22:31:16,811 - mmdet - INFO - Epoch(val) [12][16029] bbox_mAP: 0.4480, bbox_mAP_50: 0.6250, bbox_mAP_75: 0.4990, bbox_mAP_s: 0.1510, bbox_mAP_m: 0.6070, bbox_mAP_l: 0.7340, bbox_mAP_copypaste: 0.448 0.625 0.499 0.151 0.607 0.734, keypoints_mAP: 0.4010, keypoints_mAP_50: 0.7320, keypoints_mAP_75: 0.3920, keypoints_mAP_s: 0.3890, keypoints_mAP_m: 0.4500, keypoints_mAP_l: 0.5100, keypoints_mAP_copypaste: 0.401 0.732 0.392 0.389 0.450 0.510

    opened by eeric 1
Owner
Kaiwen Duan
Kaiwen Duan
A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection 1. 介绍 用以替代 NMS,在所有 bbox 中挑选出最优的集合。 NMS 仅考虑了 bbox 的得分,然后根据 IOU 来

null 44 Sep 15, 2022
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 5, 2022
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 11 Oct 21, 2021
《Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching》(CVPR 2020)

This contains the codes for cross-view geo-localization method described in: Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching, CVPR2020.

null 41 Oct 27, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

Ramin Nakhli 71 Dec 4, 2022
Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

This is the implementation of "Training deep neural networks via direct loss minimization" published at ICML 2016 in PyTorch. The implementation targe

Cuong Nguyen 1 Jan 18, 2022
Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Cross-Quality Labeled Faces in the Wild (XQLFW) Here, we release the database, evaluation protocol and code for the following paper: Cross Quality LFW

Martin Knoche 10 Dec 12, 2022
BMW TechOffice MUNICH 148 Dec 21, 2022
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks [Paper] [Project Website] This repository holds the source code, pretra

Humam Alwassel 83 Dec 21, 2022
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

Mathieu Godbout 1 Nov 19, 2021
Pull sensitive data from users on windows including discord tokens and chrome data.

⭐ For a ?? Pegasus Pull sensitive data from users on windows including discord tokens and chrome data. Features ?? Discord tokens ?? Geolocation data

Addi 44 Dec 31, 2022
Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Seulki Park 70 Jan 3, 2023
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

AdapterHub 18 Dec 9, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022