Location-Sensitive Visual Recognition with Cross-IOU Loss

Kaiwen Duan

Last update: Dec 25, 2022

Related tags

Overview

The trained models are temporarily unavailable, but you can train the code using reasonable computational resource.

Location-Sensitive Visual Recognition with Cross-IOU Loss

by Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang and Qi Tian

The code to train and evaluate the proposed LSNet is available here. For more technical details, please refer to our arXiv paper.

The location-sensitive visual recognition tasks, including object detection, instance segmentation, and human pose estimation, can be formulated into localizing an anchor point (in red) and a set of landmarks (in green). Our work aims to offer a unified framework for these tasks.

Abstract

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks. This paper summarizes these tasks as location-sensitive visual recognition and proposes a unified solution named location-sensitive network (LSNet). Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object. The key to optimizing the LSNet lies in the ability of fitting various scales, for which we design a novel loss function named cross-IOU loss that computes the cross-IOU of each anchor-landmark pair to approximate the global IOU between the prediction and groundtruth. The flexibly located and accurately predicted landmarks also enable LSNet to incorporate richer contextual information for visual recognition. Evaluated on the MSCOCO dataset, LSNet set the new state-of-the-art accuracy for anchor-free object detection (a 53.5% box AP) and instance segmentation (a 40.2% mask AP), and shows promising performance in detecting multi-scale human poses.

If you encounter any problems in using our code, please contact Kaiwen Duan: [email protected]

Bbox AP(%) on COCO test-dev

Method	Backbone	epoch	MS_train	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L

Anchor-based:
Libra R-CNN	X-101-64x4d	12	N	43.0	64.0	47.0	25.3	45.6	54.6
AB+FSAF*	X-101-64x4d	18	Y	44.6	65.2	48.6	29.7	47.1	54.6
FreeAnchor*	X-101-32x8d	24	Y	47.3	66.3	51.5	30.6	50.4	59.0
GFLV1*	X-101-32x8d	24	Y	48.2	67.4	52.6	29.2	51.7	60.2
ATSS*	X-101-64x4d-DCN	24	Y	50.7	68.9	56.3	33.2	52.9	62.4
PAA*	X-101-64x4d-DCN	24	Y	51.4	69.7	57.0	34.0	53.8	64.0
GFLV2*	R2-101-DCN	24	Y	53.3	70.9	59.2	35.7	56.1	65.6
YOLOv4-P7*	CSP-P7	450	Y	56.0	73.3	61.2	38.9	60.0	68.6

Anchor-free:
ExtremeNet*	HG-104	200	Y	43.2	59.8	46.4	24.1	46.0	57.1
RepPointsV1*	R-101-DCN	24	Y	46.5	67.4	50.9	30.3	49.7	57.1
SAPD	X-101-64x4d-DCN	24	Y	47.4	67.4	51.1	28.1	50.3	61.5
CornerNet*	HG-104	200	Y	42.1	57.8	45.3	20.8	44.8	56.7
DETR	R-101	500	Y	44.9	64.7	47.7	23.7	49.5	62.3
CenterNet*	HG-104	190	Y	47.0	64.5	50.7	28.9	49.9	58.9
CPNDet*	HG-104	100	Y	49.2	67.4	53.7	31.0	51.9	62.4
BorderDet*	X-101-64x4d-DCN	24	Y	50.3	68.9	55.2	32.8	52.8	62.3
FCOS-BiFPN	X-101-32x8-DCN	24	Y	50.4	68.9	55.0	33.2	53.0	62.7
RepPointsV2*	X-101-64x4d-DCN	24	Y	52.1	70.1	57.5	34.5	54.6	63.6

LSNet	R-50	24	Y	44.8	64.1	48.8	26.6	47.7	55.7
LSNet	X-101-64x4d	24	Y	48.2	67.6	52.6	29.6	51.3	60.5
LSNet	X-101-64x4d-DCN	24	Y	49.6	69.0	54.1	30.3	52.8	62.8
LSNet-CPV	X-101-64x4d-DCN	24	Y	50.4	69.4	54.5	31.0	53.3	64.0
LSNet-CPV	R2-101-DCN	24	Y	51.1	70.3	55.2	31.2	54.3	65.0
LSNet-CPV*	R2-101-DCN	24	Y	53.5	71.1	59.2	35.2	56.4	65.8

A comparison between LSNet and the sate-of-the-art methods in object detection on the MS-COCO test-dev set. LSNet surpasses all competitors in the anchor-free group. The abbreviations are: ‘R’ – ResNet, ‘X’ – ResNeXt, ‘HG’ – Hourglass network, ‘R2’ – Res2Net, ‘CPV’ – corner point verification, ‘MStrain’ – multi-scale training, * – multi-scale testing.

Segm AP(%) on COCO test-dev

Method	Backbone	epoch	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L

Pixel-based:
YOLACT	R-101	48	31.2	50.6	32.8	12.1	33.3	47.1
TensorMask	R-101	72	37.1	59.3	39.4	17.1	39.1	51.6
Mask R-CNN	X-101-32x4d	12	37.1	60.0	39.4	16.9	39.9	53.5
HTC	X-101-64x4d	20	41.2	63.9	44.7	22.8	43.9	54.6
DetectoRS*	X-101-64x4d	40	48.5	72.0	53.3	31.6	50.9	61.5

Contour-based:
ExtremeNet	HG-104	100	18.9	44.5	13.7	10.4	20.4	28.3
DeepSnake	DLA-34	120	30.3	-	-	-	-	-
PolarMask	X-101-64x4d-DCN	24	36.2	59.4	37.7	17.8	37.7	51.5

LSNet	X-101-64x4d-DCN	30	37.6	64.0	38.3	22.1	39.9	49.1
LSNet	R2-101-DCN	30	38.0	64.6	39.0	22.4	40.6	49.2
LSNet*	X-101-64x4d-DCN	30	39.7	65.5	41.3	25.5	41.3	50.4
LSNet*	R2-101-DCN	30	40.2	66.2	42.1	25.8	42.2	51.0

Comparison of LSNet to the sate-of-the-art methods in instance segmentation task on the COCO test-dev set. Our LSNet achieves the state-of-the-art accuracy for contour-based instance segmentation. ‘R’ - ResNet, ‘X’ - ResNeXt, ‘HG’ - Hourglass, ‘R2’ - Res2Net, * - multi-scale testing.

Keypoints AP(%) on COCO test-dev

Method	Backbone	epoch	AP	AP₅₀	AP₇₅	AP_M	AP_L

Heatmap-based:
CenterNet-jd	DLA-34	320	57.9	84.7	63.1	52.5	67.4
OpenPose	VGG-19	-	61.8	84.9	67.5	58.0	70.4
Pose-AE	HG	300	62.8	84.6	69.2	57.5	70.6
CenterNet-jd	HG104	150	63.0	86.8	69.6	58.9	70.4
Mask R-CNN	R-50	28	63.1	87.3	68.7	57.8	71.4
PersonLab	R-152	>1000	66.5	85.5	71.3	62.3	70.0
HRNet	HRNet-W32	210	74.9	92.5	82.8	71.3	80.9

Regression-based:
CenterNet-reg [66]	DLA-34	320	51.7	81.4	55.2	44.6	63.0
CenterNet-reg [66]	HG-104	150	55.0	83.5	59.7	49.4	64.0

LSNet w/ obj-box	X-101-64x4d-DCN	60	55.7	81.3	61.0	52.9	60.5
LSNet w/ kps-box	X-101-64x4d-DCN	20	59.0	83.6	65.2	53.3	67.9

Comparison of LSNet to the sate-of-the-art methods in pose estimation task on the COCO test-dev set. LSNet predict the keypoints by regression. ‘obj-box’ and ‘kps-box’ denote the object bounding boxes and the keypoint-boxes, respectively. For LSNet w/ kps-box, we fine-tune the model from the LSNet w/ kps-box for another 20 epochs.

Visualization

Some location-sensitive visual recognition results on the MS-COCO validation set.

We compared with the CenterNet to show that our LSNet w/ ‘obj-box’ tends to predict more human pose of small scales, which are not annotated on the dataset. Only pose results with scores higher than 0:3 are shown for both methods.

Left: LSNet uses the object bounding boxes to assign training samples. Right: LSNet uses the keypoint-boxes to assign training samples. Although LSNet with keypoint-boxes enjoys higher AP score, its ability of perceiving multi-scale human instances is weakened.

Preparation

The master branch works with PyTorch 1.5.0

The dataset directory should be like this:

├── data
│   ├── coco
│   │   ├── annotations
│   │   ├── images
            ├── train2017
            ├── val2017
            ├── test2017

Generate extreme point annotation from segmentation:

cd code/tools
python gen_coco_lsvr.py
cd ..

Installation

1. Installing cocoapi

cd cocoapi/pycocotools
python setup.py develop
cd ../..

2. Installing mmcv

cd mmcv
pip install -e.
cd ..

3. Installing mmdet

python setup.py develop

Training and Evaluation

Our LSNet is based on mmdetection. Please check with existing dataset for Training and Evaluation.

Comments

RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:23
**Met a problem during training,

the environment is:**

sys.platform: linux Python: 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] CUDA available: True CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.0_bu.TC445_37.28540450_0 GPU 0,1: GeForce GTX 1080 Ti GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 10.2

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37

CuDNN 7.6.5

Magma 2.5.2

Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0 OpenCV: 4.5.2 MMCV: 0.6.2 MMDetection: 2.2.0+unknown MMDetection Compiler: GCC 5.4 MMDetection CUDA Compiler: 10.1

I follow the instruction in README and the documents of MMDetection, but the process of training always crashes because of the following error:

2021-05-27 16:58:06,202 - mmdet - INFO - Epoch [1][1250/10687] lr: 1.000e-02, eta: 2 days, 6:48:27, time: 1.571, data_time: 0.010, memory: 3447, loss_cls: 0.4738, loss_bbox_init: 0.0343, loss_bbox_refine: 0.0738, loss_pose_init: 0.7736, loss_pose_refine: 1.4652, loss: 2.8207, grad_norm: 1.2611

Traceback (most recent call last): File "code/tools/train.py", line 159, in main() File "code/tools/train.py", line 155, in main meta=meta) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/apis/train.py", line 128, in train_detector runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/gy/receive_client/yuanye/LSNet/code/mmcv/mmcv/runner/epoch_based_runner.py", line 122, in run epoch_runner(data_loaders[i], **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmcv/mmcv/runner/epoch_based_runner.py", line 32, in train **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmcv/mmcv/parallel/data_parallel.py", line 31, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/detectors/base.py", line 237, in train_step losses = self(**data) File "/home/gy/anaconda3/envs/lsnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/core/fp16/decorators.py", line 51, in new_func return old_func(*args, **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/detectors/base.py", line 172, in forward return self.forward_train(img, img_metas, **kwargs) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/detectors/lsnet.py", line 55, in forward_train gt_masks, gt_labels, gt_bboxes_ignore) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 472, in forward_train losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 1374, in loss label_channels=label_channels) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 978, in get_targets unmap_outputs=unmap_outputs) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/core/utils/misc.py", line 54, in multi_apply return tuple(map(list, zip(*map_results))) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 831, in _target_single gt_bboxes_ignore, gt_labels) File "/home/gy/receive_client/yuanye/LSNet/code/mmdet/core/bbox/assigners/atss_assigner.py", line 110, in assign self.topk, dim=0, largest=False) RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:23

Many thanks to anyone who can help me :)
opened by Yuanye-F 10
A problem when I run your code

When I run the code, I meet one problem that I cannot solve.

The error is: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 9, 7, 5]], which is output 0 of AsStridedBackward, is at version 6; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

the mmdet version is 2.2.0, and the pytorch version is 1.7.0

Thanks a lot for your help.

opened by xiehousen 6

Error with segmentation

Thanks for your good job! When I tested the code on the segmentation task, there are the following bug report:

2021-05-14 21:13:25,373 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
Traceback (most recent call last):
  File "tools/train.py", line 159, in <module>
    main()
  File "tools/train.py", line 155, in main
    meta=meta)
  File "/home/bit/ming7/LSNet/mmdet/apis/train.py", line 128, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/bit/ming7/LSNet/mmcv/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/bit/ming7/LSNet/mmcv/mmcv/runner/epoch_based_runner.py", line 27, in train
    for i, data_batch in enumerate(data_loader):
  File "/home/bit/anaconda2/envs/lsnet/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/bit/anaconda2/envs/lsnet/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
    return self._process_data(data)
  File "/home/bit/anaconda2/envs/lsnet/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/bit/anaconda2/envs/lsnet/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
TypeError: __init__() missing 4 required positional arguments: 'casting', 'from_', 'to', and 'i'

It seems that there is something wrong with my annotation? (refer to this url). But I have no idea how to solve it. Hope to get your reply, thanks !

opened by ming71 4

cross iou 代码细节问题

https://github.com/Duankaiwen/LSNet/blob/9b51ffcea4215ad981595ae7bb544b93e4b5f6fd/code/mmdet/models/losses/cross_iou_loss.py#L63

作者您好，感谢您开源优秀的工作。在研究代码的时候，对这里loss_type == 'segm' 时的stride=9不是很理解。请问对于segmentation任务来说，这个stride的含义和作用是什么呢？为什么loss_type =='bbox'时不需要它呢？

祝您新春快乐，工作顺利！

opened by dongdongyee 2
errors on training custom instance segmentation dataset
I have prepared a custom instance segmentation dataset, which contains 5 classes (not counting background). It worked fine on original MMDetection framework training (such as: detectoRS, mask-RCNN, HTC), but when I modified file lsnet_segm_r50_fpn_1x_coco.py to train on this dataset, the system report errors:

File "/data2/lixuan/workspace/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 1299, in loss gt_polygons, gt_bboxes = self.process_polygons(gt_masks, cls_scores) File "/data2/lixuan/workspace/LSNet/code/mmdet/models/dense_heads/lsnet_head.py", line 1742, in process_polygons gt_polygons_stack = torch.stack(gt_polygons) RuntimeError: stack expects a non-empty TensorList

I checked file lsnet_head.py, and found that the gt_masks is empty:

def forward_train(self, x, img_metas, gt_bboxes, gt_extremes = None, gt_keypoints = None, gt_masks = None, gt_labels = None, gt_bboxes_ignore=None, proposal_cfg = None, **kwargs): outs = self(x) print(gt_masks) input()

results: [PolygonMasks(num_masks=0, height=800, width=1088)]

what causes this error and how can I solve it.
opened by kklots 2
The more weights

Thank you for your good work. Can you provide the weight of the model on instance segmentation (using resnet50 and resnet101 to train for 12 epochs without multi-scale training)

opened by hero-y 1
bbox-pose training results

config: lsnet_pose_bbox_r50_fpn_1x_coco.py

2021-05-31 22:31:03,588 - mmdet - INFO - Evaluating bbox... Loading and preparing results... DONE (t=0.75s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=3.70s). Accumulating evaluation results... DONE (t=0.70s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.448 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.625 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.499 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.151 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.607 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.734 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.186 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.482 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.509 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.150 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.700 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.816 2021-05-31 22:31:08,760 - mmdet - INFO - Evaluating keypoints... Loading and preparing results... DONE (t=1.40s) creating index... index created! Running per image evaluation... Evaluate annotation type keypoints DONE (t=5.90s). Accumulating evaluation results... DONE (t=0.27s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.401 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.732 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.392 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.389 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.450 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.510 Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.818 Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.532 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.469 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.567 2021-05-31 22:31:16,511 - mmdet - INFO - Saving checkpoint at 12 epochs 2021-05-31 22:31:16,811 - mmdet - INFO - Epoch(val) [12][16029] bbox_mAP: 0.4480, bbox_mAP_50: 0.6250, bbox_mAP_75: 0.4990, bbox_mAP_s: 0.1510, bbox_mAP_m: 0.6070, bbox_mAP_l: 0.7340, bbox_mAP_copypaste: 0.448 0.625 0.499 0.151 0.607 0.734, keypoints_mAP: 0.4010, keypoints_mAP_50: 0.7320, keypoints_mAP_75: 0.3920, keypoints_mAP_s: 0.3890, keypoints_mAP_m: 0.4500, keypoints_mAP_l: 0.5100, keypoints_mAP_copypaste: 0.401 0.732 0.392 0.389 0.450 0.510

opened by eeric 1

Owner

Kaiwen Duan

GitHub

A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection 1. 介绍用以替代 NMS，在所有 bbox 中挑选出最优的集合。 NMS 仅考虑了 bbox 的得分，然后根据 IOU 来

44 Sep 15, 2022

Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

177 Dec 29, 2022

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

147 Dec 5, 2022

Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

11 Oct 21, 2021

《Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching》(CVPR 2020)

This contains the codes for cross-view geo-localization method described in: Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching, CVPR2020.

41 Oct 27, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

71 Dec 4, 2022

Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

This is the implementation of "Training deep neural networks via direct loss minimization" published at ICML 2016 in PyTorch. The implementation targe

1 Jan 18, 2022

Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Cross-Quality Labeled Faces in the Wild (XQLFW) Here, we release the database, evaluation protocol and code for the following paper: Cross Quality LFW

10 Dec 12, 2022

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

BMW-Anonymization-Api Data privacy and individuals’ anonymity are and always have been a major concern for data-driven companies. Therefore, we design

148 Dec 21, 2022

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks [Paper] [Project Website] This repository holds the source code, pretra

83 Dec 21, 2022

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

46 Dec 21, 2022

A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

1 Nov 19, 2021

Pull sensitive data from users on windows including discord tokens and chrome data.

⭐ For a ?? Pegasus Pull sensitive data from users on windows including discord tokens and chrome data. Features ?? Discord tokens ?? Geolocation data

44 Dec 31, 2022

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

70 Jan 3, 2023

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

28 Dec 30, 2022

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

18 Dec 9, 2022

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

32 Dec 13, 2022