Complete-IoU (CIoU) Loss and Cluster-NMS for Object Detection and Instance Segmentation (YOLACT)

Overview

Complete-IoU Loss and Cluster-NMS for Improving Object Detection and Instance Segmentation.

Our paper is accepted by IEEE Transactions on Cybernetics (TCYB).

This repo is based on YOLACT++.

This is the code for our papers:

@Inproceedings{zheng2020diou,
  author    = {Zheng, Zhaohui and Wang, Ping and Liu, Wei and Li, Jinze and Ye, Rongguang and Ren, Dongwei},
  title     = {Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression},
  booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)},
  year      = {2020},
}

@Article{zheng2021ciou,
  author    = {Zheng, Zhaohui and Wang, Ping and Ren, Dongwei and Liu, Wei and Ye, Rongguang and Hu, Qinghua and Zuo, Wangmeng},
  title     = {Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation},
  booktitle = {IEEE Transactions on Cybernetics},
  year      = {2021},
}

Description of Cluster-NMS and Its Usage

An example diagram of our Cluster-NMS, where X denotes IoU matrix which is calculated by X=jaccard(boxes,boxes).triu_(diagonal=1) > nms_thresh after sorted by score descending. (Here use 0,1 for visualization.)

The inputs of NMS are boxes with size [n,4] and scores with size [80,n]. (take coco as example)

There are two ways for NMS. One is that all classes have the same number of boxes. First, we use top k=200 to select the top 200 detections for every class. Then boxes will be [80,m,4], where m<=200. Do Cluster-NMS and keep the boxes with scores>0.01. Finally, return top 100 boxes across all classes.

The other approach is that different classes have different numbers of boxes. First, we use a score threshold (e.g. 0.01) to filter out most low score detection boxes. It results in the number of remaining boxes in different classes may be different. Then put all the boxes together and sorted by score descending. (Note that the same box may appear more than once, because its scores of multiple classes are greater than the threshold 0.01.) Adding offset for all the boxes according to their class labels. (use torch.arange(0,80).) For example, since the coordinates (x1,y1,x2,y2) of all the boxes are on interval (0,1). By adding offset, if a box belongs to class 61, its coordinates will on interval (60,61). After that, the IoU of boxes belonging to different classes will be 0. (because they are treated as different clusters.) Do Cluster-NMS and return top 100 boxes across all classes. (For this method, please refer to another our repository https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/detection/detection.py)

Getting Started

1) New released! CIoU and Cluster-NMS

  1. YOLACT (See YOLACT)

  2. YOLOv3-pytorch https://github.com/Zzh-tju/ultralytics-YOLOv3-Cluster-NMS

  3. YOLOv5 (Support batch mode Cluster-NMS. It will speed up NMS when turning on test-time augmentation like multi-scale testing.) https://github.com/Zzh-tju/yolov5

  4. SSD-pytorch https://github.com/Zzh-tju/DIoU-SSD-pytorch

2) DIoU and CIoU losses into Detection Algorithms

DIoU and CIoU losses are incorporated into state-of-the-art detection algorithms, including YOLO v3, SSD and Faster R-CNN. The details of implementation and comparison can be respectively found in the following links.

  1. YOLO v3 https://github.com/Zzh-tju/DIoU-darknet

  2. SSD https://github.com/Zzh-tju/DIoU-SSD-pytorch

  3. Faster R-CNN https://github.com/Zzh-tju/DIoU-pytorch-detectron

  4. Simulation Experiment https://github.com/Zzh-tju/DIoU

YOLACT

Codes location and options

Please take a look at ciou function of layers/modules/multibox_loss.py for our CIoU loss implementation in PyTorch.

Currently, NMS surports two modes: (See eval.py)

  1. Cross-class mode, which ignores classes. (cross_class_nms=True, faster than per-class mode but with a slight performance drop.)

  2. Per-class mode. (cross_class_nms=False)

Currently, NMS supports fast_nms, cluster_nms, cluster_diounms, spm, spm_dist, spm_dist_weighted.

See layers/functions/detection.py for our Cluster-NMS implementation in PyTorch.

Installation

In order to use YOLACT++, make sure you compile the DCNv2 code.

  • Clone this repository and enter it:
    git clone https://github.com/Zzh-tju/CIoU.git
    cd yolact
  • Set up the environment using one of the following methods:
    • Using Anaconda
      • Run conda env create -f environment.yml
    • Manually with pip
      • Set up a Python3 environment (e.g., using virtenv).
      • Install Pytorch 1.0.1 (or higher) and TorchVision.
      • Install some other packages:
        # Cython needs to be installed before pycocotools
        pip install cython
        pip install opencv-python pillow pycocotools matplotlib 
  • If you'd like to train YOLACT, download the COCO dataset and the 2014/2017 annotations. Note that this script will take a while and dump 21gb of files into ./data/coco.
    sh data/scripts/COCO.sh
  • If you'd like to evaluate YOLACT on test-dev, download test-dev with this script.
    sh data/scripts/COCO_test.sh
  • If you want to use YOLACT++, compile deformable convolutional layers (from DCNv2). Make sure you have the latest CUDA toolkit installed from NVidia's Website.
    cd external/DCNv2
    python setup.py build develop

Evaluation

Here are our YOLACT models (released on May 5th, 2020) along with their FPS on a GTX 1080 Ti and mAP on coco 2017 val:

The training is carried on two GTX 1080 Ti with command: python train.py --config=yolact_base_config --batch_size=8

Image Size Backbone Loss NMS FPS box AP mask AP Weights
550 Resnet101-FPN SL1 Fast NMS 30.6 31.5 29.1 SL1.pth
550 Resnet101-FPN CIoU Fast NMS 30.6 32.1 29.6 CIoU.pth

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands. The name of each config is everything before the numbers in the file name (e.g., yolact_base for yolact_base_54_800000.pth).

Quantitative Results on COCO

# Quantitatively evaluate a trained model on the entire validation set. Make sure you have COCO downloaded as above.

# Output a COCOEval json to submit to the website or to use the run_coco_eval.py script.
# This command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json

# You can run COCOEval on the files created in the previous command. The performance should match my implementation in eval.py.
python run_coco_eval.py

# To output a coco json file for test-dev, make sure you have test-dev downloaded from above and go
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json --dataset=coco2017_testdev_dataset

Qualitative Results on COCO

# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.15.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --display

Cluster-NMS Using Benchmark on COCO

python eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark

Hardware

  • 1 GTX 1080 Ti
  • Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz
Image Size Backbone Loss NMS FPS box AP box AP75 box AR100 mask AP mask AP75 mask AR100
550 Resnet101-FPN CIoU Fast NMS 30.6 32.1 33.9 43.0 29.6 30.9 40.3
550 Resnet101-FPN CIoU Original NMS 11.5 32.5 34.1 45.1 29.7 31.0 41.7
550 Resnet101-FPN CIoU Cluster-NMS 28.8 32.5 34.1 45.2 29.7 31.0 41.7
550 Resnet101-FPN CIoU SPM Cluster-NMS 28.6 33.1 35.2 48.8 30.3 31.7 43.6
550 Resnet101-FPN CIoU SPM + Distance Cluster-NMS 27.1 33.2 35.2 49.2 30.2 31.7 43.8
550 Resnet101-FPN CIoU SPM + Distance + Weighted Cluster-NMS 26.5 33.4 35.5 49.1 30.3 31.6 43.8

The following table is evaluated by using their pretrained weight of YOLACT. (yolact_resnet50_54_800000.pth)

Image Size Backbone Loss NMS FPS box AP box AP75 box AR100 mask AP mask AP75 mask AR100
550 Resnet50-FPN SL1 Fast NMS 41.6 30.2 31.9 42.0 28.0 29.1 39.4
550 Resnet50-FPN SL1 Original NMS 12.8 30.7 32.0 44.1 28.1 29.2 40.7
550 Resnet50-FPN SL1 Cluster-NMS 38.2 30.7 32.0 44.1 28.1 29.2 40.7
550 Resnet50-FPN SL1 SPM Cluster-NMS 37.7 31.3 33.2 48.0 28.8 29.9 42.8
550 Resnet50-FPN SL1 SPM + Distance Cluster-NMS 35.2 31.3 33.3 48.2 28.7 29.9 42.9
550 Resnet50-FPN SL1 SPM + Distance + Weighted Cluster-NMS 34.2 31.8 33.9 48.3 28.8 29.9 43.0

The following table is evaluated by using their pretrained weight of YOLACT. (yolact_base_54_800000.pth)

Image Size Backbone Loss NMS FPS box AP box AP75 box AR100 mask AP mask AP75 mask AR100
550 Resnet101-FPN SL1 Fast NMS 30.6 32.5 34.6 43.9 29.8 31.3 40.8
550 Resnet101-FPN SL1 Original NMS 11.9 32.9 34.8 45.8 29.9 31.4 42.1
550 Resnet101-FPN SL1 Cluster-NMS 29.2 32.9 34.8 45.9 29.9 31.4 42.1
550 Resnet101-FPN SL1 SPM Cluster-NMS 28.8 33.5 35.9 49.7 30.5 32.1 44.1
550 Resnet101-FPN SL1 SPM + Distance Cluster-NMS 27.5 33.5 35.9 50.2 30.4 32.0 44.3
550 Resnet101-FPN SL1 SPM + Distance + Weighted Cluster-NMS 26.7 34.0 36.6 49.9 30.5 32.0 44.3

The following table is evaluated by using their pretrained weight of YOLACT++. (yolact_plus_base_54_800000.pth)

Image Size Backbone Loss NMS FPS box AP box AP75 box AR100 mask AP mask AP75 mask AR100
550 Resnet101-FPN SL1 Fast NMS 25.1 35.8 38.7 45.5 34.4 36.8 42.6
550 Resnet101-FPN SL1 Original NMS 10.9 36.4 39.1 48.0 34.7 37.1 44.1
550 Resnet101-FPN SL1 Cluster-NMS 23.7 36.4 39.1 48.0 34.7 37.1 44.1
550 Resnet101-FPN SL1 SPM Cluster-NMS 23.2 36.9 40.1 52.8 35.0 37.5 46.3
550 Resnet101-FPN SL1 SPM + Distance Cluster-NMS 22.0 36.9 40.2 53.0 34.9 37.5 46.3
550 Resnet101-FPN SL1 SPM + Distance + Weighted Cluster-NMS 21.7 37.4 40.6 52.5 35.0 37.6 46.3

Note:

  • Things we did but did not appear in the paper: SPM + Distance + Weighted Cluster-NMS. Here the box coordinate weighted average is only performed in IoU> 0.8. We searched that IoU>0.5 is not good for YOLACT and IoU>0.9 is almost same to SPM + Distance Cluster-NMS. (Refer to CAD for the details of Weighted-NMS.)

  • The Original NMS implemented by YOLACT is faster than ours, because they firstly use a score threshold (0.05) to get the set of candidate boxes, then do NMS will be faster (taking YOLACT ResNet101-FPN as example, 22 ~ 23 FPS with a slight performance drop). In order to get the same result with our Cluster-NMS, we modify the process of Original NMS.

  • Note that Torchvision NMS has the fastest speed, that is owing to CUDA implementation and engineering accelerations (like upper triangular IoU matrix only). However, our Cluster-NMS requires less iterations for NMS and can also be further accelerated by adopting engineering tricks.

  • Currently, Torchvision NMS use IoU as criterion, not DIoU. However, if we directly replace IoU with DIoU in Original NMS, it will costs much more time due to the sequence operation. Now, Cluster-DIoU-NMS will significantly speed up DIoU-NMS and obtain exactly the same result.

  • Torchvision NMS is a function in Torchvision>=0.3, and our Cluster-NMS can be applied to any projects that use low version of Torchvision and other deep learning frameworks as long as it can do matrix operations. No other import, no need to compile, less iteration, fully GPU-accelerated and better performance.

Images

# Display qualitative results on the specified image.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --ima


ge=my_image.png

# Process an image and save it to another file.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png

# Process a whole folder of images.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder

Video

# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
# If you want, use "--display_fps" to draw the FPS directly on the frame.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4

# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0

# Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast!
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4

As you can tell, eval.py can do a ton of stuff. Run the --help command to see everything it can do.

python eval.py --help

Training

By default, we train on COCO. Make sure to download the entire dataset using the commands above.

  • To train, grab an imagenet-pretrained model and put it in ./weights.
    • For Resnet101, download resnet101_reducedfc.pth from here.
    • For Resnet50, download resnet50-19c8e357.pth from here.
    • For Darknet53, download darknet53.pth from here.
  • Run one of the training commands below.
    • Note that you can press ctrl+c while training and it will save an *_interrupt.pth file at the current iteration.
    • All weights are saved in the ./weights directory by default with the file name <config>_<epoch>_<iter>.pth.
# Trains using the base config with a batch size of 8 (the default).
python train.py --config=yolact_base_config

# Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly.
python train.py --config=yolact_base_config --batch_size=5

# Resume training yolact_base with a specific weight file and start from the iteration specified in the weight file's name.
python train.py --config=yolact_base_config --resume=weights/yolact_base_10_32100.pth --start_iter=-1

# Use the help option to see a description of all available command line arguments
python train.py --help

Multi-GPU Support

YOLACT now supports multiple GPUs seamlessly during training:

  • Before running any of the scripts, run: export CUDA_VISIBLE_DEVICES=[gpus]
    • Where you should replace [gpus] with a comma separated list of the index of each GPU you want to use (e.g., 0,1,2,3).
    • You should still do this if only using 1 GPU.
    • You can check the indices of your GPUs with nvidia-smi.
  • Then, simply set the batch size to 8*num_gpus with the training commands above. The training script will automatically scale the hyperparameters to the right values.
    • If you have memory to spare you can increase the batch size further, but keep it a multiple of the number of GPUs you're using.
    • If you want to allocate the images per GPU specific for different GPUs, you can use --batch_alloc=[alloc] where [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to batch_size.

Acknowledgments

Thank you to Daniel Bolya for his fork of YOLACT & YOLACT++, which is an exellent work for real-time instance segmentation.

Comments
  • RuntimeError: CUDA error: device-side assert triggered

    RuntimeError: CUDA error: device-side assert triggered

    Hello author, iouLoss you proposed is very good at first, but when I added it to PyTorch halfway through the training,show: RuntimeError: CUDA Error: Device-side assert triggered. I am looking forward to your answer. Thank you!

    opened by minmingyu 2
  • The measure of aspect ratio in ciou loss

    The measure of aspect ratio in ciou loss

    Hi,

    I am curious about the aspect ratio term of ciou, why you use arctan and don't use w/h directly?

    Have you try w/h + l2 loss directly? Could you provide some details about it?

    Thanks!

    opened by John-Yao 2
  • Request for a basic documentation for NMS inputs

    Request for a basic documentation for NMS inputs

    Hello,

    Thanks for the code of your approach. I am interested in class-specific versions of Cluster NMS and planning to adapt your code into my detection pipeline. So, I need a very basic documentation for the inputs of NMS functions. Excluding the hyperparameters, which seems obvious, the inputs are 1-boxes 2-masks 3-scores

    Firstly, I am working on the detection domain, so I think I can safely ignore masks. Am I correct?

    Secondly can you please provide information about types, sizes and a short description of the inputs: boxes and scores?

    Many thanks.

    Kemal

    opened by kemaloksuz 2
  • No `distance` function

    No `distance` function

    Hi @Zzh-tju , Thanks for your great work of CIoU and Cluster NMS.

    I noted that the Cluster NMS is so impressed. Without any modification in the model, by changing the last NMS, the mAP is improved significantly. However, at https://github.com/Zzh-tju/CIoU/blob/master/layers/functions/detection.py#L3, the file https://github.com/Zzh-tju/CIoU/blob/master/layers/box_utils.py does not have distance function. Could you help me to figure out that?

    Thanks,

    opened by thuyngch 1
  • Implement Error

    Implement Error

    I find CIoU in your code is "cious = iou - u + alpha * v". However, it should be "cious = iou - u - alpha * v" according to your paper.

    opened by WanXiaopei 1
  • About the results

    About the results

    Hi Thanks for your work! The results of yolact and yolact++ where on the readme only tell us the AP、AP75、AR100. Could you tell me all evaluation data about the yolact and yolact++? thanks a lot:)

    opened by yuqing526 0
  • Cluster-NMS

    Cluster-NMS

    I am trying to understand Cluster-NMS operations.

    The mathematical proof seems a bit complicated to follow and comprehend.

    1. Why C1 does not change values ? In other words, why C1 == X ?

    2. How to obtain b1 ?

    3. Why is it Cn = E x X instead of Cn = E x Cn-1 ?

    opened by buttercutter 7
Owner
null
A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Real-time Instance Segmentation and Lane Detection This is a lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look

Jin 4 Dec 30, 2022
Location-Sensitive Visual Recognition with Cross-IOU Loss

The trained models are temporarily unavailable, but you can train the code using reasonable computational resource. Location-Sensitive Visual Recognit

Kaiwen Duan 146 Dec 25, 2022
A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection 1. 介绍 用以替代 NMS,在所有 bbox 中挑选出最优的集合。 NMS 仅考虑了 bbox 的得分,然后根据 IOU 来

null 44 Sep 15, 2022
Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

Tencent YouTu Research 64 Nov 11, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
Yolact-keras实例分割模型在keras当中的实现

Yolact-keras实例分割模型在keras当中的实现 目录 性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料 Reference 性能情况 训练数

Bubbliiiing 11 Dec 26, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

Matterport, Inc 22.5k Jan 4, 2023
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Faster R-CNN and Mask R-CNN in PyTorch 1.0 maskrcnn-benchmark has been deprecated. Please see detectron2, which includes implementations for all model

Facebook Research 9k Jan 4, 2023
Object detection and instance segmentation toolkit based on PaddlePaddle.

Object detection and instance segmentation toolkit based on PaddlePaddle.

null 9.3k Jan 2, 2023
Res2Net for Instance segmentation and Object detection using MaskRCNN

Res2Net for Instance segmentation and Object detection using MaskRCNN Since the MaskRCNN-benchmark of facebook is deprecated, we suggest to use our mm

Res2Net Applications 55 Oct 30, 2022
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 11 Oct 21, 2021
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 5, 2022
An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

Ramin Nakhli 71 Dec 4, 2022
Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

This is the implementation of "Training deep neural networks via direct loss minimization" published at ICML 2016 in PyTorch. The implementation targe

Cuong Nguyen 1 Jan 18, 2022
Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"

PurNet Project for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss" Abstract Image-based salie

Jinming Su 4 Aug 25, 2022