SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.

Overview

SOLO: Segmenting Objects by Locations

This project hosts the code for implementing the SOLO algorithms for instance segmentation.

SOLO: Segmenting Objects by Locations,
Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, Lei Li
In: Proc. European Conference on Computer Vision (ECCV), 2020
arXiv preprint (arXiv 1912.04488)

SOLOv2: Dynamic and Fast Instance Segmentation,
Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, Chunhua Shen
In: Proc. Advances in Neural Information Processing Systems (NeurIPS), 2020
arXiv preprint (arXiv 2003.10152)

highlights

Highlights

  • Totally box-free: SOLO is totally box-free thus not being restricted by (anchor) box locations and scales, and naturally benefits from the inherent advantages of FCNs.
  • Direct instance segmentation: Our method takes an image as input, directly outputs instance masks and corresponding class probabilities, in a fully convolutional, box-free and grouping-free paradigm.
  • High-quality mask prediction: SOLOv2 is able to predict fine and detailed masks, especially at object boundaries.
  • State-of-the-art performance: Our best single model based on ResNet-101 and deformable convolutions achieves 41.7% in AP on COCO test-dev (without multi-scale testing). A light-weight version of SOLOv2 executes at 31.3 FPS on a single V100 GPU and yields 37.1% AP.

Updates

  • SOLOv2 implemented on detectron2 is released at adet. (07/12/20)
  • Training speeds up (~1.7x faster) for all models. (03/12/20)
  • SOLOv2 is available. Code and trained models of SOLOv2 are released. (08/07/2020)
  • Light-weight models and R101-based models are available. (31/03/2020)
  • SOLOv1 is available. Code and trained models of SOLO and Decoupled SOLO are released. (28/03/2020)

Installation

This implementation is based on mmdetection(v1.0.0). Please refer to INSTALL.md for installation and dataset preparation.

Models

For your convenience, we provide the following trained models on COCO (more models are coming soon). If you need the models in PaddlePaddle framework, please refer to paddlepaddle/README.md.

Model Multi-scale training Testing time / im AP (minival) Link
SOLO_R50_1x No 77ms 32.9 download
SOLO_R50_3x Yes 77ms 35.8 download
SOLO_R101_3x Yes 86ms 37.1 download
Decoupled_SOLO_R50_1x No 85ms 33.9 download
Decoupled_SOLO_R50_3x Yes 85ms 36.4 download
Decoupled_SOLO_R101_3x Yes 92ms 37.9 download
SOLOv2_R50_1x No 54ms 34.8 download
SOLOv2_R50_3x Yes 54ms 37.5 download
SOLOv2_R101_3x Yes 66ms 39.1 download
SOLOv2_R101_DCN_3x Yes 97ms 41.4 download
SOLOv2_X101_DCN_3x Yes 169ms 42.4 download

Light-weight models:

Model Multi-scale training Testing time / im AP (minival) Link
Decoupled_SOLO_Light_R50_3x Yes 29ms 33.0 download
Decoupled_SOLO_Light_DCN_R50_3x Yes 36ms 35.0 download
SOLOv2_Light_448_R18_3x Yes 19ms 29.6 download
SOLOv2_Light_448_R34_3x Yes 20ms 32.0 download
SOLOv2_Light_448_R50_3x Yes 24ms 33.7 download
SOLOv2_Light_512_DCN_R50_3x Yes 34ms 36.4 download

Disclaimer:

  • Light-weight means light-weight backbone, head and smaller input size. Please refer to the corresponding config files for details.
  • This is a reimplementation and the numbers are slightly different from our original paper (within 0.3% in mask AP).

Usage

A quick demo

Once the installation is done, you can download the provided models and use inference_demo.py to run a quick demo.

Train with multiple GPUs

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}

Example: 
./tools/dist_train.sh configs/solo/solo_r50_fpn_8gpu_1x.py  8

Train with single GPU

python tools/train.py ${CONFIG_FILE}

Example:
python tools/train.py configs/solo/solo_r50_fpn_8gpu_1x.py

Testing

# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  --show --out  ${OUTPUT_FILE} --eval segm

Example: 
./tools/dist_test.sh configs/solo/solo_r50_fpn_8gpu_1x.py SOLO_R50_1x.pth  8  --show --out results_solo.pkl --eval segm

# single-gpu testing
python tools/test_ins.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --show --out  ${OUTPUT_FILE} --eval segm

Example: 
python tools/test_ins.py configs/solo/solo_r50_fpn_8gpu_1x.py  SOLO_R50_1x.pth --show --out  results_solo.pkl --eval segm

Visualization

python tools/test_ins_vis.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --show --save_dir  ${SAVE_DIR}

Example: 
python tools/test_ins_vis.py configs/solo/solo_r50_fpn_8gpu_1x.py  SOLO_R50_1x.pth --show --save_dir  work_dirs/vis_solo

Contributing to the project

Any pull requests or issues are welcome.

Citations

Please consider citing our papers in your publications if the project helps your research. BibTeX reference is as follows.

@inproceedings{wang2020solo,
  title     =  {{SOLO}: Segmenting Objects by Locations},
  author    =  {Wang, Xinlong and Kong, Tao and Shen, Chunhua and Jiang, Yuning and Li, Lei},
  booktitle =  {Proc. Eur. Conf. Computer Vision (ECCV)},
  year      =  {2020}
}

@article{wang2020solov2,
  title={SOLOv2: Dynamic and Fast Instance Segmentation},
  author={Wang, Xinlong and Zhang, Rufeng and  Kong, Tao and Li, Lei and Shen, Chunhua},
  journal={Proc. Advances in Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

License

For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact Xinlong Wang and Chunhua Shen.

Comments
  • bad results on cityscape datasets

    bad results on cityscape datasets

    I have trained SOLO with cityscape datasets, without modifying any settings except for datasets part. Here is my result. Obviously, solo performs very badly on the cityscape. Maybe I need to tune some hyper-parameters. Do you have any cues?

     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.075
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.157
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.068
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.063
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.166
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.096
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.158
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.168
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.008
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.142
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.335
    
    opened by TengFeiHan0 27
  • Config for evaluating a custom dataset

    Config for evaluating a custom dataset

    Dear @WXinlong ,

    After training on a custom dataset (inherited from the CocoDataset, just like the way in the cityscapes.py), the visualization looks great, but mAP = 0. I see in the test_insts.py coco_eval(result_file, eval_types, dataset.coco)

    Is it the right way?

    opened by edwardnguyen1705 7
  • How is the dynamic head more efficient than the vanilla head?

    How is the dynamic head more efficient than the vanilla head?

    I read your papers on SOLO and SOLOv2. One part that I didn't understand was that you said the dynamic version is more efficient than the vanilla version. How is efficiency measured here? If it's measured by computation (i.e. MACC/FLOP), then since you eventually generate the HxWxS^2 feature map, isn't the efficiency similar? If it's measure by parameters, than the dynamic version is indeed more efficient.

    opened by cmsflash 6
  • RuntimeError: CUDA out of memory occurred when testing

    RuntimeError: CUDA out of memory occurred when testing

    command

    python tools/test_ins.py configs/solov2/solov2_light_448_r34_fpn_8gpu_3x.py  work_dirs/solov2_light_release_r34_fpn_8gpu_3x/epoch_36.pth --show --out  results_solo.pkl
     --eval segm
    

    bug [>>>>>>>>>>>>> ] 20/76, 0.3 task/s, elapsed: 59s, ETA: 165s Traceback (most recent call last): ... RuntimeError: CUDA out of memory. Tried to allocate 3.30 GiB (GPU 0; 8.00 GiB total capacity; 973.14 MiB already allocated; 2.13 GiB free; 3.74 GiB reserved in total by PyTorch) Then I shrinked my test set to 14 images, same error occurred when [>> ] 2/14.

    Environment python 3.7 CUDA 11.1 PyTorch 1.7.0+cu110

    Supplement The epoch_36.pth file is generated from the training on my own dataset. And performed pretty good when single-tested by inference_demo.py but fail with this batch-test command.

    opened by zhuaiyi 5
  • from . import nms_cpu, nms_cuda from .soft_nms_cpu import soft_nms_cpu

    from . import nms_cpu, nms_cuda from .soft_nms_cpu import soft_nms_cpu

    from . import nms_cpu, nms_cuda from .soft_nms_cpu import soft_nms_cpu

    help

    git clone https://github.com/WXinlong/SOLO.git cd SOLO pip install -r requirements/build.txt pip install "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI" pip install -v -e .

    I have successfully installed these command,however, there are no nms_cpu, nms_cuda or soft_nms_cpu.

    Backend: pytorch1.1.0

    opened by Shanyaodedanshen 4
  • Matrix-NMS implementation

    Matrix-NMS implementation

    1. How does matrix-NMS eliminate the sequential processing nature of NMS ?

    2. What is the difference in terms of physical meaning or interpretation between sum_masks and seg_masks ?

    3. Why is intersection defined as inter_matrix = torch.mm(seg_masks, seg_masks.transpose(1, 0)) ?

    4. Why iou_matrix = (inter_matrix / (sum_masks_x + sum_masks_x.transpose(1, 0) - inter_matrix)).triu(diagonal=1) uses triu(diagonal=1) and transposed of sum_masks_x ?

    opened by buttercutter 4
  • About Unified mask feature branch

    About Unified mask feature branch

    Hi, I just read the SOLOv2 paper and found it very inspiring. But I got a question when I read the code

    In mmdet/models/anchor_heads/solov2_head.py:

        def forward(self, feats, eval=False):
            new_feats = self.split_feats(feats)
            featmap_sizes = [featmap.size()[-2:] for featmap in new_feats]
            upsampled_size = (featmap_sizes[0][0] * 2, featmap_sizes[0][1] * 2)
            cate_pred, kernel_pred = multi_apply(self.forward_single, new_feats,
                                                           list(range(len(self.seg_num_grids))),
                                                           eval=eval, upsampled_size=upsampled_size)
            return cate_pred, kernel_pred
    

    It seems that the code still uses different mask encoders and grid numbers for feature of each FPN level. Is the Unified mask feature branch function mentioned in the SOLOv2 paper included in the code? Or are there any settings related?

    Thank you!

    opened by ntuLC 4
  • CUDA out of memory when trying to inference SOLOv2

    CUDA out of memory when trying to inference SOLOv2

    I tried to run inference of SOLOv2 using the command python tools/test.py configs/solov2/solov2_r101_dcn_fpn_8gpu_3x.py model_zoo/solov2_r101_dcn_fpn_8gpu_3x.pth --eval segm --json_out coco_json/solov2_r101_dcn_fpn_8gpu_3x_coco_val_results but I got CUDA out of memory error after evaluating around 1000 images.

    Has anyone encountered this problem as well?

    opened by bowenc0221 4
  • build docker image error

    build docker image error

    Step 12/12 : RUN pip install --no-cache-dir -e .
     ---> Running in 10680a3a2218
    Obtaining file:///SOLO
        ERROR: Command errored out with exit status 1:
         command: /opt/conda/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SOLO/setup.py'"'"'; __file__='"'"'/SOLO/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
             cwd: /SOLO/
        Complete output (8 lines):
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/SOLO/setup.py", line 251, in <module>
            sources=['src/compiling_info.cpp']),
          File "/SOLO/setup.py", line 103, in make_cuda_ext
            raise EnvironmentError('CUDA is required to compile MMDetection!')
        OSError: CUDA is required to compile MMDetection!
        No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
        ----------------------------------------
    ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
    
    opened by JiageWang 4
  • Question on SOLOv2

    Question on SOLOv2

    Hello,

    If my understanding is correct, I see the SOLOv2 only uses a single combined feature map (Fig3 in the paper), which is different from SOLOv1 that uses multi-scale FPN features.

    Is the proposed dynamic head applied to this single combined feature map?

    Thanks

    opened by shwoo93 4
  • Question about training a model

    Question about training a model

    When I try to train this model, this error happens.

    ../mmdet/models/anchor_heads/solo_head.py", line 203, in loss num_ins = flatten_ins_ind_labels.sum() RuntimeError: "sum_cuda" not implemented for 'Bool'

    i have run 'python setup.py build_ext --inplace'

    opened by Fly-dream12 4
  • SOLOv2 get segmentation mask and classes

    SOLOv2 get segmentation mask and classes

    Hi,

    Solov2 is a very powerful tool for instance segmentation. I am trying to experiment with its output but this implementation uses mmdetection APIs which output a bunch of images showing instances on them. Is there a way to get the actual output from the model, i.e. the class and segmentation masks? I want to pass an image to the model at inference and get the raw output from the model. Any help is greatly appreciated.

    opened by williamcfrancis 0
  • dynamic conv

    dynamic conv

    @WXinlong ,Hello, thank you very much for your excellent open source project. When I read your code, I still can't find where your code for dynamically performing convolution is. Forgive my stupidity, can you enlighten me.

    opened by wafaer 0
  • TypeError: SOLOv2: __init__() got an unexpected keyword argument 'mask_feat_head'

    TypeError: SOLOv2: __init__() got an unexpected keyword argument 'mask_feat_head'

    I want to try a simple training on my own datasets, but got this error, how to solve it, please?

    ___Traceback (most recent call last): File "c:\users\echo\mmcv\mmcv\utils\registry.py", line 69, in build_from_cfg return obj_cls(**args) TypeError: init() got an unexpected keyword argument 'mask_feat_head'

    During handling of the above exception, another exception occurred:_

    Traceback (most recent call last): File "E:\DeepLearning\Pytorch\Projects\SOLO-master\SOLO-master\tools\train.py", line 127, in main() File "E:\DeepLearning\Pytorch\Projects\SOLO-master\SOLO-master\tools\train.py", line 100, in main model = build_detector( File "D:\Anaconda\Anaconda3\envs\mmlab\lib\site-packages\mmdet\models\builder.py", line 58, in build_detector return DETECTORS.build( File "c:\users\echo\mmcv\mmcv\utils\registry.py", line 237, in build return self.build_func(*args, **kwargs, registry=self) File "c:\users\echo\mmcv\mmcv\cnn\builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "c:\users\echo\mmcv\mmcv\utils\registry.py", line 72, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: SOLOv2: init() got an unexpected keyword argument 'mask_feat_head'__

    opened by ZHANGH83 1
  • About training configuration of SOLOv2 on LVIS dataset

    About training configuration of SOLOv2 on LVIS dataset

    Hi Xinlong, could you provide your solov2 training config file on LVIS dataset? I met some problem when reimplementing the results of solov2 on LVIS v0.5 / v1 dataset. So I want to use the official config as reference to retry my experiment.

    opened by DongSky 0
  • Change Maximum allowed Pytorch Version to 10.1

    Change Maximum allowed Pytorch Version to 10.1

    This is a request to specify that the maximum allowed version for Pytorch should be 1.10.1 or 1.10.2.

    Motivation After Pytorch 1.11, there were some changes to the CPP code for the THC.h headers to be now part of ATen.h. Knowing this, Pytorch 1.10.1 (no matter if CUDA 10..2 or 11.3), should be the highest allowed Pytorch version unless the CUDA source files for the following source files are updated:

    mmdet
    ├── ops
    │     ├── nms/src/
    │     │     ├── nms_kernel.cu
    │     ├── roi_align/src/
    │     │     ├── roi_align_kernel.cu
    │     ├── roi_pool/src/
    │     │     ├── roi_pool_kernel.cu
    │     ├── dcn/src/
    │     │     ├── deform_conv_cuda_kernel.cu
    │     │     ├── deform_pool_cuda_kernel.cu
    │     ├── sigmoid_focal_loss/src/
    │     │     ├── sigmoid_focal_loss_cuda.cu
    │     ├── masked_conv/src/
    │     │     ├── masked_conv2d_kernel.cu
    
    
    opened by manueldiaz96 0
Owner
Xinlong Wang
Xinlong Wang
[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

Stable Head Pose Estimation and Landmark Regression via 3D Dense Face Reconstruction Reimplementation of (ECCV 2020) Towards Fast, Accurate and Stable

Remilia Scarlet 221 Dec 30, 2022
1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

Instead, two models for appearance modeling are included, together with the open-source BAGS model and the full set of code for inference. With this code, you can achieve around mAP@23 with TAO test set (based on our estimation).

null 79 Oct 8, 2022
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

null 42 Nov 24, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Code for ECCV 2020 paper "Contacts and Human Dynamics from Monocular Video".

Contact and Human Dynamics from Monocular Video This is the official implementation for the ECCV 2020 spotlight paper by Davis Rempe, Leonidas J. Guib

Davis Rempe 207 Jan 5, 2023
git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

USD-Seg This project is an implement of paper USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation, based on FCOS detector f

Ruolin Ye 80 Nov 28, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022
git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction Code for the ECCV 2020 paper by Yiming Qian and Yasutaka Furukawa Getting

null 37 Dec 4, 2022
Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

PWLQ Updates 2020/07/16 - We are working on getting permission from our institution to release our source code. We will release it once we are granted

null 54 Dec 15, 2022
dataset for ECCV 2020 "Motion Capture from Internet Videos"

Motion Capture from Internet Videos Motion Capture from Internet Videos Junting Dong*, Qing Shuai*, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao

ZJU3DV 98 Dec 7, 2022
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022
SNE-RoadSeg in PyTorch, ECCV 2020

SNE-RoadSeg Introduction This is the official PyTorch implementation of SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentati

null 242 Dec 20, 2022
[ECCV 2020] Gradient-Induced Co-Saliency Detection

Gradient-Induced Co-Saliency Detection Zhao Zhang*, Wenda Jin*, Jun Xu, Ming-Ming Cheng ⭐ Project Home » The official repo of the ECCV 2020 paper Grad

Zhao Zhang 35 Nov 25, 2022
Code for Towards Streaming Perception (ECCV 2020) :car:

sAP — Code for Towards Streaming Perception ECCV Best Paper Honorable Mention Award Feb 2021: Announcing the Streaming Perception Challenge (CVPR 2021

Martin Li 85 Dec 22, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Chuang Gan 30 Nov 3, 2022
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Kayo Yin 107 Dec 27, 2022