SOTR: Segmenting Objects with Transformers [ICCV 2021]

Related tags

Deep Learning SOTR
Overview

SOTR: Segmenting Objects with Transformers [ICCV 2021]

By Ruohao Guo, Dantong Niu, Liao Qu, Zhenbo Li

Introduction

This is the official implementation of SOTR.

image

Models

COCO Instance Segmentation Baselines with SOTR

Name mask AP APS APM APL download
SOTR_R101 40.2 10.2 59.0 73.1 model
SOTR_R101_DCN 42.0 11.4 60.7 74.5 model

Installation & Quick start

  • First install Detectron2 following the official guide: INSTALL.md.

  • Then build SOTR with:

https://github.com/easton-cau/SOTR
cd SOTR
python setup.py build develop
  • Then follow datasets/README.md to set up the datasets (e.g., MS-COCO).

  • Evaluating

    • Download the trained models for COCO.

    • Run the following command

      python tools/train_net.py \
          --config-file configs/SOTR/R101.yaml \
          --eval-only \
          --num-gpus 4 \
          MODEL.WEIGHTS work_dir/SOTR_R101/SOTR_R101.pth
      
  • Training

    • Run the following command

      python tools/train_net.py \
          --config-file configs/SOTR/R101.yaml \
          --num-gpus 4 \
      

Acknowledgement

Thanks Detectron2 and AdelaiDet contribution to the community!

The work is supported by National Key R&D Program of China (2020YFD0900204) and Key-Area Research and Development Program of Guangdong Province China (2020B0202010009).

FAQ

If you want to improve the usability or any piece of advice, please feel free to contant directly ([email protected]).

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@misc{guo2021sotr,
      title={SOTR: Segmenting Objects with Transformers}, 
      author={Ruohao Guo and Dantong Niu and Liao Qu and Zhenbo Li},
      year={2021},
      eprint={2108.06747},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • 测试和训练的问题

    测试和训练的问题

    您好,我在运行测试和训练时遇到了几个问题,希望您解答一下。
    首先我运行

    python tools/train_net.py \
        --config-file configs/SOTR/R101.yaml \
        --eval-only \
        --num-gpus 4 \
        MODEL.WEIGHTS work_dir/SOTR_R101/SOTR_R101.pth
    
    

    得到的结果是:

    |   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
    |:------:|:------:|:------:|:------:|:------:|:------:|
    | 39.730 | 60.303 | 42.707 | 18.045 | 43.414 | 59.794 |
    
    

    跟您给出的有一定差距,不知道是哪里出了问题。
    另外,在运行训练代码时,tools/train_net.py的第52行super(DefaultTrainer, self).__init__(model, data_loader, optimizer)似乎有问题,我把它改成super(Trainer, self).__init__(cfg)问题消失,但是会出现

    FloatingPointError: Loss became infinite or NaN at iteration=2!
    loss_dict = {'loss_ins': nan, 'loss_cate': nan}
    

    我的学习率设置是:

    SOLVER:
      IMS_PER_BATCH: 4
      BASE_LR: 0.00001
      WARMUP_FACTOR: 0.00001
    

    请问NaN的问题应该如何解决? 期待您的回复,谢谢。

    opened by Xushibo96 9
  • Inconsistency of test-dev result

    Inconsistency of test-dev result

    Hi, thanks for your great work!

    I tested your pre-trained model (R-101 3x) on test-dev2017 in the coco evaluation server. when I extract only mask results and save to json files, the segmentation score is matched the results reported on the paper. However, when I save mask results with box results (generated by 511~514 lines in sotr.py) to json files, the score (AP_s, AP_m, AP_l) is different from the paper.

    This is the result from json file which is consist of only mask information

    overall performance
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.612
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.434
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.102
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.731
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.328
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.536
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733
    

    This is the result from json file which is consist of mask information with box information

    overall performance
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.612
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.434
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.194
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.440
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.552
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.328
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.536
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733
    

    As can be seen, (AP_s, AP_m, AP_l) of first result and second result are (0.102, 0.590, 0.731) and (0.194, 0.440, 0.552), respectively. I don't know the detailed process of the coco evaluation server, so I wonder why there is a difference in segmentation ap due to the presence or absence of box information.

    opened by tjqansthd 5
  • 关于训练

    关于训练

    作者您好,我在集群训练的时候出现问题,希望您能解答一下:

    我的环境是: torch == 1.7.1 torchvision == 0.8.2 detectron == 0.2.1

    集群显卡使用: 1块显存12G的V100

    学习率设置: IMS_PER_BATCH: 2 BASE_LR: 0.00001 WARMUP_FACTOR: 0.00001 报出结果:NAN

    学习率设置: IMS_PER_BATCH: 4 BASE_LR: 0.00001 WARMUP_FACTOR: 0.00001 报错结果:CUDA out of memory

    请问怎么解决这个问题?

    opened by roar-1128 3
  • about SOTR-RT-736

    about SOTR-RT-736

    hi, i can't find the model about SOTR-RT-736 to test pic at high FPS, when i am reshowing your work, can you give me some help?

    1. can you provide the model about SOTR-RT-736?
    2. can you tell me some details about test the model ? thank you very much!
    opened by Anglechina 3
  •  super(DefaultTrainer, self).__init__(model, data_loader, optimizer)

    super(DefaultTrainer, self).__init__(model, data_loader, optimizer)

    super(DefaultTrainer, self).init(model, data_loader, optimizer) TypeError: init() takes 1 positional argument but 4 were given

    does anyone know How to solve this problem?

    opened by roar-1128 1
  • TypeError when I try to train

    TypeError when I try to train

    With this command

    python tools/train_net.py \
        --config-file configs/SOTR/R101.yaml \
        --num-gpus 1
    

    I get error

    File "tools/train_net.py", line 52, in __init__
        super(DefaultTrainer, self).__init__(model, data_loader, optimizer)
    TypeError: __init__() takes 1 positional argument but 4 were given
    

    My environment

    torch==1.7.1
    torchvision==0.9.2
    detectron2==0.5
    

    I am not sure what to do with it. Maybe try

    detectron2==0.2.1 with torch==1.6
    
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.6/index.html
    
    

    Can you help me with this?

    opened by ChengChen2020 1
  • demo gpu模式测试

    demo gpu模式测试

    感谢,测试CPU已经通过,耗时大约0.3s,然后自己想测试GPU版本效果的时候发现时间和CPU效果一致,VisualizationDemo初始化的时候已经修改parallel为true,且开启了多次循环测试,发现时间还是和CPU相当。添加日志证明确实是调用的AsyncPredictor,您是否方便给些指导?已确认环境安装无误

    opened by Anglechina 1
  • training

    training

    Traceback (most recent call last): File "tools/train_net.py", line 218, in <module> launch( File "/root/miniconda3/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "tools/train_net.py", line 206, in main trainer = Trainer(cfg) File "tools/train_net.py", line 52, in __init__ super(DefaultTrainer, self).__init__(model, data_loader, optimizer) TypeError: __init__() takes 1 positional argument but 4 were given

    Excuse me, what is this problem?

    opened by 18219716332 0
  • visualize_data

    visualize_data

    hello, author! I use the visualize_data.py to visualize the image , but the results of the instance segmentation have the bounding box . However, there is no bounding box in the paper. Can you tell me the reason and how adjust it?

    opened by 116022017144 1
  • PositionalEmbedding?

    PositionalEmbedding?

    Hello!I read your code recently,i have same questions. First, if xy_pos_emb_shaped=None, get the tensor is same as input.But in your paper say"Position embeddings are added to the blocks to retain positional information, meaning that the position embedding spaces for the column and row are 1∗N ∗C and N ∗1∗C. "How can i add the position information?Second,different training setting with other models in yaml files,(1024, 2048)?Is this an unfair setting?

    opened by yongjiezhu1998 0
  • Can't find cfg.MODEL.SOTR.FPN_SCALE_RANGES

    Can't find cfg.MODEL.SOTR.FPN_SCALE_RANGES

    I encountered the following problem when running SOTR with a new data set. File "/home/xxx/EndoCV2022/detectron2/SOTR/adet/modeling/sotr/sotr.py", line 33, in init self.scale_ranges = cfg.MODEL.SOTR.FPN_SCALE_RANGES

    I checked the configuration file R50.yaml and found that there was no such variable. Do you know how to solve it?

    opened by YeahHighly 1
Owner
null
AOT (Associating Objects with Transformers) in PyTorch

An efficient modular implementation of Associating Objects with Transformers for Video Object Segmentation in PyTorch

null 162 Dec 14, 2022
[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Created by Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie Zhou

Xumin Yu 317 Dec 26, 2022
"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Texformer: 3D Human Texture Estimation from a Single Image with Transformers This is the official implementation of "3D Human Texture Estimation from

XiangyuXu 193 Dec 5, 2022
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

null 77 Dec 27, 2022
Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

Ren Tianhe 49 Nov 10, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
[ICCV'21] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery This is the official implementation of our ICCV 2021 paper News There maybe some bugs in

null 73 Nov 30, 2022
Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

QData 440 Jan 2, 2023
DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

Denys Rozumnyi 139 Dec 26, 2022
The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

Aljaz Bozic 134 Dec 16, 2022
This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

Erika Lu 728 Dec 28, 2022
Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Consistent Depth of Moving Objects in Video This repository contains training code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in

Google 203 Jan 5, 2023
Where2Act: From Pixels to Actions for Articulated 3D Objects

Where2Act: From Pixels to Actions for Articulated 3D Objects The Proposed Where2Act Task. Given as input an articulated 3D object, we learn to propose

Kaichun Mo 69 Nov 28, 2022
Automatically erase objects in the video, such as logo, text, etc.

Video-Auto-Wipe Read English Introduction:Here   本人不定期的基于生成技术制作一些好玩有趣的算法模型,这次带来的作品是“视频擦除”方向的应用模型,它实现的功能是自动感知到视频中我们不想看见的部分(譬如广告、水印、字幕、图标等等)然后进行擦除。由于图标擦

seeprettyface.com 141 Dec 26, 2022
Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

Tencent YouTu Research 64 Nov 11, 2022
CenterNet:Objects as Points目标检测模型在Pytorch当中的实现

CenterNet:Objects as Points目标检测模型在Pytorch当中的实现

Bubbliiiing 267 Dec 29, 2022
Unadversarial Examples: Designing Objects for Robust Vision

Unadversarial Examples: Designing Objects for Robust Vision This repository contains the code necessary to replicate the major results of our paper: U

Microsoft 93 Nov 28, 2022
A vision library for performing sliced inference on large images/small objects

SAHI: Slicing Aided Hyper Inference A vision library for performing sliced inference on large images/small objects Overview Object detection and insta

Open Business Software Solutions 2.3k Jan 4, 2023
Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

TheSys Group @ CMU CS 78 Jan 7, 2023