SOTR: Segmenting Objects with Transformers [ICCV 2021]

Last update: Dec 20, 2022

Related tags

Deep Learning SOTR

Overview

SOTR: Segmenting Objects with Transformers [ICCV 2021]

By Ruohao Guo, Dantong Niu, Liao Qu, Zhenbo Li

Introduction

This is the official implementation of SOTR.

Models

COCO Instance Segmentation Baselines with SOTR

Name	mask AP	AP_S	AP_M	AP_L	download
SOTR_R101	40.2	10.2	59.0	73.1	model
SOTR_R101_DCN	42.0	11.4	60.7	74.5	model

Installation & Quick start

First install Detectron2 following the official guide: INSTALL.md.
Then build SOTR with:

https://github.com/easton-cau/SOTR
cd SOTR
python setup.py build develop

Then follow datasets/README.md to set up the datasets (e.g., MS-COCO).

Evaluating

Download the trained models for COCO.

Run the following command

python tools/train_net.py \
    --config-file configs/SOTR/R101.yaml \
    --eval-only \
    --num-gpus 4 \
    MODEL.WEIGHTS work_dir/SOTR_R101/SOTR_R101.pth

Training

Run the following command

python tools/train_net.py \
    --config-file configs/SOTR/R101.yaml \
    --num-gpus 4 \

Acknowledgement

Thanks Detectron2 and AdelaiDet contribution to the community!

The work is supported by National Key R&D Program of China (2020YFD0900204) and Key-Area Research and Development Program of Guangdong Province China (2020B0202010009).

FAQ

If you want to improve the usability or any piece of advice, please feel free to contant directly ([email protected]).

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@misc{guo2021sotr,
      title={SOTR: Segmenting Objects with Transformers}, 
      author={Ruohao Guo and Dantong Niu and Liao Qu and Zhenbo Li},
      year={2021},
      eprint={2108.06747},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Comments

测试和训练的问题
您好，我在运行测试和训练时遇到了几个问题，希望您解答一下。
首先我运行

python tools/train_net.py \ --config-file configs/SOTR/R101.yaml \ --eval-only \ --num-gpus 4 \ MODEL.WEIGHTS work_dir/SOTR_R101/SOTR_R101.pth

得到的结果是：

| AP | AP50 | AP75 | APs | APm | APl | |:------:|:------:|:------:|:------:|:------:|:------:| | 39.730 | 60.303 | 42.707 | 18.045 | 43.414 | 59.794 |

跟您给出的有一定差距，不知道是哪里出了问题。
另外，在运行训练代码时，tools/train_net.py的第52行super(DefaultTrainer, self).__init__(model, data_loader, optimizer)似乎有问题，我把它改成super(Trainer, self).__init__(cfg)问题消失，但是会出现

FloatingPointError: Loss became infinite or NaN at iteration=2! loss_dict = {'loss_ins': nan, 'loss_cate': nan}

我的学习率设置是：

SOLVER: IMS_PER_BATCH: 4 BASE_LR: 0.00001 WARMUP_FACTOR: 0.00001

请问NaN的问题应该如何解决？期待您的回复，谢谢。
opened by Xushibo96 9

Inconsistency of test-dev result

Hi, thanks for your great work!

I tested your pre-trained model (R-101 3x) on test-dev2017 in the coco evaluation server. when I extract only mask results and save to json files, the segmentation score is matched the results reported on the paper. However, when I save mask results with box results (generated by 511~514 lines in sotr.py) to json files, the score (AP_s, AP_m, AP_l) is different from the paper.

This is the result from json file which is consist of only mask information

overall performance
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.612
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.434
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.102
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.731
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.328
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733

This is the result from json file which is consist of mask information with box information

overall performance
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.612
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.434
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.194
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.440
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.552
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.328
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733

As can be seen, (AP_s, AP_m, AP_l) of first result and second result are (0.102, 0.590, 0.731) and (0.194, 0.440, 0.552), respectively. I don't know the detailed process of the coco evaluation server, so I wonder why there is a difference in segmentation ap due to the presence or absence of box information.

opened by tjqansthd 5

关于训练

作者您好，我在集群训练的时候出现问题，希望您能解答一下：

我的环境是： torch == 1.7.1 torchvision == 0.8.2 detectron == 0.2.1

集群显卡使用： 1块显存12G的V100

学习率设置： IMS_PER_BATCH: 2 BASE_LR: 0.00001 WARMUP_FACTOR: 0.00001 报出结果：NAN

学习率设置： IMS_PER_BATCH: 4 BASE_LR: 0.00001 WARMUP_FACTOR: 0.00001 报错结果：CUDA out of memory

请问怎么解决这个问题？

opened by roar-1128 3
about SOTR-RT-736
hi, i can't find the model about SOTR-RT-736 to test pic at high FPS, when i am reshowing your work, can you give me some help?

can you provide the model about SOTR-RT-736?

can you tell me some details about test the model ? thank you very much!
opened by Anglechina 3
super(DefaultTrainer, self).__init__(model, data_loader, optimizer)

super(DefaultTrainer, self).init(model, data_loader, optimizer) TypeError: init() takes 1 positional argument but 4 were given

does anyone know How to solve this problem?

opened by roar-1128 1

TypeError when I try to train

With this command

python tools/train_net.py \
    --config-file configs/SOTR/R101.yaml \
    --num-gpus 1

I get error

File "tools/train_net.py", line 52, in __init__
    super(DefaultTrainer, self).__init__(model, data_loader, optimizer)
TypeError: __init__() takes 1 positional argument but 4 were given

My environment

torch==1.7.1
torchvision==0.9.2
detectron2==0.5

I am not sure what to do with it. Maybe try

detectron2==0.2.1 with torch==1.6

python -m pip install detectron2==0.2.1 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.6/index.html

Can you help me with this?

opened by ChengChen2020 1

demo gpu模式测试

感谢，测试CPU已经通过，耗时大约0.3s，然后自己想测试GPU版本效果的时候发现时间和CPU效果一致，VisualizationDemo初始化的时候已经修改parallel为true，且开启了多次循环测试，发现时间还是和CPU相当。添加日志证明确实是调用的AsyncPredictor，您是否方便给些指导？已确认环境安装无误

opened by Anglechina 1
training

Traceback (most recent call last): File "tools/train_net.py", line 218, in <module> launch( File "/root/miniconda3/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "tools/train_net.py", line 206, in main trainer = Trainer(cfg) File "tools/train_net.py", line 52, in __init__ super(DefaultTrainer, self).__init__(model, data_loader, optimizer) TypeError: __init__() takes 1 positional argument but 4 were given

Excuse me, what is this problem?

opened by 18219716332 0
visualize_data

hello, author! I use the visualize_data.py to visualize the image , but the results of the instance segmentation have the bounding box . However, there is no bounding box in the paper. Can you tell me the reason and how adjust it?

opened by 116022017144 1
PositionalEmbedding?

Hello！I read your code recently，i have same questions. First, if xy_pos_emb_shaped=None, get the tensor is same as input.But in your paper say"Position embeddings are added to the blocks to retain positional information, meaning that the position embedding spaces for the column and row are 1∗N ∗C and N ∗1∗C. "How can i add the position information?Second,different training setting with other models in yaml files,(1024, 2048)?Is this an unfair setting?

opened by yongjiezhu1998 0
Can't find cfg.MODEL.SOTR.FPN_SCALE_RANGES

I encountered the following problem when running SOTR with a new data set. File "/home/xxx/EndoCV2022/detectron2/SOTR/adet/modeling/sotr/sotr.py", line 33, in init self.scale_ranges = cfg.MODEL.SOTR.FPN_SCALE_RANGES

I checked the configuration file R50.yaml and found that there was no such variable. Do you know how to solve it?

opened by YeahHighly 1

Owner

GitHub

AOT (Associating Objects with Transformers) in PyTorch

An efficient modular implementation of Associating Objects with Transformers for Video Object Segmentation in PyTorch

162 Dec 14, 2022

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Created by Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie Zhou

317 Dec 26, 2022

"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Texformer: 3D Human Texture Estimation from a Single Image with Transformers This is the official implementation of "3D Human Texture Estimation from

193 Dec 5, 2022

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

49 Nov 10, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

[ICCV'21] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery This is the official implementation of our ICCV 2021 paper News There maybe some bugs in

73 Nov 30, 2022

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

440 Jan 2, 2023

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

139 Dec 26, 2022

The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

134 Dec 16, 2022

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

728 Dec 28, 2022

SOTR: Segmenting Objects with Transformers [ICCV 2021]

Related tags

Overview

SOTR: Segmenting Objects with Transformers [ICCV 2021]

Introduction

Models

COCO Instance Segmentation Baselines with SOTR

Installation & Quick start

Acknowledgement

FAQ

Citation

Comments

Owner

AOT (Associating Objects with Transformers) in PyTorch

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

[ICCV'21] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Where2Act: From Pixels to Actions for Articulated 3D Objects

Automatically erase objects in the video, such as logo, text, etc.

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

CenterNet:Objects as Points目标检测模型在Pytorch当中的实现

Unadversarial Examples: Designing Objects for Robust Vision

A vision library for performing sliced inference on large images/small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects