Repository of 3D Object Detection with Pointformer (CVPR2021)

Zhuofan Xia

Last update: Jan 6, 2023

Related tags

Deep Learning Pointformer

Overview

3D Object Detection with Pointformer

This repository contains the code for the paper 3D Object Detection with Pointformer (CVPR 2021) [arXiv]. This work is developed on the top of MMDetection3D toolbox and includes the models and results on SUN RGB-D and ScanNet datasets in the paper.

More models results on KITTI and nuScenes datasets will be released soon.

Installation and Usage

The code is developed with MMDetection3D v0.6.1 and works well with v0.14.0.

Dependencies

NVIDIA GPU + CUDA 10.2
Python 3.8 (Recommend to use Anaconda)
PyTorch == 1.8.0
mmcv-full == 1.3.7
mmdet == 2.11.0
mmsegmentation == 0.13.0

Installation

Install dependencies following their guidelines.
Clone and install mmdet3d in develop mode.

git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
python setup.py develop

Add the files in this repo into the directories in mmdet3d.

Training and Testing

Download the pretrained weights from Google Drive or Tsinghua Cloud and put them in the checkpoints folder. Use votenet_ptr_sunrgbd-3d-10class as an example:

# Training
bash -x tools/dist_train.sh configs/pointformer/votenet_ptr_sunrgbd-3d-10class.py 8

# Testing 
bash tools/dist_test.sh configs/pointformer/votenet_ptr_sunrgbd-3d-10class.py checkpoints/votenet_ptr_sunrgbd-3d-10class.pth 8 --eval mAP

Results

SUN RGB-D

classes	AP_0.25	AR_0.25	AP_0.50	AR_0.50
bed	0.8343	0.9515	0.5556	0.7029
table	0.5353	0.8705	0.2344	0.4604
sofa	0.6588	0.9171	0.4979	0.6715
chair	0.7681	0.8700	0.5664	0.6703
toilet	0.9117	0.9931	0.5538	0.7103
desk	0.2458	0.8050	0.0754	0.3395
dresser	0.3626	0.8028	0.2357	0.4908
night_stand	0.6701	0.9020	0.4525	0.6196
bookshelf	0.3383	0.6809	0.0968	0.2624
bathtub	0.7821	0.8980	0.4259	0.5510
Overall	0.6107	0.8691	0.3694	0.5479

ScanNet

classes	AP_0.25	AR_0.25	AP_0.50	AR_0.50
cabinet	0.4548	0.7930	0.1757	0.4435
bed	0.8839	0.9506	0.8006	0.8889
chair	0.9011	0.9386	0.7562	0.8136
sofa	0.8915	0.9794	0.6619	0.8041
table	0.6763	0.8714	0.4858	0.6971
door	0.5413	0.7216	0.2107	0.4283
window	0.4821	0.7021	0.1504	0.2979
bookshelf	0.5255	0.8701	0.4422	0.7273
picture	0.1815	0.3649	0.0748	0.1351
counter	0.6210	0.8654	0.2333	0.3846
desk	0.6859	0.9370	0.3774	0.6535
curtain	0.5522	0.7910	0.3156	0.4627
refrigerator	0.5215	0.9649	0.4028	0.7193
showercurtrain	0.6709	0.9643	0.1941	0.5000
toilet	0.9922	1.0000	0.8210	0.8793
sink	0.6361	0.7347	0.4119	0.5000
bathtub	0.8710	0.8710	0.8375	0.8387
garbagebin	0.4762	0.7264	0.2244	0.4604
Overall	0.6425	0.8359	0.4209	0.5908

For more details of experimetns please refer to the paper.

Acknowledgement

This code is based on MMDetection3D.

Citation

If you find our work is useful in your research, please consider citing:

@InProceedings{Pan_2021_CVPR,
    author    = {Pan, Xuran and Xia, Zhuofan and Song, Shiji and Li, Li Erran and Huang, Gao},
    title     = {3D Object Detection With Pointformer},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {7463-7472}
}

@misc{pan20203d,
  title={3D Object Detection with Pointformer}, 
  author={Xuran Pan and Zhuofan Xia and Shiji Song and Li Erran Li and Gao Huang},
  year={2020},
  eprint={2012.11409},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Comments

A quick question about the model config files

Hello authors,

I really appreciate this work, and thanks for your efforts in releasing the code.

I wonder if you could help me understand one of the model setups in your config file.

From this line, it appears that the Local-Global Transformer (i.e., decoder) is not used in your models for indoor object detection. Could you please confirm if this is the correct setting?

Thank you!

opened by azshue 2
Regarding a comment issue in the code 'LocalTransformer', personally I think the comment may be wrong.

output_features = F.max_pool2d(transformed_feats, kernel_size=[1, ns]) # (B, C, npoint) 在论文当中，关于LocalTransformer作者提到是针对每个local region进行maxpool操作。这样一来，关于上面这行代码，不应该改为output_features = F.max_pool2d(transformed_feats, kernel_size=[1, np])吗？前面的维度变换代码也需要有相应的更改呢？如有错误，烦请批评指正，十分感谢！

opened by zbaishancha 0
Can't get reported preformance based on the latest mmdetection3d

Hi,

Thanks for your excellent work!

I add your code into the latest mmdetection3d repo, but I could not get a similar result as you reported in the paper. I am wondering which version of mmdetection3d are you used, and why the latest one could not work?

opened by xiaobaishu0097 2
Request of Code and Training Logs on nuScenes Dataset

Hi! Dear Pointformer Authors from Gao Huang's lab,

Congratulations that your paper has been published on CVPR21. It is a very inspiring work for me! It is also cheering that you have released your code! I am a PhD student from Nankai University, and am interested in your work. I spent a lot of time on searching the code and result of Pointformer on the nuScenes dataset. However, I do not find anything about it.

As described in your paper, you implement your method using the OpenPCDet toolbox. Could you send a code copy and the corresponding training log on nuScenes dataset to me? It will be very useful for me to reproduce your work!

Many thanks, Yu-Huan Wu

opened by yuhuan-wu 2
LinformerEncoder Layer: No Linearisation?

Hello,

Sorry if this is a silly question, but looking at your code in ptr_base.py line 90 the LinformerEncoder layer doesn't seem to be implementing linear attention at all; what it seems to be doing instead is just performing regular multi-head attention. Is this the case, and if not where for the LinformerEncoder layers does the linearisation take place?

Thanks,

Josh

opened by JBKnights 1

Owner

Zhuofan Xia

GitHub

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

217 Jan 3, 2023

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

End-to-End Object Detection with Learnable Proposal, CVPR2021

1.2k Dec 27, 2022

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

3 Jan 26, 2022

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

MOT-Tracking-by-Detection-Pipeline Tracking-by-Detection形式のMOT(Multi Object Trac

41 Nov 23, 2022

This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

Intro This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales Vehicle Sam

39 Jul 21, 2022

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

115 Dec 23, 2022

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

115 Dec 23, 2022

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

305 Dec 16, 2022

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

443 Dec 6, 2022

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on the combined output candidates of any 3D and any 2D detector, and is trained to produce more accurate 3D and 2D detection results.

254 Dec 16, 2022

Object Detection and Multi-Object Tracking

1.6k Jan 4, 2023

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

1 Dec 29, 2021

Auto-Lama combines object detection and image inpainting to automate object removals

Auto-Lama Auto-Lama combines object detection and image inpainting to automate object removals. It is build on top of DE:TR from Facebook Research and

44 Dec 9, 2022

Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

51 Nov 17, 2022

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

OpenVINO Inference API This is a repository for an object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operati

68 Nov 24, 2022

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

BMW-Anonymization-Api Data privacy and individuals’ anonymity are and always have been a major concern for data-driven companies. Therefore, we design

148 Dec 21, 2022

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

114 Nov 28, 2022

Repository of 3D Object Detection with Pointformer (CVPR2021)

Related tags

Overview

3D Object Detection with Pointformer

Installation and Usage

Dependencies

Installation

Training and Testing

Results

Acknowledgement

Citation

Comments

A quick question about the model config files

Regarding a comment issue in the code 'LocalTransformer', personally I think the comment may be wrong.

Can't get reported preformance based on the latest mmdetection3d

Request of Code and Training Logs on nuScenes Dataset

LinformerEncoder Layer: No Linearisation?

Owner

Zhuofan Xia

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Yolo object detection - Yolo object detection with python

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

Object Detection and Multi-Object Tracking

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Auto-Lama combines object detection and image inpainting to automate object removals

Repository to run object detection on a model trained on an autonomous driving dataset.

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)