Group-Free 3D Object Detection via Transformers

Overview

Group-Free 3D Object Detection via Transformers

By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong.

This repo is the official implementation of "Group-Free 3D Object Detection via Transformers".

teaser

Updates

  • April 01, 2021: initial release.

Introduction

Recently, directly detecting 3D objects from 3D point clouds has received increasing attention. To extract object representation from an irregular point cloud, existing methods usually take a point grouping step to assign the points to an object candidate so that a PointNet-like network could be used to derive object features from the grouped points. However, the inaccurate point assignments caused by the hand-crafted grouping scheme decrease the performance of 3D object detection. In this paper, we present a simple yet effective method for directly detecting 3D objects from the 3D point cloud. Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers, where the contribution of each point is automatically learned in the network training. With an improved attention stacking scheme, our method fuses object features in different stages and generates more accurate object detection results. With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D.

In this repository, we provide model implementation (with Pytorch) as well as data preparation, training and evaluation scripts on ScanNet and SUN RGB-D.

Citation

@article{liu2021,
  title={Group-Free 3D Object Detection via Transformers},
  author={Liu, Ze and Zhang, Zheng and Cao, Yue and Hu, Han and Tong, Xin},
  journal={arXiv preprint arXiv:2104.00678},
  year={2021}
}

Main Results

ScanNet V2

Method backbone [email protected] [email protected] Model
HGNet GU-net 61.3 34.4 -
GSDN MinkNet 62.8 34.8 waiting for release
3D-MPA MinkNet 64.2 49.2 waiting for release
VoteNet PointNet++ 62.9 39.9 official repo
MLCVNet PointNet++ 64.5 41.4 official repo
H3DNet PointNet++ 64.4 43.4 official repo
H3DNet 4xPointNet++ 67.2 48.1 official repo
Ours(L6, O256) PointNet++ 67.3 (66.2*) 48.9 (48.4*) model
Ours(L12, O256) PointNet++ 67.2 (66.6*) 49.7 (49.3*) model
Ours(L12, O256) PointNet++w2× 68.8 (68.3*) 52.1 (51.1*) model
Ours(L12, O512) PointNet++w2× 69.1 (68.8*) 52.8 (52.3*) model

SUN RGB-D

Method backbone inputs [email protected] [email protected] Model
VoteNet PointNet++ point 59.1 35.8 official repo
MLCVNet PointNet++ point 59.8 - official repo
HGNet GU-net point 61.6 - -
H3DNet 4xPointNet++ point 60.1 39.0 official repo
imVoteNet PointNet++ point+RGB 63.4 - official repo
Ours(L6, O256) PointNet++ point 62.8 (62.6*) 42.3 (42.0*) model

Notes:

  • * means the result is averaged over 5-times evaluation since the algorithm randomness is large.

Install

Requirements

  • Ubuntu 16.04
  • Anaconda with python=3.6
  • pytorch>=1.3
  • torchvision with pillow<7
  • cuda=10.1
  • trimesh>=2.35.39,<2.35.40
  • 'networkx>=2.2,<2.3'
  • compile the CUDA layers for PointNet++, which we used in the backbone network: sh init.sh
  • others: pip install termcolor opencv-python tensorboard

Data preparation

For SUN RGB-D, follow the README under the sunrgbd folder.

For ScanNet, follow the README under the scannet folder.

Usage

ScanNet

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 6 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O256 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O512 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O512 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

SUN RGB-D

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 \
    --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 \
    --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 \
    --dataset sunrgbd --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 20000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset sunrgbd --data_root <data directory> [--dump_dir <dump directory>]

Acknowledgements

We thank a lot for the flexible codebase of votenet.

License

The code is released under MIT License (see LICENSE file for details).

Comments
  • 5-times evaluation

    5-times evaluation

    Hi, thank you for releasing your codebase!

    I wanted to ask - SUN-RGBD results seem to be unstable. I was wondering if you trained a single model and evaluated on 5 seeds, or you trained 5 models on different seeds?

    Further, did you notice some variations between training runs?

    opened by Divadi 5
  • Question about results reproduction

    Question about results reproduction

    Hi, thanks for the nice work.

    I train your network on SUN RGBD dataset with the training script: CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 2222 --nproc_per_node 4 train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 --size_cls_agnostic --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 --dataset sunrgbd --data_root .

    I obtain the following results:

    [08/24 23:51:00 group-free]: IoU[0.25]: 0head_: 0.6363  1head_: 0.6320  2head_: 0.6202  3head_: 0.6132  4head_: 0.6163  last_: 0.6164   proposal_: 0.6108
    [08/24 23:51:00 group-free]: IoU[0.5]:  0head_: 0.4328  1head_: 0.4388  2head_: 0.4095  3head_: 0.4329  4head_: 0.4441  last_: 0.4282   proposal_: 0.3599
    

    Question 1: There are several results (0head_, 1head_, 2head_, 3head_, 4head_, proposal_), and which one is proper to be reported in the paper? Question 2: These results are not very comparable to the results in your paper (IoU[0.25] 63.0, IoU[0.5] 45.2). I'm not sure what's going wrong.

    Thank you and look forward for your reply.

    opened by yikaiw 3
  • Training without Instance Segmentation

    Training without Instance Segmentation

    Is it possible to train the group-free network without having access to per point instance labels, so only using 3D bounding boxes? The loss calculation seems to depend on instance labels as far as I can tell

    opened by egeozsoy 3
  • Visualizations on cross-attention weight in different decoder stages

    Visualizations on cross-attention weight in different decoder stages

    Dear author, Thank you for your good work. I want to know how to visualize the results in Figure 5. Can you provide the corresponding visualization code?

    opened by jlqzzz 2
  • About voting

    About voting

    Thanks for your great work. You mentioned in appendix A1.2 that you implemented voting into the framework. But it seems that no experiment or code can be found in the paper or in this repo.

    opened by yzheng97 1
  • Some files missed on SUNRGBD

    Some files missed on SUNRGBD

    I have followed the process under sunrgbd, and the dataset can be run with votenet. But it failed to run with Group-Free-3D. The following file is missed: all_obbs_modified_nearest_has_empty.pkl all_pc_modified_nearest_has_empty.pkl all_point_labels_nearest_has_empty.pkl Can you provide related files? Thanks 😀

    opened by densechen 1
  • Eval: Some classes output NaN because of Npos=0

    Eval: Some classes output NaN because of Npos=0

    Hi! I am trying to evaluate on all Scannet classes (485). Since some classes are very rare, running the eval_det_cls for them throws NaN because of npos=0. Can you recommend a fix for this?

    opened by ishitamed19 1
  • the eval ap lower than the train ap

    the eval ap lower than the train ap

    Dear author, the result of running eval_avg.py is not as good as that of evaluating during training, and the effect is reduced by 4%. Is this due to overfitting?

    opened by sunmaosheng755 0
  • Question about label assign

    Question about label assign

    Hi, Thanks for sharing this excellent work. In the paper, you mentioned that you manually assign the object candidates to ground-truth. Could you please explain the detail a little bit and point out where is the code?

    opened by XuyangBai 0
  • Loss become nan when at about 300 epoch

    Loss become nan when at about 300 epoch

    Thanks for your excellent work!

    I encountered a problem during the training. Since I only have one GPU, so I modified the train_dist.py to a single GPU version (just remove the codes about distribution). Screenshot from 2021-05-11 09-31-19

    I want to know if is there anything else needs to be modified? And if you have any suggestion about this problem? Thanks very much!!

    opened by adrien-Chen 0
  • Question about the size_cls_agnostic

    Question about the size_cls_agnostic

    I found that ‘size_gts’ is used to supervise the pred-size of object when setting size_cls_agnostic is True. I would like to ask the reason for using ‘size_gts’ instead of ‘box3d_size’ as supervision information.

    opened by yuanzhen2020 0
  • Code Question

    Code Question

    First of all, thank you for sharing your work. I've been working with your code recently, modifying a few sections and noticed a few things I don't quite understand.

    In your paper you stated that you used a random scaling of 0.9 to 1.1 as augmentation on the Scannet data set, however the augmentation code provided only applies random flipping and rotation or did I miss the section where the scaling is applied?

    Secondly I wasn't able to reproduce your results on the Scannet data set, coming 1% short on the mAP score @25 IoU and @50 IoU. Now I'm wondering if this may be due to the fact that I'm only using a single GPU for training? As far as I understand you do not sync the batch norm across GPUs and the batch on each GPU being smaller may actually beneficial to the training?

    And when I looked at the transformer code, I noticed that each attention layer uses 288 dimensional features. I was wondering if there is a specific reason for choosing this value, as it seems quite low to me and I would have thought that a power of 2 would be more inline with most architectures.

    I would really appreciate it if I could gain your insights on this.

    opened by Giutear 2
  • Inference Queries

    Inference Queries

    @stupidZZ Thanks for opensourcing the code base , i have few queries

    1. how to run on custom point cloud dataset , should we pre process into any one of the format
    2. how to visualize the results shown in the paper , can you please sharre the visualization code base
    3. i am able to run the model and getting the following results hwo to validate the results by metrics and visualization dict_keys(['sa1_inds', 'sa1_xyz', 'sa1_features', 'sa2_inds', 'sa2_xyz', 'sa2_features', 'sa3_xyz', 'sa3_features', 'sa4_xyz', 'sa4_features', 'fp2_features', 'fp2_xyz', 'fp2_inds', 'seed_inds', 'seed_xyz', 'seed_features', 'seeds_obj_cls_logits', 'query_points_xyz', 'query_points_feature', 'query_points_sample_inds', 'proposal_base_xyz', 'proposal_objectness_scores', 'proposal_center', 'proposal_heading_scores', 'proposal_heading_residuals_normalized', 'proposal_heading_residuals', 'proposal_pred_size', 'proposal_sem_cls_scores', '0head_base_xyz', '0head_objectness_scores', '0head_center', '0head_heading_scores', '0head_heading_residuals_normalized', '0head_heading_residuals', '0head_pred_size', '0head_sem_cls_scores', '1head_base_xyz', '1head_objectness_scores', '1head_center', '1head_heading_scores', '1head_heading_residuals_normalized', '1head_heading_residuals', '1head_pred_size', '1head_sem_cls_scores', '2head_base_xyz', '2head_objectness_scores', '2head_center', '2head_heading_scores', '2head_heading_residuals_normalized', '2head_heading_residuals', '2head_pred_size', '2head_sem_cls_scores', '3head_base_xyz', '3head_objectness_scores', '3head_center', '3head_heading_scores', '3head_heading_residuals_normalized', '3head_heading_residuals', '3head_pred_size', '3head_sem_cls_scores', '4head_base_xyz', '4head_objectness_scores', '4head_center', '4head_heading_scores', '4head_heading_residuals_normalized', '4head_heading_residuals', '4head_pred_size', '4head_sem_cls_scores', 'last_base_xyz', 'last_objectness_scores', 'last_center', 'last_heading_scores', 'last_heading_residuals_normalized', 'last_heading_residuals', 'last_pred_size', 'last_sem_cls_scores', 'point_clouds', 'center_label', 'heading_class_label', 'heading_residual_label', 'size_class_label', 'size_residual_label', 'size_gts', 'sem_cls_label', 'box_label_mask', 'point_obj_mask', 'point_instance_label', 'scan_idx', 'max_gt_bboxes', 'points_hard_topk4_pos_ratio', 'points_hard_topk4_neg_ratio', 'points_hard_topk4_upper_recall_ratio', 'query_points_generation_loss', 'proposal_objectness_label', 'proposal_objectness_mask', 'proposal_object_assignment', 'proposal_pos_ratio', 'proposal_neg_ratio', 'proposal_objectness_loss', 'last_objectness_label', 'last_objectness_mask', 'last_object_assignment', 'last_pos_ratio', 'last_neg_ratio', 'last_objectness_loss', '0head_objectness_label', '0head_objectness_mask', '0head_object_assignment', '0head_pos_ratio', '0head_neg_ratio', '0head_objectness_loss', '1head_objectness_label', '1head_objectness_mask', '1head_object_assignment', '1head_pos_ratio', '1head_neg_ratio', '1head_objectness_loss', '2head_objectness_label', '2head_objectness_mask', '2head_object_assignment', '2head_pos_ratio', '2head_neg_ratio', '2head_objectness_loss', '3head_objectness_label', '3head_objectness_mask', '3head_object_assignment', '3head_pos_ratio', '3head_neg_ratio', '3head_objectness_loss', '4head_objectness_label', '4head_objectness_mask', '4head_object_assignment', '4head_pos_ratio', '4head_neg_ratio', '4head_objectness_loss', 'sum_heads_objectness_loss', 'proposal_center_loss', 'proposal_heading_cls_loss', 'proposal_heading_reg_loss', 'proposal_size_reg_loss', 'proposal_box_loss', 'proposal_sem_cls_loss', 'last_center_loss', 'last_heading_cls_loss', 'last_heading_reg_loss', 'last_size_reg_loss', 'last_box_loss', 'last_sem_cls_loss', '0head_center_loss', '0head_heading_cls_loss', '0head_heading_reg_loss', '0head_size_reg_loss', '0head_box_loss', '0head_sem_cls_loss', '1head_center_loss', '1head_heading_cls_loss', '1head_heading_reg_loss', '1head_size_reg_loss', '1head_box_loss', '1head_sem_cls_loss', '2head_center_loss', '2head_heading_cls_loss', '2head_heading_reg_loss', '2head_size_reg_loss', '2head_box_loss', '2head_sem_cls_loss', '3head_center_loss', '3head_heading_cls_loss', '3head_heading_reg_loss', '3head_size_reg_loss', '3head_box_loss', '3head_sem_cls_loss', '4head_center_loss', '4head_heading_cls_loss', '4head_heading_reg_loss', '4head_size_reg_loss', '4head_box_loss', '4head_sem_cls_loss', 'sum_heads_box_loss', 'sum_heads_sem_cls_loss', 'loss', 'batch_gt_map_cls']) Thanks in advance
    opened by abhigoku10 0
  • There is no

    There is no "demo.py"

    I wonder how different stages of results are ensembled in this method. This part of the code is not given, which should be very important according to the paper. Even in the evaluation and test stage, the loss is an average of loss in different stages, but not the loss of a final estimated result. I don't think this is reasonable.

    opened by MandyDongrs 3
Owner
Ze Liu
USTC & MSRA Joint-PhD candidate.
Ze Liu
Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

null 3 Jan 26, 2022
The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

GCoNet The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection . Trained model Download final_gconet.pth

Qi Fan 46 Nov 17, 2022
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

Shilong Zhang 129 Dec 24, 2022
Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Group-CAM By Zhang, Qinglong and Rao, Lu and Yang, Yubin [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the o

zhql 98 Nov 16, 2022
BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

null 1 Jan 26, 2022
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning Object-object Interaction Affordance Learning. For a given object-object int

Kaichun Mo 26 Nov 4, 2022
DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection Code for our Paper DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Obje

Steven Lang 58 Dec 19, 2022
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

Kakao Brain 114 Nov 28, 2022
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

null 305 Dec 16, 2022
SafePicking: Learning Safe Object Extraction via Object-Level Mapping, ICRA 2022

SafePicking Learning Safe Object Extraction via Object-Level Mapping Kentaro Wad

Kentaro Wada 49 Oct 24, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on the combined output candidates of any 3D and any 2D detector, and is trained to produce more accurate 3D and 2D detection results.

Su Pang 254 Dec 16, 2022
Object Detection and Multi-Object Tracking

Object Detection and Multi-Object Tracking

Bobby Chen 1.6k Jan 4, 2023
Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

Max 1 Dec 29, 2021
Auto-Lama combines object detection and image inpainting to automate object removals

Auto-Lama Auto-Lama combines object detection and image inpainting to automate object removals. It is build on top of DE:TR from Facebook Research and

null 44 Dec 9, 2022
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022
Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

ming71 56 Nov 28, 2022