Group-Free 3D Object Detection via Transformers

Ze Liu

Last update: Dec 7, 2022

Related tags

Deep Learning Group-Free-3D

Overview

Group-Free 3D Object Detection via Transformers

By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong.

This repo is the official implementation of "Group-Free 3D Object Detection via Transformers".

Updates

April 01, 2021: initial release.

Introduction

Recently, directly detecting 3D objects from 3D point clouds has received increasing attention. To extract object representation from an irregular point cloud, existing methods usually take a point grouping step to assign the points to an object candidate so that a PointNet-like network could be used to derive object features from the grouped points. However, the inaccurate point assignments caused by the hand-crafted grouping scheme decrease the performance of 3D object detection. In this paper, we present a simple yet effective method for directly detecting 3D objects from the 3D point cloud. Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers, where the contribution of each point is automatically learned in the network training. With an improved attention stacking scheme, our method fuses object features in different stages and generates more accurate object detection results. With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D.

In this repository, we provide model implementation (with Pytorch) as well as data preparation, training and evaluation scripts on ScanNet and SUN RGB-D.

Citation

@article{liu2021,
  title={Group-Free 3D Object Detection via Transformers},
  author={Liu, Ze and Zhang, Zheng and Cao, Yue and Hu, Han and Tong, Xin},
  journal={arXiv preprint arXiv:2104.00678},
  year={2021}
}

Main Results

ScanNet V2

Method	backbone	[email protected]	[email protected]	Model
HGNet	GU-net	61.3	34.4	-
GSDN	MinkNet	62.8	34.8	waiting for release
3D-MPA	MinkNet	64.2	49.2	waiting for release
VoteNet	PointNet++	62.9	39.9	official repo
MLCVNet	PointNet++	64.5	41.4	official repo
H3DNet	PointNet++	64.4	43.4	official repo
H3DNet	4xPointNet++	67.2	48.1	official repo
Ours(L6, O256)	PointNet++	67.3 (66.2*)	48.9 (48.4*)	model
Ours(L12, O256)	PointNet++	67.2 (66.6*)	49.7 (49.3*)	model
Ours(L12, O256)	PointNet++w2×	68.8 (68.3*)	52.1 (51.1*)	model
Ours(L12, O512)	PointNet++w2×	69.1 (68.8*)	52.8 (52.3*)	model

SUN RGB-D

Method	backbone	inputs	[email protected]	[email protected]	Model
VoteNet	PointNet++	point	59.1	35.8	official repo
MLCVNet	PointNet++	point	59.8	-	official repo
HGNet	GU-net	point	61.6	-	-
H3DNet	4xPointNet++	point	60.1	39.0	official repo
imVoteNet	PointNet++	point+RGB	63.4	-	official repo
Ours(L6, O256)	PointNet++	point	62.8 (62.6*)	42.3 (42.0*)	model

Notes:

* means the result is averaged over 5-times evaluation since the algorithm randomness is large.

Install

Requirements

Ubuntu 16.04
Anaconda with python=3.6
pytorch>=1.3
torchvision with pillow<7
cuda=10.1
trimesh>=2.35.39,<2.35.40
'networkx>=2.2,<2.3'
compile the CUDA layers for PointNet++, which we used in the backbone network: sh init.sh
others: pip install termcolor opencv-python tensorboard

Data preparation

For SUN RGB-D, follow the README under the sunrgbd folder.

For ScanNet, follow the README under the scannet folder.

Usage

ScanNet

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 6 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O256 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O512 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O512 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

SUN RGB-D

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 \
    --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 \
    --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 \
    --dataset sunrgbd --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 20000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset sunrgbd --data_root <data directory> [--dump_dir <dump directory>]

Acknowledgements

We thank a lot for the flexible codebase of votenet.

License

The code is released under MIT License (see LICENSE file for details).

Comments

5-times evaluation

Hi, thank you for releasing your codebase!

I wanted to ask - SUN-RGBD results seem to be unstable. I was wondering if you trained a single model and evaluated on 5 seeds, or you trained 5 models on different seeds?

Further, did you notice some variations between training runs?

opened by Divadi 5
Question about results reproduction
Hi, thanks for the nice work.

I train your network on SUN RGBD dataset with the training script: CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 2222 --nproc_per_node 4 train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 --size_cls_agnostic --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 --dataset sunrgbd --data_root .

I obtain the following results:

[08/24 23:51:00 group-free]: IoU[0.25]: 0head_: 0.6363 1head_: 0.6320 2head_: 0.6202 3head_: 0.6132 4head_: 0.6163 last_: 0.6164 proposal_: 0.6108 [08/24 23:51:00 group-free]: IoU[0.5]: 0head_: 0.4328 1head_: 0.4388 2head_: 0.4095 3head_: 0.4329 4head_: 0.4441 last_: 0.4282 proposal_: 0.3599

Question 1: There are several results (0head_, 1head_, 2head_, 3head_, 4head_, proposal_), and which one is proper to be reported in the paper? Question 2: These results are not very comparable to the results in your paper (IoU[0.25] 63.0, IoU[0.5] 45.2). I'm not sure what's going wrong.

Thank you and look forward for your reply.
opened by yikaiw 3
Training without Instance Segmentation

Is it possible to train the group-free network without having access to per point instance labels, so only using 3D bounding boxes? The loss calculation seems to depend on instance labels as far as I can tell

opened by egeozsoy 3
Visualizations on cross-attention weight in different decoder stages

Dear author, Thank you for your good work. I want to know how to visualize the results in Figure 5. Can you provide the corresponding visualization code?

opened by jlqzzz 2
About voting

Thanks for your great work. You mentioned in appendix A1.2 that you implemented voting into the framework. But it seems that no experiment or code can be found in the paper or in this repo.

opened by yzheng97 1
Some files missed on SUNRGBD

I have followed the process under sunrgbd, and the dataset can be run with votenet. But it failed to run with Group-Free-3D. The following file is missed: all_obbs_modified_nearest_has_empty.pkl all_pc_modified_nearest_has_empty.pkl all_point_labels_nearest_has_empty.pkl Can you provide related files? Thanks 😀

opened by densechen 1
Eval: Some classes output NaN because of Npos=0

Hi! I am trying to evaluate on all Scannet classes (485). Since some classes are very rare, running the eval_det_cls for them throws NaN because of npos=0. Can you recommend a fix for this?

opened by ishitamed19 1
the eval ap lower than the train ap

Dear author, the result of running eval_avg.py is not as good as that of evaluating during training, and the effect is reduced by 4%. Is this due to overfitting?

opened by sunmaosheng755 0
Question about label assign

Hi, Thanks for sharing this excellent work. In the paper, you mentioned that you manually assign the object candidates to ground-truth. Could you please explain the detail a little bit and point out where is the code?

opened by XuyangBai 0
Loss become nan when at about 300 epoch

Thanks for your excellent work!

I encountered a problem during the training. Since I only have one GPU, so I modified the train_dist.py to a single GPU version (just remove the codes about distribution).

I want to know if is there anything else needs to be modified? And if you have any suggestion about this problem? Thanks very much!!

opened by adrien-Chen 0
Question about the size_cls_agnostic

I found that ‘size_gts’ is used to supervise the pred-size of object when setting size_cls_agnostic is True. I would like to ask the reason for using ‘size_gts’ instead of ‘box3d_size’ as supervision information.

opened by yuanzhen2020 0
Code Question

First of all, thank you for sharing your work. I've been working with your code recently, modifying a few sections and noticed a few things I don't quite understand.

In your paper you stated that you used a random scaling of 0.9 to 1.1 as augmentation on the Scannet data set, however the augmentation code provided only applies random flipping and rotation or did I miss the section where the scaling is applied?

Secondly I wasn't able to reproduce your results on the Scannet data set, coming 1% short on the mAP score @25 IoU and @50 IoU. Now I'm wondering if this may be due to the fact that I'm only using a single GPU for training? As far as I understand you do not sync the batch norm across GPUs and the batch on each GPU being smaller may actually beneficial to the training?

And when I looked at the transformer code, I noticed that each attention layer uses 288 dimensional features. I was wondering if there is a specific reason for choosing this value, as it seems quite low to me and I would have thought that a power of 2 would be more inline with most architectures.

I would really appreciate it if I could gain your insights on this.

opened by Giutear 2
Inference Queries
@stupidZZ Thanks for opensourcing the code base , i have few queries

how to run on custom point cloud dataset , should we pre process into any one of the format

how to visualize the results shown in the paper , can you please sharre the visualization code base

i am able to run the model and getting the following results hwo to validate the results by metrics and visualization dict_keys(['sa1_inds', 'sa1_xyz', 'sa1_features', 'sa2_inds', 'sa2_xyz', 'sa2_features', 'sa3_xyz', 'sa3_features', 'sa4_xyz', 'sa4_features', 'fp2_features', 'fp2_xyz', 'fp2_inds', 'seed_inds', 'seed_xyz', 'seed_features', 'seeds_obj_cls_logits', 'query_points_xyz', 'query_points_feature', 'query_points_sample_inds', 'proposal_base_xyz', 'proposal_objectness_scores', 'proposal_center', 'proposal_heading_scores', 'proposal_heading_residuals_normalized', 'proposal_heading_residuals', 'proposal_pred_size', 'proposal_sem_cls_scores', '0head_base_xyz', '0head_objectness_scores', '0head_center', '0head_heading_scores', '0head_heading_residuals_normalized', '0head_heading_residuals', '0head_pred_size', '0head_sem_cls_scores', '1head_base_xyz', '1head_objectness_scores', '1head_center', '1head_heading_scores', '1head_heading_residuals_normalized', '1head_heading_residuals', '1head_pred_size', '1head_sem_cls_scores', '2head_base_xyz', '2head_objectness_scores', '2head_center', '2head_heading_scores', '2head_heading_residuals_normalized', '2head_heading_residuals', '2head_pred_size', '2head_sem_cls_scores', '3head_base_xyz', '3head_objectness_scores', '3head_center', '3head_heading_scores', '3head_heading_residuals_normalized', '3head_heading_residuals', '3head_pred_size', '3head_sem_cls_scores', '4head_base_xyz', '4head_objectness_scores', '4head_center', '4head_heading_scores', '4head_heading_residuals_normalized', '4head_heading_residuals', '4head_pred_size', '4head_sem_cls_scores', 'last_base_xyz', 'last_objectness_scores', 'last_center', 'last_heading_scores', 'last_heading_residuals_normalized', 'last_heading_residuals', 'last_pred_size', 'last_sem_cls_scores', 'point_clouds', 'center_label', 'heading_class_label', 'heading_residual_label', 'size_class_label', 'size_residual_label', 'size_gts', 'sem_cls_label', 'box_label_mask', 'point_obj_mask', 'point_instance_label', 'scan_idx', 'max_gt_bboxes', 'points_hard_topk4_pos_ratio', 'points_hard_topk4_neg_ratio', 'points_hard_topk4_upper_recall_ratio', 'query_points_generation_loss', 'proposal_objectness_label', 'proposal_objectness_mask', 'proposal_object_assignment', 'proposal_pos_ratio', 'proposal_neg_ratio', 'proposal_objectness_loss', 'last_objectness_label', 'last_objectness_mask', 'last_object_assignment', 'last_pos_ratio', 'last_neg_ratio', 'last_objectness_loss', '0head_objectness_label', '0head_objectness_mask', '0head_object_assignment', '0head_pos_ratio', '0head_neg_ratio', '0head_objectness_loss', '1head_objectness_label', '1head_objectness_mask', '1head_object_assignment', '1head_pos_ratio', '1head_neg_ratio', '1head_objectness_loss', '2head_objectness_label', '2head_objectness_mask', '2head_object_assignment', '2head_pos_ratio', '2head_neg_ratio', '2head_objectness_loss', '3head_objectness_label', '3head_objectness_mask', '3head_object_assignment', '3head_pos_ratio', '3head_neg_ratio', '3head_objectness_loss', '4head_objectness_label', '4head_objectness_mask', '4head_object_assignment', '4head_pos_ratio', '4head_neg_ratio', '4head_objectness_loss', 'sum_heads_objectness_loss', 'proposal_center_loss', 'proposal_heading_cls_loss', 'proposal_heading_reg_loss', 'proposal_size_reg_loss', 'proposal_box_loss', 'proposal_sem_cls_loss', 'last_center_loss', 'last_heading_cls_loss', 'last_heading_reg_loss', 'last_size_reg_loss', 'last_box_loss', 'last_sem_cls_loss', '0head_center_loss', '0head_heading_cls_loss', '0head_heading_reg_loss', '0head_size_reg_loss', '0head_box_loss', '0head_sem_cls_loss', '1head_center_loss', '1head_heading_cls_loss', '1head_heading_reg_loss', '1head_size_reg_loss', '1head_box_loss', '1head_sem_cls_loss', '2head_center_loss', '2head_heading_cls_loss', '2head_heading_reg_loss', '2head_size_reg_loss', '2head_box_loss', '2head_sem_cls_loss', '3head_center_loss', '3head_heading_cls_loss', '3head_heading_reg_loss', '3head_size_reg_loss', '3head_box_loss', '3head_sem_cls_loss', '4head_center_loss', '4head_heading_cls_loss', '4head_heading_reg_loss', '4head_size_reg_loss', '4head_box_loss', '4head_sem_cls_loss', 'sum_heads_box_loss', 'sum_heads_sem_cls_loss', 'loss', 'batch_gt_map_cls']) Thanks in advance
opened by abhigoku10 0
There is no "demo.py"

I wonder how different stages of results are ensembled in this method. This part of the code is not given, which should be very important according to the paper. Even in the evaluation and test stage, the loss is an average of loss in different stages, but not the loss of a final estimated result. I don't think this is reasonable.

opened by MandyDongrs 3

Owner

Ze Liu

USTC & MSRA Joint-PhD candidate.

GitHub

Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

3 Jan 26, 2022

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

GCoNet The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection . Trained model Download final_gconet.pth

46 Nov 17, 2022

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

129 Dec 24, 2022

Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Group-CAM By Zhang, Qinglong and Rao, Lu and Yang, Yubin [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the o

98 Nov 16, 2022

BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

1 Jan 26, 2022

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

MOT-Tracking-by-Detection-Pipeline Tracking-by-Detection形式のMOT(Multi Object Trac

41 Nov 23, 2022

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning Object-object Interaction Affordance Learning. For a given object-object int

26 Nov 4, 2022

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection Code for our Paper DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Obje

58 Dec 19, 2022

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

430 Dec 23, 2022

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

114 Nov 28, 2022

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

305 Dec 16, 2022

SafePicking: Learning Safe Object Extraction via Object-Level Mapping, ICRA 2022

SafePicking Learning Safe Object Extraction via Object-Level Mapping Kentaro Wad

49 Oct 24, 2022

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

443 Dec 6, 2022

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on the combined output candidates of any 3D and any 2D detector, and is trained to produce more accurate 3D and 2D detection results.

254 Dec 16, 2022

Object Detection and Multi-Object Tracking

1.6k Jan 4, 2023

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

1 Dec 29, 2021

Auto-Lama combines object detection and image inpainting to automate object removals

Auto-Lama Auto-Lama combines object detection and image inpainting to automate object removals. It is build on top of DE:TR from Facebook Research and

44 Dec 9, 2022

Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

54 Dec 17, 2022

Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

56 Nov 28, 2022