Improving 3D Object Detection with Channel-wise Transformer

Hualian Sheng

Last update: Dec 20, 2022

Related tags

Deep Learning CT3D

Overview

"Improving 3D Object Detection with Channel-wise Transformer"

Thanks for the OpenPCDet, this implementation of the CT3D is mainly based on the pcdet v0.3. Our paper can be downloaded here ICCV2021.

Overview of CT3D. The raw points are first fed into the RPN for generating 3D proposals. Then the raw points along with the corresponding proposals are processed by the channel-wise Transformer composed of the proposal-to-point encoding module and the channel-wise decoding module. Specifically, the proposal-to-point encoding module is to modulate each point feature with global proposal-aware context information. After that, the encoded point features are transformed into an effective proposal feature representation by the channel-wise decoding module for confidence prediction and box regression.

	AP@R11	AP@R40	Download
Only Car	86.06	85.79	model-car
3-Category (Car)	85.04	84.97	model-3cat
3-Category (Pedestrian)	56.28	55.58	-
3-Category (Cyclist)	71.71	71.88	-

1. Recommended Environment

Linux (tested on Ubuntu 16.04)
Python 3.6+
PyTorch 1.1 or higher (tested on PyTorch 1.6)
CUDA 9.0 or higher (PyTorch 1.3+ needs CUDA 9.2+)

2. Set the Environment

pip install -r requirement.txt
python setup.py develop

3. Data Preparation

Prepare KITTI dataset and road planes

# Download KITTI and organize it into the following form:
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2

# Generatedata infos:
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml

Prepare Waymo dataset

# Download Waymo and organize it into the following form:
├── data
│   ├── waymo
│   │   │── ImageSets
│   │   │── raw_data
│   │   │   │── segment-xxxxxxxx.tfrecord
|   |   |   |── ...
|   |   |── waymo_processed_data
│   │   │   │── segment-xxxxxxxx/
|   |   |   |── ...
│   │   │── pcdet_gt_database_train_sampled_xx/
│   │   │── pcdet_waymo_dbinfos_train_sampled_xx.pkl

# Install tf 2.1.0
# Install the official waymo-open-dataset by running the following command:
pip3 install --upgrade pip
pip3 install waymo-open-dataset-tf-2-1-0 --user

# Extract point cloud data from tfrecord and generate data infos:
python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos --cfg_file tools/cfgs/dataset_configs/waymo_dataset.yaml

4. Train

Train with a single GPU

python train.py --cfg_file ${CONFIG_FILE}

# e.g.,
python train.py --cfg_file tools/cfgs/kitti_models/second_ct3d.yaml

Train with multiple GPUs or multiple machines

bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file ${CONFIG_FILE}
# or 
bash scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} ${NUM_GPUS} --cfg_file ${CONFIG_FILE}

# e.g.,
bash scripts/dist_train.sh 8 --cfg_file tools/cfgs/kitti_models/second_ct3d.yaml

5. Test

Test with a pretrained model:

python test.py --cfg_file ${CONFIG_FILE} --ckpt ${CKPT}

# e.g., 
python test.py --cfg_file tools/cfgs/kitti_models/second_ct3d.yaml --ckpt output/kitti_models/second_ct3d/default/kitti_val.pth

Comments

Three questions about the code and paper
Hello @hlsheng1 ! After reading your paper and code, I want to ask some questions.

Why did you transform the coordinates of points from cartesian coordinate system to spherical coordinate system in proposal-to-point embedding module? Did experiments show better performance in spherical coordinate system than in cartesian coordinate system?

CT3D is trained with more epochs and a smaller learning rate compared to the vanilla SECOND in OpenPCDet. Is this because Transformer converges slower than other networks?

I can't fully understand the proposed extended channel-wise re-weighting. In my opinion, it's more like head-wise re-weighting for the shape of scores_1 is (batch_size * num_roi, num_head, num_query, num_key). Besides, extended channel-wise re-weighting doesn't introduce more learnable parameters. Why would the performance be better?

I am looking forward to your reply. Thank you in advance!
opened by rkotimi 5

About KITTI val results

Hi,

I used your 3 classes checkpoint for kitti, but couldn't reproduce the results reported in your paper.

Here are the results I got for second_ct3d_3cat on kitti validation set:

Car [email protected], 0.70, 0.70:
bbox AP:98.0948, 89.4857, 89.1712
bev  AP:90.2502, 88.1758, 87.7788
3d   AP:89.1084, 85.0401, 78.7598
aos  AP:98.06, 89.41, 89.04
Car [email protected], 0.70, 0.70:
bbox AP:98.9425, 94.9858, 92.7375
bev  AP:95.9154, 91.3542, 89.2926
3d   AP:92.3391, 84.9711, 82.9065
aos  AP:98.91, 94.88, 92.58
Car [email protected], 0.50, 0.50:
bbox AP:98.0948, 89.4857, 89.1712
bev  AP:98.0871, 89.4169, 89.1425
3d   AP:98.0695, 89.3947, 89.1037
aos  AP:98.06, 89.41, 89.04
Car [email protected], 0.50, 0.50:
bbox AP:98.9425, 94.9858, 92.7375
bev  AP:98.8692, 94.9542, 94.7880
3d   AP:98.8598, 94.9019, 94.6973
aos  AP:98.91, 94.88, 92.58
Pedestrian [email protected], 0.50, 0.50:
bbox AP:73.0723, 69.4016, 66.8569
bev  AP:64.2310, 59.8402, 55.7597
3d   AP:61.7407, 56.2790, 52.5120
aos  AP:69.04, 65.17, 62.02
Pedestrian [email protected], 0.50, 0.50:
bbox AP:73.6869, 69.9239, 67.0737
bev  AP:64.4088, 59.1779, 54.8622
3d   AP:61.0537, 55.5749, 51.0978
aos  AP:69.22, 65.19, 61.73
Pedestrian [email protected], 0.25, 0.25:
bbox AP:73.0723, 69.4016, 66.8569
bev  AP:76.8913, 72.4136, 70.3777
3d   AP:76.8213, 72.3334, 70.1651
aos  AP:69.04, 65.17, 62.02
Pedestrian [email protected], 0.25, 0.25:
bbox AP:73.6869, 69.9239, 67.0737
bev  AP:77.9570, 73.8559, 71.3994
3d   AP:77.8785, 73.7270, 70.7240
aos  AP:69.22, 65.19, 61.73
Cyclist [email protected], 0.50, 0.50:
bbox AP:93.4912, 77.7671, 76.2917
bev  AP:90.9435, 73.6842, 71.2105
3d   AP:85.0440, 71.7085, 68.0511
aos  AP:93.30, 77.56, 76.00
Cyclist [email protected], 0.50, 0.50:
bbox AP:95.4022, 80.6753, 77.1761
bev  AP:92.5957, 75.3982, 71.3149
3d   AP:89.0081, 71.8798, 67.9090
aos  AP:95.24, 80.43, 76.85
Cyclist [email protected], 0.25, 0.25:
bbox AP:93.4912, 77.7671, 76.2917
bev  AP:91.9032, 77.3141, 73.3950
3d   AP:91.9032, 77.3141, 73.3950
aos  AP:93.30, 77.56, 76.00
Cyclist [email protected], 0.25, 0.25:
bbox AP:95.4022, 80.6753, 77.1761
bev  AP:93.7097, 78.2383, 73.9397
3d   AP:93.7097, 78.2383, 73.9455
aos  AP:95.24, 80.43, 76.85

opened by kts707 4

Why need to set find_unused_parameters=True

I find if i didn't set find_unused_parameters=True, Training will report errors:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn’t able to locate the output tensors in the return value of your module’s forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).

Do you know the cause of this? I didn't find any unused parameter. If I don't want to set find_unused_parameters=True, is there an alternative method?

opened by Yzichen 3
Some questions about train CT3D on three classes on kitti dataset?

I train CT3D on kitti dataset three classes with 2 2080Ti GPUs,the bachsize is 2 per gpu. But the car mAP on val set is 79.38(11 recall position） and 83.48（40 recall position).If the batchsize will effect the performance？ The cfg second_ct3d_3cat.yaml lose num_points and why the num_classes is 1 in Transformer module(line 138 in cfg，and the learning rate and epoch is different from the paper？If the cfg file is correct? I can not run the test.py with the your ckpt. The .pth file becomes zip file,maybe I should change the pyorch version to 1.6.

opened by czy-0326 3
Question about point selection

Thanks for your impressive work. In the roi_head of CT3D, you select points by horizontal limit. Why don't you limit the height of points in the vertical direction?

opened by Eaphan 2
Imcompatible spconv versions
This repo doesn't specify the version of spconv. I installed the latest spconv, and got this error when running python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml:

File "CT3D/pcdet/datasets/processor/data_processor.py", line 51, in transform_points_to_voxels voxel_generator = VoxelGenerator( TypeError: __init__() missing 1 required positional argument: 'num_point_features'

I find that spconv has deprecated VoxelGenerator in the newest version. Instead, they use spconv.pytorch.utils.PointToVoxel now. The parameters needed for PointToVoxel are different from VoxelGenerator, so it's not easy to refactor all VoxelGenerator. I want to know that what version of spconv do you use to run the code? Thank you.
opened by Co1lin 2
A question about SCORE_THRESH setting of POST_PROCESSING

Thanks for your great work; How do you set SCORE_THRESH when you train the model on all trainsets and submit the prediction results to the KITTI benchmark evaluation? 0.7 or 0.81? By the way, what's the insight for changing the score_thresh? I guess that you did it to remove some false positive predictions.

opened by Eaphan 2
Differences between only car and 3classes training?

I notice that you only train the 'car' class in the provided config file. Is there any difference between training with all 3 classes and only the 'car' class? Can I use the provided codes for 3 classes training?

opened by tdzdog 2
Question about attention encoder

Hi, thanks for your brilliant work first! And I have a question about attention encoder layer. When the number of points in one box is less than 256, some vectors full of zero are used to pad the feature. Are these zero vectors participate the attention calculation when the feature pass the encoder layer? Do they have some influence?

opened by Tu1016 1
bash scripts/dist_train.sh 4 --cfg_file cfgs/kitti_models/second_ct3d.yaml error

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_9f995ce9/none_pelcvjny/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_9f995ce9/none_pelcvjny/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_9f995ce9/none_pelcvjny/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_9f995ce9/none_pelcvjny/attempt_0/3/error.json

您好，请问一下执行bash指令train的时候，出现了这个问题是什么原因呢？想请教下怎么解决，谢谢！！！

opened by pengweiweiwei 1
import module error

ImportError: /data/code/Det_Code/3D_Det/CT3D/pcdet/ops/pointnet2/pointnet2_stack/pointnet2_stack_cuda.cpy│ thon-38-x86_64-linux-gnu.so: undefined symbol: _Z25voxel_query_wrapper_stackiiiiifiiiN2at6TensorES0_S0_S0│ S0

opened by zaiquanyang 1
ct3d_head

Hi, I am confused about the code snippet 'cur_points = batch_dict['points'][(batch_dict['points'][:, 0] == bs_idx)][:, 1:5]' at line 149 in ct3d_head.py. What is the function this operation?

opened by bigbird11 4

Improving 3D Object Detection with Channel-wise Transformer

Related tags

Overview

"Improving 3D Object Detection with Channel-wise Transformer"

1. Recommended Environment

2. Set the Environment

3. Data Preparation

4. Train

5. Test

Comments

Owner

Hualian Sheng

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

Official implementation of "Robust channel-wise illumination estimation"

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Improving Object Detection by Estimating Bounding Box Quality Accurately

LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Yolo object detection - Yolo object detection with python

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Voxel Transformer for 3D object detection

Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

Rethinking Transformer-based Set Prediction for Object Detection

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks