Improving 3D Object Detection with Channel-wise Transformer

Related tags

Deep Learning CT3D
Overview

"Improving 3D Object Detection with Channel-wise Transformer"

Thanks for the OpenPCDet, this implementation of the CT3D is mainly based on the pcdet v0.3. Our paper can be downloaded here ICCV2021.

CT3D Overview of CT3D. The raw points are first fed into the RPN for generating 3D proposals. Then the raw points along with the corresponding proposals are processed by the channel-wise Transformer composed of the proposal-to-point encoding module and the channel-wise decoding module. Specifically, the proposal-to-point encoding module is to modulate each point feature with global proposal-aware context information. After that, the encoded point features are transformed into an effective proposal feature representation by the channel-wise decoding module for confidence prediction and box regression.

AP@R11 AP@R40 Download
Only Car 86.06 85.79 model-car
3-Category (Car) 85.04 84.97 model-3cat
3-Category (Pedestrian) 56.28 55.58 -
3-Category (Cyclist) 71.71 71.88 -

1. Recommended Environment

  • Linux (tested on Ubuntu 16.04)
  • Python 3.6+
  • PyTorch 1.1 or higher (tested on PyTorch 1.6)
  • CUDA 9.0 or higher (PyTorch 1.3+ needs CUDA 9.2+)

2. Set the Environment

pip install -r requirement.txt
python setup.py develop

3. Data Preparation

# Download KITTI and organize it into the following form:
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2

# Generatedata infos:
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
# Download Waymo and organize it into the following form:
├── data
│   ├── waymo
│   │   │── ImageSets
│   │   │── raw_data
│   │   │   │── segment-xxxxxxxx.tfrecord
|   |   |   |── ...
|   |   |── waymo_processed_data
│   │   │   │── segment-xxxxxxxx/
|   |   |   |── ...
│   │   │── pcdet_gt_database_train_sampled_xx/
│   │   │── pcdet_waymo_dbinfos_train_sampled_xx.pkl

# Install tf 2.1.0
# Install the official waymo-open-dataset by running the following command:
pip3 install --upgrade pip
pip3 install waymo-open-dataset-tf-2-1-0 --user

# Extract point cloud data from tfrecord and generate data infos:
python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos --cfg_file tools/cfgs/dataset_configs/waymo_dataset.yaml

4. Train

  • Train with a single GPU
python train.py --cfg_file ${CONFIG_FILE}

# e.g.,
python train.py --cfg_file tools/cfgs/kitti_models/second_ct3d.yaml
  • Train with multiple GPUs or multiple machines
bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file ${CONFIG_FILE}
# or 
bash scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} ${NUM_GPUS} --cfg_file ${CONFIG_FILE}

# e.g.,
bash scripts/dist_train.sh 8 --cfg_file tools/cfgs/kitti_models/second_ct3d.yaml

5. Test

  • Test with a pretrained model:
python test.py --cfg_file ${CONFIG_FILE} --ckpt ${CKPT}

# e.g., 
python test.py --cfg_file tools/cfgs/kitti_models/second_ct3d.yaml --ckpt output/kitti_models/second_ct3d/default/kitti_val.pth
Comments
  • Three questions about the code and paper

    Three questions about the code and paper

    Hello @hlsheng1 ! After reading your paper and code, I want to ask some questions.

    1. Why did you transform the coordinates of points from cartesian coordinate system to spherical coordinate system in proposal-to-point embedding module? Did experiments show better performance in spherical coordinate system than in cartesian coordinate system?
    2. CT3D is trained with more epochs and a smaller learning rate compared to the vanilla SECOND in OpenPCDet. Is this because Transformer converges slower than other networks?
    3. I can't fully understand the proposed extended channel-wise re-weighting. In my opinion, it's more like head-wise re-weighting for the shape of scores_1 is (batch_size * num_roi, num_head, num_query, num_key). Besides, extended channel-wise re-weighting doesn't introduce more learnable parameters. Why would the performance be better?

    I am looking forward to your reply. Thank you in advance!

    opened by rkotimi 5
  • About KITTI val results

    About KITTI val results

    Hi,

    I used your 3 classes checkpoint for kitti, but couldn't reproduce the results reported in your paper.

    Here are the results I got for second_ct3d_3cat on kitti validation set:

    Car [email protected], 0.70, 0.70:
    bbox AP:98.0948, 89.4857, 89.1712
    bev  AP:90.2502, 88.1758, 87.7788
    3d   AP:89.1084, 85.0401, 78.7598
    aos  AP:98.06, 89.41, 89.04
    Car [email protected], 0.70, 0.70:
    bbox AP:98.9425, 94.9858, 92.7375
    bev  AP:95.9154, 91.3542, 89.2926
    3d   AP:92.3391, 84.9711, 82.9065
    aos  AP:98.91, 94.88, 92.58
    Car [email protected], 0.50, 0.50:
    bbox AP:98.0948, 89.4857, 89.1712
    bev  AP:98.0871, 89.4169, 89.1425
    3d   AP:98.0695, 89.3947, 89.1037
    aos  AP:98.06, 89.41, 89.04
    Car [email protected], 0.50, 0.50:
    bbox AP:98.9425, 94.9858, 92.7375
    bev  AP:98.8692, 94.9542, 94.7880
    3d   AP:98.8598, 94.9019, 94.6973
    aos  AP:98.91, 94.88, 92.58
    Pedestrian [email protected], 0.50, 0.50:
    bbox AP:73.0723, 69.4016, 66.8569
    bev  AP:64.2310, 59.8402, 55.7597
    3d   AP:61.7407, 56.2790, 52.5120
    aos  AP:69.04, 65.17, 62.02
    Pedestrian [email protected], 0.50, 0.50:
    bbox AP:73.6869, 69.9239, 67.0737
    bev  AP:64.4088, 59.1779, 54.8622
    3d   AP:61.0537, 55.5749, 51.0978
    aos  AP:69.22, 65.19, 61.73
    Pedestrian [email protected], 0.25, 0.25:
    bbox AP:73.0723, 69.4016, 66.8569
    bev  AP:76.8913, 72.4136, 70.3777
    3d   AP:76.8213, 72.3334, 70.1651
    aos  AP:69.04, 65.17, 62.02
    Pedestrian [email protected], 0.25, 0.25:
    bbox AP:73.6869, 69.9239, 67.0737
    bev  AP:77.9570, 73.8559, 71.3994
    3d   AP:77.8785, 73.7270, 70.7240
    aos  AP:69.22, 65.19, 61.73
    Cyclist [email protected], 0.50, 0.50:
    bbox AP:93.4912, 77.7671, 76.2917
    bev  AP:90.9435, 73.6842, 71.2105
    3d   AP:85.0440, 71.7085, 68.0511
    aos  AP:93.30, 77.56, 76.00
    Cyclist [email protected], 0.50, 0.50:
    bbox AP:95.4022, 80.6753, 77.1761
    bev  AP:92.5957, 75.3982, 71.3149
    3d   AP:89.0081, 71.8798, 67.9090
    aos  AP:95.24, 80.43, 76.85
    Cyclist [email protected], 0.25, 0.25:
    bbox AP:93.4912, 77.7671, 76.2917
    bev  AP:91.9032, 77.3141, 73.3950
    3d   AP:91.9032, 77.3141, 73.3950
    aos  AP:93.30, 77.56, 76.00
    Cyclist [email protected], 0.25, 0.25:
    bbox AP:95.4022, 80.6753, 77.1761
    bev  AP:93.7097, 78.2383, 73.9397
    3d   AP:93.7097, 78.2383, 73.9455
    aos  AP:95.24, 80.43, 76.85
    
    opened by kts707 4
  • Why need to set find_unused_parameters=True

    Why need to set find_unused_parameters=True

    I find if i didn't set find_unused_parameters=True, Training will report errors:

    RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn’t able to locate the output tensors in the return value of your module’s forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).

    Do you know the cause of this? I didn't find any unused parameter. If I don't want to set find_unused_parameters=True, is there an alternative method?

    opened by Yzichen 3
  • Some questions about train CT3D on three classes on kitti dataset?

    Some questions about train CT3D on three classes on kitti dataset?

    I train CT3D on kitti dataset three classes with 2 2080Ti GPUs,the bachsize is 2 per gpu. But the car mAP on val set is 79.38(11 recall position) and 83.48(40 recall position).If the batchsize will effect the performance? The cfg second_ct3d_3cat.yaml lose num_points and why the num_classes is 1 in Transformer module(line 138 in cfg,and the learning rate and epoch is different from the paper?If the cfg file is correct? I can not run the test.py with the your ckpt. The .pth file becomes zip file,maybe I should change the pyorch version to 1.6.

    opened by czy-0326 3
  • Question about point selection

    Question about point selection

    image

    Thanks for your impressive work. In the roi_head of CT3D, you select points by horizontal limit. Why don't you limit the height of points in the vertical direction?

    opened by Eaphan 2
  • Imcompatible spconv versions

    Imcompatible spconv versions

    This repo doesn't specify the version of spconv. I installed the latest spconv, and got this error when running python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml:

    File "CT3D/pcdet/datasets/processor/data_processor.py", line 51, in transform_points_to_voxels
        voxel_generator = VoxelGenerator(
    TypeError: __init__() missing 1 required positional argument: 'num_point_features'
    

    I find that spconv has deprecated VoxelGenerator in the newest version. Instead, they use spconv.pytorch.utils.PointToVoxel now. The parameters needed for PointToVoxel are different from VoxelGenerator, so it's not easy to refactor all VoxelGenerator. I want to know that what version of spconv do you use to run the code? Thank you.

    opened by Co1lin 2
  • A question about SCORE_THRESH setting of POST_PROCESSING

    A question about SCORE_THRESH setting of POST_PROCESSING

    Thanks for your great work; How do you set SCORE_THRESH when you train the model on all trainsets and submit the prediction results to the KITTI benchmark evaluation? 0.7 or 0.81? By the way, what's the insight for changing the score_thresh? I guess that you did it to remove some false positive predictions.

    opened by Eaphan 2
  • Differences between only car and 3classes training?

    Differences between only car and 3classes training?

    I notice that you only train the 'car' class in the provided config file. Is there any difference between training with all 3 classes and only the 'car' class? Can I use the provided codes for 3 classes training?

    opened by tdzdog 2
  • Question about attention encoder

    Question about attention encoder

    Hi, thanks for your brilliant work first! And I have a question about attention encoder layer. When the number of points in one box is less than 256, some vectors full of zero are used to pad the feature. Are these zero vectors participate the attention calculation when the feature pass the encoder layer? Do they have some influence?

    opened by Tu1016 1
  • bash scripts/dist_train.sh 4 --cfg_file cfgs/kitti_models/second_ct3d.yaml error

    bash scripts/dist_train.sh 4 --cfg_file cfgs/kitti_models/second_ct3d.yaml error

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_9f995ce9/none_pelcvjny/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_9f995ce9/none_pelcvjny/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_9f995ce9/none_pelcvjny/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_9f995ce9/none_pelcvjny/attempt_0/3/error.json

    您好,请问一下执行bash指令train的时候,出现了这个问题是什么原因呢?想请教下怎么解决,谢谢!!!

    opened by pengweiweiwei 1
  • import module error

    import module error

    ImportError: /data/code/Det_Code/3D_Det/CT3D/pcdet/ops/pointnet2/pointnet2_stack/pointnet2_stack_cuda.cpy│ thon-38-x86_64-linux-gnu.so: undefined symbol: _Z25voxel_query_wrapper_stackiiiiifiiiN2at6TensorES0_S0_S0│ S0

    opened by zaiquanyang 1
  • ct3d_head

    ct3d_head

    Hi, I am confused about the code snippet 'cur_points = batch_dict['points'][(batch_dict['points'][:, 0] == bs_idx)][:, 1:5]' at line 149 in ct3d_head.py. What is the function this operation?

    opened by bigbird11 4
Owner
Hualian Sheng
Hualian Sheng
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

null 45 Dec 8, 2022
[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

Yuxin Chen 148 Dec 16, 2022
Official implementation of "Robust channel-wise illumination estimation"

This repository provides the official implementation of "Robust channel-wise illumination estimation." accepted in BMVC (2021).

Firas Laakom 4 Nov 8, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Improving Object Detection by Estimating Bounding Box Quality Accurately

Improving Object Detection by Estimating Bounding Box Quality Accurately Abstrac

null 2 Apr 14, 2022
LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

Improving Object Detection by Estimating Bounding Box Quality Accurately Abstract Object detection aims to locate and classify object instances in ima

IM Lab., POSTECH 0 Sep 28, 2022
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

Katsuya Hyodo 16 Dec 22, 2022
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 5, 2023
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

null 5 Dec 10, 2022
Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

null 3 Jan 26, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Voxel Transformer for 3D object detection

Voxel Transformer This is a reproduced repo of Voxel Transformer for 3D object detection. The code is mainly based on OpenPCDet. Introduction We provi

null 173 Dec 25, 2022
Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

3DETR: An End-to-End Transformer Model for 3D Object Detection PyTorch implementation and models for 3DETR. 3DETR (3D DEtection TRansformer) is a simp

Facebook Research 487 Dec 31, 2022
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 3, 2022
TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

TPH-YOLOv5 This repo is the implementation of "TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured

cv516Buaa 439 Dec 22, 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022