Voxel Transformer for 3D object detection

Last update: Dec 25, 2022

Related tags

Deep Learning VOTR

Overview

Voxel Transformer

This is a reproduced repo of Voxel Transformer for 3D object detection.

The code is mainly based on OpenPCDet.

Introduction

We provide code and training configurations of VoTr-SSD/TSD on the KITTI and Waymo Open dataset. Checkpoints will not be released.

Important Notes: VoTr generally requires quite a long time (more than 60 epochs on Waymo) to converge, and a large GPU memory (32Gb) is needed for reproduction. Please strictly follow the instructions and train with sufficient number of epochs. If you don't have a 32G GPU, you can decrease the attention SIZE parameters in yaml files, but this may possibly harm the performance.

Requirements

The codes are tested in the following environment:

Ubuntu 18.04
Python 3.6
PyTorch 1.5
CUDA 10.1
OpenPCDet v0.3.0
spconv v1.2.1

Installation

a. Clone this repository.

git clone https://github.com/PointsCoder/VOTR.git

b. Install the dependent libraries as follows:

Install the dependent python libraries:

pip install -r requirements.txt

Install the SparseConv library, we use the implementation from [spconv].
- If you use PyTorch 1.1, then make sure you install the spconv v1.0 with (commit 8da6f96) instead of the latest one.
- If you use PyTorch 1.3+, then you need to install the spconv v1.2. As mentioned by the author of spconv, you need to use their docker if you use PyTorch 1.4+.

c. Compile CUDA operators by running the following command:

python setup.py develop

Training

All the models are trained with Tesla V100 GPUs (32G). The KITTI config of votr_ssd is for training with a single GPU. Other configs are for training with 8 GPUs. If you use different number of GPUs for training, it's necessary to change the respective training epochs to attain a decent performance.

The performance of VoTr is quite unstable on KITTI. If you cannnot reproduce the results, remember to run it multiple times.

models

# votr_ssd.yaml: single-stage votr backbone replacing the spconv backbone
# votr_tsd.yaml: two-stage votr with pv-head

training votr_ssd on kitti

CUDA_VISIBLE_DEVICES=0 python train.py --cfg_file cfgs/kitti_models/votr_ssd.yaml

training other models

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_train.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml

testing

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_test.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml --eval_all

Citation

If you find this project useful in your research, please consider cite:

@article{mao2021voxel,
  title={Voxel Transformer for 3D Object Detection},
  author={Mao, Jiageng and Xue, Yujing and Niu, Minzhe and others},
  journal={ICCV},
  year={2021}
}

Comments

got a RuntimeError, need help plz

Hi, i transferred the code to OpenPCDet v0.52.0, but got a RuntimeError. could u help me plz. Error:

Traceback (most recent call last):                                                                                                                                                | 0/1856 [00:00<?, ?it/s]
  File "train.py", line 202, in <module>
    main()
  File "train.py", line 171, in main
    merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch
  File "/home/featurize/OpenPCDet/tools/train_utils/train_utils.py", line 118, in train_model
    dataloader_iter=dataloader_iter
  File "/home/featurize/OpenPCDet/tools/train_utils/train_utils.py", line 52, in train_one_epoch
    loss.backward()
  File "/environment/miniconda3/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/environment/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4611, 64]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

my environment :

ubuntu 20.04
cuda 11.3
python 3.7.10
torch 1.10.0+cu113
spconv-cu113 2.1.21

opened by kellen5l 7

Performance of this reproduced repo

Nice work for reproducing VOTR! Since there are many details missing in the paper, how much is the reproduced version close to the original version? May I ask how is the performance of this reproduced version? Can it achieve the performance in the paper?
Many thanks!

opened by weihaosky 4
RuntimeError. How to run it On Spconv2?

My GPU is RTX30 series. I can't use CUDA10 so that I cant use Spconv1.2.

Although I try to transfer it to OpenPCDet that supports Spconv2 , there still are some question I cant fix it.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2962, 64]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

opened by Raiden-cn 3
The details are unclear to reproduce the results and the reported performances of VoTr are not convincing

After reading the paper twice, I am still confused with the model details, especially the Voxel Transformer Modole (by the way, I am knowledgeable about Transformer)

For example, i) how to connect different VoTr building blocks? ii) why positional encoding like this, (p_i - p_j)W_pos, then add to K_j, V_j, without Q_j? In original Transformer, firstly, the token embedding adds position embedding, then convert to Q, K, V with different linear projections. iii) why is it necessary extract features on empty voxels and in ablation studies, there is no relevant evidence. iiii) the highest score in KITTI 3D object detection benchmark is ~85%, while VoTr achevies 89+% v) ...

Besides, the 'scripts' folder described in the Readme is absent.

opened by auniquesun 2
Error: No such file: 'kitti_dbinfos_train.pkl'

I didn't find this pkl file in your code, how should I get it please？？？

Traceback (most recent call last): File "train.py", line 211, in main() File "train.py", line 123, in main total_epochs=args.epochs File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/init.py", line 48, in build_dataloader logger=logger, File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/kitti/kitti_dataset.py", line 23, in init dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/dataset.py", line 32, in init ) if self.training else None File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/augmentor/data_augmentor.py", line 21, in init cur_augmentor = getattr(self, cur_cfg.NAME)(config=cur_cfg) File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/augmentor/data_augmentor.py", line 29, in gt_sampling logger=self.logger File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/augmentor/database_sampler.py", line 19, in init with open(str(db_info_path), 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/Disk8T/donght/VOTR/data/kitti/kitti_dbinfos_train.pkl'

opened by xibaicai 2
Cuda Out of Memory

Hello,

I am have been using the SECOND method based on the sparse 3D CNN, with around 16 Million parameters in my whole model, I do not get the "Cuda Out of Memory". However, when I replace the sparse 3D CNN in the backbone with your VoTr, although my model number of parameters is around 10 Million, I get the "Cuda Out of Memory" error.

I've also tried to make VoTr even simpler than what it is, but it still gives the "Cuda Out of Memory" error.

I really appreciate it if you help me.

opened by hamidrezafazlali 2
A question about the paper.

In the introduction in the paper,you write that ' the voxel size as (0.05m,0.05m,0.1m) on the KITTI dataset, the maximum receptive field in the last layer is only (3.65m,3.65m,7.3m)', Can you tell me why the receptive field expands the 73 times?

opened by czy-0326 2
Train epochs on the Waymo open dataset

Hi， I notice that in the paper you mention that VoTr-SSD and VoTr-TSD are trained by 60 and 80 epochs respectively on the Waymo open dataset. But the config file provided is totally opposite. So which setting is the correct exactly ?

opened by 1349949 2
Pretrained model

Hi @PointsCoder Could you please provide the pre-trained model that you used for producing metrics in the paper? I used your code to train models. However, the results are slightly worse than your report.

Thank you!

opened by maudzung 2

Error while training

Hey,

I am training the model from scratch on my __ with 12G of memory. I have decreased the batch size, size of attention SIZE parameters ( as suggested by the author) to bare minimum but still keep facing this error.

 File "train.py", line 211, in <module>
   main()
 File "train.py", line 182, in main
   merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch
 File "/automount_home_students/vsandhu/master_project_2/VOTR/tools/train_utils/train_utils.py", line 99, in train_model
   dataloader_iter=dataloader_iter
 File "/automount_home_students/vsandhu/master_project_2/VOTR/tools/train_utils/train_utils.py", line 19, in train_one_epoch
   batch = next(dataloader_iter)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
   data = self._next_data()
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
   return self._process_data(data)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
   data.reraise()
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/_utils.py", line 425, in reraise
   raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
   data = fetcher.fetch(index)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
   data = [self.dataset[idx] for idx in possibly_batched_index]
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
   data = [self.dataset[idx] for idx in possibly_batched_index]
 File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/kitti/kitti_dataset.py", line 433, in __getitem__
   data_dict = self.prepare_data(data_dict=input_dict)
 File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/dataset.py", line 142, in prepare_data
   data_dict=data_dict
 File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/processor/data_processor.py", line 127, in forward
   data_dict = cur_processor(data_dict=data_dict)
 File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/processor/data_processor.py", line 62, in transform_points_to_voxels
   voxel_output = voxel_generator.generate(points)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/spconv/utils/__init__.py", line 173, in generate
   or self._max_voxels, self._full_mean)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/spconv/utils/__init__.py", line 69, in points_to_voxel
   assert block_filtering is False
AssertionError

Thanks in advance for the help

opened by vardeep-sandhu 2

Dilated Attention

I don’t quite understand the implementation of Dilated Attention and the setting of RANGE_SPEC. If I want to get the result of Fig3(2D example) in the paper, how to set the parameters.

opened by Yzichen 2
Wait 30 seconds for next check

I try to train my module votr_ssd with a single GPU and make my batch size as 2, which needs 50 hours. The problem is that the result is false and displays "wait 30 seconds for next check (progress: 3175.0 / 0 minutes): /home/wangtingting/VOTR/output/kitti" after 50 hours training. Is there anyone else have the same problem as me? Hope you could help me!

opened by 192574151 1

Owner

I may not respond to issues quickly. Send me an e-mail if necessary.

GitHub

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

1.4k Dec 30, 2022

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PV-RAFT This repository contains the PyTorch implementation for paper "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clou

43 Dec 5, 2022

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

82 Dec 27, 2022

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

MS-SVConv : 3D Point Cloud Registration with Multi-Scale Architecture and Self-supervised Fine-tuning Compute features for 3D point cloud registration

42 Jul 25, 2022

Python code to fuse multiple RGB-D images into a TSDF voxel volume.

Volumetric TSDF Fusion of RGB-D Images in Python This is a lightweight python script that fuses multiple registered color and depth images into a proj

845 Jan 3, 2023

for taichi voxel-challange event

Taichi Voxel Challenge Figure: result of python3 example6.py. Please replace the image above (demo.jpg) with yours, so that other people can immediate

20 Nov 26, 2022

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

3 Jan 26, 2022

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

MOT-Tracking-by-Detection-Pipeline Tracking-by-Detection形式のMOT(Multi Object Trac

41 Nov 23, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

305 Dec 16, 2022

Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

3DETR: An End-to-End Transformer Model for 3D Object Detection PyTorch implementation and models for 3DETR. 3DETR (3D DEtection TRansformer) is a simp

487 Dec 31, 2022

Improving 3D Object Detection with Channel-wise Transformer

"Improving 3D Object Detection with Channel-wise Transformer" Thanks for the OpenPCDet, this implementation of the CT3D is mainly based on the pcdet v

107 Dec 20, 2022

Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

62 Dec 3, 2022

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

TPH-YOLOv5 This repo is the implementation of "TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured

439 Dec 22, 2022

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

443 Dec 6, 2022

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on the combined output candidates of any 3D and any 2D detector, and is trained to produce more accurate 3D and 2D detection results.

254 Dec 16, 2022

Object Detection and Multi-Object Tracking

1.6k Jan 4, 2023

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

1 Dec 29, 2021