Voxel Transformer for 3D object detection

Related tags

Deep Learning VOTR
Overview

Voxel Transformer

This is a reproduced repo of Voxel Transformer for 3D object detection.

The code is mainly based on OpenPCDet.

Introduction

We provide code and training configurations of VoTr-SSD/TSD on the KITTI and Waymo Open dataset. Checkpoints will not be released.

Important Notes: VoTr generally requires quite a long time (more than 60 epochs on Waymo) to converge, and a large GPU memory (32Gb) is needed for reproduction. Please strictly follow the instructions and train with sufficient number of epochs. If you don't have a 32G GPU, you can decrease the attention SIZE parameters in yaml files, but this may possibly harm the performance.

Requirements

The codes are tested in the following environment:

  • Ubuntu 18.04
  • Python 3.6
  • PyTorch 1.5
  • CUDA 10.1
  • OpenPCDet v0.3.0
  • spconv v1.2.1

Installation

a. Clone this repository.

git clone https://github.com/PointsCoder/VOTR.git

b. Install the dependent libraries as follows:

  • Install the dependent python libraries:
pip install -r requirements.txt 
  • Install the SparseConv library, we use the implementation from [spconv].
    • If you use PyTorch 1.1, then make sure you install the spconv v1.0 with (commit 8da6f96) instead of the latest one.
    • If you use PyTorch 1.3+, then you need to install the spconv v1.2. As mentioned by the author of spconv, you need to use their docker if you use PyTorch 1.4+.

c. Compile CUDA operators by running the following command:

python setup.py develop

Training

All the models are trained with Tesla V100 GPUs (32G). The KITTI config of votr_ssd is for training with a single GPU. Other configs are for training with 8 GPUs. If you use different number of GPUs for training, it's necessary to change the respective training epochs to attain a decent performance.

The performance of VoTr is quite unstable on KITTI. If you cannnot reproduce the results, remember to run it multiple times.

  • models
# votr_ssd.yaml: single-stage votr backbone replacing the spconv backbone
# votr_tsd.yaml: two-stage votr with pv-head
  • training votr_ssd on kitti
CUDA_VISIBLE_DEVICES=0 python train.py --cfg_file cfgs/kitti_models/votr_ssd.yaml
  • training other models
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_train.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml
  • testing
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_test.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml --eval_all

Citation

If you find this project useful in your research, please consider cite:

@article{mao2021voxel,
  title={Voxel Transformer for 3D Object Detection},
  author={Mao, Jiageng and Xue, Yujing and Niu, Minzhe and others},
  journal={ICCV},
  year={2021}
}
Comments
  • got a RuntimeError, need help plz

    got a RuntimeError, need help plz

    Hi, i transferred the code to OpenPCDet v0.52.0, but got a RuntimeError. could u help me plz. Error:

    Traceback (most recent call last):                                                                                                                                                | 0/1856 [00:00<?, ?it/s]
      File "train.py", line 202, in <module>
        main()
      File "train.py", line 171, in main
        merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch
      File "/home/featurize/OpenPCDet/tools/train_utils/train_utils.py", line 118, in train_model
        dataloader_iter=dataloader_iter
      File "/home/featurize/OpenPCDet/tools/train_utils/train_utils.py", line 52, in train_one_epoch
        loss.backward()
      File "/environment/miniconda3/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/environment/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
        allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4611, 64]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
    

    my environment :

    ubuntu 20.04
    cuda 11.3
    python 3.7.10
    torch 1.10.0+cu113
    spconv-cu113 2.1.21
    
    opened by kellen5l 7
  • Performance of this reproduced repo

    Performance of this reproduced repo

    Nice work for reproducing VOTR! Since there are many details missing in the paper, how much is the reproduced version close to the original version? May I ask how is the performance of this reproduced version? Can it achieve the performance in the paper?
    Many thanks!

    opened by weihaosky 4
  • RuntimeError. How to run it On Spconv2?

    RuntimeError. How to run it On Spconv2?

    My GPU is RTX30 series. I can't use CUDA10 so that I cant use Spconv1.2.

    Although I try to transfer it to OpenPCDet that supports Spconv2 , there still are some question I cant fix it.

    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2962, 64]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

    opened by Raiden-cn 3
  • The details are unclear to reproduce the results and the reported performances of VoTr are not convincing

    The details are unclear to reproduce the results and the reported performances of VoTr are not convincing

    After reading the paper twice, I am still confused with the model details, especially the Voxel Transformer Modole (by the way, I am knowledgeable about Transformer)

    For example, i) how to connect different VoTr building blocks? ii) why positional encoding like this, (p_i - p_j)W_pos, then add to K_j, V_j, without Q_j? In original Transformer, firstly, the token embedding adds position embedding, then convert to Q, K, V with different linear projections. iii) why is it necessary extract features on empty voxels and in ablation studies, there is no relevant evidence. iiii) the highest score in KITTI 3D object detection benchmark is ~85%, while VoTr achevies 89+% v) ...

    Besides, the 'scripts' folder described in the Readme is absent.

    opened by auniquesun 2
  • Error: No such file: 'kitti_dbinfos_train.pkl'

    Error: No such file: 'kitti_dbinfos_train.pkl'

    I didn't find this pkl file in your code, how should I get it please???

    Traceback (most recent call last): File "train.py", line 211, in main() File "train.py", line 123, in main total_epochs=args.epochs File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/init.py", line 48, in build_dataloader logger=logger, File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/kitti/kitti_dataset.py", line 23, in init dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/dataset.py", line 32, in init ) if self.training else None File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/augmentor/data_augmentor.py", line 21, in init cur_augmentor = getattr(self, cur_cfg.NAME)(config=cur_cfg) File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/augmentor/data_augmentor.py", line 29, in gt_sampling logger=self.logger File "/mnt/Disk8T/donght/VOTR/pcdet/datasets/augmentor/database_sampler.py", line 19, in init with open(str(db_info_path), 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/Disk8T/donght/VOTR/data/kitti/kitti_dbinfos_train.pkl'

    opened by xibaicai 2
  • Cuda Out of Memory

    Cuda Out of Memory

    Hello,

    I am have been using the SECOND method based on the sparse 3D CNN, with around 16 Million parameters in my whole model, I do not get the "Cuda Out of Memory". However, when I replace the sparse 3D CNN in the backbone with your VoTr, although my model number of parameters is around 10 Million, I get the "Cuda Out of Memory" error.

    I've also tried to make VoTr even simpler than what it is, but it still gives the "Cuda Out of Memory" error.

    I really appreciate it if you help me.

    opened by hamidrezafazlali 2
  • A question about the paper.

    A question about the paper.

    In the introduction in the paper,you write that ' the voxel size as (0.05m,0.05m,0.1m) on the KITTI dataset, the maximum receptive field in the last layer is only (3.65m,3.65m,7.3m)', Can you tell me why the receptive field expands the 73 times?

    opened by czy-0326 2
  • Train epochs on the Waymo open dataset

    Train epochs on the Waymo open dataset

    Hi, I notice that in the paper you mention that VoTr-SSD and VoTr-TSD are trained by 60 and 80 epochs respectively on the Waymo open dataset. But the config file provided is totally opposite. So which setting is the correct exactly ?

    opened by 1349949 2
  • Pretrained model

    Pretrained model

    Hi @PointsCoder Could you please provide the pre-trained model that you used for producing metrics in the paper? I used your code to train models. However, the results are slightly worse than your report.

    Thank you!

    opened by maudzung 2
  • Error while training

    Error while training

    Hey,

    I am training the model from scratch on my __ with 12G of memory. I have decreased the batch size, size of attention SIZE parameters ( as suggested by the author) to bare minimum but still keep facing this error.

     File "train.py", line 211, in <module>
       main()
     File "train.py", line 182, in main
       merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch
     File "/automount_home_students/vsandhu/master_project_2/VOTR/tools/train_utils/train_utils.py", line 99, in train_model
       dataloader_iter=dataloader_iter
     File "/automount_home_students/vsandhu/master_project_2/VOTR/tools/train_utils/train_utils.py", line 19, in train_one_epoch
       batch = next(dataloader_iter)
     File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
       data = self._next_data()
     File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
       return self._process_data(data)
     File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
       data.reraise()
     File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/_utils.py", line 425, in reraise
       raise self.exc_type(msg)
    AssertionError: Caught AssertionError in DataLoader worker process 0.
    Original Traceback (most recent call last):
     File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
       data = fetcher.fetch(index)
     File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
       data = [self.dataset[idx] for idx in possibly_batched_index]
     File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
       data = [self.dataset[idx] for idx in possibly_batched_index]
     File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/kitti/kitti_dataset.py", line 433, in __getitem__
       data_dict = self.prepare_data(data_dict=input_dict)
     File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/dataset.py", line 142, in prepare_data
       data_dict=data_dict
     File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/processor/data_processor.py", line 127, in forward
       data_dict = cur_processor(data_dict=data_dict)
     File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/processor/data_processor.py", line 62, in transform_points_to_voxels
       voxel_output = voxel_generator.generate(points)
     File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/spconv/utils/__init__.py", line 173, in generate
       or self._max_voxels, self._full_mean)
     File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/spconv/utils/__init__.py", line 69, in points_to_voxel
       assert block_filtering is False
    AssertionError
    

    Thanks in advance for the help

    opened by vardeep-sandhu 2
  • Dilated Attention

    Dilated Attention

    I don’t quite understand the implementation of Dilated Attention and the setting of RANGE_SPEC. If I want to get the result of Fig3(2D example) in the paper, how to set the parameters.

    opened by Yzichen 2
  • Wait 30 seconds for next check

    Wait 30 seconds for next check

    I try to train my module votr_ssd with a single GPU and make my batch size as 2, which needs 50 hours. The problem is that the result is false and displays "wait 30 seconds for next check (progress: 3175.0 / 0 minutes): /home/wangtingting/VOTR/output/kitti" after 50 hours training. Is there anyone else have the same problem as me? Hope you could help me!

    opened by 192574151 1
Owner
I may not respond to issues quickly. Send me an e-mail if necessary.
null
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PV-RAFT This repository contains the PyTorch implementation for paper "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clou

Yi Wei 43 Dec 5, 2022
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

HU Zeyu 82 Dec 27, 2022
Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

MS-SVConv : 3D Point Cloud Registration with Multi-Scale Architecture and Self-supervised Fine-tuning Compute features for 3D point cloud registration

null 42 Jul 25, 2022
Python code to fuse multiple RGB-D images into a TSDF voxel volume.

Volumetric TSDF Fusion of RGB-D Images in Python This is a lightweight python script that fuses multiple registered color and depth images into a proj

Andy Zeng 845 Jan 3, 2023
for taichi voxel-challange event

Taichi Voxel Challenge Figure: result of python3 example6.py. Please replace the image above (demo.jpg) with yours, so that other people can immediate

Liming Xu 20 Nov 26, 2022
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

null 5 Dec 10, 2022
Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

null 3 Jan 26, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

null 305 Dec 16, 2022
Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

3DETR: An End-to-End Transformer Model for 3D Object Detection PyTorch implementation and models for 3DETR. 3DETR (3D DEtection TRansformer) is a simp

Facebook Research 487 Dec 31, 2022
Improving 3D Object Detection with Channel-wise Transformer

"Improving 3D Object Detection with Channel-wise Transformer" Thanks for the OpenPCDet, this implementation of the CT3D is mainly based on the pcdet v

Hualian Sheng 107 Dec 20, 2022
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 3, 2022
TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

TPH-YOLOv5 This repo is the implementation of "TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured

cv516Buaa 439 Dec 22, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on the combined output candidates of any 3D and any 2D detector, and is trained to produce more accurate 3D and 2D detection results.

Su Pang 254 Dec 16, 2022
Object Detection and Multi-Object Tracking

Object Detection and Multi-Object Tracking

Bobby Chen 1.6k Jan 4, 2023
Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

Max 1 Dec 29, 2021