Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

Overview

MonoFlex

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21.

Work in progress.

Installation

This repo is tested with Ubuntu 20.04, python==3.7, pytorch==1.4.0 and cuda==10.1

conda create -n monoflex python=3.7

conda activate monoflex

Install PyTorch and other dependencies:

conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

pip install -r requirements.txt

Build DCNv2 and the project

cd models/backbone/DCNv2

. make.sh

cd ../../..

python setup develop

Data Preparation

Please download KITTI dataset and organize the data as follows:

#ROOT		
  |training/
    |calib/
    |image_2/
    |label/
    |ImageSets/
  |testing/
    |calib/
    |image_2/
    |ImageSets/

Then modify the paths in config/paths_catalog.py according to your data path.

Training & Evaluation

Training with one GPU. (TODO: The multi-GPU training will be further tested.)

CUDA_VISIBLE_DEVICES=0 python tools/plain_train_net.py --batch_size 8 --config runs/monoflex.yaml --output output/exp

The model will be evaluated periodically (can be adjusted in the CONFIG) during training and you can also evaluate a checkpoint with

CUDA_VISIBLE_DEVICES=0 python tools/plain_train_net.py --config runs/monoflex.yaml --ckpt YOUR_CKPT  --eval

You can also specify --vis when evaluation to visualize the predicted heatmap and 3D bounding boxes. The pretrained model for train/val split and logs are here.

Note: we observe an obvious variation of the performance for different runs and we are still investigating possible solutions to stablize the results, though it may inevitably due to the utilized uncertainties.

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{MonoFlex,
    author    = {Zhang, Yunpeng and Lu, Jiwen and Zhou, Jie},
    title     = {Objects Are Different: Flexible Monocular 3D Object Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {3289-3298}
}

Acknowlegment

The code is heavily borrowed from SMOKE and thanks for their contribution.

Comments
  • 测试结果低于论文Table1报告的val结果

    测试结果低于论文Table1报告的val结果

    您好,我在kitti-split1上用单卡进行训练,batch_size=4,在val的3694张图上测试:

    [2021-06-17 06:25:15,960] monoflex.inference INFO: metric = R40
    [2021-06-17 06:25:15,961] monoflex.inference INFO:
    Car [email protected], 0.70, 0.70:
    bbox AP:93.1365, 88.1290, 80.6364
    bev  AP:27.4066, 19.9626, 16.8124
    3d   AP:19.3602, 13.9580, 11.9678
    aos  AP:93.01, 87.77, 79.92
    Car [email protected], 0.50, 0.50:
    bbox AP:93.1365, 88.1290, 80.6364
    bev  AP:61.1731, 45.9274, 40.7388
    3d   AP:55.3238, 41.9672, 35.9666
    aos  AP:93.01, 87.77, 79.92
    Pedestrian [email protected], 0.50, 0.50:
    bbox AP:52.8775, 44.4330, 37.8220
    bev  AP:6.5744, 5.0258, 4.0656
    3d   AP:5.9074, 4.2190, 3.2233
    aos  AP:46.37, 38.43, 32.49
    Pedestrian [email protected], 0.25, 0.25:
    bbox AP:52.8775, 44.4330, 37.8220
    bev  AP:21.2798, 17.0815, 13.9071
    3d   AP:20.5831, 16.5574, 13.3822
    aos  AP:46.37, 38.43, 32.49
    Cyclist [email protected], 0.50, 0.50:
    bbox AP:55.2721, 36.1850, 33.6507
    bev  AP:1.5678, 0.8645, 0.7968
    3d   AP:1.2153, 0.3637, 0.3799
    aos  AP:47.42, 30.75, 28.57
    Cyclist [email protected], 0.25, 0.25:
    bbox AP:55.2721, 36.1850, 33.6507
    bev  AP:11.4607, 5.7936, 5.4553
    3d   AP:10.6146, 5.3862, 4.8811
    aos  AP:47.42, 30.75, 28.57
    
    [2021-06-17 06:25:16,132] monoflex.trainer INFO: Total training time: 21:24:51.903272 (0.8307 s / it), best model is achieved at iteration = 91872
    

    低于论文中Table1的val ap3d_r40,:23.64 17.51 14.83,相差4个点左右。请问是我哪里设置的不对吗?谢谢

    opened by DuZzzs 14
  • Use multi-gpu to train my dataset,but got the error

    Use multi-gpu to train my dataset,but got the error" ry = i_temp.repeat(N, 1).view(N, -1, 3) RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1, 3] because the unspecified dimension size -1 can be any value and is ambiguous"

    What can I do for it? I use 4 gpus to train data with arg"--num_gpus=4",and it can train and val,but one hours later,error occured: `Traceback (most recent call last): File "tools/plain_train_net.py", line 161, in args=(args,), File "/zft/code/MonoFlex/engine/launch.py", line 54, in launch daemon=False, File "/root/anaconda3/envs/monoflex/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/root/anaconda3/envs/monoflex/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/root/anaconda3/envs/monoflex/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

    -- Process 1 terminated with the following error: Traceback (most recent call last): File "/root/anaconda3/envs/monoflex/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/zft/code/MonoFlex/engine/launch.py", line 89, in _distributed_worker main_func(*args) File "/zft/code/MonoFlex/tools/plain_train_net.py", line 140, in main train(cfg, model, device, distributed) File "/zft/code/MonoFlex/tools/plain_train_net.py", line 84, in train arguments, File "/zft/code/MonoFlex/engine/trainer.py", line 109, in do_train loss_dict, log_loss_dict = model(images, targets) File "/root/anaconda3/envs/monoflex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/monoflex/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], **kwargs[0]) File "/root/anaconda3/envs/monoflex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/zft/code/MonoFlex/model/detector.py", line 34, in forward loss_dict, log_loss_dict = self.heads(features, targets) File "/root/anaconda3/envs/monoflex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/zft/code/MonoFlex/model/head/detector_head.py", line 21, in forward loss_dict, log_loss_dict = self.loss_evaluator(x, targets) File "/zft/code/MonoFlex/model/head/detector_loss.py", line 271, in call pred_targets, preds, reg_nums, weights = self.prepare_predictions(targets_variables, predictions) File "/zft/code/MonoFlex/model/head/detector_loss.py", line 153, in prepare_predictions target_corners_3D = self.anno_encoder.encode_box3d(target_rotys_3D, target_dimensions_3D, target_locations_3D) File "/zft/code/MonoFlex/model/anno_encoder.py", line 108, in encode_box3d ry = self.rad_to_matrix(rotys, N) File "/zft/code/MonoFlex/model/anno_encoder.py", line 60, in rad_to_matrix ry = i_temp.repeat(N, 1).view(N, -1, 3) RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1, 3] because the unspecified dimension size -1 can be any value and is ambiguous`

    opened by Mandylove1993 2
  • "clip_grad_norm_" is necessary during training?

    Hello , I see you use gradient clipping in training code, I wonder if it's necessary ? and the gradient will be vanishing if without gradient clipping ?

    opened by utc1205 0
  • Instance level

    Instance level

    Hello, when reading the tag data, if I change the filter conditions of the instance level, does it mean that the distance of the model I trained focus is different, and will the level filtering hyperparameter affect the evaluation process?

    opened by shanqiu24 0
  • numba.cuda.cudadrv.error.NvvmSupportError: No supported GPU compute capabilities found. Please check your cudatoolkit version matches your CUDA version.

    numba.cuda.cudadrv.error.NvvmSupportError: No supported GPU compute capabilities found. Please check your cudatoolkit version matches your CUDA version.

    sys: ubuntu 20.04 GPU: GF 1080 driver: NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 cuda: Cuda compilation tools, release 10.1, V10.1.243 python: 3.7.13 haa1d7c7_1 torch: torch 1.7.0+cu110 pypi_0 pypi torchaudio 0.4.0 py37 pytorch torchvision 0.8.1+cu110 pypi_0 pypi cudatoolkit cudatoolkit 10.1.243 h6bb024c_0

    opened by biubiu3721 0
  • The question about decode depth form keypoints

    The question about decode depth form keypoints

    https://github.com/zhangyp15/MonoFlex/blob/ec6da017c325451b7d997d89e323083fa8430ada/model/anno_encoder.py#L178

    why use f_u to compute the decode the depth instead of f_v here?

    The obj height should correspond to the y-axis in the camera coordinate

    opened by lurenlym 0
  • How to Reproduce  AP score on kitti test dataset ?

    How to Reproduce AP score on kitti test dataset ?

    I have run the official MonoFlex pretrained model and code on kitti test dataset , got below score: image

    There is a big gap between above score and MonoFlex's official score. So, how can i reproduce the score on kitti test leadboard ? How to generate predictions on kitti test dataset?

    opened by techshoww 0
Owner
Yunpeng
Yunpeng
[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search The official implementation of the paper LightTra

Multimedia Research 290 Dec 24, 2022
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

StableNet StableNet is a deep stable learning method for out-of-distribution generalization. This is the official repo for CVPR21 paper "Deep Stable L

null 120 Dec 28, 2022
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Unified Multi-modal Transformers This repository maintains the official implementation of the paper UMT: Unified Multi-modal Transformers for Joint Vi

Applied Research Center (ARC), Tencent PCG 84 Jan 4, 2023
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 76 Jan 2, 2023
This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

A three-stage detection and recognition pipeline of complex meters in wild This is the first released system towards detection and recognition of comp

Yan Shu 19 Nov 28, 2022
Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

Yuxuan Liu 305 Dec 19, 2022
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

Toronto Robotics and AI Laboratory 289 Jan 5, 2023
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Delving into Localization Errors for Monocular 3D Detection By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang. Intr

XINZHU.MA 124 Jan 4, 2023
[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 96 Dec 10, 2022
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

null 58 Nov 6, 2022
ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Zongdai 107 Dec 20, 2022
Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

Nguyen Mau Dzung 271 Nov 29, 2022
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

null 144 Dec 24, 2022
NP DRAW paper released code

NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation This repo contains the official implementation for the NP-DRAW paper.

ZENG Xiaohui 22 Mar 13, 2022
Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

null 1.3k Jan 4, 2023
[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

CaaM This repo contains the codes of training our CaaM on NICO/ImageNet9 dataset. Due to my recent limited bandwidth, this codebase is still messy, wh

Wang Tan 66 Dec 31, 2022
This repository contains code released by Google Research.

This repository contains code released by Google Research.

Google Research 26.6k Dec 31, 2022