Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Overview

DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?"

Install // Datasets // Experiments // Models // License // Reference

Full video

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Installation

We recommend using docker (see nvidia-docker2 instructions) to have a reproducible environment. To setup your environment, type in a terminal (only tested in Ubuntu 18.04):

git clone https://github.com/TRI-ML/dd3d.git
cd dd3d
# If you want to use docker (recommended)
make docker-build # CUDA 10.2
# Alternative docker image for cuda 11.1
# make docker-build DOCKERFILE=Dockerfile-cu111

Please check the version of your nvidia driver and cuda compatibility to determine which Dockerfile to use.

We will list below all commands as if run directly inside our container. To run any of the commands in a container, you can either start the container in interactive mode with make docker-dev to land in a shell where you can type those commands, or you can do it in one step:

# single GPU
make docker-run COMMAND="<some-command>"
# multi GPU
make docker-run-mpi COMMAND="<some-command>"

If you want to use features related to AWS (for caching the output directory) and Weights & Biases (for experiment management/visualization), then you should create associated accounts and configure your shell with the following environment variables before building the docker image:

export AWS_SECRET_ACCESS_KEY="<something>"
export AWS_ACCESS_KEY_ID="<something>"
export AWS_DEFAULT_REGION="<something>"
export WANDB_ENTITY="<something>"
export WANDB_API_KEY="<something>"

You should also enable these features in configuration, such as WANDB.ENABLED and SYNC_OUTPUT_DIR_S3.ENABLED.

Datasets

By default, datasets are assumed to be downloaded in /data/datasets/<dataset-name> (can be a symbolic link). The dataset root is configurable by DATASET_ROOT.

KITTI

The KITTI 3D dataset used in our experiments can be downloaded from the KITTI website. For convenience, we provide the standard splits used in 3DOP for training and evaluation:

# download a standard splits subset of KITTI
curl -s https://tri-ml-public.s3.amazonaws.com/github/dd3d/mv3d_kitti_splits.tar | sudo tar xv -C /data/datasets/KITTI3D

The dataset must be organized as follows:

<DATASET_ROOT>
    └── KITTI3D
        ├── mv3d_kitti_splits
        │   ├── test.txt
        │   ├── train.txt
        │   ├── trainval.txt
        │   └── val.txt
        ├── testing
        │   ├── calib
        |   │   ├── 000000.txt
        |   │   ├── 000001.txt
        |   │   └── ...
        │   └── image_2
        │       ├── 000000.png
        │       ├── 000001.png
        │       └── ...
        └── training
            ├── calib
            │   ├── 000000.txt
            │   ├── 000001.txt
            │   └── ...
            ├── image_2
            │   ├── 000000.png
            │   ├── 000001.png
            │   └── ...
            └── label_2
                ├── 000000.txt
                ├── 000001.txt
                └── ..

nuScenes

The nuScenes dataset (v1.0) can be downloaded from the nuScenes website. The dataset must be organized as follows:

<DATASET_ROOT>
    └── nuScenes
        ├── samples
        │   ├── CAM_FRONT
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243012465.jpg
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243512465.jpg
        │   │   ├── ...
        │   │  
        │   ├── CAM_FRONT_LEFT
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243004917.jpg
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243504917.jpg
        │   │   ├── ...
        │   │  
        │   ├── ...
        │  
        ├── v1.0-trainval
        │   ├── attribute.json
        │   ├── calibrated_sensor.json
        │   ├── category.json
        │   ├── ...
        │  
        ├── v1.0-test
        │   ├── attribute.json
        │   ├── calibrated_sensor.json
        │   ├── category.json
        │   ├── ...
        │  
        ├── v1.0-mini
        │   ├── attribute.json
        │   ├── calibrated_sensor.json
        │   ├── category.json
        │   ├── ...

Pre-trained DD3D models

The DD3D models pre-trained on dense depth estimation using DDAD15M can be downloaded here:

backbone download
DLA34 model
V2-99 model

(Optional) Eigen-clean subset of KITTI raw.

To train our Pseudo-Lidar detector, we curated a new subset of KITTI (raw) dataset and use it to fine-tune its depth network. This subset can be downloaded here. Each row contains left and right image pairs. The KITTI raw dataset can be download here.

Validating installation

To validate and visualize the dataloader (including data augmentation), run the following:

./scripts/visualize_dataloader.py +experiments=dd3d_kitti_dla34 SOLVER.IMS_PER_BATCH=4

To validate the entire training loop (including evaluation and visualization), run the overfit experiment (trained on test set):

./scripts/train.py +experiments=dd3d_kitti_dla34_overfit
experiment backbone train mem. (GB) train time (hr) train log Box AP (%) BEV AP (%) download
config DLA-34 6 0.25 log 84.54 88.83 model

Experiments

Configuration

We use hydra to configure experiments, specifically following this pattern to organize and compose configurations. The experiments under configs/experiments describe the delta from the default configuration, and can be run as follows:

# omit the '.yaml' extension from the experiment file.
./scripts/train.py +experiments=<experiment-file> <config-override>

The configuration is modularized by various components such as datasets, backbones, evaluators, and visualizers, etc.

Using multiple GPUs

The training script supports (single-node) multi-GPU for training and evaluation via mpirun. This is most conveniently executed by the make docker-run-mpi command (see above). Internally, IMS_PER_BATCH parameters of the optimizer and the evaluator denote the total size of batch that is sharded across available GPUs while training or evaluating. They are required to be set as a multuple of available GPUs.

Evaluation

One can run only evaluation using the pretrained models:

./scripts/train.py +experiments=<some-experiment> EVAL_ONLY=True MODEL.CKPT=<path-to-pretrained-model>
# use smaller batch size for single-gpu
./scripts/train.py +experiments=<some-experiment> EVAL_ONLY=True MODEL.CKPT=<path-to-pretrained-model> TEST.IMS_PER_BATCH=4

Gradient accumulation

If you have insufficient GPU memory for any experiment, you can use gradient accumulation by configuring ACCUMULATE_GRAD_BATCHES, at the cost of longer training time. For instance, if the experiment requires at least 400 of GPU memory (e.g. V2-99, KITTI) and you have only 128 (e.g., 8 x 16G GPUs), then you can update parameters at every 4th step:

# The original batch size is 64.
./scripts/train.py +experiments=dd3d_kitti_v99 SOLVER.IMS_PER_BATCH=16 SOLVER.ACCUMULATE_GRAD_BATCHES=4

Models

All experiments here use 8 A100 40G GPUs, and use gradient accumulation when more GPU memory is needed. We subsample nuScenes validation set by a factor of 8 (2Hz ⟶ 0.25Hz) to save training time.

KITTI

experiment backbone train mem. (GB) train time (hr) train log Box AP (%) BEV AP (%) download
config DLA-34 256 4.5 log 16.92 24.77 model
config V2-99 400 9.0 log 23.90 32.01 model

nuScenes

experiment backbone train mem. (GB) train time (hr) train log mAP (%) NDS download
config DLA-34 TBD TBD TBD) TBD TBD TBD
config V2-99 TBD TBD TBD TBD TBD TBD

License

The source code is released under the MIT license. We note that some code in this repository is adapted from the following repositories:

Reference

@inproceedings{park2021dd3d,
  author = {Dennis Park and Rares Ambrus and Vitor Guizilini and Jie Li and Adrien Gaidon},
  title = {Is Pseudo-Lidar needed for Monocular 3D Object detection?},
  booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)},
  primaryClass = {cs.CV},
  year = {2021},
}
Comments
  • Question about the releasing time

    Question about the releasing time

    Hello, @dennis-park-TRI ! Thanks for your excellent work DD3D, which shows great results in benchmark. I am wondering when will you open-source your code.

    opened by gujiaqivadin 5
  • Classes Pedestrian and Cyclist much lower than paper

    Classes Pedestrian and Cyclist much lower than paper

    I uesd released weight on GitHub, KITTI DLA-34 and KITTI V2-99. And I tried to use both weight to do evaluation on validation set. I could get similer results on Car class, but on other classes are all limited to zero. I known the TTA and different evaluation sets are cause variance. But the variance should not be so much. Are there any missing settings that are causing this issue? Below table is my evaluate results:

    | Car AP [email protected] || Paper | KITTI Submit | KITTI DLA | KITTI V2 | |-------|:-----:|:------:|:-------:|:-----:|:------:| | BEV AP | Easy | 30.98 | 32.35 | 31.65 | 40.70 | ||Med | 22.56 | 23.41 | 24.43 | 32.04 | ||Hard | 20.03 | 20.42 | 21.72 | 28.54 | | 3D AP | Easy | 23.22 | 23.19 | 22.56 | 30.38 | ||Med | 16.34 | 16.87 | 16.98 | 23.73 | ||Hard | 14.20 | 14.36 | 14.93 | 20.88 |

    | Pedestrian AP [email protected] || Paper | KITTI Submit | KITTI DLA | KITTI V2 | |-------|:-----:|:------:|:-------:|:-----:|:------:| | BEV AP | Easy | 15.90 | 18.58 | 0.04 | 0.538 | ||Med | 10.85 | 12.51 | 0.03 | 0.056 | ||Hard | 8.05 | 10.65 | 0.01 | 0.028 | | 3D AP | Easy | 13.91 | 16.64 | 0.007 | 0.017 | ||Med | 9.30 | 11.04 | 0.008 | 0.018 | ||Hard | 8.05 | 9.38 | 0.009 | 0.019 |

    | Cyclist AP [email protected] || Paper | KITTI Submit | KITTI DLA | KITTI V2 | |-------|:-----:|:------:|:-------:|:-----:|:------:| | BEV AP | Easy | 3.20 | 9.20 | 0.44 | 0.447 | ||Med | 1.99 | 5.69 | 0.27 | 0.229 | ||Hard | 1.79 | 5.20 | 0.28 | 0.258 | | 3D AP | Easy | 2.39 | 7.52 | 0.13 | 0.126 | ||Med | 1.52 | 4.79 | 0.12 | 0.123 | ||Hard | 1.31 | 4.22 | 0.11 | 0.111 |

    And I have another question about published results, why it has differece between arxiv paper and kitti submit.

    Thank you so much.

    opened by Chen-Bo-Yang 2
  • 'ValueError: cannot reshape array of size 14 into shape (4)' when running scripts

    'ValueError: cannot reshape array of size 14 into shape (4)' when running scripts

    Hi,

    Whenever I run evaluation on sample of the Kitti dataset. I get this error. I also get the same error when running the following script: ./scripts/visualize_dataloader.py +experiments=dd3d_kitti_dla34 SOLVER.IMS_PER_BATCH=4 Here are my terminal logs from running the above command:

    No protocol specified
    No protocol specified
    No protocol specified
    /usr/local/lib/python3.8/dist-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'visualize_dataloader': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
      warnings.warn(msg, UserWarning)
    [11/02 13:22:33 tridet.utils.s3]: Downloading initial weights:
    [11/02 13:22:33 tridet.utils.s3]:   src: https://tri-ml-public.s3.amazonaws.com/github/dd3d/pretrained/depth_pretrained_dla34-y1urdmir-20210422_165446-model_final-remapped.pth
    [11/02 13:22:33 tridet.utils.s3]:   dst: /tmp/tmpwxys0idg.pth
    835it [00:00, 25902.64it/s]
    [11/02 13:28:08 tridet.utils.hydra.callbacks]: Rank of current process: 0. World size: 1
    [11/02 13:28:08 tridet.utils.setup]: Working Directory: /workspace/dd3d/outputs/2021-11-02/13-22-33
    [11/02 13:28:08 tridet.utils.setup]: Full config:
    {
      "WANDB": {
        "ENABLED": false,
        "DRYRUN": false,
        "PROJECT": "dd3d",
        "GROUP": null,
        "TAGS": [
          "kitti-val",
          "dla34",
          "bn"
        ]
      },
      "EVAL_ONLY": false,
      "EVAL_ON_START": false,
      "ONLY_REGISTER_DATASETS": false,
      "OUTPUT_ROOT": "./outputs",
      "SYNC_OUTPUT_DIR_S3": {
        "ENABLED": false,
        "ROOT_IN_S3": "???",
        "PERIOD": 1000
      },
      "DATASET_ROOT": "/data/datasets/",
      "TMP_DIR": "/tmp/",
      "DATASETS": {
        "TRAIN": {
          "NAME": "kitti_3d_train",
          "CANONICAL_BOX3D_SIZES": [
            [
              1.61876949,
              3.89154523,
              1.52969237
            ],
            [
              0.62806586,
              0.82038497,
              1.76784787
            ],
            [
              0.56898187,
              1.77149234,
              1.7237099
            ],
            [
              1.9134491,
              5.15499603,
              2.18998422
            ],
            [
              2.61168401,
              9.22692319,
              3.36492722
            ],
            [
              0.5390196,
              1.08098042,
              1.28392158
            ],
            [
              2.36044838,
              15.56991038,
              3.5289238
            ],
            [
              1.24489164,
              2.51495357,
              1.61402478
            ]
          ],
          "DATASET_MAPPER": "default",
          "NUM_CLASSES": 5,
          "MEAN_DEPTH_PER_LEVEL": [
            32.594,
            15.178,
            8.424,
            5.004,
            4.662
          ],
          "STD_DEPTH_PER_LEVEL": [
            14.682,
            7.139,
            4.345,
            2.399,
            2.587
          ]
        },
        "TEST": {
          "NAME": "kitti_3d_val",
          "NUSC_SAMPLE_AGGREGATE_IN_INFERENCE": false,
          "DATASET_MAPPER": "default"
        }
      },
      "FE": {
        "FPN": {
          "IN_FEATURES": [
            "level3",
            "level4",
            "level5"
          ],
          "OUT_FEATURES": null,
          "OUT_CHANNELS": 256,
          "NORM": "FrozenBN",
          "FUSE_TYPE": "sum"
        },
        "BUILDER": "build_fcos_dla_fpn_backbone_p67",
        "BACKBONE": {
          "NAME": "DLA-34",
          "OUT_FEATURES": [
            "level3",
            "level4",
            "level5"
          ],
          "NORM": "FrozenBN"
        },
        "OUT_FEATURES": null
      },
      "DD3D": {
        "IN_FEATURES": null,
        "NUM_CLASSES": 5,
        "FEATURE_LOCATIONS_OFFSET": "none",
        "SIZES_OF_INTEREST": [
          64,
          128,
          256,
          512
        ],
        "INFERENCE": {
          "DO_NMS": true,
          "DO_POSTPROCESS": true,
          "DO_BEV_NMS": false,
          "BEV_NMS_IOU_THRESH": 0.3,
          "NUSC_SAMPLE_AGGREGATE": false
        },
        "FCOS2D": {
          "_VERSION": "v2",
          "NORM": "BN",
          "NUM_CLS_CONVS": 4,
          "NUM_BOX_CONVS": 4,
          "USE_DEFORMABLE": false,
          "USE_SCALE": true,
          "BOX2D_SCALE_INIT_FACTOR": 1.0,
          "LOSS": {
            "ALPHA": 0.25,
            "GAMMA": 2.0,
            "LOC_LOSS_TYPE": "giou"
          },
          "INFERENCE": {
            "THRESH_WITH_CTR": true,
            "PRE_NMS_THRESH": 0.05,
            "PRE_NMS_TOPK": 1000,
            "POST_NMS_TOPK": 100,
            "NMS_THRESH": 0.75
          }
        },
        "FCOS3D": {
          "NORM": "FrozenBN",
          "NUM_CONVS": 4,
          "USE_DEFORMABLE": false,
          "USE_SCALE": true,
          "DEPTH_SCALE_INIT_FACTOR": 0.3,
          "PROJ_CTR_SCALE_INIT_FACTOR": 1.0,
          "PER_LEVEL_PREDICTORS": false,
          "SCALE_DEPTH_BY_FOCAL_LENGTHS": true,
          "SCALE_DEPTH_BY_FOCAL_LENGTHS_FACTOR": 500.0,
          "MEAN_DEPTH_PER_LEVEL": [
            32.594,
            15.178,
            8.424,
            5.004,
            4.662
          ],
          "STD_DEPTH_PER_LEVEL": [
            14.682,
            7.139,
            4.345,
            2.399,
            2.587
          ],
          "MIN_DEPTH": 0.1,
          "MAX_DEPTH": 80.0,
          "CANONICAL_BOX3D_SIZES": [
            [
              1.61876949,
              3.89154523,
              1.52969237
            ],
            [
              0.62806586,
              0.82038497,
              1.76784787
            ],
            [
              0.56898187,
              1.77149234,
              1.7237099
            ],
            [
              1.9134491,
              5.15499603,
              2.18998422
            ],
            [
              2.61168401,
              9.22692319,
              3.36492722
            ],
            [
              0.5390196,
              1.08098042,
              1.28392158
            ],
            [
              2.36044838,
              15.56991038,
              3.5289238
            ],
            [
              1.24489164,
              2.51495357,
              1.61402478
            ]
          ],
          "CLASS_AGNOSTIC_BOX3D": false,
          "PREDICT_ALLOCENTRIC_ROT": true,
          "PREDICT_DISTANCE": false,
          "LOSS": {
            "SMOOTH_L1_BETA": 0.05,
            "MAX_LOSS_PER_GROUP_DISENT": 20.0,
            "CONF_3D_TEMPERATURE": 1.0,
            "WEIGHT_BOX3D": 2.0,
            "WEIGHT_CONF3D": 1.0
          },
          "PREPARE_TARGET": {
            "CENTER_SAMPLE": true,
            "POS_RADIUS": 1.5
          }
        }
      },
      "VIS": {
        "DATALOADER_ENABLED": true,
        "DATALOADER_PERIOD": 1000,
        "DATALOADER_MAX_NUM_SAMPLES": 10,
        "PREDICTIONS_ENABLED": true,
        "PREDICTIONS_MAX_NUM_SAMPLES": 20,
        "D2": {
          "DATALOADER": {
            "ENABLED": true,
            "SCALE": 1.0,
            "COLOR_MODE": "image"
          },
          "PREDICTIONS": {
            "ENABLED": true,
            "SCALE": 1.0,
            "COLOR_MODE": "image",
            "THRESHOLD": 0.4
          }
        },
        "BOX3D": {
          "DATALOADER": {
            "ENABLED": true,
            "SCALE": 1.0,
            "RENDER_LABELS": true
          },
          "PREDICTIONS": {
            "ENABLED": true,
            "SCALE": 1.0,
            "RENDER_LABELS": true,
            "THRESHOLD": 0.5,
            "MIN_DEPTH_CENTER": 0.0
          }
        }
      },
      "INPUT": {
        "FORMAT": "BGR",
        "AUG_ENABLED": true,
        "RESIZE": {
          "ENABLED": true,
          "MIN_SIZE_TRAIN": [
            288,
            304,
            320,
            336,
            352,
            368,
            384,
            400,
            416,
            448,
            480,
            512,
            544,
            576
          ],
          "MIN_SIZE_TRAIN_SAMPLING": "choice",
          "MAX_SIZE_TRAIN": 10000,
          "MIN_SIZE_TEST": 384,
          "MAX_SIZE_TEST": 100000
        },
        "CROP": {
          "ENABLED": false,
          "TYPE": "relative_range",
          "SIZE": [
            0.9,
            0.9
          ]
        },
        "RANDOM_FLIP": {
          "ENABLED": true,
          "HORIZONTAL": true,
          "VERTICAL": false
        },
        "COLOR_JITTER": {
          "ENABLED": true,
          "BRIGHTNESS": [
            0.2,
            0.2
          ],
          "SATURATION": [
            0.2,
            0.2
          ],
          "CONTRAST": [
            0.2,
            0.2
          ]
        }
      },
      "MODEL": {
        "DEVICE": "cuda",
        "META_ARCHITECTURE": "DD3D",
        "PIXEL_MEAN": [
          103.53,
          116.28,
          123.675
        ],
        "PIXEL_STD": [
          57.375,
          57.12,
          58.395
        ],
        "CKPT": "/tmp/tmpwxys0idg.pth",
        "BOX2D_ON": true,
        "BOX3D_ON": true,
        "DEPTH_ON": false,
        "CHECKPOINT": ""
      },
      "DATALOADER": {
        "TRAIN": {
          "NUM_WORKERS": 12,
          "FILTER_EMPTY_ANNOTATIONS": true,
          "SAMPLER": "RepeatFactorTrainingSampler",
          "REPEAT_THRESHOLD": 0.4,
          "ASPECT_RATIO_GROUPING": false
        },
        "TEST": {
          "NUM_WORKERS": 4,
          "SAMPLER": "InferenceSampler"
        }
      },
      "SOLVER": {
        "IMS_PER_BATCH": 4,
        "BASE_LR": 0.002,
        "MOMENTUM": 0.9,
        "NESTEROV": false,
        "WEIGHT_DECAY": 0.0001,
        "WEIGHT_DECAY_NORM": 0.0,
        "BIAS_LR_FACTOR": 1.0,
        "WEIGHT_DECAY_BIAS": 0.0001,
        "GAMMA": 0.1,
        "LR_SCHEDULER_NAME": "WarmupMultiStepLR",
        "STEPS": [
          21500,
          24000
        ],
        "WARMUP_FACTOR": 0.0001,
        "WARMUP_ITERS": 2000,
        "WARMUP_METHOD": "linear",
        "CLIP_GRADIENTS": {
          "ENABLED": false,
          "CLIP_TYPE": "value",
          "CLIP_VALUE": 1.0,
          "NORM_TYPE": 2.0
        },
        "CHECKPOINT_PERIOD": 2000,
        "MIXED_PRECISION_ENABLED": true,
        "DDP_FIND_UNUSED_PARAMETERS": false,
        "ACCUMULATE_GRAD_BATCHES": 1,
        "SYNCBN_USE_LOCAL_WORKERS": false,
        "MAX_ITER": 25000
      },
      "TEST": {
        "ENABLED": true,
        "EVAL_PERIOD": 2000,
        "EVAL_ON_START": false,
        "ADDITIONAL_EVAL_STEPS": [],
        "IMS_PER_BATCH": 80,
        "AUG": {
          "ENABLED": true,
          "MIN_SIZES": [
            320,
            384,
            448,
            512,
            576
          ],
          "MAX_SIZE": 100000,
          "FLIP": true
        }
      },
      "USE_TEST": false,
      "EVALUATORS": {
        "KITTI3D": {
          "IOU_THRESHOLDS": [
            0.5,
            0.7
          ],
          "ONLY_PREPARE_SUBMISSION": false
        }
      }
    }
    [11/02 13:28:08 tridet.data.datasets.kitti_3d]: KITTI-3D dataset(s): kitti_3d_train, kitti_3d_val 
    Error executing job with overrides: ['+experiments=dd3d_kitti_dla34', 'SOLVER.IMS_PER_BATCH=4']
    multiprocessing.pool.RemoteTraceback: 
    """
    Traceback (most recent call last):
      File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
        result = (True, func(*args, **kwds))
      File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
        return list(map(*args))
      File "/workspace/dd3d/tridet/data/datasets/kitti_3d/build.py", line 123, in _read_calibration_file
        P_20 = calibration.loc[2].values[1:].reshape(-1, 4).astype(np.float64)
    ValueError: cannot reshape array of size 14 into shape (4)
    """
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "./scripts/visualize_dataloader.py", line 26, in main
        dataset_names = register_datasets(cfg)
      File "/workspace/dd3d/tridet/data/datasets/__init__.py", line 19, in register_datasets
        dataset_names.extend(register_kitti_3d_datasets(required_datasets, cfg))
      File "/workspace/dd3d/tridet/data/datasets/kitti_3d/__init__.py", line 41, in register_kitti_3d_datasets
        fn(name, **kwargs)
      File "/workspace/dd3d/tridet/data/datasets/kitti_3d/build.py", line 312, in register_kitti_3d_metadata
        dataset_dicts = DatasetCatalog.get(dataset_name)
      File "/usr/local/lib/python3.8/dist-packages/detectron2/data/catalog.py", line 58, in get
        return f()
      File "/workspace/dd3d/tridet/data/datasets/kitti_3d/build.py", line 298, in build_monocular_kitti3d_dataset
        dataset = KITTI3DMonocularDataset(root_dir, mv3d_split, class_names, sensors, box2d_from_box3d, max_num_items)
      File "/workspace/dd3d/tridet/data/datasets/kitti_3d/build.py", line 283, in __init__
        self._kitti_dset = KITTI3DDataset(root_dir, mv3d_split, class_names, sensors, box2d_from_box3d, max_num_items)
      File "/workspace/dd3d/tridet/data/datasets/kitti_3d/build.py", line 73, in __init__
        self.calibration_table = self._parse_calibration_files()
      File "/workspace/dd3d/tridet/data/datasets/kitti_3d/build.py", line 95, in _parse_calibration_files
        (_proc.map(self._read_calibration_file, calibration_files))
      File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
      File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
        raise self._value
    ValueError: cannot reshape array of size 14 into shape (4)
    
    
    opened by komzy 2
  • where to change the train class num 5 to 3

    where to change the train class num 5 to 3

    Hi, I find that the dd3d is trained with 5 classes: Car,Pedestrian,Cyclist,Van,Truck , But I just want to train 3 classes (Car,Pedestrian,Cyclist). where to modify the code to train 3 classes ? I am looking forward to your reply! Thank you very much!

    opened by 694376965 0
  • find a bug in image_list.py  file about the functiion of from_tensors()

    find a bug in image_list.py file about the functiion of from_tensors()

    @dennis-park-TRI I found a bug in the project of the image_list.py about the function of from_tensors().When padding image,I think that the intrinsics of the image can make a difference(for example cx,cy).I misunderstand it or It is indeed wrong?Looking forward to your reply.Thank you...

    opened by lqs19881030 0
  • Generating Validation Folder

    Generating Validation Folder

    For anyone who had to generate the validation folder, here's what I used.

    `import os import shutil

    root = '' with open(os.path.join(root, "mv3d_kitti_splits", "val.txt")) as _f: lines = _f.readlines() split = [line.rstrip("\n") for line in lines]

    for sub in ['calib', 'image_2', 'label_2']: for file in split: if sub == 'calib' or sub == 'label_2': file += '.txt' else: file += '.png' shutil.copyfile(os.path.join(root, 'training',sub, file), os.path.join(root, 'val',sub,file))`

    opened by tom-bu 0
  • Multi-node training

    Multi-node training

    Hi there, Thank you so much for this release! When trying to run multi-node training, I can see that this repo is equipped to do this, when I see the following lines: https://github.com/TRI-ML/dd3d/blob/da25b614a29344830c96c2848c02a15b35380c4b/tridet/utils/setup.py#L57 https://github.com/TRI-ML/dd3d/blob/da25b614a29344830c96c2848c02a15b35380c4b/Makefile#L42

    Have you trained using multiple nodes (not just multiple GPUs) where you have to provide 2 different ip addresses from within the docker containers you provided in this repo? And has this worked for you? When I execute training on two different machines, the code hangs and I dont see any terminal printouts...

    Thank you in advance!

    opened by EphChem 0
  • Guidance on Pre training

    Guidance on Pre training

    Hello,thanks for sharing your excellent work! I would like to adapt your work to other model, for example, use resnet50 as backbone, and refactor head branches. Could your please share the pretrain codes or give me some advice. Thanks!

    opened by Watebear 0
  • Box pose transfomation from camera 2 to velodyne may not be correct.

    Box pose transfomation from camera 2 to velodyne may not be correct.

    Hi, I found code at https://github.com/TRI-ML/dd3d/blob/86d8660c29612b79836dad9b6c39972ac2ca1557/tridet/data/datasets/kitti_3d/build.py#L260 may not be correct. If we want to get the box pose in the velodyne frame, I think the code may be box_pose = pose_0V.inverse() * pose_02 * box_pose. I am not sure if it is correct, so I hope you can double check, thank you!

    opened by AlfredQin 0
  • fix bugs to run training script

    fix bugs to run training script

    fixed some trivial erros I have encountered before running the training script.

    to reproduce errors, I just ran following commands written in README.md.

    1. make docker-build
    2. ./scripts/train.py +experiments=dd3d_kitti_dla34_overfit

    please check.

    opened by yudai-kato-aisin 0
Owner
Toyota Research Institute - Machine Learning
Toyota Research Institute - Machine Learning
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 76 Jan 2, 2023
This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

3D-CVF This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object

YecheolKim 97 Dec 20, 2022
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on the combined output candidates of any 3D and any 2D detector, and is trained to produce more accurate 3D and 2D detection results.

Su Pang 254 Dec 16, 2022
Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

Nguyen Mau Dzung 271 Nov 29, 2022
Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

Ziyue Feng 72 Dec 9, 2022
Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

Yuxuan Liu 305 Dec 19, 2022
Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection

Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection (NimPme) The official implementation of Novel Instances Mining with

null 12 Sep 8, 2022
Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Delving into Localization Errors for Monocular 3D Detection By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang. Intr

XINZHU.MA 124 Jan 4, 2023
Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Gra

null 32 Dec 26, 2022
Official implementation of Monocular Quasi-Dense 3D Object Tracking

Monocular Quasi-Dense 3D Object Tracking Monocular Quasi-Dense 3D Object Tracking (QD-3DT) is an online framework detects and tracks objects in 3D usi

Visual Intelligence and Systems Group 441 Dec 20, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather

LiDAR fog simulation Created by Martin Hahner at the Computer Vision Lab of ETH Zurich. This is the official code release of the paper Fog Simulation

Martin Hahner 110 Dec 30, 2022
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.

OpenPCDet OpenPCDet is a clear, simple, self-contained open source project for LiDAR-based 3D object detection. It is also the official code release o

OpenMMLab 3.2k Dec 31, 2022
LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

LiDAR Distillation Paper | Model LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection Yi Wei, Zibu Wei, Yongming Rao, Jiax

Yi Wei 75 Dec 22, 2022
[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

Rui Peng 110 Dec 23, 2022
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 96 Dec 10, 2022
Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

JOINT This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021. @inproce

Yunyao 35 Oct 16, 2022
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

Toronto Robotics and AI Laboratory 289 Jan 5, 2023