Weakly Supervised Learning of Rigid 3D Scene Flow

Overview

Weakly Supervised Learning of Rigid 3D Scene Flow

This repository provides code and data to train and evaluate a weakly supervised method for rigid 3D scene flow estimation. It represents the official implementation of the paper:

Weakly Supervised Learning of Rigid 3D Scene Flow

Zan Gojcic, Or Litany, Andreas Wieser, Leonidas J. Guibas, Tolga Birdal
| IGP ETH Zurich | Nvidia Toronto AI Lab | Guibas Lab Stanford University |

For more information, please see the project webpage

WSR3DSF

Environment Setup

Note: the code in this repo has been tested on Ubuntu 16.04/20.04 with Python 3.7, CUDA 10.1/10.2, PyTorch 1.7.1 and MinkowskiEngine 0.5.1. It may work for other setups, but has not been tested.

Before proceding, make sure CUDA is installed and set up correctly.

After cloning this reposiory you can proceed by setting up and activating a virual environment with Python 3.7. If you are using a different version of cuda (10.1) change the pytorch installation instruction accordingly.

export CXX=g++-7
conda config --append channels conda-forge
conda create --name rigid_3dsf python=3.7
source activate rigid_3dsf
conda install --file requirements.txt
conda install -c open3d-admin open3d=0.9.0.0
conda install -c intel scikit-learn
conda install pytorch==1.7.1 torchvision cudatoolkit=10.1 -c pytorch

You can then proceed and install MinkowskiEngine library for sparse tensors:

pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps

Our repository also includes a pytorch implementation of Chamfer Distance in ./utils/chamfer_distance which will be compiled on the first run.

In order to test if Pytorch and MinkwoskiEngine are installed correctly please run

python -c "import torch, MinkowskiEngine"

which should run without an error message.

Data

We provide the preprocessed data of flying_things_3d (108GB), stereo_kitti (500MB), lidar_kitti (~160MB), semantic_kitti (78GB), and waymo_open (50GB) used for training and evaluating our model.

To download a single dataset please run:

bash ./scripts/download_data.sh name_of_the_dataset

To download all datasets simply run:

bash ./scripts/download_data.sh

The data will be downloaded and extracted to ./data/name_of_the_dataset/.

Pretrained models

We provide the checkpoints of the models trained on flying_things_3d or semantic_kitti, which we use in our main evaluations.

To download these models please run:

bash ./scripts/download_pretrained_models.sh

Additionally, we provide all the models used in the ablation studies and the model fine tuned on waymo_open.

To download these models please run:

bash ./scripts/download_pretrained_models_ablations.sh

All the models will be downloaded and extracted to ./logs/dataset_used_for_training/.

Evaluation with pretrained models

Our method with pretrained weights can be evaluated using the ./eval.py script. The configuration parameters of the evaluation can be set with the *.yaml configuration files located in ./configs/eval/. We provide a configuration file for each dataset used in our paper. For all evaluations please first download the pretrained weights and the corresponding data. Note, if the data or pretrained models are saved to a non-default path the config files also has to be adapted accordingly.

FlyingThings3D

To evaluate our backbone + scene flow head on FlyingThings3d please run:

python eval.py ./configs/eval/eval_flying_things_3d.yaml

This should recreate the results from the Table 1 of our paper (EPE3D: 0.052 m).

stereoKITTI

To evaluate our backbone + scene flow head on stereoKITTI please run:

python eval.py ./configs/eval/eval_stereo_kitti.yaml

This should again recreate the results from the Table 1 of our paper (EPE3D: 0.042 m).

lidarKITTI

To evaluate our full weakly supervised method on lidarKITTI please run:

python eval.py ./configs/eval/eval_lidar_kitti.yaml

This should recreate the results for Ours++ on lidarKITTI (w/o ground) from the Table 2 of our paper (EPE3D: 0.094 m). To recreate other results on lidarKITTI please change the ./configs/eval/eval_lidar_kitti.yaml file accordingly.

semanticKITTI

To evaluate our full weakly supervised method on semanticKITTI please run:

python eval.py ./configs/eval/eval_semantic_kitti.yaml

This should recreate the results of our full model on semanticKITTI (w/o ground) from the Table 4 of our paper. To recreate other results on semanticKITTI please change the ./configs/eval/eval_semantic_kitti.yaml file accordingly.

waymo open

To evaluate our fine-tuned model on waymo open please run:

python eval.py ./configs/eval/eval_waymo_open.yaml

This should recreate the results for Ours++ (fine-tuned) from the Table 9 of the appendix. To recreate other results on waymo open please change the ./configs/eval/eval_waymo_open.yaml file accordingly.

Training our method from scratch

Our method can be trained using the ./train.py script. The configuration parameters of the training process can be set using the config files located in ./configs/train/.

Training our backbone with full supervision on FlyingThings3D

To train our backbone network and scene flow head under full supervision (corresponds to Sec. 4.3 of our paper) please run:

python train.py ./configs/train/train_fully_supervised.yaml

The checkpoints and tensorboard data will be saved to ./logs/logs_FlyingThings3D_ME. If you run out of GPU memory with the default setting please adapt the batch_size and acc_iter_size in the ./configs/default.yaml to e.g. 4 and 2, respectively.

Training under weak supervision on semanticKITTI

To train our full method under weak supervision on semanticKITTI please run

python train.py ./configs/train/train_weakly_supervised.yaml

The checkpoints and tensorboard data will be saved to ./logs/logs_SemanticKITTI_ME. If you run out of GPU memory with the default setting please adapt the batch_size and acc_iter_size in the ./configs/default.yaml to e.g. 4 and 2, respectively.

Citation

If you found this code or paper useful, please consider citing:

@misc{gojcic2021weakly3dsf,
        title = {Weakly {S}upervised {L}earning of {R}igid {3D} {S}cene {F}low}, 
        author = {Gojcic, Zan and Litany, Or and Wieser, Andreas and Guibas, Leonidas J and Birdal, Tolga},
        year = {2021},
        eprint={2102.08945},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
        }

Contact

If you run into any problems or have questions, please create an issue or contact Zan Gojcic.

Acknowledgments

In this project we use parts of the official implementations of:

We thank the respective authors for open sourcing their methods.

Comments
  • Invalid in_feat_size 0 with Cuda 11

    Invalid in_feat_size 0 with Cuda 11

    When using Cuda 11 our model returns the following error:

    File "/home/zgojcic/anaconda3/envs/rigid_3dsf/lib/python3.7/site-packages/MinkowskiEngine-0.5.1-py3.7-linux-x86_64.egg/MinkowskiEngine/MinkowskiConvolution.py", line 84, in forward
        coordinate_manager._manager,
    RuntimeError: /home/zgojcic/Documents/Rigid3DSceneFlow/MinkowskiEngine/src/convolution_gpu.cu:85, assertion (in_feat.size(0) == p_map_manager->size(in_key)) failed. Invalid in_feat size 0 != 5296
    

    It seems that this is due to the combination of Cuda 11 with MinkowskiEngine. The issue is currently under investigation https://github.com/NVIDIA/MinkowskiEngine/issues/330

    Until solved we suggest using Cuda 10.2 or 10.1.

    opened by zgojcic 9
  • System memory usage increase in training

    System memory usage increase in training

    Hi, when I run python train.py ./configs/train/train_weakly_supervised.yaml to train the network from scratch using our dataset, my system memory usage will slowly increase until it max out the system memory and then the traning will crash. I have 16gb of system memory and the training can only go on for a little more than one epoch with ~16000 training samples. I tried to lower the num_workers to 4 and lower the batch size to 2 but they didn't seem to resolve the issue.

    opened by Alt216 5
  • Pretrained model was expired

    Pretrained model was expired

    Hi! Thank you for your wonderful work. I tried downloading the checkpoint by the script you provided. However, it seems the link saving the models expired, and I cannot find the pre-trained models. Could you please tell me how I can get them?

    Thank you very much for your kind help!

    opened by weihao1115 4
  • About FG rigid transformation estimation

    About FG rigid transformation estimation

    Hi @zgojcic thanks for sharing this inspiring work!

    Since the output of DBSCAN is unordered and even the number of clusters might be different, so how did you determine the corresponding FG instances to compute the rigid transformation? And will the instance segmentation result be good enough for DBSCAN at the beginning of the training stage?

    Another small question is why you use different tau for Eq.9 and Eq.10? Is the reason that they are doing softmax among a different numbers of correspondences?

    Best, Xuyang

    opened by XuyangBai 4
  • The file of model

    The file of model

    Hello! I want to change the model_best.pt file into .onnx file, however, I don't clearly find out the input parameter of network. It will help a lot if you can answer me the input parameter of the network. If you can offer the .onnx file of the model will help me a lot.

    Thank you!

    opened by a962097364 3
  • Question about data preprocessing for semanticKITTI

    Question about data preprocessing for semanticKITTI

    Hi, thanks for sharing. I wonder why you remove the points invisible within the front camera for semanticKITTI. Based on your paper, it doesn't seem necessary.Is it any difference when I use full point cloud to test or train?

    opened by liamlin5566 2
  • Results visualization

    Results visualization

    Hi,

    I saw the closed issue regarding this topic, but for me was not clear. Where I can find the points that need to be converted? Can you provide please the code for doing this, before using the Keyshot?

    opened by alexandruiliescu1997 2
  • Question about computing Rotation matrix

    Question about computing Rotation matrix

    Hi, thanks for the nice work.

    When computing the rotation matrix after SVD, the following line transposes both V and U: https://github.com/zgojcic/Rigid3DSceneFlow/blob/7fa57e3ddccf605dca63ded04825bba2272cae4a/lib/utils.py#L335 But I think only U should be transposed here. Is this an error?

    I find another implementation in the source code: https://github.com/zgojcic/Rigid3DSceneFlow/blob/7fa57e3ddccf605dca63ded04825bba2272cae4a/lib/model/minkowski/MinkowskiFlow.py#L344 Here only U is transposed.

    opened by hi-zhengcheng 2
  • Preprocessing on ego-motion of waymo-open dataset?

    Preprocessing on ego-motion of waymo-open dataset?

    Hi, thank you for the great work.

    As you have mentioned in the paper, the coordinate systems for kitti and for waymo are different. In order to account for the difference, you said you have "transform[ed] these points into a cordinate system centered at the location of the LiDAR sensor in the KITTI setup". I wonder if you had applied any transformation on the ego motion data of waymo-open dataset(i.e. R and t) as well, since the direction of the ego vehicle is along z-axis for the kitti dataset and along x-axis for the waymo dataset. Also, the ego transformation does not necessarily have to be "aligned" with the absolute coordinate frame. If I blindly apply the transformation (that I have used for the transformation of the point clouds) to the ego coordinate, the absolute R is not aligned with the world coordinate; in other words, R is quite different from the identity matrix, I. However, the estimated R matrix as well as the GT R matrix for kitti dataset are "almost aligned with the world coordinate"; namely, they are very close to identity. Have you applied any other tricks to bring the R back to I in the training process? Could you explain a bit more on how you have translated the ego motion from waymo to kitti? Thank you in advance.

    opened by Young-woong-Cho 1
  • The coordinate system of

    The coordinate system of "pose" in semanticKITTI npz file

    Hi, thanks for the nice work. I currently want to perform a test on semanticKITTI. How is the 'pose' in the npz file defined and obtained? For example, I downloaded the complete tar archive, and load the data in 11/000000_000001.npz, for the frame_000000, the data['pose_s'] is an identity matrix, corresponding to the global coordinate system, and data['pose_t'] is same as the line-1 of pose.txt provided by semantickitti. Is this pose in the Camera coordinate system or the LiDAR coordinate system? Because I saw in the supplemental material that the LIDAR data should be projected to the Camera coordinate system. Is the pose here also in the camera coordinate system?

    I have tried several ways, but I can’t convert the two frames of point clouds to one coordinate system through data['pose_t'] (In the former example, data['pose_s'] is the identity matrix). Can you give me some hints?

    opened by MaxChanger 1
  • Semantic_kitti dataset processing

    Semantic_kitti dataset processing

    This is a wonder work! I have a question regarding how is the dataset such as semantic_kitti processed? When I try to print out what is in one of the npz files in semantic_kitti dataset I get this: ['pc1', 'pc2', 'sem_label_s', 'sem_label_t', 'inst_label_s', 'inst_label_t', 'mot_label_s', 'mot_label_t', 'pose_s', 'pose_t'], I wonder what each of these is and how to get them. Thanks!

    opened by Alt216 0
Owner
Zan Gojcic
Zan Gojcic
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 1, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

null 5 Dec 10, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PV-RAFT This repository contains the PyTorch implementation for paper "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clou

Yi Wei 43 Dec 5, 2022
Neural Scene Flow Fields using pytorch-lightning, with potential improvements

nsff_pl Neural Scene Flow Fields using pytorch-lightning. This repo reimplements the NSFF idea, but modifies several operations based on observation o

AI葵 178 Dec 21, 2022
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

Sicheng Xu 833 Dec 28, 2022
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 3, 2023
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

Computer Vision Insitute, SZU 113 Dec 27, 2022
Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

Gengshan Yang 157 Nov 21, 2022
Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

ARAPReg Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators.. Installation The cod

Bo Sun 132 Nov 28, 2022
A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

A variational Bayesian method for similarity learning in non-rigid image registration We provide the source code and the trained models used in the re

daniel grzech 14 Nov 21, 2022
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 48 Dec 18, 2022
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation The code of: Context Decoupling Augmentation for Weakly Supervised Semanti

null 54 Dec 12, 2022