Weakly Supervised Learning of Rigid 3D Scene Flow

Zan Gojcic

Last update: Dec 27, 2022

Related tags

Deep Learning Rigid3DSceneFlow

Overview

Weakly Supervised Learning of Rigid 3D Scene Flow

This repository provides code and data to train and evaluate a weakly supervised method for rigid 3D scene flow estimation. It represents the official implementation of the paper:

Weakly Supervised Learning of Rigid 3D Scene Flow

Zan Gojcic, Or Litany, Andreas Wieser, Leonidas J. Guibas, Tolga Birdal
| IGP ETH Zurich | Nvidia Toronto AI Lab | Guibas Lab Stanford University |

For more information, please see the project webpage

Environment Setup

Note: the code in this repo has been tested on Ubuntu 16.04/20.04 with Python 3.7, CUDA 10.1/10.2, PyTorch 1.7.1 and MinkowskiEngine 0.5.1. It may work for other setups, but has not been tested.

Before proceding, make sure CUDA is installed and set up correctly.

After cloning this reposiory you can proceed by setting up and activating a virual environment with Python 3.7. If you are using a different version of cuda (10.1) change the pytorch installation instruction accordingly.

export CXX=g++-7
conda config --append channels conda-forge
conda create --name rigid_3dsf python=3.7
source activate rigid_3dsf
conda install --file requirements.txt
conda install -c open3d-admin open3d=0.9.0.0
conda install -c intel scikit-learn
conda install pytorch==1.7.1 torchvision cudatoolkit=10.1 -c pytorch

You can then proceed and install MinkowskiEngine library for sparse tensors:

pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps

Our repository also includes a pytorch implementation of Chamfer Distance in ./utils/chamfer_distance which will be compiled on the first run.

In order to test if Pytorch and MinkwoskiEngine are installed correctly please run

python -c "import torch, MinkowskiEngine"

which should run without an error message.

Data

We provide the preprocessed data of flying_things_3d (108GB), stereo_kitti (500MB), lidar_kitti (~160MB), semantic_kitti (78GB), and waymo_open (50GB) used for training and evaluating our model.

To download a single dataset please run:

bash ./scripts/download_data.sh name_of_the_dataset

To download all datasets simply run:

bash ./scripts/download_data.sh

The data will be downloaded and extracted to ./data/name_of_the_dataset/.

Pretrained models

We provide the checkpoints of the models trained on flying_things_3d or semantic_kitti, which we use in our main evaluations.

To download these models please run:

bash ./scripts/download_pretrained_models.sh

Additionally, we provide all the models used in the ablation studies and the model fine tuned on waymo_open.

To download these models please run:

bash ./scripts/download_pretrained_models_ablations.sh

All the models will be downloaded and extracted to ./logs/dataset_used_for_training/.

Evaluation with pretrained models

Our method with pretrained weights can be evaluated using the ./eval.py script. The configuration parameters of the evaluation can be set with the *.yaml configuration files located in ./configs/eval/. We provide a configuration file for each dataset used in our paper. For all evaluations please first download the pretrained weights and the corresponding data. Note, if the data or pretrained models are saved to a non-default path the config files also has to be adapted accordingly.

FlyingThings3D

To evaluate our backbone + scene flow head on FlyingThings3d please run:

python eval.py ./configs/eval/eval_flying_things_3d.yaml

This should recreate the results from the Table 1 of our paper (EPE3D: 0.052 m).

stereoKITTI

To evaluate our backbone + scene flow head on stereoKITTI please run:

python eval.py ./configs/eval/eval_stereo_kitti.yaml

This should again recreate the results from the Table 1 of our paper (EPE3D: 0.042 m).

lidarKITTI

To evaluate our full weakly supervised method on lidarKITTI please run:

python eval.py ./configs/eval/eval_lidar_kitti.yaml

This should recreate the results for Ours++ on lidarKITTI (w/o ground) from the Table 2 of our paper (EPE3D: 0.094 m). To recreate other results on lidarKITTI please change the ./configs/eval/eval_lidar_kitti.yaml file accordingly.

semanticKITTI

To evaluate our full weakly supervised method on semanticKITTI please run:

python eval.py ./configs/eval/eval_semantic_kitti.yaml

This should recreate the results of our full model on semanticKITTI (w/o ground) from the Table 4 of our paper. To recreate other results on semanticKITTI please change the ./configs/eval/eval_semantic_kitti.yaml file accordingly.

waymo open

To evaluate our fine-tuned model on waymo open please run:

python eval.py ./configs/eval/eval_waymo_open.yaml

This should recreate the results for Ours++ (fine-tuned) from the Table 9 of the appendix. To recreate other results on waymo open please change the ./configs/eval/eval_waymo_open.yaml file accordingly.

Training our method from scratch

Our method can be trained using the ./train.py script. The configuration parameters of the training process can be set using the config files located in ./configs/train/.

Training our backbone with full supervision on FlyingThings3D

To train our backbone network and scene flow head under full supervision (corresponds to Sec. 4.3 of our paper) please run:

python train.py ./configs/train/train_fully_supervised.yaml

The checkpoints and tensorboard data will be saved to ./logs/logs_FlyingThings3D_ME. If you run out of GPU memory with the default setting please adapt the batch_size and acc_iter_size in the ./configs/default.yaml to e.g. 4 and 2, respectively.

Training under weak supervision on semanticKITTI

To train our full method under weak supervision on semanticKITTI please run

python train.py ./configs/train/train_weakly_supervised.yaml

The checkpoints and tensorboard data will be saved to ./logs/logs_SemanticKITTI_ME. If you run out of GPU memory with the default setting please adapt the batch_size and acc_iter_size in the ./configs/default.yaml to e.g. 4 and 2, respectively.

Citation

If you found this code or paper useful, please consider citing:

@misc{gojcic2021weakly3dsf,
        title = {Weakly {S}upervised {L}earning of {R}igid {3D} {S}cene {F}low}, 
        author = {Gojcic, Zan and Litany, Or and Wieser, Andreas and Guibas, Leonidas J and Birdal, Tolga},
        year = {2021},
        eprint={2102.08945},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
        }

Contact

If you run into any problems or have questions, please create an issue or contact Zan Gojcic.

Acknowledgments

In this project we use parts of the official implementations of:

We thank the respective authors for open sourcing their methods.

Comments

Invalid in_feat_size 0 with Cuda 11

When using Cuda 11 our model returns the following error:

File "/home/zgojcic/anaconda3/envs/rigid_3dsf/lib/python3.7/site-packages/MinkowskiEngine-0.5.1-py3.7-linux-x86_64.egg/MinkowskiEngine/MinkowskiConvolution.py", line 84, in forward
    coordinate_manager._manager,
RuntimeError: /home/zgojcic/Documents/Rigid3DSceneFlow/MinkowskiEngine/src/convolution_gpu.cu:85, assertion (in_feat.size(0) == p_map_manager->size(in_key)) failed. Invalid in_feat size 0 != 5296

It seems that this is due to the combination of Cuda 11 with MinkowskiEngine. The issue is currently under investigation https://github.com/NVIDIA/MinkowskiEngine/issues/330

Until solved we suggest using Cuda 10.2 or 10.1.

opened by zgojcic 9

System memory usage increase in training

Hi, when I run python train.py ./configs/train/train_weakly_supervised.yaml to train the network from scratch using our dataset, my system memory usage will slowly increase until it max out the system memory and then the traning will crash. I have 16gb of system memory and the training can only go on for a little more than one epoch with ~16000 training samples. I tried to lower the num_workers to 4 and lower the batch size to 2 but they didn't seem to resolve the issue.

opened by Alt216 5
Pretrained model was expired

Hi! Thank you for your wonderful work. I tried downloading the checkpoint by the script you provided. However, it seems the link saving the models expired, and I cannot find the pre-trained models. Could you please tell me how I can get them?

Thank you very much for your kind help!

opened by weihao1115 4
About FG rigid transformation estimation

Hi @zgojcic thanks for sharing this inspiring work!

Since the output of DBSCAN is unordered and even the number of clusters might be different, so how did you determine the corresponding FG instances to compute the rigid transformation? And will the instance segmentation result be good enough for DBSCAN at the beginning of the training stage?

Another small question is why you use different tau for Eq.9 and Eq.10? Is the reason that they are doing softmax among a different numbers of correspondences?

Best, Xuyang

opened by XuyangBai 4
The file of model

Hello! I want to change the model_best.pt file into .onnx file, however, I don't clearly find out the input parameter of network. It will help a lot if you can answer me the input parameter of the network. If you can offer the .onnx file of the model will help me a lot.

Thank you!

opened by a962097364 3
Question about data preprocessing for semanticKITTI

Hi, thanks for sharing. I wonder why you remove the points invisible within the front camera for semanticKITTI. Based on your paper, it doesn't seem necessary.Is it any difference when I use full point cloud to test or train?

opened by liamlin5566 2
Results visualization

Hi,

I saw the closed issue regarding this topic, but for me was not clear. Where I can find the points that need to be converted? Can you provide please the code for doing this, before using the Keyshot?

opened by alexandruiliescu1997 2
Question about computing Rotation matrix

Hi, thanks for the nice work.

When computing the rotation matrix after SVD, the following line transposes both V and U: https://github.com/zgojcic/Rigid3DSceneFlow/blob/7fa57e3ddccf605dca63ded04825bba2272cae4a/lib/utils.py#L335 But I think only U should be transposed here. Is this an error?

I find another implementation in the source code: https://github.com/zgojcic/Rigid3DSceneFlow/blob/7fa57e3ddccf605dca63ded04825bba2272cae4a/lib/model/minkowski/MinkowskiFlow.py#L344 Here only U is transposed.

opened by hi-zhengcheng 2
Preprocessing on ego-motion of waymo-open dataset?

Hi, thank you for the great work.

As you have mentioned in the paper, the coordinate systems for kitti and for waymo are different. In order to account for the difference, you said you have "transform[ed] these points into a cordinate system centered at the location of the LiDAR sensor in the KITTI setup". I wonder if you had applied any transformation on the ego motion data of waymo-open dataset(i.e. R and t) as well, since the direction of the ego vehicle is along z-axis for the kitti dataset and along x-axis for the waymo dataset. Also, the ego transformation does not necessarily have to be "aligned" with the absolute coordinate frame. If I blindly apply the transformation (that I have used for the transformation of the point clouds) to the ego coordinate, the absolute R is not aligned with the world coordinate; in other words, R is quite different from the identity matrix, I. However, the estimated R matrix as well as the GT R matrix for kitti dataset are "almost aligned with the world coordinate"; namely, they are very close to identity. Have you applied any other tricks to bring the R back to I in the training process? Could you explain a bit more on how you have translated the ego motion from waymo to kitti? Thank you in advance.

opened by Young-woong-Cho 1
The coordinate system of "pose" in semanticKITTI npz file

Hi, thanks for the nice work. I currently want to perform a test on semanticKITTI. How is the 'pose' in the npz file defined and obtained? For example, I downloaded the complete tar archive, and load the data in 11/000000_000001.npz, for the frame_000000, the data['pose_s'] is an identity matrix, corresponding to the global coordinate system, and data['pose_t'] is same as the line-1 of pose.txt provided by semantickitti. Is this pose in the Camera coordinate system or the LiDAR coordinate system? Because I saw in the supplemental material that the LIDAR data should be projected to the Camera coordinate system. Is the pose here also in the camera coordinate system?

I have tried several ways, but I can’t convert the two frames of point clouds to one coordinate system through data['pose_t'] (In the former example, data['pose_s'] is the identity matrix). Can you give me some hints?

opened by MaxChanger 1
Semantic_kitti dataset processing

This is a wonder work! I have a question regarding how is the dataset such as semantic_kitti processed? When I try to print out what is in one of the npz files in semantic_kitti dataset I get this: ['pc1', 'pc2', 'sem_label_s', 'sem_label_t', 'inst_label_s', 'inst_label_t', 'mot_label_s', 'mot_label_t', 'pose_s', 'pose_t'], I wonder what each of these is and how to get them. Thanks!

opened by Alt216 0

Owner

Zan Gojcic

GitHub

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

96 Nov 1, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

85 Dec 22, 2022

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

28 Nov 16, 2022

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

585 Jan 4, 2023

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PV-RAFT This repository contains the PyTorch implementation for paper "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clou

43 Dec 5, 2022

Neural Scene Flow Fields using pytorch-lightning, with potential improvements

nsff_pl Neural Scene Flow Fields using pytorch-lightning. This repo reimplements the NSFF idea, but modifies several operations based on observation o

178 Dec 21, 2022

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

833 Dec 28, 2022

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

67 Jan 3, 2023

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

113 Dec 27, 2022

Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

157 Nov 21, 2022

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

ARAPReg Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators.. Installation The cod

132 Nov 28, 2022

Weakly Supervised Learning of Rigid 3D Scene Flow

Related tags

Overview

Weakly Supervised Learning of Rigid 3D Scene Flow

Environment Setup

Data

Pretrained models

Evaluation with pretrained models

FlyingThings3D

stereoKITTI

lidarKITTI

semanticKITTI

waymo open

Training our method from scratch

Training our backbone with full supervision on FlyingThings3D

Training under weak supervision on semanticKITTI

Citation

Contact

Acknowledgments

Comments

Owner

Zan Gojcic

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

Neural Scene Flow Fields using pytorch-lightning, with potential improvements

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

Code for "Learning to Segment Rigid Motions from Two Frames".

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation