Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Visual Inference Lab @TU Darmstadt

Last update: Dec 22, 2022

Related tags

Deep Learning multi-mono-sf

Overview

Self-Supervised Multi-Frame Monocular Scene Flow

3D visualization of estimated depth and scene flow (overlayed with input image) from temporally consecutive images.
Trained on KITTI in a self-supervised manner, and tested on DAVIS.

This repository is the official PyTorch implementation of the paper:

   Self-Supervised Multi-Frame Monocular Scene Flow
   Junhwa Hur and Stefan Roth
   CVPR, 2021
   Arxiv

Contact: junhwa.hur[at]gmail.com

Installation

The code has been tested with Anaconda (Python 3.8), PyTorch 1.8.1 and CUDA 10.1 (Different Pytorch + CUDA version is also compatible).
Please run the provided conda environment setup file:

conda env create -f environment.yml
conda activate multi-mono-sf

(Optional) Using the CUDA implementation of the correlation layer accelerates training (~50% faster):

./install_correlation.sh

After installing it, turn on this flag --correlation_cuda_enabled=True in training/evaluation script files.

Dataset

Please download the following to datasets for the experiment:

KITTI Raw Data (synced+rectified data, please refer MonoDepth2 for downloading all data more conveniently.)
merge KITTI Scene Flow 2015 and Multi-view extension in the same folder.

To save space, we convert the KITTI Raw png images to jpeg, following the convention from MonoDepth:

find (data_folder)/ -name '*.png' | parallel 'convert {.}.png {.}.jpg && rm {}'

We also converted images in KITTI Scene Flow 2015 as well. Please convert the png images in image_2 and image_3 into jpg and save them into the seperate folder image_2_jpg and image_3_jpg.
To save space further, you can delete the velodyne point data in KITTI raw data as we don't need it.

Training and Inference

The scripts folder contains training/inference scripts.

For self-supervised training, you can simply run the following script files:

Script	Training	Dataset
`./train_selfsup.sh`	Self-supervised	KITTI Split

Fine-tuning is done with two stages: (i) first finding the stopping point using train/valid split, and then (ii) fune-tuning using all data with the found iteration steps.

Script	Training	Dataset
`./ft_1st_stage.sh`	Semi-supervised finetuning	KITTI raw + KITTI 2015
`./ft_2nd_stage.sh`	Semi-supervised finetuning	KITTI raw + KITTI 2015

In the script files, please configure these following PATHs for experiments:

DATA_HOME : the directory where the training or test is located in your local system.
EXPERIMENTS_HOME : your own experiment directory where checkpoints and log files will be saved.

To test pretrained models, you can simply run the following script files:

Script	Training	Dataset
`./eval_selfsup_train.sh`	self-supervised	KITTI 2015 Train
`./eval_ft_test.sh`	fine-tuned	KITTI 2015 Test
`./eval_davis.sh`	self-supervised	DAVIS (one scene)
`./eval_davis_all.sh`	self-supervised	DAVIS (all scenes)

To save visuailization of outputs, please turn on --save_vis=True in the script.
To save output images for KITTI Scene Flow 2015 Benchmark submission, please turn on --save_out=True in the script.

Pretrained Models

The checkpoints folder contains the checkpoints of the pretrained models.

Acknowledgement

Please cite our paper if you use our source code.

@inproceedings{Hur:2021:SSM,  
  Author = {Junhwa Hur and Stefan Roth},  
  Booktitle = {CVPR},  
  Title = {Self-Supervised Multi-Frame Monocular Scene Flow},  
  Year = {2021}  
}

Portions of the source code (e.g., training pipeline, runtime, argument parser, and logger) are from Jochen Gast

Comments

About 'dp_1 = self.sigmoid(dp_1) * 0.3'

Hi! Thanks for making your code open. Could you please further explain why the parameter chooses 0.3? Why not dp_1 = self.sigmoid(dp_1)?

https://github.com/visinf/multi-mono-sf/blob/8e71381635c7ad4981055301705e018a884a6085/models/model_monosceneflow.py#L122

opened by daydreamer2023 14
Scene flow of static background
I have noticed when evaluating the pre-trained model on a dataset that has camera motion, that the model predicts large scene flow for the background objects that are actually static. As far as I can tell, this is an unintended consequence - is that correct? In theory at least this yields invalid results, since to have non 0 scene flow vectors in world space would mean that the static objects are moving.

I discovered this on my custom dataset, but it can also be seen on the Davis camel scene, in which the camel is moving but the background is static (in world coordinates). Example input frame:

An example scene flow (visualisation) output is as follows: (The red and blue squares have been manually annotated by me for the following reason)

Inspecting the values at the red (corresponding to static objects) and blue (corresponding to dynamic objects) squares, we see that the scene flow is larger at the red than at the blue:

red region: max 0.16379195 , min: 0.024987139

blue region: max 0.053224254 , min: 5.828061e-07

(max and min are across all three sf dimensions)

My use of this model is contingent on it's ability to predict the sf of static objects as 0. Is this a known issue, or am I missing something.

Thanks so much!
opened by rohaldb 9
Find extremely big pts_loss output when trying to set cudatoolkit=11.2

Hi Junhwa, thanks a lot for the awesome work in the field of scene flow :) When l tried to use your loss in my environment setting (cudatoolkit=11.2), the pts_loss(s_3) became extremely big, for example, bigger than 100000. But when I set the version of cudatoolkit as 10.2, the output of pts_loss became normal (usually less than 10). I can not find out the reason. Have you ever met this kind of situation? Again, thanks a lot.

opened by Yang-Hao-Lin 3
Question about 91875.68 in Eval_SceneFlow_KITTI_Train_Multi

Hi Junhwa When checking into a class, Eval_SceneFlow_KITTI_Train_Multi, in your code, I found two weird constants: 91875.68 and 91873.4. It does not make sense to me. Why not use something like "gt_*_mask.sum()" here?

opened by Yang-Hao-Lin 2
I have question about depth metric

Your research is so interesting. (selfmono-sf and multimono-sf) Your depth metric does not computing median scaling. But, another KITTI depth estimation model, monodepth2 uses median scaling computation

Why is there a difference in the metric computation method?

Your code multimono-sf https://github.com/visinf/self-mono-sf/blob/054df08674a2df96885682e657ca4803c129a364/losses.py#L402 selfmono-sf https://github.com/visinf/multi-mono-sf/blob/main/losses.py#L513

Monodepth2 KITTI depth metric https://github.com/nianticlabs/monodepth2/blob/master/evaluate_depth.py#L206

opened by Doyosae 2
bash train_selfsup.sh error

I cloned the code and even installed it.

When I run "bash train_selfsup.sh" in scripts, the following error appears: "unrecognized arguments: where checkpoints and log files will be saved)/MonoSceneFlow_Multi-selfsup-20210722-135602 Raw dataset path) Raw dataset path)" Is there any additional folder configuration I need to set up?

opened by Doyosae 1
why your training datasets need stereo images (left and right views of scenes) ?

From the figure 1. (in your paper):

I does not get where you use the stereo images. and your evaluation code does not use them (the davis dataset...)

And can I do self-supervised training without stereo images? Since many wild datasets (open world dataset) do not have them.

opened by TuringKi 1
./eval_ft_test.sh

Hello, after I modified the dataset address ，run ./eval_ft_test.sh , report an error： mian.py:error : unrecognized arguments: --validation_dataset_preprocessing_crop=False. Why is this?Hope to get your answer.

opened by cuiyu330 3

Owner

Visual Inference Lab @TU Darmstadt

GitHub

Just Go with the Flow: Self-Supervised Scene Flow Estimation

Just Go with the Flow: Self-Supervised Scene Flow Estimation Code release for the paper Just Go with the Flow: Self-Supervised Scene Flow Estimation,

50 Nov 22, 2022

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

This is a Pytorch implementation of Janai, J., Güney, F., Ranjan, A., Black, M. and Geiger, A., Unsupervised Learning of Multi-Frame Optical Flow with

110 Nov 2, 2022

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

585 Jan 4, 2023

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PV-RAFT This repository contains the PyTorch implementation for paper "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clou

43 Dec 5, 2022

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

304 Jan 3, 2023

[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Y

118 Dec 26, 2022

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

182 Dec 30, 2022

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

151 Dec 26, 2022

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

110 Dec 23, 2022

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

4 Jul 27, 2022

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

44 Dec 12, 2022

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

Weakly Supervised Learning of Rigid 3D Scene Flow

Weakly Supervised Learning of Rigid 3D Scene Flow This repository provides code and data to train and evaluate a weakly supervised method for rigid 3D

124 Dec 27, 2022

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

49 Jan 7, 2023

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

3k Jan 4, 2023

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

28 Dec 9, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

72 Dec 9, 2022

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

3 Oct 22, 2021