Self-Supervised Multi-Frame Monocular Scene Flow
3D visualization of estimated depth and scene flow (overlayed with input image) from temporally consecutive images.
Trained on KITTI in a self-supervised manner, and tested on DAVIS.
This repository is the official PyTorch implementation of the paper:
Self-Supervised Multi-Frame Monocular Scene Flow
Junhwa Hur and Stefan Roth
CVPR, 2021
Arxiv
- Contact: junhwa.hur[at]gmail.com
Installation
The code has been tested with Anaconda (Python 3.8), PyTorch 1.8.1 and CUDA 10.1 (Different Pytorch + CUDA version is also compatible).
Please run the provided conda environment setup file:
conda env create -f environment.yml
conda activate multi-mono-sf
(Optional) Using the CUDA implementation of the correlation layer accelerates training (~50% faster):
./install_correlation.sh
After installing it, turn on this flag --correlation_cuda_enabled=True
in training/evaluation script files.
Dataset
Please download the following to datasets for the experiment:
- KITTI Raw Data (synced+rectified data, please refer MonoDepth2 for downloading all data more conveniently.)
- merge KITTI Scene Flow 2015 and Multi-view extension in the same folder.
To save space, we convert the KITTI Raw png images to jpeg, following the convention from MonoDepth:
find (data_folder)/ -name '*.png' | parallel 'convert {.}.png {.}.jpg && rm {}'
We also converted images in KITTI Scene Flow 2015 as well. Please convert the png images in image_2
and image_3
into jpg and save them into the seperate folder image_2_jpg
and image_3_jpg
.
To save space further, you can delete the velodyne point data in KITTI raw data as we don't need it.
Training and Inference
The scripts folder contains training/inference scripts.
For self-supervised training, you can simply run the following script files:
Script | Training | Dataset |
---|---|---|
./train_selfsup.sh |
Self-supervised | KITTI Split |
Fine-tuning is done with two stages: (i) first finding the stopping point using train/valid split, and then (ii) fune-tuning using all data with the found iteration steps.
Script | Training | Dataset |
---|---|---|
./ft_1st_stage.sh |
Semi-supervised finetuning | KITTI raw + KITTI 2015 |
./ft_2nd_stage.sh |
Semi-supervised finetuning | KITTI raw + KITTI 2015 |
In the script files, please configure these following PATHs for experiments:
DATA_HOME
: the directory where the training or test is located in your local system.EXPERIMENTS_HOME
: your own experiment directory where checkpoints and log files will be saved.
To test pretrained models, you can simply run the following script files:
Script | Training | Dataset |
---|---|---|
./eval_selfsup_train.sh |
self-supervised | KITTI 2015 Train |
./eval_ft_test.sh |
fine-tuned | KITTI 2015 Test |
./eval_davis.sh |
self-supervised | DAVIS (one scene) |
./eval_davis_all.sh |
self-supervised | DAVIS (all scenes) |
- To save visuailization of outputs, please turn on
--save_vis=True
in the script. - To save output images for KITTI Scene Flow 2015 Benchmark submission, please turn on
--save_out=True
in the script.
Pretrained Models
The checkpoints folder contains the checkpoints of the pretrained models.
Acknowledgement
Please cite our paper if you use our source code.
@inproceedings{Hur:2021:SSM,
Author = {Junhwa Hur and Stefan Roth},
Booktitle = {CVPR},
Title = {Self-Supervised Multi-Frame Monocular Scene Flow},
Year = {2021}
}
- Portions of the source code (e.g., training pipeline, runtime, argument parser, and logger) are from Jochen Gast