Deep Two-View Structure-from-Motion Revisited

Overview

Deep Two-View Structure-from-Motion Revisited

This repository provides the code for our CVPR 2021 paper Deep Two-View Structure-from-Motion Revisited.

We have provided the functions for training, validating, and visualization.

Note: some config flags are designed for ablation study, and we have a plan to re-org the codes later. Please feel free to submit issues if you feel confused about some parts.

Requirements

Python = 3.6.x
Pytorch >= 1.6.0
CUDA >= 10.1

and the others could be installed by

pip install -r requirements.txt

Pytorch from 1.1.0 to 1.6.0 should also work well, but it will disenable mixed precision training, and we have not tested it.

To use the RANSAC five-point algorithm, you also need to

cd RANSAC_FiveP

python setup.py install --user

The CUDA extension would be installed as 'essential_matrix'. Tested under Ubuntu and CUDA 10.1.

Models

Pretrained models are provided here.

KITTI Depth

To reproduce our results, please first download the KITTI dataset RAW data and 14GB official depth maps. You should also download the split files provided by us, and unzip them into the root of the KITTI raw data. Then, modify the gt_depth_dir (KITTI_loader.py, L278) to the address of KITTI official depth maps.

For training,

python main.py -b 32 --lr 0.0005 --nlabel 128 --fix_flownet \
--data PATH/TO/YOUR/KITTI/DATASET --cfg cfgs/kitti.yml \
--pretrained-depth depth_init.pth.tar --pretrained-flow flow_init.pth.tar

For evaluation,

python main.py -v -b 1 -p 1 --nlabel 128 \
--data PATH/TO/YOUR/KITTI/DATASET --cfg cfgs/kitti.yml \
--pretrained kitti.pth.tar"

The default evaluation split is Eigen, where the metric abs_rel should be around 0.053 and rmse should be close to 2.22. If you would like to use the Eigen SfM split, please set cfg.EIGEN_SFM = True and cfg.KITTI_697 = False.

KITTI Pose

For fair comparison, we use a KITTI odometry evaluation toolbox as provided here. Please generate poses by sequence, and evaluate the results correspondingly.

Acknowledgment:

Thanks Shihao Jiang and Dylan Campbell for sharing the implementation of the GPU-accelerated RANSAC Five-point algorithm. We really appreciate the valuable feedback from our area chairs and reviewers. We would like to thank Charles Loop for helpful discussions and Ke Chen for providing field test images from NVIDIA AV cars.

BibTex:

@article{wang2021deep,
  title={Deep Two-View Structure-from-Motion Revisited},
  author={Wang, Jianyuan and Zhong, Yiran and Dai, Yuchao and Birchfield, Stan and Zhang, Kaihao and Smolyanskiy, Nikolai and Li, Hongdong},
  journal={CVPR},
  year={2021}
}
Comments
  • About RAW data download

    About RAW data download

    Hi,

    First thanks for the great work and nice paper. I have one question about the dataset. In readme it mentioned to download RAW data from kitti. However I'm wonder which scene category do we need? (e.g., city, campus, road, ... or all of the scene?)

    And maybe I miss this information in the paper, but which kind of kitti scene is used for training? I only see that for depth evaluation of kitti dataset, Eigen is used. But not sure which dataset is used for training.

    Thanks again!

    opened by HencyChen 4
  • Abouth the usage of cfg.NORM_TARGET

    Abouth the usage of cfg.NORM_TARGET

    Hi, thanks for sharing the impressive work.

    After browsing the code, I am confused about a hyper-parameter NORM_TARGET. Could you kindly explain the usage of this hyper-parameter?

    Thanks.

    opened by xhchen10 3
  • essential_matrix' has no attribute 'initialise'

    essential_matrix' has no attribute 'initialise'

    I've installed essential_matrix module as per the README file, i am able to import the module. However, i ran into "essential_matrix' has no attribute 'initialise'"

    My installation process was successful. Could you provide some leads on how to solve this?

    Thank you

    opened by nivesh48 1
  • About KITTI gt depth

    About KITTI gt depth

    Great work! I am confused about the gt_depth_dir.

    • Which gt depth of KITTI should we use to reproduce the results (absrel around 0.053 and rmse close to 2.22) ? The raw velodyne_points (.bin) in KITTI RAW data, the projected velodyne_raw (.png) or the groundtruth (.png) in 14GB official depth maps? I guess it would be raw velodyne_points (.bin) ?
    • But, then what is the usage of 14GB official depth maps?

    Looking forward to your reply! Thanks!

    opened by longyangqi 1
  • Abouth using median value to solve scale ambiguity when evualating

    Abouth using median value to solve scale ambiguity when evualating

    Hi, thanks for sharing the impressive work.

    According to the codes at Line 576-585 in main.py, you use the ratio between the median values of predicted and GT depth to scale the predicted depth. However, the predicted depth has been scaled by the GT scale \alpha_gt (see Line 536-541 in main.py). Hence, I am confused about why the rescaling operation by the ratio of median values is necessary (the performance would drop significantly without it).

    Could you kindly help me resolve the confusion. Thank you so much.

    opened by xhchen10 0
  • Missing dependencies in requirements.txt like minieigen

    Missing dependencies in requirements.txt like minieigen

    Hello,

    Not all dependencies are listed in the requirements.txt. Specifically, I have problems installing minieigen. How do you install the package and which version do you use? sudo apt-get python3-minieigen installs a version for python 3.8. I found no other way but potentially building it from source. Is that right?

    opened by SvenDierfeld 3
  • An exception occurred while training about finding NaN or Inf in the input tensor.

    An exception occurred while training about finding NaN or Inf in the input tensor.

    Best ,when I using 2011_09_26/2011_09_26_drive_0002_sync for training, I found that the loss_depth corresponding to line 392 of the main function will appear NAN. I also suspect that the learning rate is too large. Even if I set it to --lr =0, the following picture will still appear. , please give me some advice if you know how to slove , thanks 2022-05-12 21-03-30 的屏幕截图

    opened by wangxiyuan9 1
  • Evaluation of Pose

    Evaluation of Pose

    Hello,

    I want to reproduce the results of pose evaluation with your method. During this process, I was confused by some problems as blow:

    1. Following issue #8 , I predicted rel_pose with your model first, transformed them into abs_pose using VO evaluation code provided in your answer, and evaluated them with KITTI odometry evaluation toolbox mentioned in README.md.. But the result is also not so good, closing to results in the issue. I found in the code, that the pose is calculated by RANSAC with default choice, which means just using flow matches to get the pose(is that right?). Is it because the default choice that caused the result not good, or the default choice is enough to get the best result? I was wondering how to set the config to get the best result as in Table 3 of your paper.

    2. As in Table 5, there are many choice to calculate pose. But I don't know how to use them in your code . There are so many flags in config.py, and some of them are dummy. Which flags should I set 'True' to use,for example, the method with best performence in Table 5 as '5-point' + ' Flow matches '+ 'SIFT Loc'?

    @jytime can you help me with them? Thank you very much!

    Best

    opened by CSTXBH 3
  • run demon.py and evaluate.py cause same errors

    run demon.py and evaluate.py cause same errors

    Hello , thanks great work,@jytime
    I do as your README said , when run demo.py and evaluate.py in model.RAFT , but cause some issues run demo.py run_demo_errors I already download the Sintel of dataset and download_models.,but when i try python evaluate.py --model=models/raft-things.pth --dataset=sintel --mixed_precision.it has some errors ,. run evaluate.py errors 2021-12-18 16-15-27 的屏幕截图 i found this cause by evaluate.py in line 116 for "flow_low, flow_pr = model(image1, image2, iters=iters, test_mode=True)",seems only one image could as parameter otherwise cause such errors as error picture shows. But change it also don't work . I could not solve it , could you give me some advice ? thanks very much My environment : ubuntu20.04 cuda:11.1(RTX3060 needsotherwise error) torch==1.8.0 , other satisfy README.md

    opened by wangxiyuan9 2
Owner
Jianyuan Wang
Computer Vision
Jianyuan Wang
COLMAP - Structure-from-Motion and Multi-View Stereo

COLMAP About COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface.

null 4.7k Jan 7, 2023
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
Video Autoencoder: self-supervised disentanglement of 3D structure and motion

Video Autoencoder: self-supervised disentanglement of 3D structure and motion This repository contains the code (in PyTorch) for the model introduced

null 157 Dec 22, 2022
Making Structure-from-Motion (COLMAP) more robust to symmetries and duplicated structures

SfM disambiguation with COLMAP About Structure-from-Motion generally fails when the scene exhibits symmetries and duplicated structures. In this repos

Computer Vision and Geometry Lab 193 Dec 26, 2022
SatelliteSfM - A library for solving the satellite structure from motion problem

Satellite Structure from Motion Maintained by Kai Zhang. Overview This is a libr

Kai Zhang 190 Dec 8, 2022
This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

Sumith Kulal 40 Dec 5, 2022
Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance This is the codebase for video-based human motion reconstruction in human-mot

Jiachen Xu 5 Jul 14, 2022
Implementation of ICCV19 Paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network"

OANet implementation Pytorch implementation of OANet for ICCV'19 paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network", by

Jiahui Zhang 225 Dec 5, 2022
[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Y

null 118 Dec 26, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

null 63 Dec 16, 2022
Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Overview of The Code BaseColab/MLDL_FPAR.pdf: it contains the full explanation of our work Base Colab: it contains the base colab used to perform all

Simone Papicchio 4 Jul 16, 2022
Code release of paper "Deep Multi-View Stereo gone wild"

Deep MVS gone wild Pytorch implementation of "Deep MVS gone wild" (Paper | website) This repository provides the code to reproduce the experiments of

François Darmon 53 Dec 24, 2022
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

Potter Hsu 182 Jan 3, 2023
Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

Ken Lin 17 Oct 12, 2022
Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

Gengshan Yang 157 Nov 21, 2022
TransGAN: Two Transformers Can Make One Strong GAN

[Preprint] "TransGAN: Two Transformers Can Make One Strong GAN", Yifan Jiang, Shiyu Chang, Zhangyang Wang

VITA 1.5k Jan 7, 2023