FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

Related tags

Deep Learning FADNet-PP

Overview

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

This repository contains the code (in PyTorch) for the "FADNet++" paper.

Introduction
Usage
Results
Acknowledgement
Contacts

Introduction

We propose an efficient and accurate deep network for disparity estimation named FADNet with three main features:

It exploits efficient 2D based correlation layers with stacked blocks to preserve fast computation.
It combines the residual structures to make the deeper model easier to learn.
It contains multi-scale predictions so as to exploit a multi-scale weight scheduling training technique to improve the accuracy.

Usage

Dependencies

Python2.7
PyTorch(1.2.0+)
torchvision 0.2.0 (higher version may cause issues)
KITTI Stereo
Scene Flow

Package Installation

Execute "sh compile.sh" to compile libraries needed by GANet.
Enter "layers_package" and execute "sh install.sh" to install customized layers, including Channel Normalization layer and Resample layer.

We also release the docker version of this project, which has been configured completely and can be used directly. Please refer to this website for the image.

Usage of Scene Flow dataset
Download RGB cleanpass images and its disparity for three subset: FlyingThings3D, Driving, and Monkaa. Organize them as follows:
- FlyingThings3D_release/frames_cleanpass
- FlyingThings3D_release/disparity
- driving_release/frames_cleanpass
- driving_release/disparity
- monkaa_release/frames_cleanpass
- monkaa_release/disparity
Put them in the data/ folder (or soft link). The *train.sh* defaultly locates the data root path as data/.

Train

We use template scripts to configure the training task, which are stored in exp_configs. One sample "fadnet.conf" is as follows:

net=fadnet
loss=loss_configs/fadnet_sceneflow.json
outf_model=models/${net}-sceneflow
logf=logs/${net}-sceneflow.log

lr=1e-4
devices=0,1,2,3
dataset=sceneflow
trainlist=lists/SceneFlow.list
vallist=lists/FlyingThings3D_release_TEST.list
startR=0
startE=0
batchSize=16
maxdisp=-1
model=none
#model=fadnet_sceneflow.pth

Parameter	Description	Options
net	network architecture name	dispnets, dispnetc, dispnetcss, fadnet, psmnet, ganet
loss	loss weight scheduling configuration file	depends on the training scheme
outf_model	folder name to store the model files	\
logf	log file name	\
lr	initial learning rate	\
devices	GPU device IDs to use	depends on the hardware system
dataset	dataset name to train	sceneflow
train(val)list	sample lists for training/validation	\
startR	the round index to start training (for restarting training from the checkpoint)	\
startE	the epoch index to start training (for restarting training from the checkpoint)	\
batchSize	the number of samples per batch	\
maxdisp	the maximum disparity that the model tries to predict	\
model	the model file path of the checkpoint	\

We have integrated PSMNet and GANet for comparison. The sample configuration files are also given.

To start training, use the following command, dnn=CONFIG_FILE sh train.sh, such as:

dnn=fadnet sh train.sh

You do not need the suffix for CONFIG_FILE.

Evaluation

We have two modes for performance evaluation, test and detect, respectively. test requires that the testing samples should have ground truth of disparity and then reports the average End-point-error (EPE). detect does not require any ground truth for EPE computation. However, detect stores the disparity maps for each sample in the given list.

For the test mode, one can revise test.sh and run sh test.sh. The contents of test.sh are as follows:

net=fadnet
maxdisp=-1
dataset=sceneflow
trainlist=lists/SceneFlow.list
vallist=lists/FlyingThings3D_release_TEST.list

loss=loss_configs/test.json
outf_model=models/test/
logf=logs/${net}_test_on_${dataset}.log

lr=1e-4
devices=0,1,2,3
startR=0
startE=0
batchSize=8
model=models/fadnet.pth
python main.py --cuda --net $net --loss $loss --lr $lr \
               --outf $outf_model --logFile $logf \
               --devices $devices --batch_size $batchSize \
               --trainlist $trainlist --vallist $vallist \
               --dataset $dataset --maxdisp $maxdisp \
               --startRound $startR --startEpoch $startE \
               --model $model

Most of the parameters in test.sh are similar to training. However, you can just ignore parameters, including trainlist, loss, outf_model, since they are not used in the test mode.

For the detect mode, one can revise detect.sh and run sh detect.sh. The contents of detect.sh are as follows:

net=fadnet
dataset=sceneflow

model=models/fadnet.pth
outf=detect_results/${net}-${dataset}/

filelist=lists/FlyingThings3D_release_TEST.list
filepath=data

CUDA_VISIBLE_DEVICES=0 python detecter.py --model $model --rp $outf --filelist $filelist --filepath $filepath --devices 0 --net ${net}

You can revise the value of outf to change the folder that stores the predicted disparity maps.

Finetuning on KITTI datasets and result submission

We re-use the codes in PSMNet to finetune the pretrained models on KITTI datasets and generate disparity maps for submission. Use finetune.sh and submission.sh to do them respectively.

Pretrained Model

Update: 2020/2/6 We released the pre-trained Scene Flow model.

KITTI 2015	Scene Flow	KITTI 2012
/	Google Drive	/

Results

Results on Scene Flow dataset

Model	EPE	GPU Memory during inference (GB)	Runtime (ms) on Tesla V100
FADNet	0.83	3.87	48.1
DispNetC	1.68	1.62	18.7
PSMNet	1.09	13.99	399.3
GANet	0.84	29.1	2251.1

Citation

If you find the code and paper is useful in your work, please cite our conference paper

@inproceedings{wang2020fadnet,
  title={{FADNet}: A Fast and Accurate Network for Disparity Estimation},
  author={Wang, Qiang and Shi, Shaohuai and Zheng, Shizhen and Zhao, Kaiyong and Chu, Xiaowen},
  booktitle={2020 {IEEE} International Conference on Robotics and Automation ({ICRA} 2020)},
  pages={101--107},
  year={2020}
}

Acknowledgement

We acknowledge the following repositories and papers since our project has used some codes of them.

Contacts

[email protected]

Any discussions or concerns are welcomed!

You might also like...

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Build Type Linux MacOS Windows Build Status OpenPose has represented the first real-time multi-person system to jointly detect human body, hand, facia

25.7k Jan 9, 2023

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

368 Dec 26, 2022

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Real-time RGBD-based Extended Body Pose Estimation This repository is a real-time demo for our paper that was published at WACV 2021 conference The ou

118 Dec 26, 2022

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

49 Jan 7, 2023

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" Introduction This repo is official PyTorch implementatio

203 Jan 5, 2023

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

3k Jan 4, 2023

Real-time pose estimation accelerated with NVIDIA TensorRT

trt_pose Want to detect hand poses? Check out the new trt_pose_hand project for real-time hand pose and gesture recognition! trt_pose is aimed at enab

803 Jan 6, 2023

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

28 Dec 9, 2022

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型，分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

Related tags

Overview

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

Contents

Introduction

Usage

Dependencies

Package Installation

Train

Evaluation

Finetuning on KITTI datasets and result submission

Pretrained Model

Results

Results on Scene Flow dataset

Citation

Acknowledgement

Contacts

You might also like...

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Real-time pose estimation accelerated with NVIDIA TensorRT

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

Owner

HKBU High Performance Machine Learning Lab

Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

Face Library is an open source package for accurate and real-time face detection and recognition

Real-Time-Student-Attendence-System - Real Time Student Attendence System

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

A lightweight deep network for fast and accurate optical flow estimation.

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.