CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching(CVPR2021)

Related tags

Deep Learning CFNet
Overview

CFNet(CVPR 2021)

This is the implementation of the paper CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching, CVPR 2021, Zhelun Shen, Yuchao Dai, Zhibo Rao [Arxiv].

Our method also obtains the 1st place on the stereo task of Robust Vision Challenge 2020

Camera ready version and supplementary Materials can be found in [CVPR official website]

Code has been released.

Abstract

Recently, the ever-increasing capacity of large-scale annotated datasets has led to profound progress in stereo matching. However, most of these successes are limited to a specific dataset and cannot generalize well to other datasets. The main difficulties lie in the large domain differences and unbalanced disparity distribution across a variety of datasets, which greatly limit the real-world applicability of current deep stereo matching models. In this paper, we propose CFNet, a Cascade and Fused cost volume based network to improve the robustness of the stereo matching network. First, we propose a fused cost volume representation to deal with the large domain difference. By fusing multiple low-resolution dense cost volumes to enlarge the receptive field, we can extract robust structural representations for initial disparity estimation. Second, we propose a cascade cost volume representation to alleviate the unbalanced disparity distribution. Specifically, we employ a variance-based uncertainty estimation to adaptively adjust the next stage disparity search space, in this way driving the network progressively prune out the space of unlikely correspondences. By iteratively narrowing down the disparity search space and improving the cost volume resolution, the disparity estimation is gradually refined in a coarse-tofine manner. When trained on the same training images and evaluated on KITTI, ETH3D, and Middlebury datasets with the fixed model parameters and hyperparameters, our proposed method achieves the state-of-the-art overall performance and obtains the 1st place on the stereo task of Robust Vision Challenge 2020.

How to use

Environment

  • python 3.74
  • Pytorch == 1.1.0
  • Numpy == 1.15

Data Preparation

Download Scene Flow Datasets, KITTI 2012, KITTI 2015, ETH3D, Middlebury

KITTI2015/2012 SceneFlow

please place the dataset as described in "./filenames", i.e., "./filenames/sceneflow_train.txt", "./filenames/sceneflow_test.txt", "./filenames/kitticombine.txt"

Middlebury/ETH3D

Our folder structure is as follows:

dataset
├── KITTI2015
├── KITTI2012
├── Middlebury
    │ ├── Adirondack
    │   ├── im0.png
    │   ├── im1.png
    │   └── disp0GT.pfm
├── ETH3D
    │ ├── delivery_area_1l
    │   ├── im0.png
    │   ├── im1.png
    │   └── disp0GT.pfm

Note that we use the full-resolution image of Middlebury for training as the additional training images don't have half-resolution version. We will down-sample the input image to half-resolution in the data argumentation. In contrast, we use the half-resolution image and full-resolution disparity of Middlebury for testing.

Training

Scene Flow Datasets Pretraining

run the script ./scripts/sceneflow.sh to pre-train on Scene Flow datsets. Please update DATAPATH in the bash file as your training data path.

To repeat our pretraining details. You may need to replace the Mish activation function to Relu. Samples is shown in ./models/relu/.

Finetuning

run the script ./scripts/robust.sh to jointly finetune the pre-train model on four datasets, i.e., KITTI 2015, KITTI2012, ETH3D, and Middlebury. Please update DATAPATH and --loadckpt as your training data path and pretrained SceneFlow checkpoint file.

Evaluation

Joint Generalization

run the script ./scripts/eth3d_save.sh", ./scripts/mid_save.sh" and ./scripts/kitti15_save.sh to save png predictions on the test set of the ETH3D, Middlebury, and KITTI2015 datasets. Note that you may need to update the storage path of save_disp.py, i.e., fn = os.path.join("/home3/raozhibo/jack/shenzhelun/cfnet/pre_picture/", fn.split('/')[-2]).

Corss-domain Generalization

run the script ./scripts/robust_test.sh" to test the cross-domain generalizaiton of the model (Table.3 of the main paper). Please update --loadckpt as pretrained SceneFlow checkpoint file.

Pretrained Models

Pretraining Model You can use this checkpoint to reproduce the result we reported in Table.3 of the main paper

Finetuneing Moel You can use this checkpoint to reproduce the result we reported in the stereo task of Robust Vision Challenge 2020

Citation

If you find this code useful in your research, please cite:

@InProceedings{Shen_2021_CVPR,
    author    = {Shen, Zhelun and Dai, Yuchao and Rao, Zhibo},
    title     = {CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {13906-13915}
}

Acknowledgements

Thanks to the excellent work GWCNet, Deeppruner, and HSMNet. Our work is inspired by these work and part of codes are migrated from GWCNet, DeepPruner and HSMNet.

Comments
  • Datasets download and placement can be clearer

    Datasets download and placement can be clearer

    Sorry for bothering you. Your algorithm is very impressive and helpful. But I did not see clearly how to place the data set in "Data Preparation", can you make the download link more clear, or put the datasets you use in google_drive? Thanks a lot for your kind help!

    opened by yujunliuCV 8
  • the question about accuracy in scene flow datasets

    the question about accuracy in scene flow datasets

    Hello,we test your model on scene flow dataset,but the EPE is only 0.97. Is there something wrong with my use of the code or is the EPE is just 0.97. Thank you very much.

    opened by cjd24-coder 3
  • About the three stage strategy

    About the three stage strategy

    In your paper,you mentioned that "we switch the activation function to Mish and prolong the pre-training process in the SceneFlow dataset for another 15 epochs".So,should the learning rate change in another 15 epochs?

    opened by killwy 2
  • When will you release the code?

    When will you release the code?

    Hello, thank you very much for your excellent work. May I ask when you open source? We need to conduct ablation experiment on your work to verify our versatility in an excellent work like yours.

    opened by cjd24-coder 2
  • Problem about pretrained models

    Problem about pretrained models

    Hello, I heard someone say that there was a problem with this code before, I would like to ask if the two pretrained models on the web page are updated now?

    opened by richardlzx 1
  • Evaluation on Middlebury dataset

    Evaluation on Middlebury dataset

    Hi,

    Thank you for the fantastic work. I am just wondering if the results reported for Middlebury in Table 3 cover the non-occluded or the occluded regions?

    Thank you.

    opened by WeiQin-C 1
  •  Some errors in the code

    Some errors in the code

    for cfnet.py, First error: def generate_search_range(self, sample_count, input_min_disparity, input_max_disparity): """ Description: Generates the disparity search range. Returns: :min_disparity: Lower bound of disparity search range :max_disparity: Upper bound of disaprity search range. """

        min_disparity = torch.clamp(input_min_disparity - torch.clamp((
                sample_count - input_max_disparity + input_min_disparity), min=0) / 2.0, min=0, max=self.maxdisp)
        max_disparity = torch.clamp(input_max_disparity + torch.clamp(
                sample_count - input_max_disparity + input_min_disparity, min=0) / 2.0, min=0, max=self.maxdisp)
    
        return min_disparity, max_disparity
    

    it should be "min_disparity = torch.clamp(input_min_disparity - torch.clamp(( sample_count - input_max_disparity + input_min_disparity), min=0) / 2.0, min=0, max=self.maxdisp//4-1) max_disparity = torch.clamp(input_max_disparity + torch.clamp( sample_count - input_max_disparity + input_min_disparity, min=0) / 2.0, min=0, max=self.maxdisp//4-1)" or "min_disparity = torch.clamp(input_min_disparity - torch.clamp(( sample_count - input_max_disparity + input_min_disparity), min=0) / 2.0, min=0, max=self.maxdisp//2-1) max_disparity = torch.clamp(input_max_disparity + torch.clamp( sample_count - input_max_disparity + input_min_disparity, min=0) / 2.0, min=0, max=self.maxdisp//2-1)"

    Second error: in line 643 of cfnet.py, it should be "predmid_s2 = F.upsample(predmid_s2 * 2, [left.size()[2], left.size()[3]], mode='bilinear', align_corners=True)", not "predmid_s2 = F.upsample(predmid_s2 * 4, [left.size()[2], left.size()[3]], mode='bilinear', align_corners=True)"

    opened by gangweiX 1
  • About HITNet

    About HITNet

    Your paper is great and efficient. I want to make further improvements on your basis.But I have a doubt, in your paper, HITNet's inference time is 0.015s. But i didn't find official code of HITNet, how can I test its inference time on my GPUs. Can you give some guidance?Thanks!

    opened by gangweiX 1
  • about Table 3 in the paper

    about Table 3 in the paper

    Hello, thanks for the good work. Just about the Cross-domain generalization evaluation of PSMNet in Table 3. In the Table 3, the KITTI2015 D1_all of PSMNet trained on Scene Flow datatest is 16.3, while we got the 28.7, which is far from that reported in your paper. And the pre-trained model from github performances 28. Wondering the reason about it. Thanks.

    opened by jiaw-z 1
  • Error when robust_test

    Error when robust_test

    I want to evaluate the accuracy of the self-trained checkpoints on Kitti, running Robust_Test.py, but get the following error ading model /home/rc/20220410StereoMatching/CFNet/checkpoints/sceneflow/pretrained/checkpoint_000009.ckpt start at epoch 0 downscale epochs: [300], downscale rate: 10.0 setting learning rate to 0.001 Traceback (most recent call last): File "robust_test.py", line 335, in <module> train() File "robust_test.py", line 163, in train loss, scalar_outputs, image_outputs = test_sample(sample, compute_metrics=do_summary) File "/home/rc/20220410StereoMatching/CFNet/utils/experiment.py", line 30, in wrapper ret = func(*f_args, **f_kwargs) File "robust_test.py", line 280, in test_sample imgL, imgR, disp_gt = sample['left'], sample['right'], sample['disparity'] KeyError: 'disparity'

    opened by rebecca0011 0
  • RuntimeError: Legacy autograd function with non-static forward method is deprecated

    RuntimeError: Legacy autograd function with non-static forward method is deprecated

    during training or robust, there is problem like this: RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) anybody knows how to solve it?

    opened by HamedRK89 0
  • Problem to export model from PyTorch to TensorFlow

    Problem to export model from PyTorch to TensorFlow

    Hi, Thank you for your work, I find it really useful and I am trying to embed it for a test in a real-time environment. In order to do that, I want to export the model to TensorFlowLite, so that I could do small changes (like quantization) which is more efficient than doing them with PyTorch. To export the model to TFlite, I first exported it to ONNX and now I'm trying to export it from ONNX to TF with onnx-tf library. I am using opset_version=11, the lowest version compatible with all the PyTorch operations in CFNet.

    However I faced many problems in my journey, first I had a dimension problem with the conversion to ONNX so I decided to use a fixed input size for the images (wh = 512768). I tested the results with the Middlebury SDK (I have done a resize on the input images, run my ONNX model and then resized the disparity maps I get) and these results are quite good.

    Then to export to TF, I first had an issue with an unsupported operation :

    RuntimeError: Resize coordinate_transformation_mode=pytorch_half_pixel is not supported in Tensorflow.

    I tried to add "align_corners=True" inside the upsample functions in the model code, and it solved the problem

    Right now I am facing an other issue but I didn't find any way to solve it, here are the logs :

    Traceback (most recent call last):
      File "/path/scripts/../export_TF.py", line 17, in <module>
     
       tf_rep.export_graph("%s/%s.pb" % (args.outdir,input_model_name))
    
      File "/path/onnx-tensorflow/onnx_tf/backend_rep.py", line 143, in export_graph
    
    signatures=self.tf_module.__call__.get_concrete_function(
    
     File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 1264, in get_concrete_function
    
        concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
    
     File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 1244, in _get_concrete_function_garbage_collected
    
        self._initialize(args, kwargs, add_initializers_to=initializers)
    
      File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 785, in _initialize
    
        self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
    
      File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2983, in _get_concrete_function_internal_garbage_collected
    
        graph_function, _ = self._maybe_define_function(args, kwargs)
    
      File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function
    
        graph_function = self._create_graph_function(args, kwargs)
    
      File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3130, in _create_graph_function func_graph_module.func_graph_from_py_func(
    
      File "/path/venv/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func
    
        func_outputs = python_func(*func_args, **func_kwargs)
    
      File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
    
        out = weak_wrapped_fn().__wrapped__(*args, **kwds)
    
      File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3831, in bound_method_wrapper
    
        return wrapped_fn(*args, **kwargs)
    
      File "/path/venv/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1147, in autograph_handler
    
        raise e.ag_error_metadata.to_exception(e)
    
    ValueError: in user code:
    
        File "/path/onnx-tensorflow/onnx_tf/backend_tf_module.py", line 99, in __call__  *
    
            output_ops = self.backend._onnx_node_to_tensorflow_op(onnx_node,
    
        File "/path/onnx-tensorflow/onnx_tf/backend.py", line 347, in _onnx_node_to_tensorflow_op  *
    
            return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
    
        File "/path/onnx-tensorflow/onnx_tf/handlers/handler.py", line 58, in handle  *
    
            cls.args_check(node, **kwargs)
    
        File "/path/onnx-tensorflow/onnx_tf/handlers/backend/resize.py", line 68, in args_check  *
    
            x_shape = x.get_shape().as_list()
    
    ValueError: as_list() is not defined on an unknown TensorShape.
    

    Using netron.app, I found that the node 10454 seemed to have a dimension problem (which corresponds to the upsample operation at the line 660 of cfnet.py), so I tried to hardcode all the dimensions with my input size :

    pred1_s2 = F.upsample(pred1_s2 * 2, [512, 768], mode='bilinear', align_corners=True)

    but it didn't resolve my problem at all, and I really don't have any idea on how to solve it. My TF version is 2.8.0

    Did you already tried (and succeeded) to export the model to TensorFlow, and if so how did you do it ? If not, do you have any idea on how I could solve this problem ?

    Thank you.

    opened by BigNicoG 0
  • how to obtain the same performance as the given pretrained model

    how to obtain the same performance as the given pretrained model

    I tried to train the cf-net model using the code from the github and just replace the Mish activation function to Relu for the first 20 epoches and then back to Mish for another 15 epochs just as the paper described. But the performance of the trained model is far from that by the pretrained model given in the gitlab. So what's wrong with my training ? is there any parameter that shoud be modified? I used ./scripts/sceneflow.sh on two V100 GPUs

    opened by xzy-yqm 9
  • preprocess for predicting Custom dataset

    preprocess for predicting Custom dataset

    Hi. I am thinking of applying your method to my own custom dataset. So, I added the following code to save_disp.py's main with reference to datasets/sceneflow_dataset.py.

    # test one sample
    # @make_nograd_func
    # def test_sample(sample):
    #     model.eval()
    #     disp_ests, pred1_s3_up, pred2_s4 = model(sample['left'].cuda(), sample['right'].cuda())
    #     return disp_ests[-1]
    @make_nograd_func
    def test_sample(left, right):
        model.eval()
        disp_ests, pred1_s3_up, pred2_s4 = model(left.cuda(), right.cuda())
        return disp_ests[-1]
    
    
    if __name__ == '__main__':
        left_img = Image.open("/media/A/left/0.png").convert("RGB")
        right_img = Image.open("/media/A/right/0.png").convert("RGB")
    
        w, h = left_img.size
        crop_w, crop_h = 950, 512
        left_img = left_img.crop((w-crop_w, h-crop_h, w, h))
        right_img = right_img.crop((w-crop_w, h-crop_h, w, h))
    
        processed = get_transform()
        left_img = processed(left_img)
        right_img = processed(right_img)
        test_sample(left_img, right_img)
    
    

    Then I get the following error.

    Mish activation loaded...
    Mish activation loaded...
    Mish activation loaded...
    Mish activation loaded...
    Mish activation loaded...
      File "/home/ubuntu/Apps/CFNet/models/cfnet.py", line 136, in forward
        x = self.firstconv(x)
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
        input = module(input)
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
        input = module(input)
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 3, 3, 3], but got 3-dimensional input of size [3, 512, 950] instead
    

    Probably this is due to wrong input to the preprocessing network.

    how can I generate a disparity image with a custom dataset?

    opened by jahad9819jjj 2
Owner
null
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

Yuming Jiang 151 Dec 26, 2022
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

Princeton Vision & Learning Lab 328 Jan 9, 2023
Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Gesture-Volume-Control This Python program can adjust the system's volume by usi

VatsalAryanBhatanagar 1 Dec 30, 2021
Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

Hand Gesture Volume Control Modules There are basically three modules Handtracking Program Handtracking Module Volume Control Program Handtracking Pro

VITTAL 1 Jan 12, 2022
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
Lightweight stereo matching network based on MobileNetV1 and MobileNetV2

MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching

Cognitive Systems Research Group 139 Nov 30, 2022
the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet]

BGNet This repository contains the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet] Environment Python 3.6.* C

3DCV developer 87 Nov 29, 2022
Stacked Recurrent Hourglass Network for Stereo Matching

SRH-Net: Stacked Recurrent Hourglass Introduction This repository is supplementary material of our RA-L submission, which helps reviewers to understan

null 28 Jan 3, 2023
Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

This is a Pytorch Lightning version PSMNet which is based on JiaRenChang/PSMNet. use python main.py to start training. PSM-Net Pytorch reimplementatio

XIAOTIAN LIU 1 Nov 25, 2021
The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

PlantStereo This is the official implementation code for the paper "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction".

Wang Qingyu 14 Nov 28, 2022
✨✨✨An awesome open source toolbox for stereo matching.

OpenStereo This is an awesome open source toolbox for stereo matching. Supported Methods: BM SGM(T-PAMI'07) GCNet(ICCV'17) PSMNet(CVPR'18) StereoNet(E

Wang Qingyu 6 Nov 4, 2022
Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation This is a PyTorch and LibTorch implementation of MarkerPose: a

Jhacson Meza 47 Nov 18, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

null 68 Dec 14, 2022
MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

MASA-SR Official PyTorch implementation of our CVPR2021 paper MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Re

DV Lab 126 Dec 20, 2022
[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

Jinpeng Wang 150 Dec 28, 2022
Code for CVPR2021 paper "Robust Reflection Removal with Reflection-free Flash-only Cues"

Robust Reflection Removal with Reflection-free Flash-only Cues (RFC) Paper | To be released: Project Page | Video | Data Tensorflow implementation for

Chenyang LEI 162 Jan 5, 2023
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

null 153 Dec 14, 2022
PRTR: Pose Recognition with Cascade Transformers

PRTR: Pose Recognition with Cascade Transformers Introduction This repository is the official implementation for Pose Recognition with Cascade Transfo

mlpc-ucsd 133 Dec 30, 2022
Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

longlongman 170 Dec 1, 2022