Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

Overview

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

This repository contains MegEngine implementation of our paper:

hydrussoftware

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation
Jiankun Li, Peisen Wang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, Shuaicheng Liu
CVPR 2022

arXiv | BibTeX

Datasets

The Proposed Dataset

Download

There are two ways to download the dataset(~400GB) proposed in our paper:

  • Download using shell scripts dataset_download.sh
sh dataset_download.sh

the dataset will be downloaded and extracted in ./stereo_trainset/crestereo

  • Download from BaiduCloud here(Extraction code: aa3g) and extract the tar files manually.

Disparity Format

The disparity is saved as .png uint16 format which can be loaded using opencv imread function:

def get_disp(disp_path):
    disp = cv2.imread(disp_path, cv2.IMREAD_UNCHANGED)
    return disp.astype(np.float32) / 32

Other Public Datasets

Other public datasets we use including

Dependencies

CUDA Version: 10.1, Python Version: 3.6.9

  • MegEngine v1.8.2
  • opencv-python v3.4.0
  • numpy v1.18.1
  • Pillow v8.4.0
  • tensorboardX v2.1
python3 -m pip install -r requirements.txt

We also provide docker to run the code quickly:

docker run --gpus all -it -v /tmp:/tmp ylmegvii/crestereo
shotwell /tmp/disparity.png

Inference

Download the pretrained MegEngine model from here and run:

python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

Training

Modify the configurations in cfgs/train.yaml and run the following command:

python3 train.py

You can launch a TensorBoard to monitor the training process:

tensorboard --logdir ./train_log

and navigate to the page at http://localhost:6006 in your browser.

Acknowledgements

Part of the code is adapted from previous works:

We thank all the authors for their awesome repos.

Citation

If you find the code or datasets helpful in your research, please cite:

@misc{Li2022PracticalSM,
      title={Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation},
      author={Jiankun Li and Peisen Wang and Pengfei Xiong and Tao Cai and Ziwei Yan and Lei Yang and Jiangyu Liu and Haoqiang Fan and Shuaicheng Liu},
      year={2022},
      eprint={2203.11483},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • Is CUDA 11.6 supported?

    Is CUDA 11.6 supported?

    This is a really promising project, congratulations and thanks for releasing it!

    I'm trying to run the test script with your Eth3d model and this command: python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

    But the code hangs up and doesn't return from this line in extractor.py:82: self.conv2 = M.Conv2d(128, output_dim, kernel_size=1)

    which is called form load_model in test.py:15 model = Model(max_disp=256, mixed_precision=False, test_mode=True)

    My GPU is NVIDIA RTX A6000 and the CUDA version on the system is v11.6

    opened by hasnainv 14
  • Results on Holopix50k dataset

    Results on Holopix50k dataset

    Hello! Thank you for sharing the codes and the model. I tested the pre-trained model on Holopix50k test dataset, but didn't get similar results that you showed on the paper. If I would like to run crestereo_eth3d.mge model on this dataset, does it require different parameter setting or pre-preprocessing? How I can get the similar results on Holopix50k dataset? Any advice would be very helpful. Thank you in advance! 0001 0002 0007 0008

    opened by coffeehanjan 4
  • Did you obtain results on Holopix50k with published model?

    Did you obtain results on Holopix50k with published model?

    I've tried to run published model with few images from Holopix50k and got awful results. Can you please tell how to obtain results similar to paper? Another model / another preprocessing?

    opened by shkarupa-alex 3
  • TypeError: pad() got an unexpected keyword argument 'pad_witdth' in test.py

    TypeError: pad() got an unexpected keyword argument 'pad_witdth' in test.py

    Good job! May I ask a question?

    I tried to run the test.py on a V100 with the Cuda version being 10.2. The data is from ./img, and I set the size being 1280*720, the same as the original size. But I meet the following error:

    File "CREStereo/nets/corr.py", line 42, in get_correlation (0, 0), (0, 0), (pady, pady), (padx, padx)), mode="replicate") TypeError: pad() got an unexpected keyword argument 'pad_witdth'

    It means that I may use the wrong type, but I checked the code and did not find the problems: `

    def pad( src: Tensor, pad_width: Tuple[Tuple[int, int], ...], mode: str = "constant", constant_value: float = 0.0, ) -> Tensor: r"""Pads the input tensor.

    Args:
        pad_width: A tuple. Each element in the tuple is the tuple of 2-elements,
            the 2 elements represent the padding size on both sides of the current dimension, ``(front_offset, back_offset)``
        mode: One of the following string values. Default: ``'constant'``
    
            * ``'constant'``: Pads with a constant value.
            * ``'reflect'``: Pads with the reflection of the tensor mirrored on the first and last values of the tensor along each axis.
            * ``'replicate'``: Pads with the edge values of tensor.
        constant_val: Fill value for ``'constant'`` padding. Default: 0
    
    Examples:
        >>> import numpy as np
        >>> inp = Tensor([[1., 2., 3.],[4., 5., 6.]])
        >>> inp
        Tensor([[1. 2. 3.]
         [4. 5. 6.]], device=xpux:0)
        >>> F.nn.pad(inp, pad_width=((1, 1),), mode="constant")
    

    `

    I used the right Tuple type, but something wrong happened.

    opened by city19992 2
  • MegEngine 1.9.0 causes test.py error

    MegEngine 1.9.0 causes test.py error

    I have been playing around a bit with the code (thank you so much, by the way. Having heaps of fun with it) and found out that MegEngine 1.9.0 causes test.py to die with the following output:

    Images resized: 1024x1536
    Model Forwarding...
    Traceback (most recent call last):
      File "test.py", line 94, in <module>
        pred = inference(left_img, right_img, model_func, n_iter=20)
      File "test.py", line 45, in inference
        pred_flow_dw2 = model(imgL_dw2, imgR_dw2, iters=n_iter, flow_init=None)
      File "/usr/local/lib/python3.6/dist-packages/megengine/module/module.py", line 149, in __call__
        outputs = self.forward(*inputs, **kwargs)
      File "/home/dgxmartin/workspace/CREStereo/nets/crestereo.py", line 210, in forward
        align_corners=True,
      File "/usr/local/lib/python3.6/dist-packages/megengine/functional/vision.py", line 663, in interpolate
        [wscale, Tensor([0, 0], dtype="float32", device=inp.device)], axis=0
      File "/usr/local/lib/python3.6/dist-packages/megengine/functional/tensor.py", line 405, in concat
        (result,) = apply(builtin.Concat(axis=axis, comp_node=device.to_c()), *inps)
    TypeError: py_apply expects tensor as inputs
    

    For the time being the MegEngine version should be set to exactly 1.8.2

    opened by MartinPeris 1
  • What datasets are used for pretraining?

    What datasets are used for pretraining?

    The pretrained model works amazingly well on the real-life photos! What datasets are used for pretraining? Can you please provide the training details of the pretrained model? Thanks!

    opened by DY-ATL 1
  • Update requirements.txt to MegEngine v1.9.1

    Update requirements.txt to MegEngine v1.9.1

    function.Pad may lead to some weird NaN in MegEngine v1.8.2, MegEngine v1.9.0 resolve this but brings more problems, which is pointed out in https://github.com/megvii-research/CREStereo/pull/14 .

    The most recent release v1.9.1 resolves all of these problems, updates MegEngine version constraint to v1.9.1 or later

    opened by xxr3376 0
  • Model initialization takes a long time

    Model initialization takes a long time

    I'm running python test.py. in load_model(): model = Model(max_disp=256, mixed_precision=False, test_mode=True) spend a lot of time, about 30 mins. My computer: 10900k, rtx3090, 32G RAM top info: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8825 xwl 20 0 9.929g 1.122g 250360 S 100.3 3.6 8:01.38 interpreter

    opened by md5xwl 0
  • RuntimeError: bad input shape for polyadic operator

    RuntimeError: bad input shape for polyadic operator

    请问有人遇到这个问题吗?这是我在尝试训练时出现的,我想这应该是数据集设置不对,是否有人管知道这如何修改呢 环境:windows 10 旷视版本:mge1.8.2

    err: failed to load cuda func: cuDeviceGetNvSciSyncAttributes 2022/08/04 11:50:55 Use 1 GPU(s) 2022/08/04 11:50:55 Params: 5432948 2022/08/04 11:50:55 Dataset size: 5000 Traceback (most recent call last): File "c:/CREStereo-master/train.py", line 309, in run(args) File "c:/CREStereo-master/train.py", line 207, in main flow_predictions = model(left, right) File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\module\module.py", line 149, in call outputs = self.forward(*inputs, **kwargs) File "c:\CREStereo-master\nets\crestereo.py", line 263, in forward out_corrs = corr_fn(flow, None, small_patch=small_patch, iter_mode=True) File "c:\CREStereo-master\nets\corr.py", line 25, in call corr = self.corr_iter(self.fmap1, self.fmap2, flow, small_patch) File "c:\CREStereo-master\nets\corr.py", line 72, in corr_iter corr = self.get_correlation( File "c:\CREStereo-master\nets\corr.py", line 48, in get_correlation corr_mean = F.mean(left_feature * right_slid, axis=1, keepdims=True) File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\core\tensor\array_method.py", line 176, in f return _elwise(self, value, mode=mode) File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\core\tensor\array_method.py", line 73, in _elwise
    return _elwise_apply(args, mode) File "C:\Users\wd\AppData\Local\Programs\Python\Python38\lib\site-packages\megengine\core\tensor\array_method.py", line 36, in _elwise_apply (result,) = apply(op, *args) RuntimeError: bad input shape for polyadic operator: {2,64,128,96}, {18,64,128,96}

    backtrace: 2 null

    3 null

    4 null

    5 null

    6 null

    7 null

    8 null

    9 null

    10 null

    11 null

    opened by yangxiaohhh 3
  • Colab or Huggingface demo?

    Colab or Huggingface demo?

    Thanks for sharing this great work! Would you consider making a Google Colab notebook or Huggingface demo of this code so that the less technically inclined like myself can try it out? Thanks!

    opened by noobtoob4lyfe 0
  • Disparity with uint16 format

    Disparity with uint16 format

    Hi, thanks for your work. it is a greate work. I want to generate a 3D point cloud from the output of your script which is disparity but for this I need to get the disparity with 16bit. as you mentioned it the readme file, the disparity output will be saved in 16 bit but when I checked the test.py line 105 to 114 I see that you save the disparity with 8bit. Anyway I comment those lines to save the raw predicted disparity. However, when I tried to produce a 3D point cloud using disparity, I ended up with a very terrible discrete point cloud (see image), which is mostly due by using 8 bit instead of 16 bit. it seems that even the raw predicted disparity is also in 8bit. I verified my script and also the camera calibration to make sure that the problem comes from the 8bit disparity. Anyway would you please let me know if is it possible to save the disparity in 16 bit correctly

    Line 107 to 117 disp_vis = inference(left_img, right_img, model_func, n_iter=20) # disp_vis = (disp - disp.min()) / (disp.max() - disp.min()) * 255.0 # disp_vis = disp_vis.astype("uint8") # disp_vis = disp_vis.astype(np.uint16) # disp_vis = cv2.applyColorMap(disp_vis, cv2.COLORMAP_INFERNO) parent_path = os.path.abspath(os.path.join(args.output, os.pardir)) if not os.path.exists(parent_path): os.makedirs(parent_path) cv2.imwrite(args.output, disp_vis)

    Picture1

    opened by AliKaramiFBK 7
  • The effect of distortion on results?

    The effect of distortion on results?

    Excuse me, in the case of a distorted image or binocular stereo correction still with distortion, will the effect of restoring depth by parallax be significantly affected? For example, multi-frame point cloud splicing will cause the phenomenon of multiple layers of point clouds (rotation and translation pose no problem)

    opened by liu6010 0
Owner
MEGVII Research
Power Human with AI. 持续创新拓展认知边界 非凡科技成就产品价值
MEGVII Research
MegEngine implementation of YOLOX

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

旷视天元 MegEngine 75 Sep 29, 2022
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.3k Oct 2, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 73 Sep 22, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

null 34 Aug 23, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 104 Sep 29, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 78 Sep 16, 2022
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

null 228 Sep 28, 2022
Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

ZJU3DV 333 Sep 30, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 761 Sep 22, 2022
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

FeiLong 93 Sep 20, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 90 Sep 29, 2022
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

null 80 Sep 29, 2022
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Yifan Zhang 209 Sep 28, 2022
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

null 224 Oct 2, 2022
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University 209 Sep 28, 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 190 Sep 19, 2022
Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral) This is the official implementation of Focals Conv (CVPR 2022), a new sp

DV Lab 250 Sep 16, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 67 Sep 2, 2022
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Sungha Choi 163 Sep 22, 2022