Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

MEGVII Research

Last update: Dec 30, 2022

Related tags

Deep Learning computer-vision deep-learning dataset stereo cvpr stereo-vision stereo-matching megengine

Overview

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

This repository contains MegEngine implementation of our paper:

hydrussoftware

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation
Jiankun Li, Peisen Wang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, Shuaicheng Liu
CVPR 2022

arXiv | BibTeX

Datasets

The Proposed Dataset

Download

There are two ways to download the dataset(~400GB) proposed in our paper:

Download using shell scripts dataset_download.sh

sh dataset_download.sh

the dataset will be downloaded and extracted in ./stereo_trainset/crestereo

Download from BaiduCloud here(Extraction code: aa3g) and extract the tar files manually.

Disparity Format

The disparity is saved as .png uint16 format which can be loaded using opencv imread function:

def get_disp(disp_path):
    disp = cv2.imread(disp_path, cv2.IMREAD_UNCHANGED)
    return disp.astype(np.float32) / 32

Other Public Datasets

Other public datasets we use including

Dependencies

CUDA Version: 10.1, Python Version: 3.6.9

MegEngine v1.8.2
opencv-python v3.4.0
numpy v1.18.1
Pillow v8.4.0
tensorboardX v2.1

python3 -m pip install -r requirements.txt

We also provide docker to run the code quickly:

docker run --gpus all -it -v /tmp:/tmp ylmegvii/crestereo
shotwell /tmp/disparity.png

Inference

Download the pretrained MegEngine model from here and run:

python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

Training

Modify the configurations in cfgs/train.yaml and run the following command:

python3 train.py

You can launch a TensorBoard to monitor the training process:

tensorboard --logdir ./train_log

and navigate to the page at http://localhost:6006 in your browser.

Acknowledgements

Part of the code is adapted from previous works:

RAFT(code base)
LoFTR(attention module)
HSMNet(data augmentaion)

We thank all the authors for their awesome repos.

Citation

If you find the code or datasets helpful in your research, please cite:

@misc{Li2022PracticalSM,
      title={Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation},
      author={Jiankun Li and Peisen Wang and Pengfei Xiong and Tao Cai and Ziwei Yan and Lei Yang and Jiangyu Liu and Haoqiang Fan and Shuaicheng Liu},
      year={2022},
      eprint={2203.11483},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Comments

Is CUDA 11.6 supported?

This is a really promising project, congratulations and thanks for releasing it!

I'm trying to run the test script with your Eth3d model and this command: python3 test.py --model_path path_to_mge_model --left img/test/left.png --right img/test/right.png --size 1024x1536 --output disparity.png

But the code hangs up and doesn't return from this line in extractor.py:82: self.conv2 = M.Conv2d(128, output_dim, kernel_size=1)

which is called form load_model in test.py:15 model = Model(max_disp=256, mixed_precision=False, test_mode=True)

My GPU is NVIDIA RTX A6000 and the CUDA version on the system is v11.6

opened by hasnainv 14
Results on Holopix50k dataset

Hello! Thank you for sharing the codes and the model. I tested the pre-trained model on Holopix50k test dataset, but didn't get similar results that you showed on the paper. If I would like to run crestereo_eth3d.mge model on this dataset, does it require different parameter setting or pre-preprocessing? How I can get the similar results on Holopix50k dataset? Any advice would be very helpful. Thank you in advance!

opened by coffeehanjan 4
Did you obtain results on Holopix50k with published model?

I've tried to run published model with few images from Holopix50k and got awful results. Can you please tell how to obtain results similar to paper? Another model / another preprocessing?

opened by shkarupa-alex 3

TypeError: pad() got an unexpected keyword argument 'pad_witdth' in test.py

Good job! May I ask a question?

I tried to run the test.py on a V100 with the Cuda version being 10.2. The data is from ./img, and I set the size being 1280*720, the same as the original size. But I meet the following error:

File "CREStereo/nets/corr.py", line 42, in get_correlation (0, 0), (0, 0), (pady, pady), (padx, padx)), mode="replicate") TypeError: pad() got an unexpected keyword argument 'pad_witdth'

It means that I may use the wrong type, but I checked the code and did not find the problems: `

def pad( src: Tensor, pad_width: Tuple[Tuple[int, int], ...], mode: str = "constant", constant_value: float = 0.0, ) -> Tensor: r"""Pads the input tensor.

Args:
    pad_width: A tuple. Each element in the tuple is the tuple of 2-elements,
        the 2 elements represent the padding size on both sides of the current dimension, ``(front_offset, back_offset)``
    mode: One of the following string values. Default: ``'constant'``

        * ``'constant'``: Pads with a constant value.
        * ``'reflect'``: Pads with the reflection of the tensor mirrored on the first and last values of the tensor along each axis.
        * ``'replicate'``: Pads with the edge values of tensor.
    constant_val: Fill value for ``'constant'`` padding. Default: 0

Examples:
    >>> import numpy as np
    >>> inp = Tensor([[1., 2., 3.],[4., 5., 6.]])
    >>> inp
    Tensor([[1. 2. 3.]
     [4. 5. 6.]], device=xpux:0)
    >>> F.nn.pad(inp, pad_width=((1, 1),), mode="constant")

I used the right Tuple type, but something wrong happened.

opened by city19992 2

MegEngine 1.9.0 causes test.py error

I have been playing around a bit with the code (thank you so much, by the way. Having heaps of fun with it) and found out that MegEngine 1.9.0 causes test.py to die with the following output:

Images resized: 1024x1536
Model Forwarding...
Traceback (most recent call last):
  File "test.py", line 94, in <module>
    pred = inference(left_img, right_img, model_func, n_iter=20)
  File "test.py", line 45, in inference
    pred_flow_dw2 = model(imgL_dw2, imgR_dw2, iters=n_iter, flow_init=None)
  File "/usr/local/lib/python3.6/dist-packages/megengine/module/module.py", line 149, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/dgxmartin/workspace/CREStereo/nets/crestereo.py", line 210, in forward
    align_corners=True,
  File "/usr/local/lib/python3.6/dist-packages/megengine/functional/vision.py", line 663, in interpolate
    [wscale, Tensor([0, 0], dtype="float32", device=inp.device)], axis=0
  File "/usr/local/lib/python3.6/dist-packages/megengine/functional/tensor.py", line 405, in concat
    (result,) = apply(builtin.Concat(axis=axis, comp_node=device.to_c()), *inps)
TypeError: py_apply expects tensor as inputs

For the time being the MegEngine version should be set to exactly 1.8.2

opened by MartinPeris 1

What datasets are used for pretraining?

The pretrained model works amazingly well on the real-life photos! What datasets are used for pretraining? Can you please provide the training details of the pretrained model? Thanks!

opened by DY-ATL 1
Update requirements.txt to MegEngine v1.9.1

function.Pad may lead to some weird NaN in MegEngine v1.8.2, MegEngine v1.9.0 resolve this but brings more problems, which is pointed out in https://github.com/megvii-research/CREStereo/pull/14 .

The most recent release v1.9.1 resolves all of these problems, updates MegEngine version constraint to v1.9.1 or later

opened by xxr3376 0
Datasets in training and schedule

https://github.com/megvii-research/CREStereo/blob/ad3a1613bdedd88b93247e5f002cb7c80799762d/train.py#L147 Thank you for supplying this code and training procedure! In the paper (and the git readme), you say you train using other datasets as well ([SceneFlow], [Sintel], [Middlebury], [ETH3D], [KITTI 2012/2015], [Falling Things], [InStereo2K], [HR-VS]). Yet, in the train.py, you only refer to your CRES dataset. Can you elaborate? Are you training on other datasets before? After?

Thank you!

opened by orram 0
$WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}$

WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}

Thank you for the excellent work! I got some problem I finetune the model using own data. Howerer it got stuck in step 2 flow_predictions = model(left, right) after one optimizer.step().clear_grad(), the network can not inference any image. I use gdb to debug and find it would be stuck in random layers in the network forward....

I check that my data is correct. Even using same data the model got stuck after one optimizer.step().clear_grad() Do you have any suggestions?

I upgrade mgengine 1.9.1 -> 1.11.1 the model can train without stuck. However, it print when doing optimizer.step().clear_grad() at first time:

WRN Not FormattedTensorValue input for AttachGrad op: AttachGradValue{key=grad_1}, (49342:49342) Handle{ptr=0x5616b860dd58, name="update_block.encoder.conv.bias"}

the para update abnormal, the result are worse. Does anyone meet the same problem or has any suggestion?

opened by Eatmelonboy 0
the GPU memory is too large

@zsc Thank you for your sharing! As your paper said, you can train with batch size 16 on 8 2080TI GPUs when you use the pytorch framework. But when I want to train your network, the GPU memory is large as 8.5G with batch size 1. So what is the problem?

opened by zyl1336110861 0

CREStereo not able to run inside thread with Python

I do not seem to be able to run inference with CREStereo inside of a thread using python's threading module. Below is a minimal example using the test.py script from this repo. It loads the pretrained model and runs inference in a child thread(lines 96-98). Also attached is the error that appears when this is run: CREStereo_thread_error

import os

import megengine as mge
import megengine.functional as F
import argparse
import numpy as np
import cv2

from nets import Model

#NOTE: added threading import statement
import threading

def load_model(model_path):
    print("Loading model:", os.path.abspath(model_path))
    pretrained_dict = mge.load(model_path)
    model = Model(max_disp=256, mixed_precision=False, test_mode=True)

    model.load_state_dict(pretrained_dict["state_dict"], strict=True)

    model.eval()
    return model


def inference(left, right, model, n_iter=20):
    imgL = left.transpose(2, 0, 1)
    imgR = right.transpose(2, 0, 1)
    imgL = np.ascontiguousarray(imgL[None, :, :, :])
    imgR = np.ascontiguousarray(imgR[None, :, :, :])

    imgL = mge.tensor(imgL).astype("float32")
    imgR = mge.tensor(imgR).astype("float32")

    imgL_dw2 = F.nn.interpolate(
        imgL,
        size=(imgL.shape[2] // 2, imgL.shape[3] // 2),
        mode="bilinear",
        align_corners=True,
    )
    imgR_dw2 = F.nn.interpolate(
        imgR,
        size=(imgL.shape[2] // 2, imgL.shape[3] // 2),
        mode="bilinear",
        align_corners=True,
    )
    pred_flow_dw2 = model(imgL_dw2, imgR_dw2, iters=n_iter, flow_init=None)

    pred_flow = model(imgL, imgR, iters=n_iter, flow_init=pred_flow_dw2)
    pred_disp = F.squeeze(pred_flow[:, 0, :, :]).numpy()

    return pred_disp


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="A demo to run CREStereo.")
    parser.add_argument(
        "--model_path",
        default="crestereo_eth3d.mge",
        help="The path of pre-trained MegEngine model.",
    )
    parser.add_argument(
        "--left", default="img/test/left.png", help="The path of left image."
    )
    parser.add_argument(
        "--right", default="img/test/right.png", help="The path of right image."
    )
    parser.add_argument(
        "--size",
        default="1024x1536",
        help="The image size for inference. Te default setting is 1024x1536. \
                        To evaluate on ETH3D Benchmark, use 768x1024 instead.",
    )
    parser.add_argument(
        "--output", default="disparity.png", help="The path of output disparity."
    )
    args = parser.parse_args()

    assert os.path.exists(args.model_path), "The model path do not exist."
    assert os.path.exists(args.left), "The left image path do not exist."
    assert os.path.exists(args.right), "The right image path do not exist."

    model_func = load_model(args.model_path)
    left = cv2.imread(args.left)
    right = cv2.imread(args.right)

    assert left.shape == right.shape, "The input images have inconsistent shapes."

    in_h, in_w = left.shape[:2]

    print("Images resized:", args.size)
    eval_h, eval_w = [int(e) for e in args.size.split("x")]
    left_img = cv2.resize(left, (eval_w, eval_h), interpolation=cv2.INTER_LINEAR)
    right_img = cv2.resize(right, (eval_w, eval_h), interpolation=cv2.INTER_LINEAR)

    #NOTE: put inference in a thread here
    inference_thread = threading.Thread(target=inference, args=(left_img, right_img, model_func,))
    inference_thread.start()
    inference_thread.join()

opened by thomasw2 0

Model size and number of params?

Hey, so good job you have done!

Have you ever compared the model size and number of parameters with other SOTA works, such as LEAStereo, RAFT-Stereo etc? Seems your model very smart.

opened by philleer 0

Owner

MEGVII Research

Power Human with AI. 持续创新拓展认知边界非凡科技成就产品价值

GitHub

MegEngine implementation of YOLOX

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

77 Nov 22, 2022

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

1.4k Jan 1, 2023

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

87 Jan 8, 2023

[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

36 Nov 23, 2022

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

138 Dec 28, 2022

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

108 Dec 27, 2022

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

249 Dec 28, 2022

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

365 Dec 30, 2022

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

842 Jan 4, 2023

The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

116 Dec 19, 2022

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1 Liang Pan1 Zhongang Cai1,2,3 Ziwei Liu1* 1S-Lab, Nanyang Technologic

96 Jan 3, 2023

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

102 Dec 25, 2022

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

259 Dec 25, 2022

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

Related tags

Overview

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

Datasets

The Proposed Dataset

Download

Disparity Format

Other Public Datasets

Dependencies

Inference

Training

Acknowledgements

Citation

Comments

Owner

MEGVII Research

MegEngine implementation of YOLOX

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Official PyTorch implementation of RobustNet (CVPR 2021 Oral)