RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Overview

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

This repository contains the source code for our paper:

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching
Lahav Lipson, Zachary Teed and Jia Deng

@article{lipson2021raft,
  title={{RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching}},
  author={Lipson, Lahav and Teed, Zachary and Deng, Jia},
  journal={arXiv preprint arXiv:2109.07547},
  year={2021}
}

Requirements

The code has been tested with PyTorch 1.7 and Cuda 10.2.

conda env create -f environment.yaml
conda activate raftstereo

Required Data

To evaluate/train RAFT-stereo, you will need to download the required datasets.

To download the ETH3D and Middlebury test datasets for the demos, run

chmod ug+x download_datasets.sh && ./download_datasets.sh

By default stereo_datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder

├── datasets
    ├── FlyingThings3D
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Monkaa
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Driving
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── KITTI
        ├── testing
        ├── training
        ├── devkit
    ├── Middlebury
        ├── MiddEval3
    ├── ETH3D
        ├── lakeside_1l
        ├── ...
        ├── tunnel_3s

Demos

Pretrained models can be downloaded by running

chmod ug+x download_models.sh && ./download_models.sh

or downloaded from google drive

You can demo a trained model on pairs of images. To predict stereo for Middlebury, run

python demo.py --restore_ckpt models/raftstereo-sceneflow.pth

Or for ETH3D:

python demo.py --restore_ckpt models/raftstereo-eth3d.pth -l=datasets/ETH3D/*/im0.png -r=datasets/ETH3D/*/im1.png

Using our fastest model:

python demo.py --restore_ckpt models/raftstereo-realtime.pth  --shared_backbone --n_downsample 3 --n_gru_layers 2 --slow_fast_gru 

To save the disparity values as .npy files, run any of the demos with the --save_numpy flag.

Converting Disparity to Depth

If the camera focal length and camera baseline are known, disparity predictions can be converted to depth values using

Note that the units of the focal length are pixels not millimeters.

Evaluation

To evaluate a trained model on a validation set (e.g. Middlebury), run

python evaluate_stereo.py --restore_ckpt models/raftstereo-middlebury.pth --dataset middlebury_H

Training

Our model is trained on two RTX-6000 GPUs using the following command. Training logs will be written to runs/ which can be visualized using tensorboard.

python train_stereo.py --batch_size 8 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --num_steps 200000 --mixed_precision

To train using significantly less memory, change --n_downsample 2 to --n_downsample 3. This will slightly reduce accuracy.

(Optional) Faster Implementation

We provide a faster CUDA implementation of the correlation volume which works with mixed precision feature maps.

cd sampler && python setup.py install && cd ..

Running demo.py, train_stereo.py or evaluate.py with --corr_implementation reg_cuda together with --mixed_precision will speed up the model without impacting performance.

To significantly decrease memory consumption on high resolution images, use --corr_implementation alt. This implementation is slower than the default, however.

Comments
  • Question about learning rate

    Question about learning rate

    We prepare to cite your paper in our new work. When I reproduce your work, I noticed that the maximum learning rate of training Sceneflow is 0.0002 in your provided code. In your paper, It is said that Sceneflow is trained with a minimum learning rate of 1e-4. Should I just keep the parameter 0.0002 in the code to reproduce your work?

    Thanks!

    opened by David-Zhao-1997 7
  • Question about Training Schedule

    Question about Training Schedule

    Thanks for sharing such an excellent work!
    Mentioned in the paper:
    Final models are trained on synthetic data for 200k steps with a batch size of 8
    SceneFlow consists of about 35k training pictures.
    Can we think that you have trained a total of 200k*8/35k ~= 45 epochs?
    GANet, DSMNet, etc. only train 10-20 epochs. Have you compared the performance when the training epoch is less than 20?
    How long does it take to train 200k Steps under your GPU configuration?

    thank you for your reply!

    opened by AbnerCSZ 6
  • Frozen Batch Norm

    Frozen Batch Norm

    Thanks for the great work Lahav and the team, iteratively refinement has been missing in stereo matching, and great work on the multi-level correlation lookup volume.

    I have been using this code repo on my application with great success. However training is not very stable, the model seems to suffer some kind of mode collapse and predict the same for all inputs.

    image

    I'm just checking all the loose ends and came across the normalisation part. https://github.com/princeton-vl/RAFT-Stereo/blob/5c13878b617177da139cfeba79ac15b39b351963/train_stereo.py#L151

    1. Why is batch norm frozen during training? Doesn't this defeat the purpose of adding a batch norm in the first place?
    2. In the paper, instance norm is used instead of the batch norm for the context encoder, can you expand on this implementation detail? How will this impact the model when we use a shared encoder for speed up?
    opened by ppyht2 5
  • Questions about different augmentors for sparse and dense gt.

    Questions about different augmentors for sparse and dense gt.

    Thanks for sharing your excellent work! Why do you provide two different ways for augmenting data? The sparse one does not have yjitter,asymmetric color augmentation. Besides, why do you provide a new function resize_sparse_flow_map, instead of using cv2.resize(flow, interpolation=cv2.INTER_NEAREST)for resizing sparse gt?

    opened by zhujiagang 5
  • ONNX export failed: Couldn't export Python operator CorrSampler

    ONNX export failed: Couldn't export Python operator CorrSampler

    When I converted the ONNX model, I encountered that the CorrSampler could not be converted. Can you provide some suggestions?

    Looking forward to your reply!

    opened by sunmooncode 4
  • Problem about runtime of base model

    Problem about runtime of base model

    Hi, thank you for sharing the awesome work. I have run the basemodel raftstereo-middlebury.pth without any refinement on my costume dataset. The precision of results are pretty good, But the runtime of model prediction seems does not match that described in the article. I add some time-analysis snippet in demo.py: image

    The command predicts stereo on custome data is : python demo.py --restore_ckpt models/raftstereo-middlebury.pth -l=output_xvisio/rect_cam0/*.jpg -r=output_xvisio/rect_cam1/*.jpg --corr_implementation alt --mixed_precision

    The configuration of my local machine:

    • image resolution: 640 x 400;
    • CPU: Intel® Core™ i7-8700K CPU @ 3.70GHz × 12
    • GPU: NVIDIA GeForce GTX 1080 Runtime of base model: image

    The configuration of my server machine:

    • image resolution: 640 x 400;
    • CPU:Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
    • GPU: NVIDIA Tesla P100 Runtime of base model: image

    Could you help me to figure out what mistake I made caused this problem? I really appreciate your help!

    opened by fangchuan 4
  • What is raftstereo-middlebury.pth trained on?

    What is raftstereo-middlebury.pth trained on?

    Hi @lahavlipson, Thank you for your great work!

    I'm wondering what datasets you used to train raftstereo-middlebury.pth? Why do you recommend it for in-the-wild images?

    opened by nikitakaraevv 3
  • Question About Inference Time.

    Question About Inference Time.

    Hi, thank you for sharing the amazing work. It shows better performance on leaderboard with a large margin. I'm testing R-Stereo. I found that I can't make the speed of fastest model to reach ~26fps on kitti 2015 test data. So, I want to ask for the configuration to achieve this.

    I use the time calculate method like this:

                padder = InputPadder(image1.shape, divis_by=32)
                image1, image2 = padder.pad(image1, image2)
                start_time = time.time()
                _, flow_up = model(image1, image2, iters=args.valid_iters, test_mode=True)
                print("forward time: ", time.time()-start_time)
                file_stem = imfile1.split('\\')[-1]
    

    the configure is RTX3070 CUDA11 Cudnn 8 Win10 Kitti (1248x384) with following scripts

    python.exe demo.py 
    --restore_ckpt   models/raftstereo-realtime.pth
    --shared_backbone
    --n_downsample  3
    --n_gru_layers  2
    --slow_fast_gru
    --mixed_precision
    --corr_implementation  reg_cuda
    -l= ${kitti_test}
    -r= ${kitti_test}
    

    I got about ~170ms per stereo image. I want to ask which GPU are you use and if there anything wrong with the testing?

    opened by fafancier 3
  • Question about  disparity   Up &  Down

    Question about disparity Up & Down

    Hi
    I want to compute the up-down disparity , and then fusion with left-right disaprity, my solution is

    disparity_x = RAFT_x(Img_Left, Img_Right) disparity_y = RAFT_y(Img_Up Img_Down) disparity_out = disparity_x * ratio_x + disparity_y * ratio_y

    In RAFT_y I modify correction function to y-direction correction but result is not good,could you give me some advice?

    Thanks

    opened by excllent123 2
  • confidence map

    confidence map

    Hello, Is it possible to extract a confidence map for all disparity values? During a 3D stiching and SLAM such confidence map would greatly improve the over all result. Do you have any suggestions on how to extract such confidence map from the RAFT pipeline itself or would you recommend a different approach? Thank you for making your work available to all!

    opened by gpuartifact 2
  • Question about  fine-tunning on middlebury

    Question about fine-tunning on middlebury

    Hi
    In paper section 4.4. Middlebury,After pre-training on Sceneflow [23], we fine-tune on 384x1000 random crops of the 23 Middlebury traning images for 4000 steps with a batch size of 2, using 22 update iterations during training but in official_train.txt, only containt 10 Middlebury traning images , could you help me point out which is right?

    Thanks

    opened by excllent123 2
  • Training Datasets and schedule

    Training Datasets and schedule

    https://github.com/princeton-vl/RAFT-Stereo/blob/0e2a12746143a7552e30ef2f4b1d4c3214388a1a/train_stereo.py#L222 Hi there, thank you for supplying such a clear code! I have a question regarding the training procedure; as I understand, the training you suggest on the git page includes only sceneflow (and refine on Middlebury) with no reference to other datasets - FallingThings and Tartanair, which you reference in the paper. Do you use them in any additional training? Can you clarify? Thank you!

    opened by orram 0
  • pytorch usewarning

    pytorch usewarning

    /home/user/anaconda3/envs/raftstereo/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

    Does it have an impact on the results? If so, how to solve it? Thanks!

    opened by lyhloveyou 0
  • Why divide the correlation by sqrt(D)?

    Why divide the correlation by sqrt(D)?

    https://github.com/princeton-vl/RAFT-Stereo/blob/0e2a12746143a7552e30ef2f4b1d4c3214388a1a/core/corr.py#L156 Hi, what does this line mean? In the paper it says that the correlation is dot product between feature vectors. But it is divided by this sqrt. Any meaning? Can replace this sqrt by something else?

    opened by steven9046 0
  • occlusion detection

    occlusion detection

    Hi, RAFT-Stereo is producing disparities also in regions that are occluded to one of the cameras of the stereo pair. As a result the disparites are good when scene is similar to the training set. In terms of generalization it then produces more errors in those partially occluded regions that are less similar to data in the training set.

    In SGM implementations the left-rigth and right-left consitency check is used to implicitly find occluded regions. This check is done in SGM within the cost cube computed within one run (only left-right or only right-left) by swithching the Cost look up direction.

    Where in the RAFT-Stereo implementation would you suggest implementing an equivalent for occlusion check without having to implement the full disparity computation twice? Thanks.

    opened by gpuartifact 0
  • Retraining RAFT

    Retraining RAFT

    Hi. I tried RAFT for my data. It's working fine. But objects in my case are thin and I'm facing depth issue (zig-zag kind structures). I saw in one issue you suggested to use High Resolution images or Retrain with thin dataset. As I don't have GT for my dataset. Can you suggest any opensource dataset that have more thin objects so I can ReTrain RAFT... Thanks!!!

    opened by jayes97 0
  • question about excluding

    question about excluding "seasonsforest_winter_easy" from the "tartan_air" training dataset

    Hi,

    I realized that you excluded this subset from the tartan_air, could you share the reason why you did so? and why only "easy" not "hard"?

    Thanks!

    opened by deephog 1
Owner
Princeton Vision & Learning Lab
Princeton Vision & Learning Lab
Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PV-RAFT This repository contains the PyTorch implementation for paper "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clou

Yi Wei 43 Dec 5, 2022
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet]

BGNet This repository contains the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet] Environment Python 3.6.* C

3DCV developer 87 Nov 29, 2022
CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching(CVPR2021)

CFNet(CVPR 2021) This is the implementation of the paper CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching, CVPR 2021, Zhelun Shen, Yuch

null 106 Dec 28, 2022
Lightweight stereo matching network based on MobileNetV1 and MobileNetV2

MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching

Cognitive Systems Research Group 139 Nov 30, 2022
Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

This is a Pytorch Lightning version PSMNet which is based on JiaRenChang/PSMNet. use python main.py to start training. PSM-Net Pytorch reimplementatio

XIAOTIAN LIU 1 Nov 25, 2021
The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

PlantStereo This is the official implementation code for the paper "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction".

Wang Qingyu 14 Nov 28, 2022
✨✨✨An awesome open source toolbox for stereo matching.

OpenStereo This is an awesome open source toolbox for stereo matching. Supported Methods: BM SGM(T-PAMI'07) GCNet(ICCV'17) PSMNet(CVPR'18) StereoNet(E

Wang Qingyu 6 Nov 4, 2022
Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks Contributions A novel pairwise feature LSP to extract structural

null 31 Dec 6, 2022
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

Yuming Jiang 151 Dec 26, 2022
A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

LPM_Python A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching. The code is established ac

AoxiangFan 11 Nov 7, 2022
Datasets, Transforms and Models specific to Computer Vision

torchvision The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Installat

null 13.1k Jan 2, 2023
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 218 Jan 5, 2023
Image data augmentation scheduler for albumentations transforms

albu_scheduler Scheduler for albumentations transforms based on PyTorch schedulers interface Usage TransformMultiStepScheduler import albumentations a

null 19 Aug 4, 2021
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

null 58 Nov 6, 2022
TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

Microsoft 1.3k Dec 30, 2022
functorch is a prototype of JAX-like composable function transforms for PyTorch.

functorch is a prototype of JAX-like composable function transforms for PyTorch.

Facebook Research 1.2k Jan 9, 2023
Image Processing, Image Smoothing, Edge Detection and Transforms

opevcvdl-hw1 This project uses openCV and Qt to achieve the requirements. Version Python 3.7 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.1

Kenny Cheng 3 Aug 17, 2022