SMD-Nets: Stereo Mixture Density Networks

Overview

SMD-Nets: Stereo Mixture Density Networks

Alt text

This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021) by Fabio Tosi, Yiyi Liao, Carolin Schmitt and Andreas Geiger

Contributions:

  • A novel learning framework for stereo matching that exploits compactly parameterized bimodal mixture densities as output representation and can be trained using a simple likelihood-based loss function. Our simple formulation lets us avoid bleeding artifacts at depth discontinuities and provides a measure for aleatoric uncertainty.

  • A continuous function formulation aimed at estimating disparities at arbitrary spatial resolution with constant memory footprint.

  • A new large-scale synthetic binocular stereo dataset with ground truth disparities at 3840×2160 resolution, comprising photo-realistic renderings of indoor and outdoor environments.

For more details, please check:

[Paper] [Supplementary] [Poster] [Video] [Blog]

If you find this code useful in your research, please cite:

@INPROCEEDINGS{Tosi2021CVPR,
  author = {Fabio Tosi and Yiyi Liao and Carolin Schmitt and Andreas Geiger},
  title = {SMD-Nets: Stereo Mixture Density Networks},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
} 

Requirements

This code was tested with Python 3.8, Pytotch 1.8, CUDA 11.2 and Ubuntu 20.04.
All our experiments were performed on a single NVIDIA Titan V100 GPU.
Requirements can be installed using the following script:

pip install -r requirements

Datasets

We create our synthetic dataset, UnrealStereo4K, using the popular game engine Unreal Engine combined with the open-source plugin UnrealCV.

UnrealStereo4K

Our photo-realistic synthetic passive binocular UnrealStereo4K dataset consists of images of 8 static scenes, including indoor and outdoor environments. We rendered stereo pairs at 3840×2160 resolution for each scene with pixel-accurate ground truth disparity maps (aligned with both the left and the right images!) and ground truth poses.

You can automatically download the entire synthetic binocular stereo dataset using the download_data.sh script in the scripts folder. In alternative, you can download each scene individually:

UnrealStereo4K_00000.zip [74 GB]
UnrealStereo4K_00001.zip [73 GB]
UnrealStereo4K_00002.zip [74 GB]
UnrealStereo4K_00003.zip [73 GB]
UnrealStereo4K_00004.zip [72 GB]
UnrealStereo4K_00005.zip [74 GB]
UnrealStereo4K_00006.zip [67 GB]
UnrealStereo4K_00007.zip [76 GB]
UnrealStereo4K_00008.zip [16 GB] - It contains 200 stereo pairs only, used as out-of-domain test set

Warning!: All the RGB images are PNG files at 8 MPx. This notably slows down the training process due to the expensive dataloading operation. Thus, we suggest compressing the images to raw binary files to speed up the process and trainings (Pay attention to edit the filenames accordingly). You can use the following code to convert (offline) the stereo images (Image0 and Image1 folders) to a raw format:

img_path=/path/to/the/image
out = open(img_path.replace("png", "raw"), 'wb') 
img = cv2.imread(img_path, -1)
img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)
img.tofile(out)
out.close()

Training

All training and testing scripts are provided in the scripts folder.
As an example, use the following command to train SMD-Nets on our UnrealStereo4K dataset.

python apps/train.py --dataroot $dataroot \
                     --checkpoints_path $checkpoints_path \
                     --training_file $training_file \
                     --testing_file $testing_file \
                     --results_path $results_path \
                     --mode $mode \
                     --name $name \
                     --batch_size $batch_size \
                     --num_epoch $num_epoch \
                     --learning_rate $learning_rate \
                     --gamma $gamma \
                     --crop_height $crop_height \
                     --crop_width $crop_width \
                     --num_sample_inout $num_sample_inout \
                     --aspect_ratio $aspect_ratio \
                     --sampling $sampling \
                     --output_representation $output_representation \
                     --backbone $backbone

For a detailed description of training options, please take a look at lib/options.py

In order to monitor and visualize the training process, you can start a tensorboard session with:

tensorboard --logdir checkpoints

Evaluation

Use the following command to evaluate the trained SMD-Nets on our UnrealStereo4K dataset.

python apps/test.py --dataroot $dataroot \
                    --testing_file $testing_file \
                    --results_path $results_path \
                    --mode $mode \
                    --batch_size 1 \
                    --superes_factor $superes_factor \
                    --aspect_ratio $aspect_ratio \
                    --output_representation $output_representation \
                    --load_checkpoint_path $checkpoints_path \
                    --backbone $backbone 

Warning! The soft edge error (SEE) on the KITTI dataset requires instance segmentation maps from the KITTI 2015 dataset.

Stereo Ultra High-Resolution: if you want to estimate a disparity map at arbitrary spatial resolution given a low resolution stereo pair at testing time, just use a different value for the superres_factor parameter (e.g. 2,4,8..32!). Below, a comparison of our model using the PSMNet backbone at 128Mpx resolution (top) and the original PSMNet at 0.5Mpx resolution (bottom), both taking stereo pairs at 0.5Mpx resolution as input.

Pretrained models

You can download pre-trained models on our UnrealStereo4K dataset from the following links:

Qualitative results

Disparity Visualization. Some qualitative results of the proposed SMD-Nets using PSMNet as stereo backbone. From left to right, the input image from the UnrealStereo4K test set, the predicted disparity and the corresponding error map. Please zoom-in to better perceive details near depth boundaries.

Point Cloud Visualization. Below, instead, we show point cloud visualizations on UnrealStereo4K for both the passive binocular stereo and the active depth datasets, adopting HSMNet as backbone. From left to right, the reference image, the results obtained using a standard disparity regression (i.e., disparity point estimate), a unimodal Laplacian distribution and our bimodal Laplacian mixture distribution. Note that our bimodal representation notably alleviates bleeding artifacts near object boundaries compared to both disparity regression and the unimodal formulation.

Contacts

For questions, please send an email to [email protected]

Acknowledgements

We thank the authors that shared the code of their works. In particular:

  • Jia-Ren Chang for providing the code of PSMNet.
  • Gengshan Yang for providing the code of HSMNet.
  • Clement Godard for providing the code of Monodepth (extended to Stereodepth).
  • Shunsuke Saito for providing the code of PIFu
Comments
  •  The Err is nan

    The Err is nan

    Hi, thanks for your great work. When I use my own dataset which collected with the carla simulation and the baseline is 50 CM. I meet the err is nan, Can you give me some suggestion. I use the deafult setting. Thanks a lot.

    opened by raozhongyu 4
  • About the crop size

    About the crop size

    When training,i set the crop size to 512×512 by default,but i have some problem with the pooling layer. _lib/backbone/PSMNet/submodule.py_ _output_branch1 = self.branch1(output_skip)_ _RuntimeError: Given input size: (128x32x32). Calculated output size: (128x0x0). Output size is too small_ You set a larger size when training,right?Or there is another problem. Thank you

    opened by yinhanxi 3
  • Understanding dataset Extrinsics folder

    Understanding dataset Extrinsics folder

    Hi, thanks for open sourcing your code!

    Looking at the dataset extrinsics files (e.g., 00000/Extrinsics0/00000.txt) the file contains two lines:

    1920.000000 0.000000 1920.000000 0.000000 1920.000000 1080.000000 0.000000 0.000000 1.000000
    0.469532 0.882916 0.000000 -2.295430 -0.168801 0.089768 -0.981554 -0.117832 0.866629 -0.460871 -0.191186 5.537117
    

    I'm interpreting the first line as the flattened 3x3 intrinsics matrix, and the second line as the flattened 3x4 pose matrix. Is this interpretation correct?

    If so, it seems like for a single scene (e.g., scene 00000) the position of the right camera relative to the left camera changes for different frames. E.g., if T_left_0, T_right_0, T_left_100, T_right_100 are the poses read from 00000/Extrinsics0/00000.txt, 00000/Extrinsics1/00000.txt, 00000/Extrinsics0/00100.txt, 00000/Extrinsics1/00100.txt respectively and converted to 4x4 homogenous transformations, then np.linalg.inv(T_left_0) @ T_right_0 != np.linalg.inv(T_left_100) @ T_right_100. I expected the stereo cameras to stay at the same position relative to each other. Can you please tell me if I'm doing something incorrect here?

    opened by saryazdi 2
  • Possible negative loss value in bimodal_loss

    Possible negative loss value in bimodal_loss

    The idea of bimodal mixture density is brilliant! However I notice that, theoretically, the loss function can produce negative values in certain circumstances. And I actually got negative losses during the training procedure.

    In SMD-Nets/lib/model/losses.py we have:

    def laplacian(mu, b, labels): return 0.5 * torch.exp(-(torch.abs(mu-labels)/b))/b def bimodal_loss(mu0, mu1, sigma0, sigma1, w0, w1, labels, dist="gaussian"): return - torch.log(w0 * distribution(mu0, sigma0, labels, dist) + \ w1 * distribution(mu1, sigma1, labels, dist))

    Negative values come out for -log(x) when x>1, where x is related to 'exp(-(abs(diff)/b))/b'. We can have, for example, when diff=b=0.01, w0=1, the loss value would be: -log(1* 0.5*exp(-0.01/0.01)/0.01) = -2.91 The function of 'y = 0.5 * exp(x/b)/b' can be plotted to imply that, for all 'x', only when b<0.2 can achieve y<1

    I wonder is it normal to have negative values during your training. or if there are any more restriction applied on mu, sigma and w0 to prevent this from happening.

    opened by JagerU 2
  • Predict Disparity on Custom Data using Pre-trained Model

    Predict Disparity on Custom Data using Pre-trained Model

    Your results looks fantastic! I would like to test on my own images, but had some trouble modifying the provided test script to do so. Are there instructions for running a pre-trained model on custom data with no ground truth?

    Thanks, David

    opened by djc26 0
  • The way of resizing GT

    The way of resizing GT

    Thanks for sharing your excellent work! It seems that the way of resizing GT in your code is nearest neighbor. And these is no scale jittering augmentation in your implementation. When gt is dense, nearest neighbor can lead to less flying point in object boundary but less precise than bilinear interpolation in other area. I am not clear what influence bilinear interpolation can bring to your method, especially to the boundary.

    opened by zhujiagang 0
  • Why do you put the disparity ground truth values in the samples?

    Why do you put the disparity ground truth values in the samples?

    Hi @fabiotosi92 , Thanks for your work! I saw you put the disparity ground truth in the samples variable but didn't use it in the query operation. May I ask you why? or it has something I am misunderstanding?

    Thanks!

    opened by suyuzhang 0
  • Given groups=1, weight of size [16, 3, 3, 3], expected input[4, 4, 2048, 1536] to have 3 channels, but got 4 channels instead

    Given groups=1, weight of size [16, 3, 3, 3], expected input[4, 4, 2048, 1536] to have 3 channels, but got 4 channels instead

    I only downloaded the 00008 data as a training and validation set due to network problems, but got this error

    /home/rc/anaconda3/envs/DL/bin/python3.7 /home/rc/StereoMatching/SMD-Nets/apps/train.py --dataroot /home/rc/StereoMatching/SMD-Nets/Dataset/UnrealStereo4K_00008 --checkpoints_path /home/rc/StereoMatching/SMD-Nets/checkpoints --training_file /home/rc/StereoMatching/SMD-Nets/filenames/train2.txt --testing_file /home/rc/StereoMatching/SMD-Nets/filenames/test2.txt --results_path /home/rc/StereoMatching/SMD-Nets/output --name train --mode passive --batch_size 2 --num_epoch 1 --learning_rate 1e-4 --gamma 0.1 --crop_height 2048 --crop_width 1536 --num_sample_inout 5000 --aspect_ratio 1. --sampling dda --output_representation bimodal --backbone HSMNet
    train data size:  92
    test data size:  10
    /home/rc/StereoMatching/SMD-Nets/checkpoints/train
    Start Training
    Traceback (most recent call last):
      File "/home/rc/StereoMatching/SMD-Nets/apps/train.py", line 196, in <module>
        train(opt)
      File "/home/rc/StereoMatching/SMD-Nets/apps/train.py", line 113, in train
        errors = net.forward(left, right, sample, labels=labels)
      File "/home/rc/StereoMatching/SMD-Nets/lib/model/SMDHead.py", line 108, in forward
        self.filter(left, right)
      File "/home/rc/StereoMatching/SMD-Nets/lib/model/SMDHead.py", line 23, in filter
        self.feat_list = self.stereo_network(left, right)
      File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/rc/StereoMatching/SMD-Nets/lib/backbone/HSMNet/hsm.py", line 60, in forward
        conv4, conv3, conv2, conv1, enc0, enc1, enc2, enc3 = self.feature_extraction(torch.cat([left, right], 0))
      File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/rc/StereoMatching/SMD-Nets/lib/backbone/HSMNet/utils.py", line 80, in forward
        conv1 = self.convbnrelu1_1(x)
      File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/rc/StereoMatching/SMD-Nets/lib/backbone/HSMNet/utils.py", line 154, in forward
        outputs = self.cbr_unit(inputs)
      File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
        input = module(input)
      File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 399, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
        self.padding, self.dilation, self.groups)
    RuntimeError: Given groups=1, weight of size [16, 3, 3, 3], expected input[4, 4, 2048, 1536] to have 3 channels, but got 4 channels instead
    
    Process finished with exit code 1
    

    The parameters I use are set like this --dataroot /home/rc/StereoMatching/SMD-Nets/Dataset/UnrealStereo4K_00008 --checkpoints_path /home/rc/StereoMatching/SMD-Nets/checkpoints --training_file /home/rc/StereoMatching/SMD-Nets/filenames/train2.txt --testing_file /home/rc/StereoMatching/SMD-Nets/filenames/test2.txt --results_path /home/rc/StereoMatching/SMD-Nets/output --name train --mode passive --batch_size 2 --num_epoch 1 --learning_rate 1e-4 --gamma 0.1 --crop_height 2048 --crop_width 1536 --num_sample_inout 5000 --aspect_ratio 1. --sampling dda --output_representation "bimodal" --backbone HSMNet

    opened by rebecca0011 0
Owner
Fabio Tosi
Postdoc Researcher at University of Bologna - Computer Science and Engineering
Fabio Tosi
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

Princeton Vision & Learning Lab 328 Jan 9, 2023
Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks Contributions A novel pairwise feature LSP to extract structural

null 31 Dec 6, 2022
This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Deep Conditional Gaussian Mixture Model for Constrained Clustering. This repository holds the code for the paper Deep Conditional Gaussian Mixture Mod

null 17 Oct 30, 2022
Tutel MoE: An Optimized Mixture-of-Experts Implementation

Project Tutel Tutel MoE: An Optimized Mixture-of-Experts Implementation. Supported Framework: Pytorch Supported GPUs: CUDA(fp32 + fp16), ROCm(fp32) Ho

Microsoft 344 Dec 29, 2022
Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources (e.g. just the lead vocals).

Victor Basu 14 Nov 7, 2022
Pytorch implementation of paper: "NeurMiPs: Neural Mixture of Planar Experts for View Synthesis"

NeurMips: Neural Mixture of Planar Experts for View Synthesis This is the official repo for PyTorch implementation of paper "NeurMips: Neural Mixture

James Lin 101 Dec 13, 2022
[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

AMOS This repository contains the scripts for fine-tuning AMOS pretrained models on GLUE and SQuAD 2.0 benchmarks. Paper: Pretraining Text Encoders wi

Microsoft 22 Sep 15, 2022
Create animations for the optimization trajectory of neural nets

Animating the Optimization Trajectory of Neural Nets loss-landscape-anim lets you create animated optimization path in a 2D slice of the loss landscap

Logan Yang 81 Dec 25, 2022
Woosung Choi 63 Nov 14, 2022
Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

Tom Goldstein 2.2k Jan 9, 2023
Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

ReLU-GP Residual (RGPR) This repository contains code for reproducing the following NeurIPS 2021 paper: @inproceedings{kristiadi2021infinite, title=

Agustinus Kristiadi 4 Dec 26, 2021
NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring Uncensored version of the following image can be found at https://i.

notAI.tech 1.1k Dec 29, 2022
[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets Introduction This repo contains the source code accompanying the paper: Well-tuned Sim

null 52 Jan 4, 2023
Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English ?? Real-CUGAN

tarsin 111 Dec 28, 2022
Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

VisDrone 98 Nov 16, 2022
Estimation of human density in a closed space using deep learning.

Siemens HOLLZOF challenge - Human Density Estimation Add project description here. Installing Dependencies: Install Python3 either system-wide, user-w

null 3 Aug 8, 2021
Details about the wide minima density hypothesis and metrics to compute width of a minima

wide-minima-density-hypothesis Details about the wide minima density hypothesis and metrics to compute width of a minima This repo presents the wide m

Nikhil Iyer 9 Dec 27, 2022
PyTorch implementations of algorithms for density estimation

pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invert

Ilya Kostrikov 546 Dec 5, 2022
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022