SMD-Nets: Stereo Mixture Density Networks

Fabio Tosi

Last update: Dec 26, 2022

Related tags

Deep Learning SMD-Nets

Overview

SMD-Nets: Stereo Mixture Density Networks

This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021) by Fabio Tosi, Yiyi Liao, Carolin Schmitt and Andreas Geiger

Contributions:

A novel learning framework for stereo matching that exploits compactly parameterized bimodal mixture densities as output representation and can be trained using a simple likelihood-based loss function. Our simple formulation lets us avoid bleeding artifacts at depth discontinuities and provides a measure for aleatoric uncertainty.
A continuous function formulation aimed at estimating disparities at arbitrary spatial resolution with constant memory footprint.
A new large-scale synthetic binocular stereo dataset with ground truth disparities at 3840×2160 resolution, comprising photo-realistic renderings of indoor and outdoor environments.

For more details, please check:

[Paper] [Supplementary] [Poster] [Video] [Blog]

If you find this code useful in your research, please cite:

@INPROCEEDINGS{Tosi2021CVPR,
  author = {Fabio Tosi and Yiyi Liao and Carolin Schmitt and Andreas Geiger},
  title = {SMD-Nets: Stereo Mixture Density Networks},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
}

Requirements

This code was tested with Python 3.8, Pytotch 1.8, CUDA 11.2 and Ubuntu 20.04.
All our experiments were performed on a single NVIDIA Titan V100 GPU.
Requirements can be installed using the following script:

pip install -r requirements

Datasets

We create our synthetic dataset, UnrealStereo4K, using the popular game engine Unreal Engine combined with the open-source plugin UnrealCV.

UnrealStereo4K

Our photo-realistic synthetic passive binocular UnrealStereo4K dataset consists of images of 8 static scenes, including indoor and outdoor environments. We rendered stereo pairs at 3840×2160 resolution for each scene with pixel-accurate ground truth disparity maps (aligned with both the left and the right images!) and ground truth poses.

You can automatically download the entire synthetic binocular stereo dataset using the download_data.sh script in the scripts folder. In alternative, you can download each scene individually:

UnrealStereo4K_00000.zip [74 GB]
UnrealStereo4K_00001.zip [73 GB]
UnrealStereo4K_00002.zip [74 GB]
UnrealStereo4K_00003.zip [73 GB]
UnrealStereo4K_00004.zip [72 GB]
UnrealStereo4K_00005.zip [74 GB]
UnrealStereo4K_00006.zip [67 GB]
UnrealStereo4K_00007.zip [76 GB]
UnrealStereo4K_00008.zip [16 GB] - It contains 200 stereo pairs only, used as out-of-domain test set

Warning!: All the RGB images are PNG files at 8 MPx. This notably slows down the training process due to the expensive dataloading operation. Thus, we suggest compressing the images to raw binary files to speed up the process and trainings (Pay attention to edit the filenames accordingly). You can use the following code to convert (offline) the stereo images (Image0 and Image1 folders) to a raw format:

img_path=/path/to/the/image
out = open(img_path.replace("png", "raw"), 'wb') 
img = cv2.imread(img_path, -1)
img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)
img.tofile(out)
out.close()

Training

All training and testing scripts are provided in the scripts folder.
As an example, use the following command to train SMD-Nets on our UnrealStereo4K dataset.

python apps/train.py --dataroot $dataroot \
                     --checkpoints_path $checkpoints_path \
                     --training_file $training_file \
                     --testing_file $testing_file \
                     --results_path $results_path \
                     --mode $mode \
                     --name $name \
                     --batch_size $batch_size \
                     --num_epoch $num_epoch \
                     --learning_rate $learning_rate \
                     --gamma $gamma \
                     --crop_height $crop_height \
                     --crop_width $crop_width \
                     --num_sample_inout $num_sample_inout \
                     --aspect_ratio $aspect_ratio \
                     --sampling $sampling \
                     --output_representation $output_representation \
                     --backbone $backbone

For a detailed description of training options, please take a look at lib/options.py

In order to monitor and visualize the training process, you can start a tensorboard session with:

tensorboard --logdir checkpoints

Evaluation

Use the following command to evaluate the trained SMD-Nets on our UnrealStereo4K dataset.

python apps/test.py --dataroot $dataroot \
                    --testing_file $testing_file \
                    --results_path $results_path \
                    --mode $mode \
                    --batch_size 1 \
                    --superes_factor $superes_factor \
                    --aspect_ratio $aspect_ratio \
                    --output_representation $output_representation \
                    --load_checkpoint_path $checkpoints_path \
                    --backbone $backbone

Warning! The soft edge error (SEE) on the KITTI dataset requires instance segmentation maps from the KITTI 2015 dataset.

Stereo Ultra High-Resolution: if you want to estimate a disparity map at arbitrary spatial resolution given a low resolution stereo pair at testing time, just use a different value for the superres_factor parameter (e.g. 2,4,8..32!). Below, a comparison of our model using the PSMNet backbone at 128Mpx resolution (top) and the original PSMNet at 0.5Mpx resolution (bottom), both taking stereo pairs at 0.5Mpx resolution as input.

Pretrained models

You can download pre-trained models on our UnrealStereo4K dataset from the following links:

Qualitative results

Disparity Visualization. Some qualitative results of the proposed SMD-Nets using PSMNet as stereo backbone. From left to right, the input image from the UnrealStereo4K test set, the predicted disparity and the corresponding error map. Please zoom-in to better perceive details near depth boundaries.

Point Cloud Visualization. Below, instead, we show point cloud visualizations on UnrealStereo4K for both the passive binocular stereo and the active depth datasets, adopting HSMNet as backbone. From left to right, the reference image, the results obtained using a standard disparity regression (i.e., disparity point estimate), a unimodal Laplacian distribution and our bimodal Laplacian mixture distribution. Note that our bimodal representation notably alleviates bleeding artifacts near object boundaries compared to both disparity regression and the unimodal formulation.

Contacts

For questions, please send an email to fabio.tosi5@unibo.it

Acknowledgements

We thank the authors that shared the code of their works. In particular:

Jia-Ren Chang for providing the code of PSMNet.
Gengshan Yang for providing the code of HSMNet.
Clement Godard for providing the code of Monodepth (extended to Stereodepth).
Shunsuke Saito for providing the code of PIFu

Comments

The Err is nan

Hi, thanks for your great work. When I use my own dataset which collected with the carla simulation and the baseline is 50 CM. I meet the err is nan, Can you give me some suggestion. I use the deafult setting. Thanks a lot.

opened by raozhongyu 4
About the crop size

When training,i set the crop size to 512×512 by default,but i have some problem with the pooling layer. _lib/backbone/PSMNet/submodule.py_ _output_branch1 = self.branch1(output_skip)_ _RuntimeError: Given input size: (128x32x32). Calculated output size: (128x0x0). Output size is too small_ You set a larger size when training,right?Or there is another problem. Thank you

opened by yinhanxi 3
Understanding dataset Extrinsics folder
Hi, thanks for open sourcing your code!

Looking at the dataset extrinsics files (e.g., 00000/Extrinsics0/00000.txt) the file contains two lines:

1920.000000 0.000000 1920.000000 0.000000 1920.000000 1080.000000 0.000000 0.000000 1.000000 0.469532 0.882916 0.000000 -2.295430 -0.168801 0.089768 -0.981554 -0.117832 0.866629 -0.460871 -0.191186 5.537117

I'm interpreting the first line as the flattened 3x3 intrinsics matrix, and the second line as the flattened 3x4 pose matrix. Is this interpretation correct?

If so, it seems like for a single scene (e.g., scene 00000) the position of the right camera relative to the left camera changes for different frames. E.g., if T_left_0, T_right_0, T_left_100, T_right_100 are the poses read from 00000/Extrinsics0/00000.txt, 00000/Extrinsics1/00000.txt, 00000/Extrinsics0/00100.txt, 00000/Extrinsics1/00100.txt respectively and converted to 4x4 homogenous transformations, then np.linalg.inv(T_left_0) @ T_right_0 != np.linalg.inv(T_left_100) @ T_right_100. I expected the stereo cameras to stay at the same position relative to each other. Can you please tell me if I'm doing something incorrect here?
opened by saryazdi 2
Possible negative loss value in bimodal_loss

The idea of bimodal mixture density is brilliant! However I notice that, theoretically, the loss function can produce negative values in certain circumstances. And I actually got negative losses during the training procedure.

In SMD-Nets/lib/model/losses.py we have:

def laplacian(mu, b, labels): return 0.5 * torch.exp(-(torch.abs(mu-labels)/b))/b def bimodal_loss(mu0, mu1, sigma0, sigma1, w0, w1, labels, dist="gaussian"): return - torch.log(w0 * distribution(mu0, sigma0, labels, dist) + \ w1 * distribution(mu1, sigma1, labels, dist))

Negative values come out for -log(x) when x>1, where x is related to 'exp(-(abs(diff)/b))/b'. We can have, for example, when diff=b=0.01, w0=1, the loss value would be: -log(1* 0.5*exp(-0.01/0.01)/0.01) = -2.91 The function of 'y = 0.5 * exp(x/b)/b' can be plotted to imply that, for all 'x', only when b<0.2 can achieve y<1

I wonder is it normal to have negative values during your training. or if there are any more restriction applied on mu, sigma and w0 to prevent this from happening.

opened by JagerU 2
Predict Disparity on Custom Data using Pre-trained Model

Your results looks fantastic! I would like to test on my own images, but had some trouble modifying the provided test script to do so. Are there instructions for running a pre-trained model on custom data with no ground truth?

Thanks, David

opened by djc26 0
The way of resizing GT

Thanks for sharing your excellent work! It seems that the way of resizing GT in your code is nearest neighbor. And these is no scale jittering augmentation in your implementation. When gt is dense, nearest neighbor can lead to less flying point in object boundary but less precise than bilinear interpolation in other area. I am not clear what influence bilinear interpolation can bring to your method, especially to the boundary.

opened by zhujiagang 0
Why do you put the disparity ground truth values in the samples?

Hi @fabiotosi92 , Thanks for your work! I saw you put the disparity ground truth in the samples variable but didn't use it in the query operation. May I ask you why? or it has something I am misunderstanding?

Thanks!

opened by suyuzhang 0

Given groups=1, weight of size [16, 3, 3, 3], expected input[4, 4, 2048, 1536] to have 3 channels, but got 4 channels instead

I only downloaded the 00008 data as a training and validation set due to network problems, but got this error

/home/rc/anaconda3/envs/DL/bin/python3.7 /home/rc/StereoMatching/SMD-Nets/apps/train.py --dataroot /home/rc/StereoMatching/SMD-Nets/Dataset/UnrealStereo4K_00008 --checkpoints_path /home/rc/StereoMatching/SMD-Nets/checkpoints --training_file /home/rc/StereoMatching/SMD-Nets/filenames/train2.txt --testing_file /home/rc/StereoMatching/SMD-Nets/filenames/test2.txt --results_path /home/rc/StereoMatching/SMD-Nets/output --name train --mode passive --batch_size 2 --num_epoch 1 --learning_rate 1e-4 --gamma 0.1 --crop_height 2048 --crop_width 1536 --num_sample_inout 5000 --aspect_ratio 1. --sampling dda --output_representation bimodal --backbone HSMNet
train data size:  92
test data size:  10
/home/rc/StereoMatching/SMD-Nets/checkpoints/train
Start Training
Traceback (most recent call last):
  File "/home/rc/StereoMatching/SMD-Nets/apps/train.py", line 196, in <module>
    train(opt)
  File "/home/rc/StereoMatching/SMD-Nets/apps/train.py", line 113, in train
    errors = net.forward(left, right, sample, labels=labels)
  File "/home/rc/StereoMatching/SMD-Nets/lib/model/SMDHead.py", line 108, in forward
    self.filter(left, right)
  File "/home/rc/StereoMatching/SMD-Nets/lib/model/SMDHead.py", line 23, in filter
    self.feat_list = self.stereo_network(left, right)
  File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/rc/StereoMatching/SMD-Nets/lib/backbone/HSMNet/hsm.py", line 60, in forward
    conv4, conv3, conv2, conv1, enc0, enc1, enc2, enc3 = self.feature_extraction(torch.cat([left, right], 0))
  File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/rc/StereoMatching/SMD-Nets/lib/backbone/HSMNet/utils.py", line 80, in forward
    conv1 = self.convbnrelu1_1(x)
  File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/rc/StereoMatching/SMD-Nets/lib/backbone/HSMNet/utils.py", line 154, in forward
    outputs = self.cbr_unit(inputs)
  File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/rc/anaconda3/envs/DL/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [16, 3, 3, 3], expected input[4, 4, 2048, 1536] to have 3 channels, but got 4 channels instead

Process finished with exit code 1

The parameters I use are set like this --dataroot /home/rc/StereoMatching/SMD-Nets/Dataset/UnrealStereo4K_00008 --checkpoints_path /home/rc/StereoMatching/SMD-Nets/checkpoints --training_file /home/rc/StereoMatching/SMD-Nets/filenames/train2.txt --testing_file /home/rc/StereoMatching/SMD-Nets/filenames/test2.txt --results_path /home/rc/StereoMatching/SMD-Nets/output --name train --mode passive --batch_size 2 --num_epoch 1 --learning_rate 1e-4 --gamma 0.1 --crop_height 2048 --crop_width 1536 --num_sample_inout 5000 --aspect_ratio 1. --sampling dda --output_representation "bimodal" --backbone HSMNet

opened by rebecca0011 0

Owner

Fabio Tosi

Postdoc Researcher at University of Bologna - Computer Science and Engineering

GitHub

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

328 Jan 9, 2023

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks Contributions A novel pairwise feature LSP to extract structural

31 Dec 6, 2022

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Deep Conditional Gaussian Mixture Model for Constrained Clustering. This repository holds the code for the paper Deep Conditional Gaussian Mixture Mod

17 Oct 30, 2022

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Project Tutel Tutel MoE: An Optimized Mixture-of-Experts Implementation. Supported Framework: Pytorch Supported GPUs: CUDA(fp32 + fp16), ROCm(fp32) Ho

344 Dec 29, 2022

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources (e.g. just the lead vocals).

14 Nov 7, 2022

Pytorch implementation of paper: "NeurMiPs: Neural Mixture of Planar Experts for View Synthesis"

NeurMips: Neural Mixture of Planar Experts for View Synthesis This is the official repo for PyTorch implementation of paper "NeurMips: Neural Mixture

101 Dec 13, 2022

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

AMOS This repository contains the scripts for fine-tuning AMOS pretrained models on GLUE and SQuAD 2.0 benchmarks. Paper: Pretraining Text Encoders wi

22 Sep 15, 2022

Create animations for the optimization trajectory of neural nets

Animating the Optimization Trajectory of Neural Nets loss-landscape-anim lets you create animated optimization path in a 2D slice of the loss landscap

81 Dec 25, 2022

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Investigating U-NETS With Various Intermediate Blocks For Spectrogram-based Singing Voice Separation A Pytorch Implementation of the paper "Investigat

63 Nov 14, 2022

Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

2.2k Jan 9, 2023

Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

ReLU-GP Residual (RGPR) This repository contains code for reproducing the following NeurIPS 2021 paper: @inproceedings{kristiadi2021infinite, title=

4 Dec 26, 2021

SMD-Nets: Stereo Mixture Density Networks

Related tags

Overview

SMD-Nets: Stereo Mixture Density Networks

Requirements

Datasets

UnrealStereo4K

Training

Evaluation

Pretrained models

Qualitative results

Contacts

Acknowledgements

Comments

Owner

Fabio Tosi

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Pytorch implementation of paper: "NeurMiPs: Neural Mixture of Planar Experts for View Synthesis"

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Create animations for the optimization trajectory of neural nets

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Code for visualizing the loss landscape of neural nets

Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

Estimation of human density in a closed space using deep learning.

Details about the wide minima density hypothesis and metrics to compute width of a minima

PyTorch implementations of algorithms for density estimation

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch