Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021

Overview

DIFFNet

This repo is for Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021

A new backbone for self-supervised depth estimaiton.

If you think it is not a bad work, please consider citing it.

@article{diffnet_bmvc,
    title = {Self-Supervised Monocular DepthEstimation with Internal Feature Fusion},
    author  = { Hang Zhou, David Greenwood and Sarah Taylor},
    booktitle = {The British Machine Vision Conference (BMVC)},
    month = {November},
    year = {2021}}

** Paper, implementation details and trained models are coming soon **

Comparing with others

Evaluation on selected hard cases:

Trained weights

Acknowledgement

Thanks the authors for their works:

Comments
  • Loss

    Loss

    Hello, When I run the code, I wonder whether you used the options of uncertain_mask and flipping_loss. Because I can't reproduce the accuracy in your paper at the resolution of 1024*320. Thanks for your reply.

    opened by wangcong607 7
  • Missing Keys in Pretrained Weights

    Missing Keys in Pretrained Weights

    Hi @brandleyzhou, thank you for your great work!

    I met the following problem when testing your pretrained models:

    Exception has occurred: RuntimeError
    Error(s) in loading state_dict for HRDepthDecoder:
    	Missing key(s) in state_dict: "convs.up_x9_0.conv.conv.weight", "convs.up_x9_0.conv.conv.bias", "convs.up_x9_1.conv.conv.weight", "convs.up_x9_1.conv.conv.bias", "convs.72.ca.fc.0.weight", "convs.72.ca.fc.2.weight", "convs.72.conv_se.weight", "convs.72.conv_se.bias", "convs.36.ca.fc.0.weight", "convs.36.ca.fc.2.weight", "convs.36.conv_se.weight", "convs.36.conv_se.bias", "convs.18.ca.fc.0.weight", "convs.18.ca.fc.2.weight", "convs.18.conv_se.weight", "convs.18.conv_se.bias", "convs.9.ca.fc.0.weight", "convs.9.ca.fc.2.weight", "convs.9.conv_se.weight", "convs.9.conv_se.bias", "convs.dispConvScale0.conv.weight", "convs.dispConvScale0.conv.bias", "convs.dispConvScale1.conv.weight", "convs.dispConvScale1.conv.bias", "convs.dispConvScale2.conv.weight", "convs.dispConvScale2.conv.bias", "convs.dispConvScale3.conv.weight", "convs.dispConvScale3.conv.bias", "decoder.0.conv.conv.weight", "decoder.0.conv.conv.bias", "decoder.1.conv.conv.weight", "decoder.1.conv.conv.bias", "decoder.2.ca.fc.0.weight", "decoder.2.ca.fc.2.weight", "decoder.2.conv_se.weight", "decoder.2.conv_se.bias", "decoder.3.ca.fc.0.weight", "decoder.3.ca.fc.2.weight", "decoder.3.conv_se.weight", "decoder.3.conv_se.bias", "decoder.4.ca.fc.0.weight", "decoder.4.ca.fc.2.weight", "decoder.4.conv_se.weight", "decoder.4.conv_se.bias", "decoder.5.ca.fc.0.weight", "decoder.5.ca.fc.2.weight", "decoder.5.conv_se.weight", "decoder.5.conv_se.bias", "decoder.6.conv.weight", "decoder.6.conv.bias", "decoder.7.conv.weight", "decoder.7.conv.bias", "decoder.8.conv.weight", "decoder.8.conv.bias", "decoder.9.conv.weight", "decoder.9.conv.bias". 
    

    The pretrained weights are downloaded from this repository page. Specifically, I was testing two pretrained models:

    Could you please have a look at this and upload the complete models? Thanks in advance!

    opened by ldkong1205 5
  • Saved trained models

    Saved trained models

    Hi, Thank you for sharing your amazing code. I run the training code and it was trained for 20 epochs but I don't know where the models are saved? also your code save each epoch results separately or only save the last epoch? and the last question, where can I change the number of epochs for training?

    opened by MohsenMoradiArt 5
  • cannot reproduced results mentioned in the paper

    cannot reproduced results mentioned in the paper

    Hi, I trained your model with 640x192 and 1025x320 input sizes, but the results are different from what you mentioned in the paper.

    Here are the results I got:

    -> Computing predictions with size 640x192 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.108 & 0.792 & 4.589 & 0.186 & 0.889 & 0.963 & 0.982 \

    -> Computing predictions with size 1024x320 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.103 & 0.909 & 4.642 & 0.183 & 0.899 & 0.965 & 0.982 \

    And here are the results mentioned in the paper:

    Screenshot_1

    Screenshot_2

    I don't know what cause this difference, because when I used your pre-trained weights for evaluation, I got the same results as yours. do you have any idea why? maybe the code has slightly changed? or a different version of the torch can cause this?

    opened by ArminMasoumian 4
  • Changing Input Size

    Changing Input Size

    Hi, The provided code gives the results for 640x192 image size. where can I change it to the original size input (1024x320) and train with that? Also, it seems that you add an internal feature fusion to the original HRNet, I would like to remove that and test it with the original HRNet. In the "test_hr_encoder.py" I tried to remove "mixed_features" and only return "features", but in the decoder, I get an error. Is there any way to train your model with the original HRNet?

    opened by ArminMasoumian 4
  • Environment

    Environment

    Hi, thank you for sharing your nice work.

    Could you share the environment setting such as versions of packages for this work?

    I cannot reproduce the results of this paper even if using the pretrained model that is provided in this repo.

    In my evaluation: 0.1024 0.7632 4.482 1.799 0.8954 0.9645 0.9831

    In the paper: 0.102 0.764 4.483 0.180 0.896 0.965 0.983

    opened by seb-le 4
  • issue about downloading hrnet pretrained on ImageNet

    issue about downloading hrnet pretrained on ImageNet

    First of all, thank you for sharing this cool work.

    I faced an issue that an error occurred when I run the start2train.sh.

    The error is below: image

    I ran start2train.sh on another computer that has a different IP address, but the error also occurred.

    Thank you.

    opened by seb-le 4
  • Training and testing issue

    Training and testing issue

    Hi,

    When I'm trying to test a simple image by running the " sh test_sample.sh " code, I get this error: " ModuleNotFoundError: No module named 'hr_networks' "

    Would you please let me know how I can get "hr_networks"?

    Also when I tried to train the model, this error popup:

    from .hrnet_config import MODEL_CONFIGS File "/media/armin/DATA/DIFFNet/networks/hrnet_config.py", line 5, in from yacs.config import CfgNode as CN ModuleNotFoundError: No module named 'yacs'

    Do I miss something here?

    opened by ArminMasoumian 2
  • where is the supplementary material mentioned in paper

    where is the supplementary material mentioned in paper

    At the bottom of page 9 said "The corresponding images are shown in the supplementary material." but I can't find the supplementary material section in this paper. is there any misunderstanding on my part? thanks for your time

    opened by Shiwen615 2
  • About model's FPS

    About model's FPS

    Hello Thank you for your good work!!

    I'm calculating the DIFFNet's FPS in RTX2080ti to fairly compare our works. But the DIFFNet and monodepth2 's fps are so different from those reported in your paper. Can I get your code to calculate the fps, please?

    I measured the fps with the following code.

    import torch
    import networks
    en = networks.test_hr_encoder.hrnet18(False)
    en.num_ch_enc = [ 64, 18, 36, 72, 144 ]
    de= networks.HRDepthDecoder(en.num_ch_enc, [0])
    # depth_net=DepthResNet(version="101pt")
    =
    device = torch.device('cuda')
    en.to(device)
    en.eval()
    de.to(device)
    de.eval()
    optimal_batch_size=1
    dummy_input = torch.randn(optimal_batch_size, 3,192,640, dtype=torch.float).to(device)
    repetitions=10000
    total_time = 0
    print("start calculate")
    with torch.no_grad():
          for rep in range(repetitions):
                 starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
                 starter.record()
                 _ = de(en(dummy_input))
                 ender.record()
                 torch.cuda.synchronize()
                 curr_time = starter.elapsed_time(ender)/1000
                 if rep!=0:
                     total_time += curr_time
    repetitions=repetitions-1
    print(total_time)
    Throughput = (repetitions*optimal_batch_size)/total_time
    print('Final FPS:',Throughput,' total_time:',total_time)
    print("weight num: ",sum(p.numel() for p in en.parameters())+sum(p.numel() for p in de.parameters()))
    

    And the following results were obtained for each models.

    | Model| FPS | |:------:|:------:| | DIFFNet | 34.92 | | Monodepth2 | 282.25 |

    opened by big-chan 2
  • the layer's names of the decoder model and the depth.pth in diffnet_1024x320_ttr are not the same

    the layer's names of the decoder model and the depth.pth in diffnet_1024x320_ttr are not the same

    hello, it seems that the layer's names of the decoder model and the depth.pth in diffnet_1024x320_ttr are not the same and cause error when running evaluate_depth.py

    opened by czh0001 2
  • Training

    Training

    Thanks for your working. Here are something detials i want 2 ask you . Here are my torch torch 1.7.1+cu110 torchaudio 0.7.2 torchsummary 1.5.1
    torchvision 0.8.2+cu110 I found when i set the initial learning rate as 10−4 for the first 14 epochs and then 10−5 for last 5 epochs ,my experimental results are very different from yours . Is it the reason for different PyTorch versions?Or my training process wrong?

    opened by ljy199712 3
  • test file missing

    test file missing

    Thanks for your work of DIFFNet! I want to evaluate the results of the training in my PC, but the file "splits/eigen/gt_depths.npz" is required. I can't find it in the document. Could you please provide this file? Thanks!

    opened by Renatusphere 1
  • Cityscapes model

    Cityscapes model

    Hi. First, thank you for opening your nice paper and source code.

    Could you share checkpoints that were pretrained on Cityscapes and fine-tuned on KITTI (i.e., CS → K)?

    I would like to know whether DiffNet that I pretrained on Cityscapes is correct.

    Thanks!

    opened by seb-le 1
  • About torch::jit::trace

    About torch::jit::trace

    Hello, Thank you for sharing your work, and I want to use libtorch to deploy this network in C++, but when using torch::jit::trace(), I get this error(executing test_sample.py can run successfully): image image Because torch::jit::trace() cannot handle dictionary, I changed the output of depth_decoder to list, and there is a line "import hr_networks" in test_sample.py, but I did not find hr_networks, I don't know if this affectstorch::jit::trace().

    Thank you very much!

    opened by Hugo699 3
  • Multi-GPU training hangs

    Multi-GPU training hangs

    Hello, When I start multi gpu training. I run the following command. python -m torch.distributed.launch --nproc_per_node=2 train.py --split eigen_zhou --learning_rate 1e-4 --height 320 --width 1024 --scheduler_step_size 14 --batch_size 2 --model_name mono_model --png --data_path ../4_monodepth2/data/KITTI/ --num_epochs 40 --log_dir weights_logs

    If I set --nproc_per_node=1, then it runs alright on single GPU, but if I set --nproc_per_node=2, then it just prints the comments before it initializes distributed training but after that, it just stucks. From nvidia-smi, I can see the GPUs are 100% occupied, but training does not start (weight_logs also does not get created)

    I have attached screenshot where it gets stuck. Can you please help me with knowing what this might be? diffnet_multigpuStuck

    Thank you for you time.

    opened by tushardmaske 2
Owner
Hang
Hang
Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments Paper: arXiv (ICRA 2021) Video : https://youtu.be/CC

Sachini Herath 68 Jan 3, 2023
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022
[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

Rui Peng 110 Dec 23, 2022
Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

Ziyue Feng 72 Dec 9, 2022
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

null 304 Jan 3, 2023
Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

JBHI-Pytorch This repository contains a reference implementation of the algorithms described in our paper "Self-supervised Multi-modal Hybrid Fusion N

FeiyiFANG 5 Dec 13, 2021
arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Andrej 671 Dec 31, 2022
Listing arxiv - Personalized list of today's articles from ArXiv

Personalized list of today's articles from ArXiv Print and/or send to your gmail

Lilianne Nakazono 5 Jun 17, 2022
Arxiv harvester - Poor man's simple harvester for arXiv resources

Poor man's simple harvester for arXiv resources This modest Python script takes

Patrice Lopez 5 Oct 18, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

ASFormer: Transformer for Action Segmentation This repo provides training & inference code for BMVC 2021 paper: ASFormer: Transformer for Action Segme

null 42 Dec 23, 2022
The pytorch implementation of SOKD (BMVC2021).

Semi-Online Knowledge Distillation Implementations of SOKD. Requirements This repo was tested with Python 3.8, PyTorch 1.5.1, torchvision 0.6.1, CUDA

null 4 Dec 19, 2021
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 11 Oct 21, 2021
[BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"

DomainMix [BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations" [paper] [de

Wenhao Wang 17 Dec 20, 2022
Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

MOS-Multi-Task-Face-Detect Introduction This repo is the official implementation of "MOS: A Low Latency and Lightweight Framework for Face Detection,

null 104 Dec 8, 2022
This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

3D-CVF This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object

YecheolKim 97 Dec 20, 2022
a practicable framework used in Deep Learning. So far UDL only provide DCFNet implementation for the ICCV paper (Dynamic Cross Feature Fusion for Remote Sensing Pansharpening)

UDL UDL is a practicable framework used in Deep Learning (computer vision). Benchmark codes, results and models are available in UDL, please contact @

Xiao Wu 11 Sep 30, 2022