Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Overview

Official PyTorch code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction.
Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe, Elisa Ricci.
Apply Transformer into depth predciton and surface normal estimation.

Prepare pretrain model

we choose R50-ViT-B_16 as our encoder.

wget https://storage.googleapis.com/vit_models/imagenet21k/R50+ViT-B_16.npz 
mkdir ./model/vit_checkpoint/imagenet21k 
mv R50+ViT-B_16.npz ./model/vit_checkpoint/imagenet21k/R50+ViT-B_16.npz

Prepare Dateset

prepare nyu

mkdir -p pytorch/dataset/nyu_depth_v2
python utils/download_from_gdrive.py 1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP pytorch/dataset/nyu_depth_v2/sync.zip
cd pytorch/dataset/nyu_depth_v2
unzip sync.zip

prepare kitti

cd dataset
mkdir kitti_dataset
cd kitti_dataset
### image move kitti_archives_to_download.txt into kitti_dataset
wget -i kitti_archives_to_download.txt

### label
wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_depth_annotated.zip
unzip data_depth_annotated.zip
cd train
mv * ../
cd ..  
rm -r train
cd val
mv * ../
cd ..
rm -r val
rm data_depth_annotated.zip

Environment

pip install -r requirement.txt

Run

Train

CUDA_VISIBLE_DEVICES=0,1,2,3 python bts_main.py arguments_train_nyu.txt
CUDA_VISIBLE_DEVICES=0,1,2,3 python bts_main.py arguments_train_eigen.txt

Test: Pick up nice result

CUDA_VISIBLE_DEVICES=1 python bts_test.py arguments_test_nyu.txt
python ../utils/eval_with_pngs.py --pred_path vis_att_bts_nyu_v2_pytorch_att/raw/ --gt_path ../../dataset/nyu_depth_v2/official_splits/test/ --dataset nyu --min_depth_eval 1e-3 --max_depth_eval 10 --eigen_crop
CUDA_VISIBLE_DEVICES=1 python bts_test.py arguments_test_eigen.txt
python ../utils/eval_with_pngs.py --pred_path vis_att_bts_eigen_v2_pytorch_att/raw/ --gt_path ./dataset/kitti_dataset/ --dataset kitti --min_depth_eval 1e-3 --max_depth_eval 80 --do_kb_crop --garg_crop

Debug

CUDA_VISIBLE_DEVICES=1 python bts_main.py arguments_train_nyu_debug.txt

Download Pretrained Model

sh scripts/download_TransDepth_model.sh kitti_depth

sh scripts/download_TransDepth_model.sh nyu_depth

sh scripts/download_TransDepth_model.sh nyu_surfacenormal

Reference

BTS

ViT

Do‘s code

Visualization result share

We provide all vis result of all tasks. link

Comments
  • Pretrained Encoder link

    Pretrained Encoder link

    In the README, the pretrained encoder cmd should be:

    wget https://storage.googleapis.com/vit_models/imagenet21k/R50%2BViT-B_16.npz
    

    not the current:

    wget https://storage.googleapis.com/vit_models/imagenet21k/R50-ViT-B_16.npz
    
    opened by BrianPugh 8
  • Inconsistency between the text and code

    Inconsistency between the text and code

    Hi,

    Thanks for the great work. Actually, in Fig. 2 of the paper it is written that "*" stands for convolution. For example I_{r-->r}^{i}*f_{r} in Eq. (8) means these two maps get convolved together. However, in code you just use an element-wise multiplication between these two feature maps.

    My second question is about unfolding. It seems that after unfolding the input variable (https://github.com/ygjwd12345/TransDepth/blob/0a7422c6d816429b9f3fc4cca19d93de8cd1ab8a/pytorch/AttentionGraphCondKernel.py#L101), we get an output with the same spatial size but 9 additional channels, in addition to the previous channels we are already provided. I was just wondering if the spatial content was preserved by this type of unforlding, I mean if we sample the top right corner of the spatial maps, whether all the channels are from the same spatial location in the original map.

    Thanks,

    opened by Mathilda88 7
  • Problems during training

    Problems during training

    Hello, I only have one GPU, when I try to train with NYU dataset, enter the following command

    CUDA_VISIBLE_DEVICES=0 python bts_main.py arguments_train_nyu.txt

    Found the following problems

    -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/home/ace/PycharmProjects/TransDepth-main/pytorch/bts_main.py", line 439, in main_worker var_sum = np.sum(var_sum) File "<array_function internals>", line 6, in sum File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2248, in sum initial=initial, where=where) File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/tensor.py", line 621, in array return self.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

    I found some solutions but none of them worked,How can I solve this?I sincerely look forward to your reply

    opened by ZYCheng777 7
  • How to get NYUv2 normal GT of training set?

    How to get NYUv2 normal GT of training set?

    Hi, thanks for your work. I notice that you refer to the https://github.com/MARSLab-UMN/TiltedImageSurfaceNormal in the issue https://github.com/ygjwd12345/TransDepth/issues/11 for NYUv2 dataset. However, that project only provides the image & surface normal GT in terms of test set. I wonder how can I get the normal GT of NYUv2 training data so that dataloader can work well https://github.com/ygjwd12345/TransDepth/blob/main/surface_normal/dataset_loader/dataset_loader_nyud_2.py#L35 .

    opened by mt-cly 6
  • Performance gap between baseline method: BTS

    Performance gap between baseline method: BTS

    Hi, thanks for the great work. When I read your paper, I find: "We choose the ResNet-50 with the same prediction head as our baseline", but there are no words about the "decoder head" design, so I come to GitHub to figure it out. I find your method bases on BTS and uses its decoder:

    https://github.com/ygjwd12345/TransDepth/blob/3ae116f045243f24c72a4fc558634d0cf823fd1b/pytorch/bts.py#L347

    So, "We choose the ResNet-50 with the same prediction head as our baseline" means you replace the BTS encoder with ResNet-50, and preserve other setting the same. I recently reproduced the BTS with their official code, so I am a little bit familiar with its quantitative results. Although the result of the baseline on the NYU dataset is similar to the one reported in BTS, when it comes to the KITTI, I find that your baseline result is much lower than the one reported in BTS. As follows:

    NYU: (Abs rel, RMSE, a1, a2, a3) Your report: 0.118 0.414 0.866 0.979 0.995 (TransDepth, Table.2, Baseline) BTS report: 0.119 0.419 0.865 0.975 0.993 (BTS, Table. 5, ResNet-50)

    KITTI: (Abs rel, RMSE, a1, a2, a3) Your report: 0.106 3.981 0.888 0.967 0.986 (TransDepth, Table.1, Baseline) BTS report: 0.061 2.803 0.954 0.992 0.998 (BTS, Table. 6, ResNet-50)

    May I ask if I misunderstood, or did you use a different setting from the BTS?

    opened by zhyever 6
  • NYU Surface normal dataset

    NYU Surface normal dataset

    Thanks for the great work!

    I have a question about the surface normal dataset: NYU. I download and unzip following the description but I cannot find the surface normal. Can you help me?

    Thanks.

    mkdir -p pytorch/dataset/nyu_depth_v2
    python utils/download_from_gdrive.py 1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP pytorch/dataset/nyu_depth_v2/sync.zip
    cd pytorch/dataset/nyu_depth_v2
    unzip sync.zip
    
    opened by ChenyangLEI 5
  • The download link has something wrong!

    The download link has something wrong!

    In download_from_gdrive.py file, the link "https://docs.google.com/uc?export=download1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP" has something wrong. I can't have the access to the website. image image The error code is 400, so maybe the link is wrong. Thanks for your help!

    opened by huacong 3
  • [Memory Address Question] How to control gpu memory usage in this code?

    [Memory Address Question] How to control gpu memory usage in this code?

    Thank you for your excellent work. I encountered a CUDA out of memory error while turning your code. Perhaps I think this is a problem caused by a lack of gpu memory. Because of this, I increased the number of num_threads in the multi-gpu part of your code and reduced the batch size, but the error still does not disappear. Do you happen to know how to control this?

    Below is the full text of errors.

    -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/home/cv1/TransDepth/pytorch/bts_main.py", line 347, in main_worker model = BtsModel(args) File "/home/cv1/TransDepth/pytorch/bts.py", line 345, in init self.encoder = ViT_seg(config_vit, img_size=[params.input_height,params.input_width], num_classes=config_vit.n_classes).cuda() File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply module._apply(fn) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply module._apply(fn) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply module._apply(fn) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 223, in _apply param_applied = fn(param) File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 10.76 GiB total capacity; 400.86 MiB already allocated; 66.69 MiB free; 452.00 MiB reserved in total by PyTorch)

    opened by sjg02122 3
  • About AGD

    About AGD

    Excuse me, the current code does not seem to use AGD because it is commented in bts.py def forward(self, x, focal,rank=0): skip_feat = self.encoder(x) # for i in range(len(skip_feat)): # print(skip_feat[i].shape) # skip_feat[5] = self.AttentionGraphCondKernel(skip_feat[2],skip_feat[3],skip_feat[4],skip_feat[5],rank) return self.decoder(skip_feat, focal) But I found that it works as good as using AGD during training. Why is that? image

    opened by kuangqi93 2
  • Kitti Pretrained download Error

    Kitti Pretrained download Error

    Thank you for your excellent work.

    Currently, if I try to download checkpoint using .sh file in your script, It doesn't work with Error404 . Could you check the pretrained file link?

    The following is the full text of the error.

    Note: available models are kitti_depth, nyu_depth, and nyu_surfacenormal Specified [kitti_dpeth] WARNING: timestamping does nothing in combination with -O. See the manual for details.

    --2022-09-27 08:03:41-- http://disi.unitn.it/~hao.tang/uploads/models/TransDepth/kitti_dpeth_pretrained.tar.gz Resolving disi.unitn.it (disi.unitn.it)... 193.205.194.4 Connecting to disi.unitn.it (disi.unitn.it)|193.205.194.4|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2022-09-27 08:03:42 ERROR 404: Not Found.

    opened by sjg02122 1
  • How do I train the model on a windows machine?

    How do I train the model on a windows machine?

    I already solved several errors arising due to the code not being directly compatible with windows, but then in the train command, I am getting an error which I can't seem to fix. Can you help me out in this? image

    I am using Torch version: 1.8.0 and CUDA: 11.4

    opened by shreyash0502 1
  • Improve OS related lines and add .gitignore

    Improve OS related lines and add .gitignore

    Hi, When I try to run your code on Windows, there are some OS system incompatible errors related to the os.system(command) function. I replace the "make directory command" with os.makedirs() and replace "cp files command" with util.copyfile() to make your code more robust.

    Also, I add python-version .gitignore.

    opened by syKevinPeng 0
Owner
stanley
stanley
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Receptive Field Block Net for Accurate and Fast Object Detection By Songtao Liu, Di Huang, Yunhong Wang Updatas (2021/07/23): YOLOX is here!, stronger

Liu Songtao 1.4k Dec 21, 2022
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022
AdelaiDepth is an open source toolbox for monocular depth prediction.

AdelaiDepth is an open source toolbox for monocular depth prediction.

Adelaide Intelligent Machines (AIM) Group 743 Jan 1, 2023
Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

Kranti Kumar Parida 33 Jun 27, 2022
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

null 1 Oct 2, 2021
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

Toronto Robotics and AI Laboratory 289 Jan 5, 2023
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 3, 2023
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Single Image Depth Prediction with Wavelet Decomposition Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambeto

Niantic Labs 205 Jan 2, 2023
[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

Rui Peng 110 Dec 23, 2022
Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

Ziyue Feng 72 Dec 9, 2022
This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

Hang 3 Oct 22, 2021
Apply our monocular depth boosting to your own network!

MergeNet - Boost Your Own Depth Boost custom or edited monocular depth maps using MergeNet Input Original result After manual editing of base You can

Computational Photography Lab @ SFU 142 Dec 17, 2022
SimpleDepthEstimation - An unified codebase for NN-based monocular depth estimation methods

SimpleDepthEstimation Introduction This is an unified codebase for NN-based monocular depth estimation methods, the framework is based on detectron2 (

null 8 Dec 13, 2022
ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Ibai Gorordo 18 Nov 6, 2022
ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Ibai Gorordo 14 Dec 9, 2022
Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

Meta Research 122 Dec 13, 2022