Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Qin Wang

Last update: Nov 30, 2022

Related tags

Deep Learning corda

Overview

CorDA

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Prerequisite

Please create and activate the following conda envrionment

# It may take several minutes for conda to solve the environment
conda env create -f environment.yml
conda activate corda

Code was tested on a V100 with 16G Memory.

Train a CorDA model

# Train for the SYNTHIA2Cityscapes task
bash run_synthia_stereo.sh
# Train for the GTA2Cityscapes task
bash run_gta.sh

Test the trained model

bash shells/eval_syn2city.sh
bash shells/eval_gta2city.sh

Pre-trained models are provided (Google Drive). Please put them in ./checkpoint.

The provided SYNTHIA2Cityscapes model achieves 56.3 mIoU (16 classes) at the end of the training.
The provided GTA2Cityscapes model achieves 57.7 mIoU (19 classes) at the end of the training.

Reported Results on SYNTHIA2Cityscapes

Method	mIoU*(13)	mIoU(16)
CBST	48.9	42.6
FDA	52.5	-
DADA	49.8	42.6
DACS	54.8	48.3
CorDA	62.8	55.0

Citation

Please cite our work if you find it useful.

@article{wang2021domain,
  title={Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation},
  author={Wang, Qin and Dai, Dengxin and Hoyer, Lukas and Fink, Olga and Van Gool, Luc},
  journal={arXiv preprint arXiv:2104.13613},
  year={2021}
}

Acknowledgement

DACS is used as our codebase and our DA baseline official
SFSU as the source of stereo Cityscapes depth estimation Official

Data links

Download links
- Stereo Depth Estimation for Cityscapes
- Mono Depth Estimation for GTA
- SYNTHIA Depth and images SYNTHIA-RAND-CITYSCAPES (CVPR16)
Dataset Folder Structure Tree

For questions regarding the code, please contact [email protected] .

Comments

Training on a custom dataset without ground truth label

From what I understand after reading your paper, you do not need ground truth label data on the target domain to train the pseudo labels. However, when I look at cityscapes_loader, it seems I need to supply the ground truth seg maps as well.

I am trying to train the network on a custom dataset (that only depth maps, and ground truth seg map only on the source domain), but it looks I cannot get away without providing it. Do you have any thoughts on this?

opened by chophilip21 6
Coufusion about the 'depth' of cityscapes

Hello, nice work but i meet some question.

in 'data/cityscapes_loader.py' line 181-183:

depth = cv2.imread(depth_path, flags=cv2.IMREAD_ANYDEPTH).astype(np.float32) / 256. + 1. if depth.shape != lbl.shape: depth = cv2.resize(depth, lbl.shape[::-1], interpolation=cv2.INTER_NEAREST) Monocular depth: in disparity form 0 - 65535

(1) Why the depth is calculated from x/256+1
(2) is it the depth or the disparity ? In the official doc of cityscapes, it say disparity = (x-1)/256

Thank you!

opened by ganyz 6
gta2city

When I revisited the performance of your GTA2City, I found that the MIOU could only reach about 54.8 after 250,000 iterations. I didn't change anything except the 10.2 version of CUDA. Could you please provide the training log of your GTA2City? Thanks a lot！！

opened by xiaoachen98 6
Question about the pretrained parameters of backbone

Thanks for sharing the code, and it brings the amazing improvement for this filed.

I notice that you have used backbone with parameters pretrained on MSCOCO which is the same with DACS, and have you tried backbone pretrained on ImageNet? If yes, could you please provide the corresponding results?

opened by super233 4
About intrinsics used in GTA depth estimation

Thanks a lot for your fantastic work. When I followed your depth estimation mentioned in issue#7, I went to the https://playing-for-benchmarks.org. However,its camera calibration doesn't include intrinsic matrix directly, which is needed in Monodepth2 depth estimation. Would you kindly share the intrinsic of GTA you used in depth estimation? Or may I know a way to convert GTA's projection matrix to intrinsic matrix?

opened by Ichinose0code 2

Why does the class Train have 0 mIoU， What may could happen

I download your pretrained model, and start demo But I find train iou 0.0

(yy_corda) ailab@ailab:/media/ailab/data/yy/corda$ bash shells/eval_gta2city.sh
./checkpoint/gta
Found 500 val images
Evaluating, found 500 batches.
100 processed
200 processed
300 processed
400 processed
500 processed
class  0 road         IU 94.81
class  1 sidewalk     IU 62.18
class  2 building     IU 88.03
class  3 wall         IU 33.09
class  4 fence        IU 43.51
class  5 pole         IU 39.93
class  6 traffic_light IU 49.46
class  7 traffic_sign IU 54.68
class  8 vegetation   IU 88.01
class  9 terrain      IU 47.67
class 10 sky          IU 89.22
class 11 person       IU 68.22
class 12 rider        IU 39.21
class 13 car          IU 90.25
class 14 truck        IU 51.43
class 15 bus          IU 58.37
class 16 train        IU 0.00
class 17 motorcycle   IU 40.38
class 18 bicycle      IU 57.42
meanIOU: 0.5767768805758403

I train my model on it, and test eval_syn2city.py. Here are 3 classes Iou 0.0 because missing classed in source domain. but I download pretrained model ,and run eval_gta2city.sh still miss one class train. So, I want to know why. Is it may train class didn't appear city datasets？ So it's IOU is 0.

opened by yuheyuan 2

How to obtain your depth datasets?
Hi, thanks for your great work!

It would be great if you can elaborate more on how you obtain the monocular depth estimation.

I understand that you've uploaded the dataset, but it would be really helpful if I know exactly how you've done it.

From your paper, in the ablation study part: "We would like to highlight that for both stereo and monocular depth estimations, only stereo pairs or image sequences from the same dataset are used to train and generate the pseudo depth estimation model. As no data from external datasets is used, and stereo pairs and image sequences are relatively easy to obtain, our proposal of using self-supervised depth have the potential to be effectively realized in real-world applications."

So I image you get your monocular depth pseudo ground truth by:

Downloading target domain videos (here Cityscapes. Btw, where do you get Cityscapes videos?)

Train a Monodepth2 model on those videos (for how long?)

Use the model to get pseudo ground truth Then repeat to the source domain (GTA 5 or Synthia)

Am I getting it right? And is there any more important points you want to highlight when calculating such depth labels?

Regards, Tu
opened by tudragon154203 2
How to continue train?

when I use script llike

CUDA_VISIBLE_DEVICES=0 python3 -u trainUDA_gta.py --config ./configs/configUDA_gta2city.json --name UDA-gta --resume /saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth | tee ./gta-corda.log

It would run again but the new checkpoint would be saved.

opened by ygjwd12345 2

Warning:optimizer contains a parameter group with duplicate parameters

I follow you code, and train a model. But it results may not meet the need.

I eval the model you share .

bash shells/eval_syn2city.sh

your share model. in syn2city : 19 classes : meanIout: 0.4771 I train the model: in syn2city : 19 classes : meanIout: only: 0.46.7

in the train. I find the warning, So I want to know if it may cause the result drop.

/home/ailab/anaconda3/envs/yy_CORDA/lib/python3.7/site-packages/torch/optim/sgd.py:68: UserWarning: optimizer contains a parameter group with duplicate parameters; in future, this will cause an error; see github.com/pytorch/pytorch/issues/40967 for more information
  super(SGD, self).__init__(params, defaults)
D_init tensor(134.8489, device='cuda:0', grad_fn=<DivBackward0>) D tensor(134.5171, device='cuda:0', grad_fn=<DivBackward0>)

opened by yuheyuan 1

May deeplabv2_synthia.py have extra space symbol

if the forward code, return out ,an extra space symbol

   def forward(self, x):
        out = self.conv2d_list[0](x)
        for i in range(len(self.conv2d_list)-1):
            out += self.conv2d_list[i+1](x)
            return out

this is the code in your code

class Classifier_Module(nn.Module):

    def __init__(self, dilation_series, padding_series, num_classes):
        super(Classifier_Module, self).__init__()
        self.conv2d_list = nn.ModuleList()
        for dilation, padding in zip(dilation_series, padding_series):
            self.conv2d_list.append(nn.Conv2d(256, num_classes, kernel_size=3, stride=1, padding=padding, dilation=dilation, bias = True))

        for m in self.conv2d_list:
            m.weight.data.normal_(0, 0.01)

    def forward(self, x):
        out = self.conv2d_list[0](x)
        for i in range(len(self.conv2d_list)-1):
            out += self.conv2d_list[i+1](x)
            return out

I this this forward is possible.beaceuse your code, use list contain four elements,if return out have space, this may do only twice without fourth

   def forward(self, x):
        out = self.conv2d_list[0](x)
        for i in range(len(self.conv2d_list)-1):
            out += self.conv2d_list[i+1](x)
        return out

   self.__make_pred_layer(Classifier_Module,[6,12,18,24],[6, 12,18, 24],NUM_OUTPUT[task]

   def _make_pred_layer(self,block, dilation_series, padding_series,num_classes):
        return block(dilation_series,padding_series,num_classes)

opened by yuheyuan 1

checkpoints links fail

I can't download the checkpoints file from your links, when click into the google drive, The file size is shown to be 2GB, but it was only 0B when downloaded

opened by xiaoachen98 0

Releases(iccv)

iccv(Sep 30, 2021)

Source code(tar.gz)
Source code(zip)
poster_iccv21.pdf(2.01 MB)

Owner

Qin Wang

PhD student @ ETH Zürich

GitHub

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

39 Nov 21, 2022

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

13 Dec 10, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

4 Jul 27, 2022

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

87 Oct 19, 2022

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

110 Dec 23, 2022

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

3 Oct 22, 2021

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

32 Jun 14, 2022

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

205 Dec 30, 2022

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

138 Dec 28, 2022

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

247 Dec 25, 2022

Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

174 Dec 22, 2022

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

8 Nov 21, 2022

Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation by Lukas Hoyer, Dengxin Dai, and Luc Van Gool [Arxiv] [Paper] Overview Unsup

149 Dec 28, 2022

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

1.1k Jan 2, 2023

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

sssegmentation is a general framework for our research on strongly supervised semantic segmentation.

445 Jan 2, 2023

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Related tags

Overview

CorDA

Prerequisite

Train a CorDA model

Test the trained model

Citation

Acknowledgement

Data links

Comments

Releases(iccv)

iccv(Sep 30, 2021)

Owner

Qin Wang

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

sssegmentation is a general framework for our research on strongly supervised semantic segmentation.