Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

Yi-Hsuan Tsai

Last update: Dec 30, 2022

Related tags

Deep Learning computer-vision deep-learning pytorch generative-adversarial-network semantic-segmentation adversarial-learning domain-adaptation

Overview

Learning to Adapt Structured Output Space for Semantic Segmentation

Pytorch implementation of our method for adapting semantic segmentation from the synthetic dataset (source domain) to the real dataset (target domain). Based on this implementation, our result is ranked 3rd in the VisDA Challenge.

Contact: Yi-Hsuan Tsai (wasidennis at gmail dot com) and Wei-Chih Hung (whung8 at ucmerced dot edu)

Paper

Learning to Adapt Structured Output Space for Semantic Segmentation
Yi-Hsuan Tsai*, Wei-Chih Hung*, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang and Manmohan Chandraker
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (spotlight) (* indicates equal contribution).

Please cite our paper if you find it useful for your research.

@inproceedings{Tsai_adaptseg_2018,
  author = {Y.-H. Tsai and W.-C. Hung and S. Schulter and K. Sohn and M.-H. Yang and M. Chandraker},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  title = {Learning to Adapt Structured Output Space for Semantic Segmentation},
  year = {2018}
}

Example Results

Quantitative Reuslts

Installation

Install PyTorch from http://pytorch.org with Python 2 and CUDA 8.0
NEW Add the LS-GAN objective to improve the performance
- Usage: add --gan LS option during training (see below for more details)
PyTorch 0.4 with Python 3 and CUDA 8.0
- Usage: replace the training and evaluation codes with the ones in the pytorch_0.4 folder
- Update: tensorboard is provided by adding --tensorboard in the command
- Note: the single-level model works as expected, while the multi-level model requires smaller weights, e.g., --lambda-adv-target1 0.00005 --lambda-adv-target2 0.0005. We will investigate this issue soon.
Clone this repo

git clone https://github.com/wasidennis/AdaptSegNet
cd AdaptSegNet

Dataset

Download the GTA5 Dataset as the source domain, and put it in the data/GTA5 folder
Download the Cityscapes Dataset as the target domain, and put it in the data/Cityscapes folder

Pre-trained Models

Please find our-pretrained models using ResNet-101 on three benchmark settings here
They include baselines (without adaptation and with feature adaptation) and our models (single-level and multi-level)

Testing

NEW Update results using LS-GAN and using Synscapes as the source domain
- Performance: check the appendix of the updated arXiv paper (updated on 10/17/2019)
- Pre-trained models
Download the pre-trained multi-level GTA5-to-Cityscapes model and put it in the model folder
Test the model and results will be saved in the result folder

python evaluate_cityscapes.py --restore-from ./model/GTA2Cityscapes_multi-ed35151c.pth

Or, test the VGG-16 based model Model Link

python evaluate_cityscapes.py --model DeeplabVGG --restore-from ./model/GTA2Cityscapes_vgg-ac4ac9f6.pth

Compute the IoU on Cityscapes (thanks to the code from VisDA Challenge)

python compute_iou.py ./data/Cityscapes/data/gtFine/val result/cityscapes

Training Examples

NEW Train the GTA5-to-Cityscapes model (single-level with LS-GAN)

python train_gta2cityscapes_multi.py --snapshot-dir ./snapshots/GTA2Cityscapes_single_lsgan \
                                     --lambda-seg 0.0 \
                                     --lambda-adv-target1 0.0 --lambda-adv-target2 0.01 \
                                     --gan LS

Train the GTA5-to-Cityscapes model (multi-level)

python train_gta2cityscapes_multi.py --snapshot-dir ./snapshots/GTA2Cityscapes_multi \
                                     --lambda-seg 0.1 \
                                     --lambda-adv-target1 0.0002 --lambda-adv-target2 0.001

Train the GTA5-to-Cityscapes model (single-level)

python train_gta2cityscapes_multi.py --snapshot-dir ./snapshots/GTA2Cityscapes_single \
                                     --lambda-seg 0.0 \
                                     --lambda-adv-target1 0.0 --lambda-adv-target2 0.001

Related Implementation and Dataset

Y.-H. Tsai, K. Sohn, S. Schulter, and M. Chandraker. Domain Adaptation for Structured Output via Discriminative Patch Representations. In ICCV, 2019. (Oral) [paper] [project] [Implementation Guidance]
W.-C. Hung, Y.-H Tsai, Y.-T. Liou, Y.-Y. Lin, and M.-H. Yang. Adversarial Learning for Semi-supervised Semantic Segmentation. In BMVC, 2018. [paper] [code]
Y.-H. Chen, W.-Y. Chen, Y.-T. Chen, B.-C. Tsai, Y.-C. Frank Wang, and M. Sun. No More Discrimination: Cross City Adaptation of Road Scene Segmenters. In ICCV 2017. [paper] [project]

Acknowledgment

This code is heavily borrowed from Pytorch-Deeplab.

Note

The model and code are available for non-commercial research purposes only.

10/2019: update performance and training/evaluation codes for using LS-GAN and Synscapes (especially thanks to Yan-Ting Liu for helping experiments)
01/2019: upate the training code for PyTorch 0.4
07/23/2018: update evaluation code for PyTorch 0.4
06/04/2018: update pretrained VGG-16 model
02/2018: code released

Comments

Couldn't reproduce the result reported in the paper

Hi, thanks for the code! I am trying to reproduce your result using your code, but I only get a mIoU of 40 after 45000 iterations. The result get even worse after 65000 iterations. I am wondering if it is very sensitive to how many iterations to train? Also is there other things to mind using your code?

opened by YifeiAI 23

Bug on deeplab classifier model

class Classifier_Module(nn.Module):
    def __init__(self, inplanes, dilation_series, padding_series, num_classes):
        super(Classifier_Module, self).__init__()
        self.conv2d_list = nn.ModuleList()
        for dilation, padding in zip(dilation_series, padding_series):
            self.conv2d_list.append(
                nn.Conv2d(inplanes, num_classes, kernel_size=3, stride=1, padding=padding, dilation=dilation, bias=True))

        for m in self.conv2d_list:
            m.weight.data.normal_(0, 0.01)

    def forward(self, x):
        out = self.conv2d_list[0](x)
        for i in range(len(self.conv2d_list) - 1):
            out += self.conv2d_list[i + 1](x)
            return out

Why is the “return out " statement on the for loop? In my understanding, out is the sum of the pyramid layer , so it should be outside of the for loop.

opened by lianqing11 15

About baseline result with res101-deeplabv2

Hi @wasidennis, I use your provided res101 backbone and train deeplab v2 on GTA as your code, Then I evaluate the model on cityscapes, but I only got 31.1 which is far from your baseline result of 36.6. Did I miss anything? Looking forward to your reply.

opened by shanyuhu 7
Training takes a long time?
Hi,

Been trying to replicate the gta -> cityscapes results on an AWS instance (p2.xlarge). I am using docker with cuda 8.0 and pytorch 0.4.1. Running the following:

python train_gta2cityscapes_multi.py --snapshot-dir ./snapshots/GTA2Cityscapes_single_lsgan \ --lambda-seg 0.0 \ --lambda-adv-target1 0.0 --lambda-adv-target2 0.01 \ --gan LS

It takes about 5.7 seconds per iteration. Given that the model converges at 120k iterations, it's gonna take me more than a week to train it, which sounds insane. Is there something wrong here or are those the expected times?
opened by defqoon 6
RuntimeError: cuda runtime error (2) : out of memory

Hello. I tried to execute your code, but it gives out of memory error. I am working with NVIDIA Titan Xp, which has 12GB memory capacity. It seems that the input sizes of the source and target images are quite big. ((1280,720) and (1024,512))

Can I ask which GPU devices you use for training?

opened by wgchang 6
Understanding the weights for multi-level adversarial learning

Hello, It is a great work. I saw the readme file where you say the in pytorch-0.4 version the multi-level learning needs the weights for adv learning to be 0.0005 and 0.00005 respectively. I would like to know how you came up with this value ? I am asking because I am working on different dataset and it would be very helpful if I can understand how to specify the reasonable value for adv learning.

opened by AshStuff 5
Error when training using VGG-16 model

@wasidennis @m3phisto @hfslyc , i am trying to run the code (VGG source only). But when i run as you mentioned in previous thread, with --lambda-adv-target1 0 --lambda-adv-target2 i got this error:"train_gta2cityscapes_multi.py", line 311, in main pred1, pred2 = model(images) ValueError: not enough values to unpack (expected 2, got 1)`

Are there any modifications that have to be done for training source only (without) adaptation experiment? If yes, i would be pleased if you could share the code.

opened by alphjheon 5
How is the trend of loss changing?

I train my data using this method. The loss_adv1 is increased. loss_seg1 and loss_D1 is decreased. In this situation, should I make LAMBDA_ADV_TARGET1 larger ?

opened by Sunting78 5
RuntimeError: there are no graph nodes that require computing gradients

Hi, Thank you for sharing your code. I met a problem which is as follows:

**Traceback (most recent call last):

File "train_gta2cityscapes_multi.py", line 412, in main() File "train_gta2cityscapes_multi.py", line 304, in main loss.backward() File "/home/yaxing/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/autograd/variable.py", line 156, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/home/yaxing/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/autograd/init.py", line 98, in backward variables, grad_variables, retain_graph) RuntimeError: there are no graph nodes that require computing gradients**

Note I never change the code. During training, the error happens instead of the beginning. There are three times(iter:25281/25000, 8500/25000, 600/25000 ). The pytorch is 0.2.0_4 with python2.7.

please forgive me to bother you

opened by yaxingwang 5
feature space adaptation

Hi, thanks for the great work and the codes. In the original implementation, i only found the output space adaptation, since the both outputs from deeplab_multi are coming from the classifier layers and the upsampled for the later alignment. So if i want to implement the real feature space adaptation, i should take the feature maps right before the classifier layers, right? And also a question, the output of from layer5, in my opinion, is not shallow enough, would aligning the features coming from very shallow layers help the domain adaptation?

looking forward to your reply.

opened by jianingwangind 4
Train the model on other dataset.

Hello！ I have trained your model successfully recently on GTA5 and Cityscapes. And I want to try it on other dataset. I am a newbie in this domain and I met a problem. The new dataset labels are black-and-white images. I found that there maybe some problems in labels. So I want to know that what does "label2train" mean in json file? I really can not understand these numbers([0, 255], [1, 255], [2, 255], [3, 255], [4, 255], [5, 255], [6, 255], [7, 0], [8, 1], [9, 255], [10, 255], [11, 2], [12, 3], [13, 4], [14, 255], [15, 255], [16, 255], [17, 5], [18, 255], [19, 6], [20, 7], [21, 8], [22, 9], [23, 10], [24, 11], [25, 12], [26, 13], [27, 14], [28, 15], [29, 255], [30, 255], [31, 16], [32, 17], [33, 18], [-1, 255]).

And some of them is applied in the gta5_dataset.py in init. I really can not understand these numbers.
Maybe my problem is very stupid. I am really a newbie just want to learn more. Thank you very much!!

opened by JIANGbb95 4
About the numeric scale of the dicriminator loss

Dear author,

I was using this code for my project and I found the discriminator loss is further divided by 2. I am not very clear with GAN and could you please help explain why to divide 2? https://github.com/wasidennis/AdaptSegNet/blob/fca9ff0f09dab45d44bf6d26091377ac66607028/train_gta2cityscapes_multi.py#L362

Thanks a lot!

opened by Muming-Zhao 0
data augmentation

hello author,

Did you use the the mirror and scale augmentaions in the dataloader. There are arguments in the dataloader function, but have not been used ?

thank you

opened by soans1994 0

error when loading the pretrained model

when I load the pretrained model which is downloaded from

http://vllab.ucmerced.edu/ytsai/CVPR18/DeepLab_resnet_pretrained_init-f81d91e8.pth

then torch.load() will report an error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data1/xxx/software/anaconda3/envs/torch18/lib/python3.8/site-packages/torch/serialization.py", line 593, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/data1/xxx/software/anaconda3/envs/torch18/lib/python3.8/site-packages/torch/serialization.py", line 779, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 278891 more bytes. The file might be corrupted.

Is the online model incomplete？

opened by TomSheng21 0

Can you share a log file?

Thank you to share your amazing code!

I try to use deeper segmentation network. But after changing the segmentation backbone network, the adversarial loss diverges only after a few steps.

Can you share a training log file of single-level adaptation training for both VGG-16 & ResNet-101? Thank you!

opened by drumyseong 0
How much NUM_STEPS can be reduced to train the model well?

Hello i want to train model faster.
Could i decrease the NUM_STEPS to 25000? what are the effects of decreasing the NUM_STEPS ? I will be very grateful if you answer my questions sincerely

opened by farahnazmalekzadeh 0

Owner

Yi-Hsuan Tsai

GitHub

Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

BigGAN Audio Visualizer Description This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generat

2 Nov 21, 2022

Code Release for Learning to Adapt to Evolving Domains

EAML Code release for "Learning to Adapt to Evolving Domains" (NeurIPS 2020) Prerequisites PyTorch >= 0.4.0 (with suitable CUDA and CuDNN version) tor

23 Dec 7, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

Cross-media Structured Common Space for Multimedia Event Extraction Table of Contents Overview Requirements Data Quickstart Citation Overview The code

49 Nov 21, 2022

[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

178 Jan 6, 2023

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You

1.3k Dec 29, 2022

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

35 Oct 26, 2022

3D ResNets for Action Recognition (CVPR 2018)

3D ResNets for Action Recognition Update (2020/4/13) We published a paper on arXiv. Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,

3.5k Jan 6, 2023

StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

5.1k Jan 4, 2023

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

294 Dec 12, 2022

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

191 Dec 31, 2022

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

3 Jan 7, 2022

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

247 Dec 25, 2022

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

PLOP: Learning without Forgetting for Continual Semantic Segmentation This repository contains all of our code. It is a modified version of Cermelli e

116 Dec 14, 2022

Implementation of CVPR 2020 Dual Super-Resolution Learning for Semantic Segmentation

Dual super-resolution learning for semantic segmentation 2021-01-02 Subpixel Update Happy new year! The 2020-12-29 update of SISR with subpixel conv p

79 Nov 24, 2022

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

113 Dec 27, 2022

This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

Intro This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales Vehicle Sam

39 Jul 21, 2022

Point Cloud Denoising input segmentation output raw point-cloud valid/clear fog rain de-noised Abstract Lidar sensors are frequently used in environme

75 Nov 24, 2022

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

897 Jan 5, 2023

Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

Related tags

Overview

Learning to Adapt Structured Output Space for Semantic Segmentation

Paper

Example Results

Quantitative Reuslts

Installation

Dataset

Pre-trained Models

Testing

Training Examples

Related Implementation and Dataset

Acknowledgment

Note

Comments

**Traceback (most recent call last):

Owner

Yi-Hsuan Tsai

Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

Code Release for Learning to Adapt to Evolving Domains

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

3D ResNets for Action Recognition (CVPR 2018)

StarGAN - Official PyTorch Implementation (CVPR 2018)

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

Implementation of CVPR 2020 Dual Super-Resolution Learning for Semantic Segmentation

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

Point Cloud Denoising input segmentation output raw point-cloud valid/clear fog rain de-noised Abstract Lidar sensors are frequently used in environme

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers