Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Visual Inference Lab @TU Darmstadt

Last update: Dec 21, 2022

Related tags

Deep Learning da-sac

Overview

Self-supervised Augmentation Consistency
for Adapting Semantic Segmentation

This repository contains the official implementation of our paper:

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation
Nikita Araslanov and Stefan Roth
To appear at CVPR 2021. [arXiv preprint]


We obtain state-of-the-art accuracy of adapting semantic segmentation by enforcing consistency across photometric and similarity transformations. We use neither style transfer nor adversarial training.

Contact: Nikita Araslanov fname.lname (at) visinf.tu-darmstadt.de

Installation

Requirements. To reproduce our results, we recommend Python >=3.6, PyTorch >=1.4, CUDA >=10.0. At least two Titan X GPUs (12Gb) or equivalent are required for VGG-16; ResNet-101 and VGG-16/FCN need four.

create conda environment:

conda create --name da-sac
source activate da-sac

install PyTorch >=1.4 (see PyTorch instructions). For example,

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

install the dependencies:

pip install -r requirements.txt

download data (Cityscapes, GTA5, SYNTHIA) and create symlinks in the ./data folder, as follows:

./data/cityscapes -> <symlink to Cityscapes>
./data/cityscapes/gtFine2/
./data/cityscapes/leftImg8bit/

./data/game -> <symlink to GTA>
./data/game/labels_cs
./data/game/images

./data/synthia  -> <symlink to SYNTHIA>
./data/synthia/labels_cs
./data/synthia/RGB

Note that all ground-truth label IDs (Cityscapes, GTA5 and SYNTHIA) should be converted to Cityscapes train IDs. The label directories in the above example (gtFine2, labels_cs) therefore refer not to the original labels, but to these converted semantic maps.

Training

Training from ImageNet initialisation proceeds in three steps:

Training the baseline (ABN)
Generating the weights for importance sampling
Training with augmentation consistency from the ABN baseline

1. Training the baseline (ABN)

Here the input are ImageNet models available from the official PyTorch repository. We provide the links to those models for convenience.

Backbone	Link
ResNet-101	resnet101-5d3b4d8f.pth (171M)
VGG-16	vgg16_bn-6c64b313.pth (528M)

By default, these models should be placed in ./models/pretrained/ (though configurable with MODEL.INIT_MODEL).

To run the training

bash ./launch/train.sh [gta|synthia] [resnet101|vgg16|vgg16fcn] base

where the first argument specifies the source domain, the second determines the network architecture. The third argument base instructs to run the training of the baseline.

If you would like to skip this step, you can use our pre-trained models:

Source domain: GTA5

Backbone	Arch.	IoU (val)	Link	MD5
ResNet-101	DeepLabv2	40.8	baseline_abn_e040.pth (336M)	`9fe17[...]c11fc`
VGG-16	DeepLabv2	37.1	baseline_abn_e115.pth (226M)	`d4ffc[...]ef755`
VGG-16	FCN	36.7	baseline_abn_e040.pth (1.1G)	`aa2e9[...]bae53`

Source domain: SYNTHIA

Backbone	Arch.	IoU (val)	Link	MD5
ResNet-101	DeepLabv2	36.3	baseline_abn_e090.pth (336M)	`b3431[...]d1a83`
VGG-16	DeepLabv2	34.4	baseline_abn_e070.pth (226M)	`3af24[...]5b24e`
VGG-16	FCN	31.6	baseline_abn_e040.pth (1.1G)	`5f457[...]e4b3a`

Tip: You can download these files (as well as the final models below) with tools/download_baselines.sh:

cp tools/download_baselines.sh snapshots/cityscapes/baselines/
cd snapshots/cityscapes/baselines/
bash ./download_baselines.sh

2. Generating weights for importance sampling

To generate the weights you need to

generate mask predictions with your baseline (see inference below);
run tools/compute_image_weights.py that reads in those predictions and counts the predictions per each class.

If you would like to skip this step, you can use our weights we computed for the ABN baselines above:

Backbone	Arch.	Source: GTA5	Source: SYNTHIA
ResNet-101	DeepLabv2	cs_weights_resnet101_gta.data	cs_weights_resnet101_synthia.data
VGG-16	DeepLabv2	cs_weights_vgg16_gta.data	cs_weights_vgg16_synthia.data
VGG-16	FCN	cs_weights_vgg16fcn_gta.data	cs_weights_vgg16fcn_synthia.data

Tip: The bash script data/download_weights.sh will download all these importance sampling weights in the current directory.

3. Training with augmentation consistency

To train the model with augmentation consistency, we use the same shell script as in step 1, but without the argument base:

bash ./launch/train.sh [gta|synthia] [resnet101|vgg16|vgg16fcn]

Make sure to specify your baseline snapshot with RESUME bash variable set in the environment (export RESUME=...) or directly in the shell script (commented out by default).

We provide our final models for download.

Source domain: GTA5

Backbone	Arch.	IoU (val)	IoU (test)	Link	MD5
ResNet-101	DeepLabv2	53.8	55.7	final_e136.pth (504M)	`59c16[...]5a32f`
VGG-16	DeepLabv2	49.8	51.0	final_e184.pth (339M)	`0accb[...]d5881`
VGG-16	FCN	49.9	50.4	final_e112.pth (1.6G)	`e69f8[...]f729b`

Source domain: SYNTHIA

Backbone	Arch.	IoU (val)	IoU (test)	Link	MD5
ResNet-101	DeepLabv2	52.6	52.7	final_e164.pth (504M)	`a7682[...]db742`
VGG-16	DeepLabv2	49.1	48.3	final_e164.pth (339M)	`c5b31[...]5fdb7`
VGG-16	FCN	46.8	45.8	final_e098.pth (1.6G)	`efb74[...]845cc`

Inference and evaluation

Inference

To run single-scale inference from your snapshot, use infer_val.py. The bash script launch/infer_val.sh provides an easy way to run the inference by specifying a few variables:

# validation/training set
FILELIST=[val_cityscapes|train_cityscapes] 
# configuration used for training
CONFIG=configs/[deeplabv2_vgg16|deeplab_resnet101|fcn_vgg16]_train.yaml
# the following 3 variables effectively specify the path to the snapshot
EXP=...
RUN_ID=...
SNAPSHOT=...
# the snapshot path is defined as
# SNAPSHOT_PATH=snapshots/cityscapes/${EXP}/${RUN_ID}/${SNAPSHOT}.pth

Evaluation

Please use the Cityscapes' official evaluation tool evalPixelLevelSemanticLabeling from Cityscapes scripts for evaluating your results.

Citation

We hope you find our work useful. If you would like to acknowledge it in your project, please use the following citation:

@inproceedings{Araslanov:2021:DASAC,
  title     = {Self-supervised Augmentation Consistency for Adapting Semantic Segmentation},
  author    = {Araslanov, Nikita and and Roth, Stefan},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
}

Comments

Problem of reproducing

Hi, thanks for your awesome work and sharing code.

For ResNet101-DeepLab on GTAV-to-Cityscapes UDA task, I have rerun your code with the provided pre-trained model and weights for importance sampling, however, the best mIoU on val_cityscpaes is only 50.4%, which is worse than the paper result of 53.8%.

Is the problem caused by 2 GPUs? I reproduced the experiment on 2 NVIDIA Tesla V100 with 32 GB memory. However, I have noticed that you set batch_target as 1 for each GPU in code, so the total batch size of target data is less than the setting in your environment with 4 GPUs.

If possible, can you provide the complete experiment logs? They will be helpful for me to debug. :-)

opened by super233 5
How do you think whether the knowledge distillation can bring improvement for UDA?

How do you think about the "4.3. Distillation to self-supervised model" in ProDA? The ProDA without knowledge distillation has a result similar to SAC, and how do you think whether knowledge distillation can bring enhancement for SAC?

opened by super233 2
Question about the frequency of updating momentum network

Thanks for sharing the code.

Have you investigate the influence of T (frequency of updating momentum network)? There's not corresponding ablation study in paper.

Counld you please report the result of setting with different T ?

opened by super233 2
How to evaluate model without using cityscapes scripts protocol

Hi, thanks for your contribution! I really like your paper. I am currently training your model in a slightly different setup (Cityscapes ---> BDD), and I see that your algorithm is already able to run an evaluation on all the validation dataloaders defined in datasets/__init__.py after finishing every epoch.

What I would like to do is to actually evaluate the resulting model on other datasets that do not participate in the training process. In order to do this, I am forced to load your model and checkpoint, and run an evaluation on the datasets that I want. Aside from generating the .txt files on the data folder, how can I run the evaluation without having to use the scripts of cityscapes?

opened by fabriziojpiva 1
freeze_bn

Hi, thanks for providing this great work!!

It seems that in the implementation of DeepLab with ResNet101, no matter freeze_bn is true or false it will call self._freeze_bn so the BN will be fixed. Should line 196 be commented out? https://github.com/visinf/da-sac/blob/b6f0a90085e46619b87fbaf854fea897b69de02e/models/deeplabv2.py#L190-L196

Thanks:)

opened by JNNNNYao 1
How to use custom dataset for training?

Hi,

Thanks for the super nice work!! I wonder what modifications I should do if I want to use my own dataset? Suppose I have a source domain dataset A with corresponding images/mask, and target domain dataset B with corresponding images/mask. What folder should I put them? Any advice will be appreciated! Thanks.

By the way, have you tried training the model in a relatively smaller dataset, e.g. 10K images? Would the model still achieve such good performance?

opened by yongshuo-Z 1
Unexcepted performance drop on GTA5 to Cityscapes (resnet101).
Hi, @arnike. Thanks for your marvelous work.

When I tried to reproduce the task GTA5 to Cityscpaes, there is an unexcepted performance drop (e.g., Epoch 96: 31.4 mIoU, Epoch 98: 32.1 mIoU) and detailed results are run-.-tag-logits_up_all_mIoU.csv. So far, the best result is 42.9 mIoU (Epoch 182), which lags behind the reported result in the paper.

And the best result of ABN stage is

Epoch >>> 94 <<< Averaging 19 classes: IoU: 0.388 Pr: 0.661 Re: 0.505 [0] Validation / val_cityscapes / Val: 0.23m

Any suggestion for this problem? Your help is highly appreciated.
opened by BinhuiXie 0
Re-initialization of the momentum

Hi,

In your code, there's an argument called reset_teacher, which re-initializes the momentum network denoted as self.slow_net. I would like to know when does reset_teacher = True, so that the momentum gets re-initialized to the state of the actual SegmentationNet denoted as self.backbone. Because from what I understood, the MomentumNet gets initialized from the beginning to be equal to SegmentationNet, then gets updated using the exponential moving average formula and never gets re-intialized.

Thank you very much !! :smiley:

opened by yasserben 0
RuntimeError: CUDA error

Try to train base model on GTAV, this error occured everytime on the 3rd or 4th(if I turned down the batchsize) epoch. Tracing back to the same line. We need some help here ^-^

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/disk1/hl/anaconda3/envs/da-sac/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/disk1/hl/da-sac/train.py", line 518, in main_worker score = time_call(trainer.validation, "Validation / {} / Val: ".format(val_set),
File "/disk1/hl/da-sac/train.py", line 498, in time_call val = func(*args, **kwargs) File "/disk1/hl/da-sac/train.py", line 378, in validation masks_all = eval_batch(batch) File "/disk1/hl/da-sac/train.py", line 358, in eval_batch loss, masks = step_func(epoch, batch, train=False, visualise=False) File "/disk1/hl/da-sac/train.py", line 151, in step losses_ret[key] = val.mean().item() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

opened by kandysoso 0
Seek help for some little changes

Is there any little trick or possible improvement like loss function can be used on your code? It's my coursework this term to try to optimize some work just on task GTAV->Cityscapes and I find yours. I tried some other focal loss function but failed. Maybe your team have some other ideas or solutoins just for GTAV->.

opened by Yuuhooow 0
Could you share the latex code of Figure 1 and Table 4?

Figure 1 and Table 4 in your paper are very well drawn. Could you share how you drew Figure 1 and Table 4 in your paper? Such as this part of the latex code?

opened by lanyx7 0

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Related tags

Overview

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

Installation

Training

1. Training the baseline (ABN)

2. Generating weights for importance sampling

3. Training with augmentation consistency

Inference and evaluation

Inference

Evaluation

Citation

Comments

Owner

Visual Inference Lab @TU Darmstadt

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Code for the paper "Adapting Monolingual Models: Data can be Scarce when Language Similarity is High"

Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Self-supervised Augmentation Consistency
for Adapting Semantic Segmentation