Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Related tags

Deep Learning da-sac
Overview

Self-supervised Augmentation Consistency
for Adapting Semantic Segmentation

License PyTorch

This repository contains the official implementation of our paper:

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation
Nikita Araslanov and Stefan Roth
To appear at CVPR 2021. [arXiv preprint]

drawing

We obtain state-of-the-art accuracy of adapting semantic
segmentation by enforcing consistency across photometric
and similarity transformations. We use neither style transfer
nor adversarial training.

Contact: Nikita Araslanov fname.lname (at) visinf.tu-darmstadt.de


Installation

Requirements. To reproduce our results, we recommend Python >=3.6, PyTorch >=1.4, CUDA >=10.0. At least two Titan X GPUs (12Gb) or equivalent are required for VGG-16; ResNet-101 and VGG-16/FCN need four.

  1. create conda environment:
conda create --name da-sac
source activate da-sac
  1. install PyTorch >=1.4 (see PyTorch instructions). For example,
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
  1. install the dependencies:
pip install -r requirements.txt
  1. download data (Cityscapes, GTA5, SYNTHIA) and create symlinks in the ./data folder, as follows:
./data/cityscapes -> <symlink to Cityscapes>
./data/cityscapes/gtFine2/
./data/cityscapes/leftImg8bit/

./data/game -> <symlink to GTA>
./data/game/labels_cs
./data/game/images

./data/synthia  -> <symlink to SYNTHIA>
./data/synthia/labels_cs
./data/synthia/RGB

Note that all ground-truth label IDs (Cityscapes, GTA5 and SYNTHIA) should be converted to Cityscapes train IDs. The label directories in the above example (gtFine2, labels_cs) therefore refer not to the original labels, but to these converted semantic maps.

Training

Training from ImageNet initialisation proceeds in three steps:

  1. Training the baseline (ABN)
  2. Generating the weights for importance sampling
  3. Training with augmentation consistency from the ABN baseline

1. Training the baseline (ABN)

Here the input are ImageNet models available from the official PyTorch repository. We provide the links to those models for convenience.

Backbone Link
ResNet-101 resnet101-5d3b4d8f.pth (171M)
VGG-16 vgg16_bn-6c64b313.pth (528M)

By default, these models should be placed in ./models/pretrained/ (though configurable with MODEL.INIT_MODEL).

To run the training

bash ./launch/train.sh [gta|synthia] [resnet101|vgg16|vgg16fcn] base

where the first argument specifies the source domain, the second determines the network architecture. The third argument base instructs to run the training of the baseline.

If you would like to skip this step, you can use our pre-trained models:

Source domain: GTA5

Backbone Arch. IoU (val) Link MD5
ResNet-101 DeepLabv2 40.8 baseline_abn_e040.pth (336M) 9fe17[...]c11fc
VGG-16 DeepLabv2 37.1 baseline_abn_e115.pth (226M) d4ffc[...]ef755
VGG-16 FCN 36.7 baseline_abn_e040.pth (1.1G) aa2e9[...]bae53

Source domain: SYNTHIA

Backbone Arch. IoU (val) Link MD5
ResNet-101 DeepLabv2 36.3 baseline_abn_e090.pth (336M) b3431[...]d1a83
VGG-16 DeepLabv2 34.4 baseline_abn_e070.pth (226M) 3af24[...]5b24e
VGG-16 FCN 31.6 baseline_abn_e040.pth (1.1G) 5f457[...]e4b3a

Tip: You can download these files (as well as the final models below) with tools/download_baselines.sh:

cp tools/download_baselines.sh snapshots/cityscapes/baselines/
cd snapshots/cityscapes/baselines/
bash ./download_baselines.sh

2. Generating weights for importance sampling

To generate the weights you need to

  1. generate mask predictions with your baseline (see inference below);
  2. run tools/compute_image_weights.py that reads in those predictions and counts the predictions per each class.

If you would like to skip this step, you can use our weights we computed for the ABN baselines above:

Backbone Arch. Source: GTA5 Source: SYNTHIA
ResNet-101 DeepLabv2 cs_weights_resnet101_gta.data cs_weights_resnet101_synthia.data
VGG-16 DeepLabv2 cs_weights_vgg16_gta.data cs_weights_vgg16_synthia.data
VGG-16 FCN cs_weights_vgg16fcn_gta.data cs_weights_vgg16fcn_synthia.data

Tip: The bash script data/download_weights.sh will download all these importance sampling weights in the current directory.

3. Training with augmentation consistency

To train the model with augmentation consistency, we use the same shell script as in step 1, but without the argument base:

bash ./launch/train.sh [gta|synthia] [resnet101|vgg16|vgg16fcn]

Make sure to specify your baseline snapshot with RESUME bash variable set in the environment (export RESUME=...) or directly in the shell script (commented out by default).

We provide our final models for download.

Source domain: GTA5

Backbone Arch. IoU (val) IoU (test) Link MD5
ResNet-101 DeepLabv2 53.8 55.7 final_e136.pth (504M) 59c16[...]5a32f
VGG-16 DeepLabv2 49.8 51.0 final_e184.pth (339M) 0accb[...]d5881
VGG-16 FCN 49.9 50.4 final_e112.pth (1.6G) e69f8[...]f729b

Source domain: SYNTHIA

Backbone Arch. IoU (val) IoU (test) Link MD5
ResNet-101 DeepLabv2 52.6 52.7 final_e164.pth (504M) a7682[...]db742
VGG-16 DeepLabv2 49.1 48.3 final_e164.pth (339M) c5b31[...]5fdb7
VGG-16 FCN 46.8 45.8 final_e098.pth (1.6G) efb74[...]845cc

Inference and evaluation

Inference

To run single-scale inference from your snapshot, use infer_val.py. The bash script launch/infer_val.sh provides an easy way to run the inference by specifying a few variables:

# validation/training set
FILELIST=[val_cityscapes|train_cityscapes] 
# configuration used for training
CONFIG=configs/[deeplabv2_vgg16|deeplab_resnet101|fcn_vgg16]_train.yaml
# the following 3 variables effectively specify the path to the snapshot
EXP=...
RUN_ID=...
SNAPSHOT=...
# the snapshot path is defined as
# SNAPSHOT_PATH=snapshots/cityscapes/${EXP}/${RUN_ID}/${SNAPSHOT}.pth

Evaluation

Please use the Cityscapes' official evaluation tool evalPixelLevelSemanticLabeling from Cityscapes scripts for evaluating your results.

Citation

We hope you find our work useful. If you would like to acknowledge it in your project, please use the following citation:

@inproceedings{Araslanov:2021:DASAC,
  title     = {Self-supervised Augmentation Consistency for Adapting Semantic Segmentation},
  author    = {Araslanov, Nikita and and Roth, Stefan},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
}
Comments
  • Problem of reproducing

    Problem of reproducing

    Hi, thanks for your awesome work and sharing code.

    For ResNet101-DeepLab on GTAV-to-Cityscapes UDA task, I have rerun your code with the provided pre-trained model and weights for importance sampling, however, the best mIoU on val_cityscpaes is only 50.4%, which is worse than the paper result of 53.8%.

    Is the problem caused by 2 GPUs? I reproduced the experiment on 2 NVIDIA Tesla V100 with 32 GB memory. However, I have noticed that you set batch_target as 1 for each GPU in code, so the total batch size of target data is less than the setting in your environment with 4 GPUs.

    If possible, can you provide the complete experiment logs? They will be helpful for me to debug. :-)

    opened by super233 5
  • How do you think whether the knowledge distillation can bring improvement for UDA?

    How do you think whether the knowledge distillation can bring improvement for UDA?

    How do you think about the "4.3. Distillation to self-supervised model" in ProDA? The ProDA without knowledge distillation has a result similar to SAC, and how do you think whether knowledge distillation can bring enhancement for SAC?

    image

    opened by super233 2
  • Question about the frequency of updating momentum network

    Question about the frequency of updating momentum network

    Thanks for sharing the code.

    Have you investigate the influence of T (frequency of updating momentum network)? There's not corresponding ablation study in paper.

    Counld you please report the result of setting with different T ?

    opened by super233 2
  • How to evaluate model without using cityscapes scripts protocol

    How to evaluate model without using cityscapes scripts protocol

    Hi, thanks for your contribution! I really like your paper. I am currently training your model in a slightly different setup (Cityscapes ---> BDD), and I see that your algorithm is already able to run an evaluation on all the validation dataloaders defined in datasets/__init__.py after finishing every epoch.

    What I would like to do is to actually evaluate the resulting model on other datasets that do not participate in the training process. In order to do this, I am forced to load your model and checkpoint, and run an evaluation on the datasets that I want. Aside from generating the .txt files on the data folder, how can I run the evaluation without having to use the scripts of cityscapes?

    opened by fabriziojpiva 1
  • freeze_bn

    freeze_bn

    Hi, thanks for providing this great work!!

    It seems that in the implementation of DeepLab with ResNet101, no matter freeze_bn is true or false it will call self._freeze_bn so the BN will be fixed. Should line 196 be commented out? https://github.com/visinf/da-sac/blob/b6f0a90085e46619b87fbaf854fea897b69de02e/models/deeplabv2.py#L190-L196

    Thanks:)

    opened by JNNNNYao 1
  • How to use custom dataset for training?

    How to use custom dataset for training?

    Hi,

    Thanks for the super nice work!! I wonder what modifications I should do if I want to use my own dataset? Suppose I have a source domain dataset A with corresponding images/mask, and target domain dataset B with corresponding images/mask. What folder should I put them? Any advice will be appreciated! Thanks.

    By the way, have you tried training the model in a relatively smaller dataset, e.g. 10K images? Would the model still achieve such good performance?

    opened by yongshuo-Z 1
  • Unexcepted performance drop on GTA5 to Cityscapes (resnet101).

    Unexcepted performance drop on GTA5 to Cityscapes (resnet101).

    Hi, @arnike. Thanks for your marvelous work.

    When I tried to reproduce the task GTA5 to Cityscpaes, there is an unexcepted performance drop (e.g., Epoch 96: 31.4 mIoU, Epoch 98: 32.1 mIoU) and detailed results are run-.-tag-logits_up_all_mIoU.csv. So far, the best result is 42.9 mIoU (Epoch 182), which lags behind the reported result in the paper.

    And the best result of ABN stage is

    Epoch >>> 94 <<<
    Averaging 19 classes:
    IoU: 0.388
     Pr: 0.661
     Re: 0.505
    [0] Validation / val_cityscapes /  Val:  0.23m
    

    Any suggestion for this problem? Your help is highly appreciated.

    opened by BinhuiXie 0
  • Re-initialization of the momentum

    Re-initialization of the momentum

    Hi,

    In your code, there's an argument called reset_teacher, which re-initializes the momentum network denoted as self.slow_net. I would like to know when does reset_teacher = True, so that the momentum gets re-initialized to the state of the actual SegmentationNet denoted as self.backbone. Because from what I understood, the MomentumNet gets initialized from the beginning to be equal to SegmentationNet, then gets updated using the exponential moving average formula and never gets re-intialized.

    Thank you very much !! :smiley:

    opened by yasserben 0
  • RuntimeError: CUDA error

    RuntimeError: CUDA error

    Try to train base model on GTAV, this error occured everytime on the 3rd or 4th(if I turned down the batchsize) epoch. Tracing back to the same line. We need some help here ^-^

    -- Process 1 terminated with the following error: Traceback (most recent call last): File "/disk1/hl/anaconda3/envs/da-sac/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/disk1/hl/da-sac/train.py", line 518, in main_worker score = time_call(trainer.validation, "Validation / {} / Val: ".format(val_set),
    File "/disk1/hl/da-sac/train.py", line 498, in time_call val = func(*args, **kwargs) File "/disk1/hl/da-sac/train.py", line 378, in validation masks_all = eval_batch(batch) File "/disk1/hl/da-sac/train.py", line 358, in eval_batch loss, masks = step_func(epoch, batch, train=False, visualise=False) File "/disk1/hl/da-sac/train.py", line 151, in step losses_ret[key] = val.mean().item() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    opened by kandysoso 0
  • Seek help for some little changes

    Seek help for some little changes

    Is there any little trick or possible improvement like loss function can be used on your code? It's my coursework this term to try to optimize some work just on task GTAV->Cityscapes and I find yours. I tried some other focal loss function but failed. Maybe your team have some other ideas or solutoins just for GTAV->.

    opened by Yuuhooow 0
  •  Could you share the latex code of Figure 1 and Table 4?

    Could you share the latex code of Figure 1 and Table 4?

    Figure 1 and Table 4 in your paper are very well drawn. Could you share how you drew Figure 1 and Table 4 in your paper? Such as this part of the latex code?

    opened by lanyx7 0
Owner
Visual Inference Lab @TU Darmstadt
Visual Inference Lab @TU Darmstadt
Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CAC) Xin Lai*, Zhuotao Tian*, Li Jiang, Shu Liu, Hengshuang Zhao, Li

DV Lab 137 Dec 14, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

null 44 Dec 12, 2022
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

null 45 Dec 8, 2022
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

null 90 Dec 29, 2022
[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

CodingMan 45 Dec 12, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation The code of: Context Decoupling Augmentation for Weakly Supervised Semanti

null 54 Dec 12, 2022
Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

Self-Supervised-MVS This repository is the official PyTorch implementation of our AAAI 2021 paper: "Self-supervised Multi-view Stereo via Effective Co

hongbin_xu 127 Jan 4, 2023
EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections Ruiqi Zhong, Kristy Lee*, Zheng Zhang*, Dan Klein EMN

Ruiqi Zhong 42 Nov 3, 2022
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Mozhdeh Gheini 16 Jul 16, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 110 Dec 7, 2022
[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

TorchSemiSeg [CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision by Xiaokang Chen1, Yuhui Yuan2, Gang Zeng1, Jingdong Wang

Chen XiaoKang 387 Jan 8, 2023
Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

Transferable Semantic Augmentation for Domain Adaptation Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021) Paper

null 66 Dec 16, 2022
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

null 304 Jan 3, 2023
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
Code for the paper "Adapting Monolingual Models: Data can be Scarce when Language Similarity is High"

Wietse de Vries • Martijn Bartelds • Malvina Nissim • Martijn Wieling Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

Wietse de Vries 5 Aug 2, 2021