An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Related tags

Deep Learning SFA
Overview

Sequence Feature Alignment (SFA)

By Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-jun Zha, Yonggang Wen, and Dacheng Tao

This repository is an official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers, which is accepted to ACM MultiMedia 2021.

Introduction

TL; DR. We develop a domain adaptive object detection method SFA that is specialized for adaptive detection transformers. It contains a domain query-based feature alignment model and a token-wise feature alignment module for global and local feature alignment respectively, and a bipartite matching consistency loss for improving robustness.

SFA

Abstract. Detection transformers have recently shown promising object detection results and attracted increasing attention. However, how to develop effective domain adaptation techniques to improve its cross-domain performance remains unexplored and unclear. In this paper, we delve into this topic and empirically find that direct feature distribution alignment on the CNN backbone only brings limited improvements, as it does not guarantee domain-invariant sequence features in the transformer for prediction. To address this issue, we propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers. Technically, SFA consists of a domain query-based feature alignment (DQFA) module and a token-wise feature alignment (TDA) module. In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains. DQFA reduces the domain discrepancy in global feature representations and object relations when deploying in the transformer encoder and decoder, respectively. Meanwhile, TDA aligns token features in the sequence from both domains, which reduces the domain gaps in local and instance-level feature representations in the transformer encoder and decoder, respectively. Besides, a novel bipartite matching consistency loss is proposed to enhance the feature discriminability for robust object detection. Experiments on three challenging benchmarks show that SFA outperforms state-of-the-art domain adaptive object detection methods.

Main Results

The experimental results and model weights for Cityscapes to Foggy Cityscapes are shown below.

Model mAP mAP@50 mAP@75 mAP@S mAP@M mAP@L Log & Model
SFA-DefDETR 21.5 41.1 20.0 3.9 20.9 43.0 Google Drive
SFA-DefDETR-BoxRefine 23.9 42.6 22.5 3.8 21.6 46.7 Google Drive
SFA-DefDETR-TwoStage 24.1 42.5 22.8 3.8 22.0 48.1 Google Drive

Note:

  1. All models of SFA are trained with total batch size of 4.
  2. "DefDETR" means Deformable DETR (with R50 backbone).
  3. "BoxRefine" means Deformable DETR with iterative box refinement.
  4. "TwoStage" indicates the two-stage Deformable DETR variant.
  5. The original implementation is based on our internal codebase. There are slight differences in the released code are slight differences. For example, we only use the middle features output by the first encoder and decoder layers for hierarchical feature alignment, to reduce computational costs during training.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n sfa python=3.7 pip

    Then, activate the environment:

    conda activate sfa
  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

    For example, if your CUDA version is 9.2, you could install pytorch and torchvision as following:

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
  • Other requirements

    pip install -r requirements/requirements.txt
  • Logging using wandb (optional)

    pip install -r requirements/optional.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

We use the preparation of Cityscapes to Foggy Cityscapes adaptation as demonstration. Other domain adaptation benchmarks can be prepared in analog. Cityscapes and Foggy Cityscapes datasets can be downloaded from here. The annotations in COCO format can be obtained from here. Afterward, please organize the datasets and annotations as following:

[coco_path]
└─ cityscapes
   └─ leftImg8bit
      └─ train
      └─ val
└─ foggy_cityscapes
   └─ leftImg8bit_foggy
      └─ train
      └─ val
└─ CocoFormatAnnos
   └─ cityscapes_train_cocostyle.json
   └─ cityscapes_foggy_train_cocostyle.json
   └─ cityscapes_foggy_val_cocostyle.json

Training

As an example, we provide commands for training our SFA on a single node with 4 GPUs for weather adaptation.

Training SFA-DeformableDETR

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs_da/sfa_r50_deformable_detr.sh --wandb

Training SFA-DeformableDETR-BoxRefine

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs_da/sfa_r50_deformable_detr_plus_iterative_bbox_refinement.sh --wandb

Training SFA-DeformableDETR-TwoStage

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs_da/sfa_r50_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage.sh --wandb

Training Source-only DeformableDETR

Please refer to the source branch.

Evaluation

You can get the config file and pretrained model of SFA (the link is in "Main Results" session), then run following command to evaluate it on Foggy Cityscapes validation set:

<path to config file> --resume <path to pre-trained model> --eval

You can also run distributed evaluation by using ./tools/run_dist_launch.sh or ./tools/run_dist_slurm.sh.

Acknowledgement

This project is based on DETR and Deformable DETR. Thanks for their wonderful works. See LICENSE for more details.

Citing SFA

If you find SFA useful in your research, please consider citing:

@inproceedings{wang2021exploring ,
  title={Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers},
  author={Wen, Wang and Yang, Cao and Jing, Zhang and Fengxiang, He and Zheng-Jun, Zha and Yonggang, Wen and Dacheng, Tao},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}
Comments
  • Implementation of Consistency Loss

    Implementation of Consistency Loss

    Hello, so I noticed that there is a slight difference between the paper's fornulation of consistency loss and the actual implementation. Please correct me if I am wrong. As can be seen, the actual formulation are as follows:

    擷取1

    擷取

    However, I noticed that for the final consistency loss, the actual implementation uses a sum instead of an average across layers as indicated in equation 12 (weight_dict for this type of loss is 1 by defualt, therefore notion of average is perhaps not incorporated). As for the per layer consistency loss, the actual implementation uses an average over all M object queries instead of just the sum, as indicated in equation 13.

    擷取

    擷取1

    I am wondering what are the right formulations to follow? Thanks.

    opened by michaelku1 5
  • Question about the Paper

    Question about the Paper

    Hello, Thanks for your amazing work. I have a question about the paper where it explains why a consistency loss is used for constraining the decoder output predictions. It is mentioned in the paper that

    "Since no semantic label is available on the target domain, the object detector is prone to produce inaccurate matches between object queries and ground truth objects on the target domain."

    I don't quite I understand what it means by that. Since you don't have any labels for target data, how is it possible to do matching when you actually don't have the ground truths? Thanks.

    opened by michaelku1 3
  • About evaluation metric

    About evaluation metric

    I don't know your exact evaluation metric. Normally, IoU=0.50:0.95 is the official evaluation metric. But when I use the model with iterative and two-stage that you provided, the running result is far lower than that of the paper. If the evaluation metric only considers IoU=0.5,the result is close to your paper, even better.The following is the specific results:

    Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.241 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.425 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.228 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.038 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.220 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.481 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.180 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.315 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.341 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.098 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.336 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.625

    opened by Pandaxia8 2
  • Bad results in the task of city2city_foggy

    Bad results in the task of city2city_foggy

    Thanks for amazing work. I have a problem when i run the demo in the task of city2city_foggy, the map of the results is only 21% after 24 epochs, which is much lower than the result in the paper. It seems that the model didn't converge. Since I use only one GPU, i didn't use the distributed training mode. I simply ran main_da.py --hda 1 --cmt --with_box_refine --two_stage, and i changed the lr and lr backbone to 5e-5 and 5e-6 because my batch_size is 1. i also tried lr =1e-4 and lr backbone=1e-5, however it seems that the result is still bad. I wonder whether there is a pretrained model used in deformable detr.

    opened by pmz1997 2
  •     assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

    assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

    Thanks for amazing work. i have a problem when i run the demo in task of sim2city, assert (boxes1[:, 2:] >= boxes1[:, :2]).all() happened in function generalized_box_iou After reading the code i find that boxes1 is the predictd bbox from a MLP layer, which i think the above assertion may happen during early training time, and then break the training.

    assert (boxes1[:, 2:] >= boxes1[:, :2]).all() assert (boxes2[:, 2:] >= boxes2[:, :2]).all() The error info: assert (boxes1[:, 2:] >= boxes1[:, :2]).all() RuntimeError: CUDA error: device-side assert triggered

    opened by cwnuyangyan 2
  • object of type 'int' has no len()

    object of type 'int' has no len()

    I run this demo, but face a error in the line of 673 of deformable_detr.py, domain_levels += len(args.hda) args.hda is int, so Exception has occurred: TypeError object of type 'int' has no len()

    opened by cwnuyangyan 2
  • target domain dataset gt label?

    target domain dataset gt label?

    I saw that when preparing the dataset, the label JSON file of the training dataset in the target domain was also in the directory. Did you use the gt label of the training dataset in the target domain? Is this inconsistent with unsupervised training

    opened by zhouzheng1123 1
  • Why is the class number for bdd100k 9 ?

    Why is the class number for bdd100k 9 ?

    Why is the class number for bdd100k 9? which is the same as cityscape to foggy cityscape. In fact, the bdd100k results are evaluated on 7 classes, it should be 8, right? This makes me confused.

    Thanks a lot.

    opened by kinredon 1
  • about json labels

    about json labels

    Hello, I would like to ask, I want to test my own data set, but the tags are ALL XML, I don't know how to package them into the JSON format you need, can your code directly read XML tags? Looking forward to your reply, please

    opened by wlc-git 0
  • About visualization

    About visualization

    Thanks for your wonderful work!

    I've recently been trying to visualize the distribution of the dataset by t-SNE(Refer toTransfer-Learning-Library , but the result is not good,I didn't find the relevant code in this project, can you tell me how to implemented it?

    opened by youarefree123 0
  • 你好!关于数据集位置问题

    你好!关于数据集位置问题

    作者您好! 请问数据集及所需要的标注JSON,是放在SFA同层目录吗?名称有要求吗? Hello, author. May I ask whether the data set and the required JSON files are placed in the same directory as SFA? And Is there a name requirement?

    opened by Chengyunlai 0
Owner
WangWen
WangWen
Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

null 22 Sep 22, 2022
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 8, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 5, 2023
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 8, 2023
Implementation of "Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency"

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency (ICCV2021) Paper Link: https://arxiv.org/abs/2107.11355 This implementation bui

null 32 Nov 17, 2022
Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter This is a pytorch-based implementation for paper Implicit Feature Alignme

wangtianwei 61 Nov 12, 2022
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

null 20 Oct 24, 2022
Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation by Lukas Hoyer, Dengxin Dai, and Luc Van Gool [Arxiv] [Paper] Overview Unsup

Lukas Hoyer 149 Dec 28, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

Deep CORAL A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation. B Sun, K Saenko, ECCV 2016' Deep CORAL can learn

Andy Hsu 200 Dec 25, 2022
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

CASIA-IVA-Lab 67 Dec 4, 2022
Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

null 87 Oct 19, 2022
Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection Main requirements torch >= 1.0 torchvision >= 0.2.0 Python 3 Environm

null 15 Apr 4, 2022
A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

University1652-Baseline [Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍] This

Zhedong Zheng 335 Jan 6, 2023
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 708 Dec 19, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022