Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Xin Wang

Last update: Oct 13, 2022

Related tags

Overview

Robust Object Detection via Instance-Level Temporal Cycle Confusion

This repo contains the implementation of the ICCV 2021 paper, Robust Object Detection via Instance-Level Temporal Cycle Confusion.

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real world applications. In this work, we study the effectiveness of auxiliary self-supervised tasks to improve out-of-distribution generalization of object detectors. Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level cycle confusion (CycConf), which operates on the region features of the object detectors. For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision. CycConf encourages the object detector to explore invariant structures across instances under various motion, which leads to improved model robustness in unseen domains at test time. We observe consistent out-of-domain performance improvements when training object detectors in tandem with self-supervised tasks on various domain adaptation benchmarks with static images (Cityscapes, Foggy Cityscapes, Sim10K) and large-scale video datasets (BDD100K and Waymo open data).

Installation

Environment

CUDA 10.2
Python >= 3.7
Pytorch >= 1.6
THe Detectron2 version matches Pytorch and CUDA versions.

Dependencies

Create a virtual env.

python3 -m pip install --user virtualenv
python3 -m venv cyc-conf
source cyc-conf/bin/activate

Install dependencies.

pip install -r requirements.txt
Install Pytorch 1.9

pip3 install torch torchvision

Check out the previous Pytorch versions here.

Install Detectron2 Build Detectron2 from Source (gcc & g++ >= 5.4) python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Or, you can install Pre-built detectron2 (example for CUDA 10.2, Pytorch 1.9)

python -m pip install detectron2 -f \ https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html

More details can be found here.

Data Preparation

BDD100K

Download the BDD100K MOT 2020 dataset (MOT 2020 Images and MOT 2020 Labels) and the detection labels (Detection 2020 Labels) here and the detailed description is available here. Put the BDD100K data under datasets/ in this repo. After downloading the data, the folder structure should be like below:

├── datasets
│   ├── bdd100k
│   │   ├── images
│   │   │    └── track
│   │   │        ├── train
│   │   │        ├── val
│   │   │        └── test
│   │   └── labels
│   │        ├── box_track_20
│   │        │   ├── train
│   │        │   └── val
│   │        └── det_20
│   │            ├── det_train.json
│   │            └── det_val.json
│   ├── waymo

Convert the labels of the MOT 2020 data (train & val sets) into COCO format by running:

python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/val/ -o datasets/bdd100k/labels/track/bdd100k_mot_val_coco.json -m track
python3 datasets/bdd100k2coco.py -i datasets/bdd100k/labels/box_track_20/train/ -o datasets/bdd100k/labels/track/bdd100k_mot_train_coco.json -m track

Split the original videos into different domains (time of day). Run the following command:

python3 -m datasets.domain_splits_bdd100k

This script will first extract the domain attributes from the BDD100K detection set and then map them to the tracking set sequences. After the processing steps, you would see two additional folders domain_splits and per_seq under the datasets/bdd100k/labels/box_track_20. The domain splits of all attributes in BDD100K detection set can be found at datasets/bdd100k/labels/domain_splits.

Waymo

Download the Waymo dataset here. Put the Waymo raw data under datasets/ in this repo. After downloading the data, the folder structure should be like below:

├── datasets
│   ├── bdd100k
│   ├── waymo
│   │   └── raw

Convert the raw TFRecord data files into COCO format by running:

python3 -m datasets.waymo2coco

Note that this script takes a long time to run, be prepared to keep it running for over a day.

Convert the BDD100K dataset labels into 3 classes (originally 8). This needs to be done in order to match the 3 classes of the Waymo dataset. Run the following command:

python3 -m datasets.convert_bdd_3cls

Get Started

For joint training,

python3 -m tools.train_net --config-file [config_file] --num-gpus 8

For evaluation,

python3 -m tools.train_net --config-file [config_file] --num-gpus [num] --eval-only

This command will load the latest checkpoint in the folder. If you want to specify a different checkpoint or evaluate the pretrained checkpoints, you can run

python3 -m tools.train_net --config-file [config_file] --num-gpus [num] --eval-only MODEL.WEIGHTS [PATH_TO_CHECKPOINT]

Benchmark Results

Dataset Statistics

Dataset	Split	Seq	frames/seq.	boxes	classes
BDD100K Daytime	train	757	204	1.82M	8
	val	108	204	287K	8
BDD100K Night	train	564	204	895K	8
	val	71	204	137K	8
Waymo Open Data	train	798	199	3.64M	3
	val	202	199	886K	3

Out of Domain Evaluation

BDD100K Daytime to Night. The base detector is Faster R-CNN with ResNet-50.

Model	AP	AP50	AP75	APs	APm	APl	Config	Checkpoint
Faster R-CNN	17.84	31.35	17.68	4.92	16.15	35.56	link	link
+ Rotation	18.58	32.95	18.15	5.16	16.93	36.00	link	link
+ Jigsaw	17.47	31.22	16.81	5.08	15.80	33.84	link	link
+ Cycle Consistency	18.35	32.44	18.07	5.04	17.07	34.85	link	link
+ Cycle Confusion	19.09	33.58	19.14	5.70	17.68	35.86	link	link

BDD100K Night to Daytime.

Model	AP	AP50	AP75	APs	APm	APl	Config	Checkpoint
Faster R-CNN	19.14	33.04	19.16	5.38	21.42	40.34	link	link
+ Rotation	19.07	33.25	18.83	5.53	21.32	40.06	link	link
+ Jigsaw	19.22	33.87	18.71	5.67	22.35	38.57	link	link
+ Cycle Consistency	18.89	33.50	18.31	5.82	21.01	39.13	link	link
+ Cycle Confusion	19.57	34.34	19.26	6.06	22.55	38.95	link	link

Waymo Front Left to BDD100K Night.

Model	AP	AP50	AP75	APs	APm	APl	Config	Checkpoint
Faster R-CNN	10.07	19.62	9.05	2.67	10.81	18.62	link	link
+ Rotation	11.34	23.12	9.65	3.53	11.73	21.60	link	link
+ Jigsaw	9.86	19.93	8.40	2.77	10.53	18.82	link	link
+ Cycle Consistency	11.55	23.44	10.00	2.96	12.19	21.99	link	link
+ Cycle Confusion	12.27	26.01	10.24	3.44	12.22	23.56	link	link

Waymo Front Right to BDD100K Night.

Model	AP	AP50	AP75	APs	APm	APl	Config	Checkpoint
Faster R-CNN	8.65	17.26	7.49	1.76	8.29	19.99	link	link
+ Rotation	9.25	18.48	8.08	1.85	8.71	21.08	link	link
+ Jigsaw	8.34	16.58	7.26	1.61	8.01	18.09	link	link
+ Cycle Consistency	9.11	17.92	7.98	1.78	9.36	19.18	link	link
+ Cycle Confusion	9.99	20.58	8.30	2.18	10.25	20.54	link	link

Citation

If you find this repository useful for your publications, please consider citing our paper.

@article{wang2021robust,
  title={Robust Object Detection via Instance-Level Temporal Cycle Confusion},
  author={Wang, Xin and Huang, Thomas E and Liu, Benlin and Yu, Fisher and Wang, Xiaolong and Gonzalez, Joseph E and Darrell, Trevor},
  journal={International Conference on Computer Vision (ICCV)},
  year={2021}
}

Comments

Which Waymo dataset should I download？

Hi, I find that there are two datasets on Waymo official website, the motion dataset and the perception dataset. And I can not find raw dataset. Which one should I download for this experiment? Thank you

opened by jinweiisgreat 0
Can not reach the reported performance

Hi, thanks for this excellent work! I try to train the model using the config: R50_FPN_daytime_cycle_conf.yaml on BDD100k. However, the performance is only 18.126 AP which is the same as my retrained baseline (18.108 AP). There is a 1.0 AP gap with the reported performance in the paper.

opened by fanq15 1
Code for a conventional unsupervised domain adaptation in an image.

Thank you for your interesting work. It was very impressive to improve the robustness to domain shift through self-supervision.

In particular, I'm interested in conventional UDA settings (e.g. cityscapes -> foggy cityscapes, Sim10k -> cityscapes) that you applied rotation operation as a self-supervision task. Could you release the code for this, as well?

opened by natureyoo 3

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

107 Dec 20, 2022

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

52 Nov 25, 2022

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

[ICCV2021] Learning to Regress Bodies from Images using Differentiable Semantic Rendering Getting Started DSR has been implemented and tested on Ubunt

83 Nov 27, 2022

Ever felt tired after preprocessing the dataset, and not wanting to write any code further to train your model? Ever encountered a situation where you wanted to record the hyperparameters of the trained model and able to retrieve it afterward? Models Playground is here to help you do that. Models playground allows you to train your models right from the browser.

Models Playground 🗂️ Upload a Preprocessed Dataset 🌠 Choose whether to perform Classification or Regression 🦹 Enter the Dependent Variable ?

19 Dec 10, 2022

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

159 Dec 30, 2022

source code of “Visual Saliency Transformer” (ICCV2021)

Visual Saliency Transformer (VST) source code for our ICCV 2021 paper “Visual Saliency Transformer” by Nian Liu, Ni Zhang, Kaiyuan Wan, Junwei Han, an

89 Dec 21, 2022

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

103 Dec 14, 2022

Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021) This repository is the official PyTorc

139 Dec 29, 2022

Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Related tags

Overview

Robust Object Detection via Instance-Level Temporal Cycle Confusion

Installation

Environment

Dependencies

Data Preparation

Get Started

Benchmark Results

Dataset Statistics

Out of Domain Evaluation

Citation

You might also like...

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

source code of “Visual Saliency Transformer” (ICCV2021)

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Comments

Which Waymo dataset should I download？

Can not reach the reported performance

Code for a conventional unsupervised domain adaptation in an image.

Owner

Xin Wang

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.