Cross-Task Consistency Learning Framework for Multi-Task Learning

Aki Nakano

Last update: Jan 8, 2022

Related tags

Deep Learning xtask_mt

Overview

Cross-Task Consistency Learning Framework for Multi-Task Learning

Tested on

numpy(v1.19.1)
opencv-python(v4.4.0.42)
torch(v1.7.0)
torchvision(v0.8.0)
tqdm(v4.48.2)
matplotlib(v3.3.1)
seaborn(v0.11.0)
pandas(v.1.1.2)

Data

Cityscapes (CS)

Download Cityscapes dataset and put it in a subdirectory named ./data/cityscapes. The folder should have the following subfolders:

RGB image in folder leftImg8bit
Segmentation in folder gtFine
Disparity maps in folder disparity

NYU

We use the preprocessed NYUv2 dataset provided by this repo. Download the dataset and put it in the dataset folder in ./data/nyu.

Model

The model consists of one encoder (ResNet) and two decoders, one for each task. The decoders outputs the predictions for each task ("direct predictions"), which are fed to the TaskTransferNet.
The objective of the TaskTranferNet is to predict the other task given a prediction image as an input (Segmentation prediction -> Depth prediction, vice versa), which I refer to as "transferred predictions"

Loss function

When computing the losses, the direct predictions are compared with the target while the transferred predictions are compared with the direct predictions so that they "align themselves".
The total loss consists of 4 different losses:

direct segmentation loss: CrossEntropyLoss()
direct depth loss: L1() or MSE() or logL1() or SmoothL1()
transferred segmentation loss:
CrossEntropyLoss() or KLDivergence()
transferred depth loss: L1() or SSIM()

* Label smoothing: To "smooth" the one-hot probability by taking some of the probability from the correct class and distributing it among other classes.
* SSIM: Structural Similarity Loss

Flags

The flags are the same for both datasets. The flags and its usage are as written below,

Flag Name	Usage	Comments
`input_path`	Path to dataset	default is `data/cityscapes` (CS) or `data/nyu` (NYU)
`height`	height of prediction	default: 128 (CS) or 288 (NYU)
`width`	width of prediction	default: 256 (CS) or 384 (NYU)
`epochs`	# of epochs	default: 250 (CS) or 100 (NYU)
`enc_layers`	which encoder to use	default: 34, can choose from 18, 34, 50, 101, 152
`use_pretrain`	toggle on to use pretrained encoder weights	available for both datasets
`batch_size`	batch size	default: 8 (CS) or 6 (NYU)
`scheduler_step_size`	step size for scheduler	default: 80 (CS) or 60 (NYU), note that we use StepLR
`scheduler_gamma`	decay rate of scheduler	default: 0.5
`alpha`	weight of adding transferred depth loss	default: 0.01 (CS) or 0.0001 (NYU)
`gamma`	weight of adding transferred segmentation loss	default: 0.01 (CS) or 0.0001 (NYU)
`label_smoothing`	amount of label smoothing	default: 0.0
`lp`	loss fn for direct depth loss	default: L1, can choose from L1, MSE, logL1, smoothL1
`tdep_loss`	loss fn for transferred depth loss	default: L1, can choose from L1 or SSIM
`tseg_loss`	loss fn for transferred segmentation loss	default: cross, can choose from cross or kl
`batch_norm`	toggle to enable batch normalization layer in TaskTransferNet	slightly improves segmentation task
`wider_ttnet`	toggle to double the # of channels in TaskTransferNet
`uncertainty_weights`	toggle to use uncertainty weights (Kendall, et al. 2018)	we used this for best results
`gradnorm`	toggle to use GradNorm (Chen, et al. 2018)

Training

Cityscapes

For the Cityscapes dataset, there are two versions of segmentation task, which are 7-classes task and 19-classes task (Use flag 'num_classes' to switch tasks, default is 7).
So far, the results show near-SOTA for 7-class segmentation task + depth estimation.

ResNet34 was used as the encoder, L1() for direct depth loss and CrossEntropyLoss() for transferred segmentation loss.
The hyperparameter weights for both transferred predictions were 0.01.
I used Adam as my optimizer with an initial learning rate of 0.0001 and trained for 250 epochs with batch size 8. The learning rate was halved every 80 epochs.

To reproduce the code, use the following:

python main_cross_cs.py --uncertainty_weights

NYU

Our results show SOTA for NYU dataset.

ResNet34 was used as the encoder, L1() for direct depth loss and CrossEntropyLoss() for transferred segmentation loss.
The hyperparameter weights for both transferred predictions were 0.0001.
I used Adam as my optimizer with an initial learning rate of 0.0001 and trained for 100 epochs with batch size 6. The learning rate was halved every 60 epochs.

To reproduce the code, use the following:

python main_cross_nyu.py --uncertainty_weights

Comparisons

Evaluation metrics are the following:

Segmentation

Pixel accuracy (Pix Acc): percentage of pixels with the correct label
mIoU: mean Intersection over Union

Depth

Absolute Error (Abs)
Absolute Relative Error (Abs Rel): Absolute error divided by ground truth depth

The results are the following:

Cityscapes

Models	mIoU	Pix Acc	Abs	Abs Rel
MTAN	53.04	91.11	0.0144	33.63
KD4MTL	52.71	91.54	0.0139	27.33
PCGrad	53.59	91.45	0.0171	31.34
AdaMT-Net	62.53	94.16	0.0125	22.23
Ours	66.51	93.56	0.0122	19.40

NYU

Models	mIoU	Pix Acc	Abs	Abs Rel
MTAN*	21.07	55.70	0.6035	0.2472
MTAN†	20.10	53.73	0.6417	0.2758
KD4MTL*	20.75	57.90	0.5816	0.2445
KD4MTL†	22.44	57.32	0.6003	0.2601
PCGrad*	20.17	56.65	0.5904	0.2467
PCGrad†	21.29	54.07	0.6705	0.3000
AdaMT-Net*	21.86	60.35	0.5933	0.2456
AdaMT-Net†	20.61	58.91	0.6136	0.2547
Ours†	30.31	63.02	0.5954	0.2235

*: Trained on 3 tasks (segmentation, depth, and surface normal)
†: Trained on 2 tasks (segmentation and depth)
Italic: Reproduced by ourselves

Scores with models trained on 3 tasks for NYU dataset are shown only as reference.

Papers referred

MTAN: [paper][github]
KD4MTL: [paper][github]
PCGrad: [paper][github (tensorflow)][github (pytorch)]
AdaMT-Net: [paper]

⚡ Fast • 🪶 Lightweight • 0️⃣ Dependency • 🔌 Pluggable • 😈 TLS interception • 🔒 DNS-over-HTTPS • 🔥 Poor Man's VPN • ⏪ Reverse & ⏩ Forward • 👮🏿 "Proxy Server" framework • 🌐 "Web Server" framework • ➵ ➶ ➷ ➠ "PubSub" framework • 👷 "Work" acceptor & executor framework

Table of Contents Features Install Using PIP Stable version Development version Using Docker Stable version Development version Using HomeBrew Stable

2.2k Jan 8, 2023

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

28 Dec 30, 2022

Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Cross-Quality Labeled Faces in the Wild (XQLFW) Here, we release the database, evaluation protocol and code for the following paper: Cross Quality LFW

10 Dec 12, 2022

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

129 Dec 11, 2022

Cross-Task Consistency Learning Framework for Multi-Task Learning

Related tags

Overview

Cross-Task Consistency Learning Framework for Multi-Task Learning

Tested on

Data

Cityscapes (CS)

NYU

Model

Loss function

Flags

Training

Cityscapes

NYU

Comparisons

Cityscapes

NYU

Papers referred

You might also like...

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Learning Correspondence from the Cycle-consistency of Time (CVPR 2019)

Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

Consistency Regularization for Adversarial Robustness

Owner

Aki Nakano

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Implementation of "Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency"

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders