This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Overview

Auto-Lambda

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

We encourage readers to check out our project page, including more interesting discussions and insights which are not covered in our technical paper.

Multi-task Methods

We implemented all weighting and gradient-based baselines presented in the paper for computer vision tasks: Dense Prediction Tasks (for NYUv2 and CityScapes) and Multi-domain Classification Tasks (for CIFAR-100).

Specifically, we have covered the implementation of these following multi-task optimisation methods:

Weighting-based:

Gradient-based:

Note: Applying a combination of both weighting and gradient-based methods can further improve performance.

Datasets

We applied the same data pre-processing following our previous project: MTAN which experimented on:

  • NYUv2 [3 Tasks] - 13 Class Segmentation + Depth Estimation + Surface Normal. [288 x 384] Resolution.
  • CityScapes [3 Tasks] - 19 Class Segmentation + 10 Class Part Segmentation + Disparity (Inverse Depth) Estimation. [256 x 512] Resolution.

Note: We have included a new task: Part Segmentation for CityScapes dataset. The pre-processing file for CityScapes has also been included in the dataset folder.

Experiments

All experiments were written in PyTorch 1.7 and can be trained with different flags (hyper-parameters) when running each training script. We briefly introduce some important flags below.

Flag Name Usage Comments
network choose multi-task network: split, mtan both architectures are based on ResNet-50; only available in dense prediction tasks
dataset choose dataset: nyuv2, cityscapes only available in dense prediction tasks
weight choose weighting-based method: equal, uncert, dwa, autol only autol will behave differently when set to different primary tasks
grad_method choose gradient-based method: graddrop, pcgrad, cagrad weight and grad_method can be applied together
task choose primary tasks: seg, depth, normal for NYUv2, seg, part_seg, disp for CityScapes, all: a combination of all standard 3 tasks only available in dense prediction tasks
with_noise toggle on to add noise prediction task for training (to evaluate robustness in auxiliary learning setting) only available in dense prediction tasks
subset_id choose domain ID for CIFAR-100, choose -1 for the multi-task learning setting only available in CIFAR-100 tasks
autol_init initialisation of Auto-Lambda, default 0.1 only available when applying Auto-Lambda
autol_lr learning rate of Auto-Lambda, default 1e-4 for NYUv2 and 3e-5 for CityScapes only available when applying Auto-Lambda

Training Auto-Lambda in Multi-task / Auxiliary Learning Mode:

python trainer_dense.py --dataset [nyuv2, cityscapes] --task [PRIMARY_TASK] --weight autol --gpu 0   # for NYUv2 or CityScapes dataset
python trainer_cifar.py --subset_id [PRIMARY_DOMAIN_ID] --weight autol --gpu 0   # for CIFAR-100 dataset

Training in Single-task Learning Mode:

python trainer_dense_single.py --dataset [nyuv2, cityscapes] --task [PRIMARY_TASK]  --gpu 0   # for NYUv2 or CityScapes dataset
python trainer_cifar_single.py --subset_id [PRIMARY_DOMAIN_ID] --gpu 0   # for CIFAR-100 dataset

Note: All experiments in the original paper were trained from scratch without pre-training.

Benchmark

For standard 3 tasks in NYUv2 (without dense prediction task) in the multi-task learning setting with Split architecture, please follow the results below.

Method Sem. Seg. (mIOU) Depth (aErr.) Normal (mDist.) Delta MTL
Single 43.37 52.24 22.40 -
Equal 44.64 43.32 24.48 +3.57%
DWA 45.14 43.06 24.17 +4.58%
GradDrop 45.39 43.23 24.18 +4.65%
PCGrad 45.15 42.38 24.13 +5.09%
Uncertainty 45.98 41.26 24.09 +6.50%
CAGrad 46.14 41.91 23.52 +7.05%
Auto-Lambda 47.17 40.97 23.68 +8.21%
Auto-Lambda + CAGrad 48.26 39.82 22.81 +11.07%

Note: The results were averaged across three random seeds. You should expect the error range less than +/-1%.

Citation

If you found this code/work to be useful in your own research, please considering citing the following:

@article{liu2022auto-lambda,
  title={Auto-Lambda: Disentangling Dynamic Task Relationships},
  author={Liu, Shikun and James, Stephen and Davison, Andrew J and Johns, Edward},
  journal={arXiv preprint arXiv:2202.03091},
  year={2022}
}

Acknowledgement

We would like to thank @Cranial-XIX for his clean implementation for gradient-based optimisation methods.

Contact

If you have any questions, please contact [email protected].

Comments
  • failed to download the NYU npy dataset

    failed to download the NYU npy dataset

    Hello, I tried many times to download the NYU npy file from the dropbox link, but it always failed in the last minutes.

    I am confusing about the problem because the code can only support the npy format...may be there is some preprocess code for NYU?

    With best regards

    opened by buble-pie 9
  • 损失函数为Nan

    损失函数为Nan

    博主您好,非常感谢您的工作,近期我尝试用auto-lambda来优化自己的模型,因为自己的模型中存在一些无法求导的参数,因此我尝试将求取梯度改为以下部分: model_params = [ p for p in self.model.parameters() if p.requires_grad ] gradients = torch.autograd.grad(loss, shared_params,retain_graph=True,allow_unused=True) 但是这样就存在一些梯度层为None,我个人直接做了一个if判定,直接不操作这些为None的层,代码跑通了,但是训练过程中存在loss 指数上升最后为Nan的情况: 0%| | 1/11807 [01:52<369:46:10, 112.75s/it]tensor(31.6492, device='cuda:0', grad_fn=) 0%| | 2/11807 [01:53<259:48:03, 79.23s/it] tensor(408.9402, device='cuda:0', grad_fn=) 0%| | 3/11807 [01:54<182:54:59, 55.79s/it]tensor(43848.0703, device='cuda:0', grad_fn=) 0%| | 4/11807 [01:55<129:05:55, 39.38s/it]tensor(1.1228e+15, device='cuda:0', grad_fn=) 0%| | 5/11807 [01:56<91:24:46, 27.88s/it] tensor(nan, device='cuda:0', grad_fn=) 0%| | 6/11807 [01:58<65:00:37, 19.83s/it]tensor(nan, device='cuda:0', grad_fn=) 0%| | 7/11807 [01:59<46:34:15, 14.21s/it]tensor(nan, device='cuda:0', grad_fn=) 0%| | 8/11807 [02:00<33:45:58, 10.30s/it]tensor(nan, device='cuda:0', grad_fn=) 0%| | 9/11807 [02:01<24:43:29, 7.54s/it]tensor(nan, device='cuda:0', grad_fn=) 0%| | 10/11807 [02:02<18:18:33, 5.59s/it]Traceback (most recent call last): 想请教一下您,看看有没有什么建议

    opened by raozhongyu 7
  • Applying a combination of both weighting and gradient-based methods can further improve performance?

    Applying a combination of both weighting and gradient-based methods can further improve performance?

    As we know, weight-based methods search for different task weights, and task weights will act on the loss. Then the loss back-propagation will act on the gradient. In other words, weight-based methods have an effect on the gradient, and gradient-based methods also give different weights to each gradient. It looks like both weighting and gradient-based methods serve the same purpose.

    I have three questions about the combination of both weighting and gradient-based methods :

    1. What is the motivation for the combination of both weighting and gradient-based methods?
    2. It looks like both weighting and gradient-based methods serve the same purpose. Why is it still necessary to combine both weighting and gradient-based methods?
    3. And why the combination of both can further improve performance?

    Thank you very much.

    opened by puhan123 7
  • Question about the multi-task learning

    Question about the multi-task learning

    Hi,

    Thank you so much for sharing the code!

    I am trying to reproduce your results and just wanted to double-check if the following command is for multi-task learning, not for auxiliary learning: python trainer_dense.py --network split --dataset nyuv2 --task all --weight autol --gpu 3 In other words, this command will give the result of Split Multi-Task Auto-lambda?

    image

    Thanks!

    opened by heendung 6
  • batch size changes after update_metric

    batch size changes after update_metric

    Hi, I have a small question here, why the batch size of the model output changes after this line:

    https://github.com/lorenmt/auto-lambda/blob/24591b7ff0d4498b18bd4b4c85c41864bdfc800a/trainer_dense.py#L214

    it seems before this line the batch is 4, then it comes to 3.

    since I want to use the output of the model afterloss.backward() in the next epoch...so it becomes a problem if the batch size change.

    would you kindly give me some idea on this?

    With best regards

    opened by buble-pie 5
  • 关于compute_hessian函数的问题

    关于compute_hessian函数的问题

    您好 我拜读了你的论文 非常感谢您对多任务的贡献 在此有几个问题希望得到您的解答 在代码auto_lambda.py文件夹 compute_hessian函数中,1. 首先对 p += eps * d 后求导self.meta_weights权重,2. 然后 p -= 2 * eps * d,后再一次求导self.meta_weights,3. 最后p += eps * d 计算得到hessian = [(p - n) / (2. * eps) for p, n in zip(d_weight_p, d_weight_n)] (1)我不明白 p先加上eps * d 再减去2*eps * d,再加上eps * d 是不是相当于p没有变化? (2) d_model 是经过最重要的val_loss更新后的网络权重,我不明白p += eps * d这样做的意义? (3)因为我这块方向了解不深入,compute_hessian函数应该是核心算法,但是代码这段代码作用我没看明白,希望得到您的指导 万分感谢

    opened by E18301194 4
  • Normalisation of depth data and depth prediction performance

    Normalisation of depth data and depth prediction performance

    Hi,

    Congratulations on a great paper :)

    Thanks a lot for making your code open-source. I went through your code and it seems like you normalise the input RGB data to [-1, 1] scale whilst the depth data is normalised to [-1, max]. It was my understanding that for Cityscapes, the depth data would be normalised to [-1, 1] after using the map_disparity function and the RGB data normalised to ImageNet stats if for instance using pre-trained weights. Am I wrong?

    I also have a general question about training depth prediction models from Cityscapes. I have tried various flavours of models (DeepLabV3, HRNet) and yet, training single-task depth prediction network seems to yield overly smooth depth maps with convergence of the loss occurring very soon in training regardless of the learning rate (1e-3, 1e-4 etc. for ADAM). For reference, the RGB data is normalised using ImageNet stats (using pre-trained encoders on ImageNet) and the depth data is either normalised to [-1, 1] or [-1, max] (using your disparity mapping functions)

    I was wondering if you could comment based on your experience on the dataset? This would be very helpful. These same networks have been tested on the 19-class segmentation problem.

    Many thanks

    opened by fbragman 2
Owner
Shikun Liu
Ph.D. Student, The Dyson Robotics Lab at Imperial College.
Shikun Liu
Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

Facebook Research 171 Jan 9, 2023
Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance Project Page | Paper | Data This repository contains an implementatio

Lior Yariv 521 Dec 30, 2022
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

Facebook Research 281 Dec 22, 2022
This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Dynamic-Vision-Transformer (Pytorch) This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT). Not All Ima

null 210 Dec 18, 2022
Baselines for TrajNet++

TrajNet++ : The Trajectory Forecasting Framework PyTorch implementation of Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TrajNet

VITA lab at EPFL 183 Jan 5, 2023
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 1, 2023
Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

Intelligent Systems Lab Org 2.3k Jan 1, 2023
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

null 10 Oct 7, 2022
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

THUNLP 37 Oct 30, 2022
This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

Sachin Mehta 386 Nov 26, 2022
Source code of the paper Meta-learning with an Adaptive Task Scheduler.

ATS About Source code of the paper Meta-learning with an Adaptive Task Scheduler. If you find this repository useful in your research, please cite the

Huaxiu Yao 16 Dec 26, 2022
null 190 Jan 3, 2023
This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

Erika Lu 728 Dec 28, 2022
This repository contains the code and models for the following paper.

DC-ShadowNet Introduction This is an implementation of the following paper DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised

AuAgCu 65 Dec 27, 2022
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 5, 2022
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 5, 2022
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

Wanyu Du 18 Dec 29, 2022