This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Shikun Liu

Last update: Dec 20, 2022

Related tags

Overview

Auto-Lambda

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

We encourage readers to check out our project page, including more interesting discussions and insights which are not covered in our technical paper.

Multi-task Methods

We implemented all weighting and gradient-based baselines presented in the paper for computer vision tasks: Dense Prediction Tasks (for NYUv2 and CityScapes) and Multi-domain Classification Tasks (for CIFAR-100).

Specifically, we have covered the implementation of these following multi-task optimisation methods:

Weighting-based:

Equal - All task weightings are 1. --weight equal
Uncertainty - https://arxiv.org/abs/1705.07115 --weight uncert
Dynamic Weight Average - https://arxiv.org/abs/1803.10704 --weight dwa
Auto-Lambda - Our approach. --weight autol

Gradient-based:

GradDrop - https://arxiv.org/abs/2010.06808 --grad_method graddrop
PCGrad - https://arxiv.org/abs/2001.06782 --grad_method pcgrad
CAGrad - https://arxiv.org/abs/2110.14048 --grad_method cagrad

Note: Applying a combination of both weighting and gradient-based methods can further improve performance.

Datasets

We applied the same data pre-processing following our previous project: MTAN which experimented on:

NYUv2 [3 Tasks] - 13 Class Segmentation + Depth Estimation + Surface Normal. [288 x 384] Resolution.
CityScapes [3 Tasks] - 19 Class Segmentation + 10 Class Part Segmentation + Disparity (Inverse Depth) Estimation. [256 x 512] Resolution.

Note: We have included a new task: Part Segmentation for CityScapes dataset. The pre-processing file for CityScapes has also been included in the dataset folder.

Experiments

All experiments were written in PyTorch 1.7 and can be trained with different flags (hyper-parameters) when running each training script. We briefly introduce some important flags below.

Flag Name	Usage	Comments
`network`	choose multi-task network: `split, mtan`	both architectures are based on ResNet-50; only available in dense prediction tasks
`dataset`	choose dataset: `nyuv2, cityscapes`	only available in dense prediction tasks
`weight`	choose weighting-based method: `equal, uncert, dwa, autol`	only `autol` will behave differently when set to different primary tasks
`grad_method`	choose gradient-based method: `graddrop, pcgrad, cagrad`	`weight` and `grad_method` can be applied together
`task`	choose primary tasks: `seg, depth, normal` for NYUv2, `seg, part_seg, disp` for CityScapes, `all`: a combination of all standard 3 tasks	only available in dense prediction tasks
`with_noise`	toggle on to add noise prediction task for training (to evaluate robustness in auxiliary learning setting)	only available in dense prediction tasks
`subset_id`	choose domain ID for CIFAR-100, choose `-1` for the multi-task learning setting	only available in CIFAR-100 tasks
`autol_init`	initialisation of Auto-Lambda, default `0.1`	only available when applying Auto-Lambda
`autol_lr`	learning rate of Auto-Lambda, default `1e-4` for NYUv2 and `3e-5` for CityScapes	only available when applying Auto-Lambda

Training Auto-Lambda in Multi-task / Auxiliary Learning Mode:

python trainer_dense.py --dataset [nyuv2, cityscapes] --task [PRIMARY_TASK] --weight autol --gpu 0   # for NYUv2 or CityScapes dataset
python trainer_cifar.py --subset_id [PRIMARY_DOMAIN_ID] --weight autol --gpu 0   # for CIFAR-100 dataset

Training in Single-task Learning Mode:

python trainer_dense_single.py --dataset [nyuv2, cityscapes] --task [PRIMARY_TASK]  --gpu 0   # for NYUv2 or CityScapes dataset
python trainer_cifar_single.py --subset_id [PRIMARY_DOMAIN_ID] --gpu 0   # for CIFAR-100 dataset

Note: All experiments in the original paper were trained from scratch without pre-training.

Benchmark

For standard 3 tasks in NYUv2 (without dense prediction task) in the multi-task learning setting with Split architecture, please follow the results below.

Method	Sem. Seg. (mIOU)	Depth (aErr.)	Normal (mDist.)	Delta MTL
Single	43.37	52.24	22.40	-
Equal	44.64	43.32	24.48	+3.57%
DWA	45.14	43.06	24.17	+4.58%
GradDrop	45.39	43.23	24.18	+4.65%
PCGrad	45.15	42.38	24.13	+5.09%
Uncertainty	45.98	41.26	24.09	+6.50%
CAGrad	46.14	41.91	23.52	+7.05%
Auto-Lambda	47.17	40.97	23.68	+8.21%
Auto-Lambda + CAGrad	48.26	39.82	22.81	+11.07%

Note: The results were averaged across three random seeds. You should expect the error range less than +/-1%.

Citation

If you found this code/work to be useful in your own research, please considering citing the following:

@article{liu2022auto-lambda,
  title={Auto-Lambda: Disentangling Dynamic Task Relationships},
  author={Liu, Shikun and James, Stephen and Davison, Andrew J and Johns, Edward},
  journal={arXiv preprint arXiv:2202.03091},
  year={2022}
}

Acknowledgement

We would like to thank @Cranial-XIX for his clean implementation for gradient-based optimisation methods.

Contact

If you have any questions, please contact [email protected].

Comments

failed to download the NYU npy dataset

Hello, I tried many times to download the NYU npy file from the dropbox link, but it always failed in the last minutes.

I am confusing about the problem because the code can only support the npy format...may be there is some preprocess code for NYU?

With best regards

opened by buble-pie 9
损失函数为Nan

博主您好，非常感谢您的工作，近期我尝试用auto-lambda来优化自己的模型，因为自己的模型中存在一些无法求导的参数，因此我尝试将求取梯度改为以下部分： model_params = [ p for p in self.model.parameters() if p.requires_grad ] gradients = torch.autograd.grad(loss, shared_params,retain_graph=True,allow_unused=True) 但是这样就存在一些梯度层为None,我个人直接做了一个if判定，直接不操作这些为None的层，代码跑通了，但是训练过程中存在loss 指数上升最后为Nan的情况： 0%| | 1/11807 [01:52<369:46:10, 112.75s/it]tensor(31.6492, device='cuda:0', grad_fn=) 0%| | 2/11807 [01:53<259:48:03, 79.23s/it] tensor(408.9402, device='cuda:0', grad_fn=) 0%| | 3/11807 [01:54<182:54:59, 55.79s/it]tensor(43848.0703, device='cuda:0', grad_fn=) 0%| | 4/11807 [01:55<129:05:55, 39.38s/it]tensor(1.1228e+15, device='cuda:0', grad_fn=) 0%| | 5/11807 [01:56<91:24:46, 27.88s/it] tensor(nan, device='cuda:0', grad_fn=) 0%| | 6/11807 [01:58<65:00:37, 19.83s/it]tensor(nan, device='cuda:0', grad_fn=) 0%| | 7/11807 [01:59<46:34:15, 14.21s/it]tensor(nan, device='cuda:0', grad_fn=) 0%| | 8/11807 [02:00<33:45:58, 10.30s/it]tensor(nan, device='cuda:0', grad_fn=) 0%| | 9/11807 [02:01<24:43:29, 7.54s/it]tensor(nan, device='cuda:0', grad_fn=) 0%| | 10/11807 [02:02<18:18:33, 5.59s/it]Traceback (most recent call last): 想请教一下您，看看有没有什么建议

opened by raozhongyu 7
Applying a combination of both weighting and gradient-based methods can further improve performance？
As we know, weight-based methods search for different task weights, and task weights will act on the loss. Then the loss back-propagation will act on the gradient. In other words, weight-based methods have an effect on the gradient, and gradient-based methods also give different weights to each gradient. It looks like both weighting and gradient-based methods serve the same purpose.

I have three questions about the combination of both weighting and gradient-based methods :

What is the motivation for the combination of both weighting and gradient-based methods?

It looks like both weighting and gradient-based methods serve the same purpose. Why is it still necessary to combine both weighting and gradient-based methods?

And why the combination of both can further improve performance?

Thank you very much.
opened by puhan123 7
Question about the multi-task learning

Hi,

Thank you so much for sharing the code!

I am trying to reproduce your results and just wanted to double-check if the following command is for multi-task learning, not for auxiliary learning: python trainer_dense.py --network split --dataset nyuv2 --task all --weight autol --gpu 3 In other words, this command will give the result of Split Multi-Task Auto-lambda?

Thanks!

opened by heendung 6
batch size changes after update_metric

Hi, I have a small question here, why the batch size of the model output changes after this line:

https://github.com/lorenmt/auto-lambda/blob/24591b7ff0d4498b18bd4b4c85c41864bdfc800a/trainer_dense.py#L214

it seems before this line the batch is 4, then it comes to 3.

since I want to use the output of the model afterloss.backward() in the next epoch...so it becomes a problem if the batch size change.

would you kindly give me some idea on this?

With best regards

opened by buble-pie 5
关于compute_hessian函数的问题

您好我拜读了你的论文非常感谢您对多任务的贡献在此有几个问题希望得到您的解答在代码auto_lambda.py文件夹 compute_hessian函数中，1. 首先对 p += eps * d 后求导self.meta_weights权重，2. 然后 p -= 2 * eps * d，后再一次求导self.meta_weights，3. 最后p += eps * d 计算得到hessian = [(p - n) / (2. * eps) for p, n in zip(d_weight_p, d_weight_n)] （1）我不明白 p先加上eps * d 再减去2*eps * d，再加上eps * d 是不是相当于p没有变化？（2） d_model 是经过最重要的val_loss更新后的网络权重，我不明白p += eps * d这样做的意义？（3）因为我这块方向了解不深入，compute_hessian函数应该是核心算法，但是代码这段代码作用我没看明白，希望得到您的指导万分感谢

opened by E18301194 4
Normalisation of depth data and depth prediction performance

Hi,

Congratulations on a great paper :)

Thanks a lot for making your code open-source. I went through your code and it seems like you normalise the input RGB data to [-1, 1] scale whilst the depth data is normalised to [-1, max]. It was my understanding that for Cityscapes, the depth data would be normalised to [-1, 1] after using the map_disparity function and the RGB data normalised to ImageNet stats if for instance using pre-trained weights. Am I wrong?

I also have a general question about training depth prediction models from Cityscapes. I have tried various flavours of models (DeepLabV3, HRNet) and yet, training single-task depth prediction network seems to yield overly smooth depth maps with convergence of the loss occurring very soon in training regardless of the learning rate (1e-3, 1e-4 etc. for ADAM). For reference, the RGB data is normalised using ImageNet stats (using pre-trained encoders on ImageNet) and the depth data is either normalised to [-1, 1] or [-1, max] (using your disparity mapping functions)

I was wondering if you could comment based on your experience on the dataset? This would be very helpful. These same networks have been tested on the 19-class segmentation problem.

Many thanks

opened by fbragman 2

Owner

Shikun Liu

Ph.D. Student, The Dyson Robotics Lab at Imperial College.

GitHub https://shikun.io/projects/auto-lambda

Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

171 Jan 9, 2023

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance Project Page | Paper | Data This repository contains an implementatio

521 Dec 30, 2022

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

197 Nov 29, 2022

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

281 Dec 22, 2022

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Dynamic-Vision-Transformer (Pytorch) This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT). Not All Ima

210 Dec 18, 2022

Baselines for TrajNet++

TrajNet++ : The Trajectory Forecasting Framework PyTorch implementation of Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TrajNet

183 Jan 5, 2023

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

4.7k Jan 1, 2023

Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

2.3k Jan 1, 2023

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

139 Dec 28, 2022

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 7, 2022

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

37 Oct 30, 2022

This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

386 Nov 26, 2022

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

ATS About Source code of the paper Meta-learning with an Adaptive Task Scheduler. If you find this repository useful in your research, please cite the

16 Dec 26, 2022

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

190 Jan 3, 2023

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

728 Dec 28, 2022

This repository contains the code and models for the following paper.

DC-ShadowNet Introduction This is an implementation of the following paper DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised

65 Dec 27, 2022

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

2 Oct 5, 2022

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

2 Oct 5, 2022

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

18 Dec 29, 2022

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Related tags

Overview

Auto-Lambda

Multi-task Methods

Weighting-based:

Gradient-based:

Datasets

Experiments

Benchmark

Citation

Acknowledgement

Contact

Comments

failed to download the NYU npy dataset

损失函数为Nan

Applying a combination of both weighting and gradient-based methods can further improve performance？

Question about the multi-task learning

batch size changes after update_metric

关于compute_hessian函数的问题

Normalisation of depth data and depth prediction performance

Owner

Shikun Liu

Multi Task RL Baselines

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Baselines for TrajNet++

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

This repository contains the source code of our work on designing efficient CNNs for computer vision

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

This repository contains the code and models for the following paper.

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)