MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

Meta Research

Last update: Oct 10, 2022

Related tags

Deep Learning MetaBalance

Overview

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

Introduction

This repo contains the pytorch implementation of MetaBalance and an example main file to call MetaBalance:

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks
The Web Conference, 2022.
Yun He, Xue Feng, Cheng Cheng, Geng Ji, Yunsong Guo and James Caverlee.
Meta AI and Texas A&M University.
A majority of this work was done while the first author was interning at Meta AI.

In many personalized recommendation scenarios, the generalization ability of a target task can be improved via learning with additional auxiliary tasks alongside this target task on a multi-task network. However, this method often suffers from a serious optimization imbalance problem. On the one hand, one or more auxiliary tasks might have a larger influence than the target task and even dominate the network weights, resulting in worse recommendation accuracy for the target task. On the other hand, the influence of one or more auxiliary tasks might be too weak to assist the target task. More challenging is that this imbalance dynamically changes throughout the training process and varies across the parts of the same network. We propose a new method: MetaBalance to balance auxiliary losses via directly manipulating their gradients w.r.t the shared parameters in the multi-task network. Specifically, in each training iteration and adaptively for each part of the network, the gradient of an auxiliary loss is carefully reduced or enlarged to have a closer magnitude to the gradient of the target loss, preventing auxiliary tasks from being so strong that dominate the target task or too weak to help the target task. Moreover, the proximity between the gradient magnitudes can be flexibly adjusted to adapt MetaBalance to different scenarios. The experiments show that our proposed method achieves a significant improvement of 8.34% in terms of NDCG@10 upon the strongest baseline on two real-world datasets.

Acknowledgement

The technique of calculating the Moving Average of Gradient Magnitudes in this paper is learned from https://github.com/ItzikMalkiel/MTAdam. Th first author is Itzik Malkiel. Thanks to them!

Citation

TBD

License

See the LICENSE file for more details. The project is licensed under CC-BY-NC.

Comments

memory leak

Thanks for sharing, this is a simple and interesting way to use auxiliary losses.

When using it on a large dataset I get a memory leak, it uses up more and more cuda memory untill it crashes. I think this is because the graph is not cleared loss.backward(retain_graph=True).

The obvious next step is to clear the graph with a loss.backward(retain_graph=False) but I get an error that the variables have been modified (image blow). I assume this is an intentional or metabalance, but I can't find where, and I can't find a way to clear the graph manually.

Any tips?

opened by wassname 1
could you please provide the pre-processing code for the two dataset

as far as i know, differet behaviors in the two datasets (click, add to cart ..) do not appear in the same log, so how to construct the mtl task dataset ? thx for your answer

opened by neil-yc 0

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

13 Dec 10, 2022

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 4, 2023

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 8, 2023

5.7k Feb 12, 2021

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 7, 2022

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

Related tags

Overview

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

Introduction

Acknowledgement

Citation

License

You might also like...

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Comments

memory leak

could you please provide the pre-processing code for the two dataset

Owner

Meta Research

Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Code for the paper "Adapting Monolingual Models: Data can be Scarce when Language Similarity is High"

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

A PyTorch implementation of Learning to learn by gradient descent by gradient descent