Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

Overview

The Stem Cell Hypothesis

Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

Installation

Run the following setup script. Feel free to install a GPU-enabled PyTorch (torch>=1.6.0).

python3 -m venv env
source env/bin/activate
ln -sf "$(which python2)" env/bin/python
pip install -e .

Data Pre-processing

Download OntoNotes 5 (LDC2013T19.tgz) and put it into the following directory:

mkdir -p ~/.elit/thirdparty/catalog.ldc.upenn.edu/LDC2013T19/
cp LDC2013T19.tgz ~/.elit/thirdparty/catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz

That's all. ELIT will automatically do the rest for you the first time you run the training script.

Experiments

Here we demonstrate how to experiment with BERT-base but feel free to replace the transformer and task name in the script path for other experiments. Our scripts are grouped by transformers and tasks with clear semantics.

Single Task Learning

The following script will train STL-POS with BERT-base and evaluate its performance on the test set:

python3 stem_cell_hypothesis/en_bert_base/single/pos.py

Multi-Task Learning

The following script will train MTL-5 with BERT-base and evaluate its performance on the test set:

python3 stem_cell_hypothesis/en_bert_base/joint/all.py

Pruning Experiments

The following script will train STL-POS-DP with BERT-base and evaluate its performance on the test set:

python3 stem_cell_hypothesis/en_bert_base/gate/pos.py

You can monitor the pruning process in real time via tensorboard:

tensorboard --logdir=data/model/mtl/ontonotes_bert_base_en/gated/pos/0/runs --samples_per_plugin images=1000

which will show how the heads gradually get claimed in http://localhost:6007/#images:

gates

Once 3 runs are finished, you can visualize the overlap of head utilization across runs via:

python3 stem_cell_hypothesis/en_bert_base/gate/vis_gate_overlap_rgb.py

which will generate the following figure (1a):

Similarly, Figure 1g is generated with stem_cell_hypothesis/en_bert_base/gate/vis_gate_overlap_tasks_gray.py.

15-models-average

Probing Experiments

Once a model is trained, you can probe its representations via the scripts in stem_cell_hypothesis/en_bert_base/head. For example, to probe STL-POS performance, run:

python3 stem_cell_hypothesis/en_bert_base/head/pos.py
python3 stem_cell_hypothesis/en_bert_base/head/vis/pos.py

which generates Figure 4:

pos-probe

You may need to manually change the path and update new results in the scripts.

To probe the unsupervised BERT performance for a single task, e.g., SRL, run:

python3 stem_cell_hypothesis/en_bert_base/head/srl_dot.py

which generates Figure 3:

srl-probe-static

Although not included in the paper due to page limitation, experiments of Chinese, BERT-large, ALBERT, etc. are uploaded to stem_cell_hypothesis. Feel free to run them for your interest.

Citation

If you use this repository in your research, please kindly cite our EMNLP2021 paper:

@inproceedings{he-choi-2021-stem,
    title = "The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders",
    author = "He, Han and Choi, Jinho D.",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.451",
    pages = "5555--5577",
    abstract = "Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis.",
}
You might also like...
Code to reproduce the experiments in the paper
Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

Related resources for our EMNLP 2021 paper Plan-then-Generate: Controlled Data-to-Text Generation via Planning

Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”
Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

GATER This repository contains the code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”. Our implementation is

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Aspect Sentiment Quad Prediction (ASQP) This repo contains the annotated data and code for our paper Aspect Sentiment Quad Prediction as Paraphrase Ge

codes for
codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation (EMNLP-2021 main conference) Contents Overview Background Quick to Use Furth

[CVPR 2021]
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.
Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Neural Magic Eye Preprint | Project Page | Colab Runtime Official PyTorch implementation of the preprint paper "NeuralMagicEye: Learning to See and Un

Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021

Learning Intents behind Interactions with Knowledge Graph for Recommendation This is our PyTorch implementation for the paper: Xiang Wang, Tinglin Hua

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection Acknowledgement We implement our model, BtcDet, based on [OpenPcdet 0.3.0]. Insta

Owner
Emory NLP
NLP Research Laboratory at Emory University
Emory NLP
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation This repo is the official implementation of "MHFormer: Multi-Hypothesis Transforme

Vegetabird 281 Jan 7, 2023
STEM: An approach to Multi-source Domain Adaptation with Guarantees

STEM: An approach to Multi-source Domain Adaptation with Guarantees Introduction This is the official implementation of ``STEM: An approach to Multi-s

null 5 Dec 19, 2022
Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

Improving evidential deep learning via multi task learning It is a repository of AAAI2022 paper, “Improving evidential deep learning via multi-task le

deargen 11 Nov 19, 2022
code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

null 75 Dec 16, 2022
Exploration-Exploitation Dilemma Solving Methods

Exploration-Exploitation Dilemma Solving Methods Medium article for this repo - HERE In ths repo I implemented two techniques for tackling mentioned t

Aman Mishra 6 Jan 25, 2022
Breaking the Dilemma of Medical Image-to-image Translation

Breaking the Dilemma of Medical Image-to-image Translation Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field

Kid Liet 86 Dec 21, 2022
Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Spatio-Temporal Entropy Model A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression. More details can

null 16 Nov 28, 2022
A Player for Kanye West's Stem Player. Sort of an emulator.

Stem Player Player Stem Player Player Usage Download the latest release here Optional: install ffmpeg, instructions here NOTE: DOES NOT ENABLE DOWNLOA

null 119 Dec 28, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System This repository contains the PyTorch im

Libo Qin 25 Sep 6, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Libo Qin 12 Sep 26, 2021