Awesome Efficient PLM Papers
Must-read papers on improving efficiency for pre-trained language models.
The paper list is mainly mantained by Lei Li and Shuhuai Ren.
Knowledge Distillation
-
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter NeurIPS workshop
Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf [pdf] [project]
-
Patient Knowledge Distillation for BERT Model Compression EMNLP 2019
-
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models Preprint
Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova [pdf] [project]
-
TinyBERT: Distilling BERT for Natural Language Understanding Findings of EMNLP 2020
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu [pdf] [project]
-
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing EMNLP 2020
Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou [pdf] [project]
-
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers NeurIPS 2020
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou [pdf] [project]
-
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance EMNLP 2020
Jianquan Li, Xiaokang Liu, Honghong Zhao, Ruifeng Xu, Min Yang, Yaohong Jin [pdf] [project]
-
MixKD: Towards Efficient Distillation of Large-scale Language Models ICLR 2021
Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin [pdf]
-
Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains ACL-IJCNLP 2021
Haojie Pan, Chengyu Wang, Minghui Qiu, Yichang Zhang, Yaliang Li, Jun Huang [pdf]
-
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation ACL-IJCNLP 2021
Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh [pdf]
-
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor ACL-IJCNLP 2021
Xinyu Wang, Yong Jiang, Zhaohui Yan, Zixia Jia, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu [pdf] [project]
-
Weight Distillation: Transferring the Knowledge in Neural Network Parameters ACL-IJCNLP 2021
Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu [pdf]
-
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation ACL-IJCNLP 2021
Yuanxin Liu, Fandong Meng, Zheng Lin, Weiping Wang, Jie Zhou [pdf]
-
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Findings of ACL-IJCNLP 2021
Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei [pdf] [project]
-
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers Findings of ACL-IJCNLP 2021
Chuhan Wu, Fangzhao Wu, Yongfeng Huang [pdf]
Dynamic Early Exiting
-
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference ACL 2020
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin [pdf] [project]
-
FastBERT: a Self-distilling BERT with Adaptive Inference Time ACL 2020
Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju [pdf] [project]
-
The Right Tool for the Job: Matching Model and Instance Complexities ACL 2020
Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith [pdf] [project]
-
A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models NAACL 2021
Kaiyuan Liao, Yi Zhang, Xuancheng Ren, Qi Su, Xu Sun, Bin He [pdf] [project]
-
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade Preprint
Lei Li, Yankai Lin, Deli Chen, Shuhuai Ren, Peng Li, Jie Zhou, Xu Sun [pdf] [project]
-
Early Exiting BERT for Efficient Document Ranking SustaiNLP 2020
Ji Xin, Rodrigo Nogueira, Yaoliang Yu, and Jimmy Lin [pdf] [project]
-
BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression EACL 2021
Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin [pdf] [project]
-
Accelerating BERT Inference for Sequence Labeling via Early-Exit ACL 2021
Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang [pdf] [project]
-
BERT Loses Patience: Fast and Robust Inference with Early Exit NeurIPS 2020
Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei [pdf] [project]
-
Early Exiting with Ensemble Internal Classifiers Preprint
Tianxiang Sun, Yunhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu [pdf]
Quantization
-
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT AAAI 2020
Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer [pdf] [project]
-
TernaryBERT: Distillation-aware Ultra-low Bit BERT EMNLP 2020
Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu [pdf] [project]
-
Q8BERT: Quantized 8Bit BERT NeurIPS 2019 Workshop
Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat [pdf] [project]
-
BinaryBERT: Pushing the Limit of BERT Quantization EMNLP 2020
Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King [pdf] [project]
-
I-BERT: Integer-only BERT Quantization ICML 2021
Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer [pdf] [project]
Pruning
-
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned ACL 2019
Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov [pdf] [project]
-
Are Sixteen Heads Really Better than One? NeurIPS 2019
-
The Lottery Ticket Hypothesis for Pre-trained BERT Networks NeurIPS 2020
Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, Michael Carbin [pdf] [project]
-
Movement Pruning: Adaptive Sparsity by Fine-Tuning NeurIPS 2020
-
Reducing Transformer Depth on Demand with Structured Dropout Preprint
Angela Fan, Edouard Grave, Armand Joulin [pdf]
-
When BERT Plays the Lottery, All Tickets Are Winning EMNLP 2020
-
Structured Pruning of a BERT-based Question Answering Model Preprint
J.S. McCarley, Rishav Chakravarti, Avirup Sil [pdf]
-
Structured Pruning of Large Language Models EMNLP 2020
-
Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm NAACL 2021
Dongkuan Xu, Ian E.H. Yen, Jinxi Zhao, Zhibin Xiao [pdf]
-
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization ACL 2021
Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen [pdf] [project]
Contribution
If you find any related work not included in the list, do not hesitate to raise a PR to help us complete the list.