HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression
This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".
Requirements
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Download checkpoints
Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/
.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/
.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/
.
Prepare dataset
Download the GLUE dataset (containing MNLI) using the script in HERE, and put the files into ./dataset/glue/
. Download the Amazon Reviews dataset from HERE, and extract it into ./dataset/amazon_review/
Train the teacher model (BERT$_{\rm B}$-single) from single-domain
bash train_domain.sh
Distill the student model (BERT$_{\rm S}$) with TinyBERT-KD from single-domain
bash finetune_domain.sh
Train the teacher model (HRKD-teacher) from multi-domain
bash train_multi_domain.sh
And then put the checkpoints to the specified directories (see the beginning of finetune_multi_domain.py
for more details).
Distill the student model (BERT$_{\rm S}$) with our HRKD from multi-domain
bash finetune_multi_domain.sh
Reference
If you find this code helpful for your research, please cite the following paper.
@inproceedings{dong2021hrkd,
title = {{HRKD}: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression},
author = {Chenhe Dong and Yaliang Li and Ying Shen and Minghui Qiu},
booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2021}
}