Versatile Generative Language Model
This is the implementation of the paper:
Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning. Zhaojiang Lin, Andrea Madotto, Pascale Fung Findings of EMNLP 2020 [PDF]
If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below:
@article{lin2020exploring, title={Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning}, author={Lin, Zhaojiang and Madotto, Andrea and Fung, Pascale}, journal={arXiv preprint arXiv:2004.03829}, year={2020} }
Abstract
Fine-tuning pre-trained generative language models to down-stream language generation tasks have shown promising results. However, it comes with the cost of having a single, large, model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this work, we propose an effective way for fine-tuning multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments in five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.
Versatile Generative Language Model (VLM): Versatile Language Model (VLM) is composed of three components: a pre-trained language model back-bone (e.g., GPT-2), and two kinds of specialized parameters for each generation task such as low-rank residual adapters and task embeddings.
Dependency
Check the packages needed or simply run the command
❱❱❱ pip install -r requirements.txt
Experiments
Dataset
Download the preprocessed datasets
Reproducibility
We provide the trained checkpoint of our VLM.
Test model: choose one task from (mt, summarization, dialogue, qa, nlg].
❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path
Fine tune GPT-2
Train machine translation:
❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json
Test machine translation:
❱❱❱ python ./evaluate.py --task mt --no_sample --max_history=2 --model_checkpoint runs/$model_checkpoint
Check run.sh to run other tasks
VLM train Adapters and Task embeddings
Train machine translation without knowledge distillation
❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005
Train machine translation using sentence level knowledge distillation:
❱❱❱ python ./sentence_distiller.py --task mt --max_history=2 --model_checkpoint runs/$fully_finetuned_gpt2_checkpoint --no_sample
❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005 --distillation
Test machine traslation:
❱❱❱ python ./evaluate.py --task mt --no_sample --adapter_bottleneck 300 --model_checkpoint runs/$model_checkpoint
Check run.sh to run other tasks
Combine all the adapters and task embedding into single model
Line 68 of combine_all.py to provide the list of checkpoint
❱❱❱ python combine_all.py
Test to see if the result is same
❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path
The above scripts illustrate how to train VLM continuously when tasks arrive sequentially.
Multitask training VLM
When all the tasks available at the same time.
❱❱❱ python ./train_vlm.py --gradient_accumulation_steps=16 --train_batch_size=1 --valid_batch_size=1 --n_epochs 3
Acknowledgement
This repository is implemented base on Huggingface