This repository contains data and code for our EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation. Please contact me at [email protected] for any question.
Please cite this paper if you use our code or data.
@InProceedings{clads-emnlp,
author = "Laura Perez-Beltrachini and Mirella Lapata",
title = "Models and Datasets for Cross-Lingual Summarisation",
booktitle = "Proceedings of The 2021 Conference on Empirical Methods in Natural Language Processing ",
year = "2021",
address = "Punta Cana, Dominican Republic",
}
The XWikis Corpus
You can create the corpus using the instructions below. The original XWikis corpus is available at XWikis.
Instructions to re-create our corpus and extract other languages are available here.
Cross-lingual Summarisation Code
Our code is based on Fairseq and mBART/mBART50. You'll find our clone of Fairseq and the code extension to implement our models here and instructions to pre-process the data, and train and evaluate our models here.