Official Implementation of "DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization."



Code for AAAI 2022 paper: DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization.

Pre-trained Models

We release two versions of pre-trained models.

  • DialogLM is based on UniLMv2. According to whether sparse attention is introduced, it can be divided into two different versions to process dialogs of different lengths.
  • DialogLED builds on Longformer-Encoder-Decoder (LED) architecture and uses window-based denoising as the pre-training task on a large amount of long dialogue data for further training. You can use its base version and large version directly through HuggingFace.


Please download the five datasets we used in our paper here (AMI, ICSI, QMSum, ForeverDreaming, TVMegaSite).

Finetuning for Downstream Tasks

Please go to specific folders to apply them to downstream tasks related to long dialogues.


  • URLs for unilm vocab.txt are deprecated

    The vocab.txt links in are deprecated. Need to update the latest checkpoints links according to [link].(

            'unilm-large-cased': "",
            'unilm-base-cased': "",
            'unilm1-large-cased': "",
            'unilm1-base-cased': "",
            'unilm1.2-base-uncased': ""

    Thus, links in,,, need to be updated as well.

    opened by TheBestHu 0
  • Pre-trained and fine-tuned models not generating dialogue summaries.

    Hello, I'm trying to generate summaries on dialogues using the pre-trained model (MingZhong/DialogLED-large-5120) as well as a fine-tuned one (on AMI dataset). It does not produce any meaningful summaries, but instead it just replicates a part of the dialogue. I'm using an adaptation of AllenAI (allenai/led-large-16384-arxiv) and the Huggingface LEDTokenizer and LEDForConditionalGeneration scripts:

    LONG_DIALOGUE = """ Here goes my long dialogue."""" import torch from transformers import LEDTokenizer, LEDForConditionalGeneration

    tokenizer = LEDTokenizer.from_pretrained("AMI_DialogLED_large/") input_ids = tokenizer(LONG_DIALOGUE, return_tensors="pt")"cuda") global_attention_mask = torch.zeros_like(input_ids) global_attention_mask[:, 0] = 1

    model = LEDForConditionalGeneration.from_pretrained("AMI_DialogLED_large/", return_dict_in_generate=True).to("cuda") sequences = model.generate(input_ids, global_attention_mask=global_attention_mask).sequences

    summary = tokenizer.batch_decode(sequences) print(summary)

    With this code the output is a gibberish dialogue, not an actual summary as shown in your model outputs. Is there another way to generate summaries, or should the model be used in a different way?



    opened by dafraile 0
  • hyperparameters to replicate experiments

    Hi! Thank you for sharing the dialogLM and dialogLED implementation. I wonder if it's possible to release the hyperparameters used for ForeverDreaming and TVMegaSite. I see the DialogLM_UniLM folder does use TVMegaSite as an example and provided the hyperparameters. Does fine-tuning on ForeverDreaming use slightly different parameters? In particular, knowing the num_training_steps and batchsize will be super useful for us to replicate the results presented in the paper.

    opened by Bobby-Hua 0
  • error while running the script

    ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 1199) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/lib/python3.7/", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/", line 193, in main() File "/usr/local/lib/python3.7/dist-packages/torch/distributed/", line 189, in main launch(args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/", line 174, in launch run(args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/", line 755, in run )(*cmd_args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/", line 247, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

    opened by manish142000 2
  • how DialogLM is able process 5,120 tokens without using hybrid attention approach?

    As mentioned in the research paper -

    DIALOGLM is obtained by further pre-training UNILMbase with the window-based denoising method. Its maximum input length is 5,120 and the tokens exceeding this length is truncated in the experiments. DIALOGLM-sparse additionally introduces the hybrid attention approach in the pre-training process of DIALOGLM, so its maximum length is increased to 8,192 tokens.

    I am not understanding how DialogLM is able process 5,120 tokens without using hybrid attention approach? Since it is using UniLM V2 as backbone the max tokens it can process should be 512 right?

    opened by Akshayextreme 0
