The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Overview

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization

CC BY 4.0

[Paper] accepted at the EMNLP 2021:

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization, by Tiezheng Yu *, Wenliang Dai *, Zihan Liu, Pascale Fung.

Paper Abstract

Multimodal abstractive summarization (MAS) models that summarize videos (vision modality) and their corresponding transcripts (text modality) are able to extract the essential information from massive multimodal data on the Internet. Recently, large-scale generative pre-trained language models (GPLMs) have been shown to be effective in text generation tasks. However, existing MAS models cannot leverage GPLMs' powerful generation ability. To fill this research gap, we aim to study two research questions: 1) how to inject visual information into GPLMs without hurting their generation ability; and 2) where is the optimal place in GPLMs to inject the visual information? In this paper, we present a simple yet effective method to construct vision guided (VG) GPLMs for the MAS task using attention-based add-on layers to incorporate visual information while maintaining their original text generation ability. Results show that our best model significantly surpasses the prior state-of-the-art model by 5.7 ROUGE-1, 5.3 ROUGE-2, and 5.1 ROUGE-L scores on the How2 dataset, and our visual guidance method contributes 83.6% of the overall improvement. Furthermore, we conduct thorough ablation studies to analyze the effectiveness of various modality fusion methods and fusion locations.

If you work is inspired by our paper or code, please cite it, thanks!

TODO

Evaluation

We release the generated summaries from different models in ./evaluation/results. All the evaluation metrics can be computed following ./evaluation/README.md.

Prepare dataset

You can go to How2 dataset Github to get the dataset. We recommend you to choose the (option 1): Download a pre-packaged version.

Run fine-tuning

  • make directory for saving lightning logs: mkdir lightning_logs
  • An example of running Bart text only model: ./scripts/Bart_text_only.sh
  • An example of running Bart multimodal model: ./scripts/Bart_multimodal.sh

Run inference

  • An example of running Bart multimodal model: ./scripts/test_Bart_multimodal.sh
Comments
  • Question about the data

    Question about the data

    Thank you guys for your amazing work. I am trying to reproduce your work, but when I try to train multi-modal model, it occured error. It seems that the error is about loading video extraction . I find that How2 provides several files for video feature, I wonder which file is used or did you preprocess these feature?

    opened by Fr0zenCrane 1
  • Question about the code

    Question about the code

    Hi, thanks for your awesome work! I have a question about the code: https://github.com/HLTCHKUST/VG-GPLMs/blob/d27a485c3da4869a6a9f2ff902a9c64129b778b1/src/data_preprocess/data_builder.py#L30 Why did you ignore the first word in the src and tgt? Looking forward to your reply.

    opened by haruhi-sudo 1
  • about version of transformers

    about version of transformers

    Hello,

    I tried to run your code and got an error saying that 'force_bos_token_to_be_generated' does not exist in BartConfig.

    I think it is because of version of transformers i used.

    I used transformers version==4.11.0 and it works well on text-only Bart. The error occurred when I run multimodal task.

    Could you let me know the version of transformer you used in your work?

    And I want to know the reason why you used './dataset/sum_devtest/tran.tok.txt ' instead of './dataset/sum_train/tran.tok.txt ' for your 'train_src_path'.

    Thank you

    opened by jyson24 1
  • Unsupported operation when running t5_multimodal model

    Unsupported operation when running t5_multimodal model

    I'm running t5-multimodal model on our computer. However, I received the following information after loading the data: Validation sanity check: 0it [00:00, ?it/s] Validation sanity check: 0%| | 0/10 [00:00<?, ?it/s]Traceback (most recent call last): File "/nas02/homes/yang22-1000062/VG-GPLMs/./src/run.py", line 155, in trainer.fit(model, train_dataloader=summary_data.train_loader, val_dataloaders=summary_data.val_loader) File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 498, in fit self.dispatch() File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 545, in dispatch self.accelerator.start_training(self) File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training self.training_type_plugin.start_training(trainer) File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training self._results = trainer.run_train() File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 606, in run_train self.run_sanity_check(self.lightning_module) File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 855, in run_sanity_check _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches) File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 724, in run_evaluation output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx) File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 164, in evaluation_step output = self.trainer.accelerator.validation_step(args) File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 177, in validation_step return self.training_type_plugin.validation_step(*args) File "/home/nas02home/conda/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 297, in validation_step return self.model(*args, **kwargs) File "/home/nas02home/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/nas02home/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1002, in forward self._sync_buffers() File "/home/nas02home/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1585, in _sync_buffers self._sync_module_buffers(authoritative_rank) File "/home/nas02home/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1589, in _sync_module_buffers self._default_broadcast_coalesced(authoritative_rank=authoritative_rank) File "/home/nas02home/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1610, in _default_broadcast_coalesced self._distributed_broadcast_coalesced( File "/home/nas02home/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1526, in _distributed_broadcast_coalesced dist._broadcast_coalesced( RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor refer to a single memory location. Please clone() the tensor before performing the operation.

    For other models, the program can run smoothly. How can I solve it?

    opened by chlorane 0
  • `image_len` is not uesd?

    `image_len` is not uesd?

    https://github.com/HLTCHKUST/VG-GPLMs/blob/ecd40e8d884123666a339ef9d2968b178610b898/src/models/modeling_bart.py#L749

    image_len is not uesd in calculate attn?

    opened by FutureWithoutEnding 5
  • how get the

    how get the "noise image feature"?

    In the code, noise image feature is load from file. how can i generate the noise image feature?

    In the paper, said generate from a uniform distribution from 0 to 3, why 0 and 3, How get the two value? and Why use a uniform?

    Can you show me some code about these?

    opened by FutureWithoutEnding 2
Owner
CAiRE
CAiRE
This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

STaCK: Sentence Ordering with Temporal Commonsense Knowledge This repository contains the pytorch implementation of the paper STaCK: Sentence Ordering

Deep Cognition and Language Research (DeCLaRe) Lab 23 Dec 16, 2022
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Ashutosh Baheti 11 Jan 1, 2023
Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

Contra-OOD Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers. Requirements PyTorch Transformers datasets

Wenxuan Zhou 27 Oct 28, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

LancoPKU 105 Jan 3, 2023
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System This repository contains the PyTorch im

Libo Qin 25 Sep 6, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Libo Qin 12 Sep 26, 2021
This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

null 12 Nov 22, 2022
Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

GATER This repository contains the code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”. Our implementation is

Jiacheng Ye 12 Nov 24, 2022
Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Aspect Sentiment Quad Prediction (ASQP) This repo contains the annotated data and code for our paper Aspect Sentiment Quad Prediction as Paraphrase Ge

Isaac 39 Dec 11, 2022
Implementation for the EMNLP 2021 paper "Interactive Machine Comprehension with Dynamic Knowledge Graphs".

Interactive Machine Comprehension with Dynamic Knowledge Graphs Implementation for the EMNLP 2021 paper. Dependencies apt-get -y update apt-get instal

Xingdi (Eric) Yuan 19 Aug 23, 2022
Related resources for our EMNLP 2021 paper

Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code

Yixuan Su 61 Jan 3, 2023
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

Arthur Bražinskas 39 Jan 1, 2023
Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Junxian He 57 Jan 1, 2023
EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

This repository contains data and code for our EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation. Please contact me at [email protected]

null 9 Oct 28, 2022
Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

GD-VCR Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021). Research Questions and Aims: How well can a model perform o

Da Yin 24 Oct 13, 2022
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Mozhdeh Gheini 16 Jul 16, 2022
Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

CopeNLU 36 Dec 5, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

ood-text-emnlp Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them" Files fine_tune.py is used to finetune the GPT-2 mo

Udit Arora 19 Oct 28, 2022