Code for our paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021

Related tags

Deep Learning SimCLS
Overview

SimCLS

Code for our paper: "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021

1. How to Install

Requirements

  • python3
  • conda create --name env --file spec-file.txt
  • pip3 install -r requirements.txt

Description of Codes

  • main.py -> training and evaluation procedure
  • model.py -> models
  • data_utils.py -> dataloader
  • utils.py -> utility functions
  • preprocess.py -> data preprocessing

Workspace

Following directories should be created for our experiments.

  • ./cache -> storing model checkpoints
  • ./result -> storing evaluation results

2. Preprocessing

We use the following datasets for our experiments.

For data preprocessing, please run

python preprocess.py --src_dir [path of the raw data] --tgt_dir [output path] --split [train/val/test] --cand_num [number of candidate summaries]

src_dir should contain the following files (using test split as an example):

  • test.source
  • test.source.tokenized
  • test.target
  • test.target.tokenized
  • test.out
  • test.out.tokenized

Each line of these files should contain a sample. In particular, you should put the candidate summaries for one data sample at neighboring lines in test.out and test.out.tokenized.

The preprocessing precedure will store the processed data as seperate json files in tgt_dir.

We have provided an example file in ./example.

3. How to Run

Hyper-parameter Setting

You may specify the hyper-parameters in main.py.

Train

python main.py --cuda --gpuid [list of gpuid] -l

Fine-tune

python main.py --cuda --gpuid [list of gpuid] -l --model_pt [model path]

Evaluate

python main.py --cuda --gpuid [single gpu] -e --model_pt [model path]

4. Results

CNNDM

ROUGE-1 ROUGE-2 ROUGE-L
BART 44.39 21.21 41.28
Ours 46.67 22.15 43.54

XSum

ROUGE-1 ROUGE-2 ROUGE-L
Pegasus 47.10 24.53 39.23
Ours 47.61 24.57 39.44

Our model outputs on these datasets can be found in ./output.

Comments
  • Any try on other generation task ?

    Any try on other generation task ?

    Hi, thanks for your great work. I am wondering have you ever tried this general idea to other NLG tasks like dialogue or NMT? Hoping to get some insights from you guys !

    opened by Hannibal046 4
  • code about candidate generation

    code about candidate generation

    Hi, yixinL7 Very beautiful work. Do you mind sharing your code about candidate generation?

    I try to use the BartForConditionalGeneration model from huggingface to reproduce your results but it always generates repeat sentences in a beam.

    Thanks!

    opened by ShangQingTu 4
  • A question in Table 1: Results on CNNDM.

    A question in Table 1: Results on CNNDM.

    Thanks for your insightful work. However, I am confused by some details in the Table 1. Is the model which derives the 'Max' result also trained by contrastive learning ? or simply sampled from different beam search process?

    opened by Doragd 4
  • The difference between loss function and loss code part

    The difference between loss function and loss code part

    Thanks for your excellent work. I have a question about loss computation. Is there any difference between the loss function in the paper and the code part? The loss function in the pape: image image

    But the code part: image

    It seems the code part just computes +ilamda instead of +(j-i) @lamda. Did I miss something?

    opened by Jexxie 3
  • doubt regarding inputs to preprocess.py

    doubt regarding inputs to preprocess.py

    Hello there, first of all thank you so much for giving your code as open source so others like me can learn from it. I saw that the preprocess.py script requires many file inputs including the candidate summary. But those are generated by the model right? I couldn't find those in the data. Also in the example json file, I noticed that article untokenized and tokenized both seem to be sentence tokenized. So what is the difference?

    opened by ramgj28 3
  • About TotalLoss in model.RankingLoss

    About TotalLoss in model.RankingLoss

    Thank you for the nice research and the paper is pretty readable and very intuitive.

    But when I read the code some question came to my mind.

    https://github.com/yixinL7/SimCLS/blob/1f08d260dce0668241e9d2fb9eed57cc6b0e60f2/model.py#L7-L10

    I think the variable TotalLoss at line number 10 will always be 0. Ofcourse it would be change from the next line by the candi and summary score. Is it for just referencing before for loop?

    Thanks

    opened by moon-jong 2
  • About CNN/DM and XSum dataset

    About CNN/DM and XSum dataset

    Hi,I am a newbee on NLP, and I don't know what's the raw data of cnn/dm and xsum dataset.I looked up those two links and have no idea about that,I'am glad if you can tell me more,thanks!

    opened by windhxs 0
  • Can we use pretrained or trained model to generate summarization from our input

    Can we use pretrained or trained model to generate summarization from our input

    Follow your instruction, I've reproduced your results in dataset. So can you give me some instruction to use this SimCLS to summary document . Now, it has a version in pytorch and huggingface but I really want to use directly your repo (the official repo)

    opened by tungphamMTA 0
  • Train set and test set ranking distribution difference

    Train set and test set ranking distribution difference

    Hi, since the model used for CNNDM is Facebook/bart-large-cnn, which means the model actually got fine-tuned on the CNNDM training set. Considering the Neural model's amazing capacity for memorization, the candidate generation of training set for evaluation model should be nearly perfect. Do I understand this correctly ? How do you avoid this to generate useful data for ranking ? And does Pegasus also fine tuned on the CNNDM before generating summary candidate ? Thanks .

    opened by Hannibal046 7
  • model.pt question

    model.pt question

    Hi, thank you for your nice work. I'm trying to rebuild your model. But I have a question, After I run the train part, "python main.py --cuda --gpuid [list of gpuid] -l", I got some config.txt files. Then when I run Evaluate part, "python main.py --cuda --gpuid [single gpu] -e --model_pt [model path]", It needs the model path, but I have no idea where it comes from, should It comes from the training part? But I only got some config and log files in the cache folder. Appreciate it if you could give me some help.

    opened by vandawn 1
  • Where are candidate summaries created

    Where are candidate summaries created

    Hello, I'm trying to train and evaluate this model on a new dataset (SAMSum). To do that I need to first generate and score candidate summaries using the generative model. Where is the code that was used to do this? The README for this repo seems like it starts with a dataset that already includes candidate summaries and their rogue scores. I just want to make sure I do that step correctly. Thanks!

    opened by crichardson332 1
  • About BS and MS metrics

    About BS and MS metrics

    Hi, I have a question about BS and MS metrics. In BS metrics, I get a score of 0.88 when I use ‘rescale_WITH_baseline =True’. With ‘rescale_WITH_baseline =False’, the score drops to around 0.46. Both results are different from yours. I'am glad if you can tell me more,thanks!

    opened by JingqiWei 2
Owner
Yixin Liu
Yixin Liu
Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

One2Set This repository contains the code for our ACL 2021 paper “One2Set: Generating Diverse Keyphrases as a Set”. Our implementation is built on the

Jiacheng Ye 63 Jan 5, 2023
Code and data of the ACL 2021 paper: Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

MetaAdaptRank This repository provides the implementation of meta-learning to reweight synthetic weak supervision data described in the paper Few-Shot

THUNLP 5 Jun 16, 2022
code associated with ACL 2021 DExperts paper

DExperts Hi! This repository contains code for the paper DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts to appear at

Alisa Liu 68 Dec 15, 2022
null 190 Jan 3, 2023
Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Introduction Code and data for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning". We cons

Pan Lu 81 Dec 27, 2022
Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming

Code for ACL'2021 paper WARP ?? Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification.

YerevaNN 75 Nov 6, 2022
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 1, 2023
PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".

Maria: A Visual Experience Powered Conversational Agent This repository is the Pytorch implementation of our paper "Maria: A Visual Experience Powered

Jokie 22 Dec 12, 2022
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

null 124 Dec 27, 2022
A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Span-ASTE-Pytorch This repository is a pytorch version that implements Ali's ACL 2021 research paper Learning Span-Level Interactions for Aspect Senti

来自丹麦的天籁 10 Dec 6, 2022
Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Balancing Training for Multilingual Neural Machine Translation Implementation of the paper Balancing Training for Multilingual Neural Machine Translat

Xinyi Wang 21 May 18, 2022
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

SapBERT: Self-alignment pretraining for BERT This repo holds code for the SapBERT model presented in our NAACL 2021 paper: Self-Alignment Pretraining

Cambridge Language Technology Lab 104 Dec 7, 2022
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

Rowan Zellers 51 Oct 8, 2022
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m

Akari Asai 25 Oct 30, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Official PyTorch Implementation of SSMix (Findings of ACL 2021)

SSMix: Saliency-based Span Mixup for Text Classification (Findings of ACL 2021) Official PyTorch Implementation of SSMix | Paper Abstract Data augment

Clova AI Research 52 Dec 27, 2022
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

NeuralWOZ This code is official implementation of "NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation". Sungdong Kim, Mi

NAVER AI 31 Oct 25, 2022
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

null 71 Dec 8, 2022