The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

Overview

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

This repository provides the implementation details for the ACL 2021 main conference paper:

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data. [paper]

1. Data Preparation

In this work, we carried out persona-based dialogue generation experiments under a persona-dense scenario (English PersonaChat) and a persona-sparse scenario (Chinese PersonalDialog), with the assistance of a series of auxiliary inference datasets. Here we summarize the key information of these datasets and provide the links to download these datasets if they are directly accessible.

2. How to Run

The setup.sh script contains the necessary dependencies to run this project. Simply run ./setup.sh would install these dependencies. Here we take the English PersonaChat dataset as an example to illustrate how to run the dialogue generation experiments. Generally, there are three steps, i.e., tokenization, training and inference:

  • Preprocessing

     python preprocess.py --dataset_type convai2 \
     --trainset ./data/ConvAI2/train_self_original_no_cands.txt \
     --testset ./data/ConvAI2/valid_self_original_no_cands.txt \
     --nliset ./data/ConvAI2/ \
     --encoder_model_name_or_path ./pretrained_models/bert/bert-base-uncased/ \
     --max_source_length 64 \
     --max_target_length 32
    

    We have provided some data examples (dozens of lines) at the ./data directory to show the data format. preprocess.py reads different datasets and tokenizes the raw data into a series of vocab IDs to facilitate model training. The --dataset_type could be either convai2 (for English PersonaChat) or ecdt2019 (for Chinese PersonalDialog). Finally, the tokenized data will be saved as a series of JSON files.

  • Model Training

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --do_train \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder2_model ./pretrained_models/bert/bert-base-uncased/ \
     --save_model_path checkpoints/ConvAI2/bertoverbert --dataset_type convai2 \
     --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --learning_rate 7e-6 \
     --batch_size 32
    

    Here we initialize encoder and both decoders from the same downloaded BERT checkpoint. And more parameter settings could be found at bertoverbert.py.

  • Evaluations

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --dataset_type convai2 \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/  \
     --do_evaluation --do_predict \
     --eval_epoch 7
    

    Empirically, in the PersonaChat experiment with default hyperparameter settings, the best-performing checkpoint should be found between epoch 5 and epoch 9. If the training procedure goes fine, there should be some results like:

     Perplexity on test set is 21.037 and 7.813.
    

    where 21.037 is the ppl from the first decoder and 7.813 is the final ppl from the second decoder. And the generated results is redirected to test_result.tsv, here is a generated example from the above checkpoint:

     persona:i'm terrified of scorpions. i am employed by the us postal service. i've a german shepherd named barnaby. my father drove a car for nascar.
     query:sorry to hear that. my dad is an army soldier.
     gold:i thank him for his service.
     response_from_d1:that's cool. i'm a train driver.
     response_from_d2:that's cool. i'm a bit of a canadian who works for america.  
    

    where d1 and d2 are the two BERT decoders, respectively.

  • Computing Infrastructure:

    • The released codes were tested on NVIDIA Tesla V100 32G and NVIDIA PCIe A100 40G GPUs. Notice that with a batch_size=32, the BoB model will need at least 20Gb GPU resources for training.

MISC

  • Build upon 🤗 Transformers.

  • Bibtex:

      @inproceedings{song-etal-2021-bob,
          title = "BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data",
          author = "Haoyu Song, Yan Wang, Kaiyan Zhang, Wei-Nan Zhang, Ting Liu",
          booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL-2021)",
          month = "Aug",
          year = "2021",
          address = "Online",
          publisher = "Association for Computational Linguistics",
      }
      
  • Email: [email protected].

Comments
  • How to calculate and evaluate the ppl of D1 and D2.

    How to calculate and evaluate the ppl of D1 and D2.

    First of all, I would like to know how to calculate the ppl of d1 and d2.

    And I also have a question about how to evaluate the values of d1 and d2 ppl. In "How to Run", it was written as follows.

    Empirically, in the PersonaChat experiment with default hyperparameter settings, the best-performing checkpoint should be found between epoch 5 and epoch 9. If the training procedure goes fine, there should be some results like: Perplexity on test set is 21.037 and 7.813. where 21.037 is the ppl from the first decoder and 7.813 is the final ppl from the second decoder. And the generated results is redirected to test_result.tsv, here is a generated example from the above checkpoint:

    However, as the number of epochs increases, the d2 ppl decreases, and in epoch 49 it drops to 1.957. (My result for epoch 7 was Perplexity on test set is 27.675 and 22.045., so it may be different from other people's value for epoch 49.) Indeed, in epoch 49, the d1 ppl worsened to 249.0. However, as long as the final output of the model, the d2 score, improves, we don't need to worry about the d1 score. Please tell us why you decided that epoch 7 (Perplexity on test set is 21.037 and 7.813.) is optimal.

    opened by iyo-0713 9
  • Question about the PersonaChat data

    Question about the PersonaChat data

    Hi, In the paper, table 1 shows that there are 7,801 dialogues in the test set, which is not found in the data folder here? Is that data referred to the personaChat data?

    thanks

    opened by xiaolan98 4
  • 关于`ul_training`一些理解不到位的地方

    关于`ul_training`一些理解不到位的地方

    感谢作者的开源!有个问题想咨询一下作者:关于ul_training的三个疑问。

    文章关于Unlikehood Training部分如下图所示,对于公式(9),文章旨在通过前提和假设数据生成假设,但是此处用的是将BERT当编码器使用的,输入是$\overline{P}$和$\overline{R}$,那就很难理解$\overline{\mathcal{R}}_{<i}$这一项了,或许作者想表达的是在预测过程中是逐一由前面的token预测后面的token而与attention_mask的形式无关。 image

    在做ul_training时,根据公式(9)和公式(10),理论上应该是$\overline{P}$和$\overline{R}$全部作为输入,但是通过分析下面的代码,发现在整个BERT的过程中,只用到了数据$\overline{R}$,而没有用到$\overline{P}$

    # 分别获取前提和假设的输入数据,前提数据和假设数据是分开的
    def prepare_inference_batch(pos_batch, neg_batch):
        pos_pre_input_ids = pos_batch['pre']['input_ids']
        pos_pre_attention_mask = pos_batch['pre']['attention_mask']
        pos_pre_type_ids = pos_batch['pre']['token_type_ids'] * 0 + 1
    
        pos_hyp_input_ids = pos_batch['hyp']['input_ids']
        pos_hyp_attention_mask = pos_batch['hyp']['attention_mask']
        pos_hyp_type_ids = pos_batch['hyp']['token_type_ids'] * 0
    
        neg_pre_input_ids = neg_batch['pre']['input_ids']
        neg_pre_attention_mask = neg_batch['pre']['attention_mask']
        neg_pre_type_ids = neg_batch['pre']['token_type_ids'] * 0 + 1
    
        neg_hyp_input_ids = neg_batch['hyp']['input_ids']
        neg_hyp_attention_mask = neg_batch['hyp']['attention_mask']
        neg_hyp_type_ids = neg_batch['hyp']['token_type_ids'] * 0
    
        return pos_pre_input_ids, pos_pre_attention_mask, pos_pre_type_ids, pos_hyp_input_ids, pos_hyp_attention_mask, pos_hyp_type_ids, neg_pre_input_ids, neg_pre_attention_mask, neg_pre_type_ids, neg_hyp_input_ids, neg_hyp_attention_mask, neg_hyp_type_ids
    
    # 省略一些不重要的过程 ......
    # 上面的数据用于模型的输入
    inference_data_dict = prepara_inference_dict(pos_batch, neg_batch)
    
    # 省略一些不重要的过程 ......
    
    # 训练过程中,假设的数据分别用于解码器的输入和标签,前提的数据作为`persona_input_ids`,但是由于`encoder_hidden_states=None,`,因此`persona_input_ids`在整个BERT的运算过程中没有被使用。
    if ul_training:
        decoder_input_ids=inference_dict['neg_hyp_input_ids']
        hyp_attention_mask=inference_dict['neg_hyp_attention_mask']
        mask_flag = torch.Tensor.bool(1 - hyp_attention_mask)
        labels = decoder_input_ids.masked_fill(mask_flag, -100)
        persona_input_ids=inference_dict['neg_pre_input_ids']
    
        ul_outputs = self.decoder2(
                        input_ids=decoder_input_ids,
                        attention_mask=hyp_attention_mask,
                        encoder_hidden_states=None,
                        encoder_attention_mask=None,
                        inputs_embeds=None,
                        labels=labels,
                        output_attentions=output_attentions,
                        output_hidden_states=output_hidden_states,
                        return_dict=return_dict,
                        per_input_ids=persona_input_ids,
                        ul_training=ul_training,
                        **kwargs_decoder2)
    

    在计算ul_training的loss时,由上图公式(9)和公式(10)可知,它们的计算过程是不一样的,但是程序中都是经过了一样的运算,即下面的代码,理论上应该是期望模型可以对包含关系的$\overline{P}$和$\overline{R}$做出预测,而对于矛盾关系的$\overline{P}$和$\overline{R}$不能做出正确的预测。

    ul_scores = -prediction_scores
    shifted_prediction_scores = ul_scores[:, :-1, :].contiguous()
    labels = labels[:, 1:].contiguous()
    loss_fct = CrossEntropyLoss()
    lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
    

    以上三点就是我的疑问,我觉得采用NLI来改善模型的推理能力这个ideal非常好,因此期望加在自己的model中,但是在阅读作者的论文和代码中,我有一些疑问,或许产生了理解错误的地方,希望作者可以给予答复!万分感谢!预祝新年快乐~

    opened by cingtiye 3
  • `per_input_ids=persona_input_ids`  seems not be used when `ul_training`

    `per_input_ids=persona_input_ids` seems not be used when `ul_training`

    For the following code:

     if ul_training:
                    decoder_input_ids=inference_dict['neg_hyp_input_ids']
                    hyp_attention_mask=inference_dict['neg_hyp_attention_mask']
                    mask_flag = torch.Tensor.bool(1 - hyp_attention_mask)
                    labels = decoder_input_ids.masked_fill(mask_flag, -100)
                    persona_input_ids=inference_dict['neg_pre_input_ids']
    
                    ul_outputs = self.decoder2(
                        input_ids=decoder_input_ids,
                        attention_mask=hyp_attention_mask,
                        encoder_hidden_states=None,
                        encoder_attention_mask=None,
                        inputs_embeds=None,
                        labels=labels,
                        output_attentions=output_attentions,
                        output_hidden_states=output_hidden_states,
                        return_dict=return_dict,
                        per_input_ids=persona_input_ids,
                        ul_training=ul_training,
                        **kwargs_decoder2,
                    )
    

    because encoder_hidden_states=None, only self.attention can be executed, other related attention code seems can't be executed. Thus the above code per_input_ids=persona_input_ids seems not be used.

    So, hyp is generated by what? Please.

    opened by cingtiye 3
  • Question about Dist.1 / 2

    Question about Dist.1 / 2

    When I run the evaluation script:

    ''' CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --dumped_token ./data/ConvAI2/convai2_tokenized/
    --dataset_type convai2
    --encoder_model ./pretrained_models/bert/bert-base-uncased/
    --do_evaluation --do_predict
    --eval_epoch 7 ''' The results are shown below: Distinct-1 (hypothesis, hypothesis_2, reference): 0.0442, 0.0393, 0.1048 Distinct-2 (hypothesis, hypothesis_2, reference): 0.162, 0.1362, 0.4824

    why the Dist.1/2 of hypothesis_2 (3.93, 13.62) can not reach the number in Table 3 (8.40, 36.08)? or how should I set the argument to obtain the result in Table 3?

    Thanks!

    opened by zhliu0106 3
  • Could you provide your model?

    Could you provide your model?

    Hi, I am researching about persona chatbot. Now i try training the BoB model for my research but it dosen't work well... So, I would like you to publish your model.

    opened by Exe-dev 1
  • 采用DDP训练代码出错

    采用DDP训练代码出错

    您好,我想用DDP(model)训练代码,但是出现如下错误:

    RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).`

    尽管我已经设置了 find_unused_parameters=True 和 提高了 torch的版本。

    请问应该如何更改相关代码才能解决上述问题,感谢。

    opened by cingtiye 1
  • Can't use model.generate when beam_size>1

    Can't use model.generate when beam_size>1

    When I setting beam_size>1 for generateing sequences, missing 1 required positional argument: 'token_type_ids' is occurred. How can I generate utterances by beam_size>1?

    code

    generated_2 = model.generate(
        input_ids=input_ids,
        token_type_ids=token_type_ids,
        attention_mask=attention_mask,
        num_beams=5,
        length_penalty=1.0,
        min_length=3,
        max_length=32,
        no_repeat_ngram_size=1,
        use_decoder2=True,
        per_input_ids=persona_input_ids
        )
    

    Error Message.

    BoB\xlibs\generation_utils.py in beam_search(self, input_ids, beam_scorer, logits_processor, max_length, pad_token_id, eos_token_id, use_decoder2, **model_kwargs) 966 967 while cur_len < max_length: --> 968 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) 969 970 outputs_1, outputs_2 = self(**model_inputs, return_dict=True)

    TypeError: prepare_inputs_for_generation() missing 1 required positional argument: 'token_type_ids'

    opened by Exe-dev 1
  • File name

    File name "xlib.modeling_tf_auto" not found

    When preprocessing and model training, I met this error. I think xlib doesn't include modeling_tf_auto.py .

    Commands

    python bertoverbert.py --do_train --encoder_model ./pretrained_models/bert/bert-base-uncased/ --decoder_model ./pretrained_models/bert/bert-base-uncased/ --decoder2_model ./pretrained_models/bert/bert-base-uncased/ --save_model_path checkpoints/ConvAI2/bertoverbert --dataset_type convai2 --dumped_token ./data/ConvAI2/convai2_tokenized/ --learning_rate 7e-6 --batch_size 32

    Error Message.

    Traceback (most recent call last): File "bertoverbert.py", line 26, in from xlibs import AdamW File "PATH\BoB\xlibs_init_.py", line 126, in from .pipelines import ( File "PATH\BoB\xlibs\pipelines.py", line 48, in from .modeling_tf_auto import ( ModuleNotFoundError: No module named 'xlibs.modeling_tf_auto'

    opened by Exe-dev 1
  • Question about CUDA out of memory

    Question about CUDA out of memory

    I run the code on a 10 GB GPU but it seems to have the problem of CUDA out of memory. I tried to reduce the batch_size to 1 but it still doesn't work.

    opened by slptongji 0
  • Question about nliset of PersonaChat

    Question about nliset of PersonaChat

    Hi, as the example in the ./data/ConvAI2 folder, there are two example data files: nli_positive.tsv, nli_negative.tsv. Where can I get these files? I download the MNLI dataset but didn't get files in the correct format. Do I need to process the MNLI dataset to get nli_positive.tsv, nli_negative.tsv? Is there the code for that procedure?

    Thanks a lot

    opened by xiaolan98 2
  • How to get the results of other metrics?

    How to get the results of other metrics?

    Hi, thanks for the great work. There are results about D.AVG, p.Ent, p.Ctd, Delta P, C.Score metrics in your paper while no evaluation implementations in your repo. How to get the results of these metrics? It is hard to reproduce this work without those evaluation scripts. Looking forward to your reply. Thanks.

    opened by SkyAndCloud 0
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

NeuralWOZ This code is official implementation of "NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation". Sungdong Kim, Mi

NAVER AI 31 Oct 25, 2022
Draw like Bob Ross using the power of Neural Networks (With PyTorch)!

Draw like Bob Ross using the power of Neural Networks! (+ Pytorch) Learning Process Visualization Getting started Install dependecies Requires python3

Kendrick Tan 116 Mar 7, 2022
KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems This is the implementation of the paper: Learning Knowledge Bases with Par

CAiRE 42 Nov 10, 2022
Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

DDAMS This is the pytorch code for our IJCAI 2021 paper Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization [Arxiv Pr

xcfeng 55 Dec 27, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data (NeurIPS 2021) This repository provides the official PyTorch implementation

Liming Jiang 155 Nov 30, 2021
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 1, 2023
Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

PyOpenVINO - An Experimental Python Implementation of OpenVINO Inference Engine (minimum-set) Description The PyOpenVINO is a spin-off product from my

Yasunori Shimura 7 Oct 31, 2022
Database Reasoning Over Text project for ACL paper

Database Reasoning over Text This repository contains the code for the Database Reasoning Over Text paper, to appear at ACL2021. Work is performed in

Facebook Research 320 Dec 12, 2022
null 190 Jan 3, 2023
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

THUNLP 37 Oct 30, 2022
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Ashutosh Baheti 11 Jan 1, 2023
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Collapse by Conditioning: Training Class-conditional GANs with Limited Data Moha

Mohamad Shahbazi 33 Dec 6, 2022
[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

SapBERT: Self-alignment pretraining for BERT This repo holds code for the SapBERT model presented in our NAACL 2021 paper: Self-Alignment Pretraining

Cambridge Language Technology Lab 104 Dec 7, 2022
Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Regularizing Generative Adversarial Networks under Limited Data [Project Page][Paper] Implementation for our GAN regularization method. The proposed r

Google 148 Nov 18, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System This repository contains the PyTorch im

Libo Qin 25 Sep 6, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Libo Qin 12 Sep 26, 2021
Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

Knowledge Bridging for Empathetic Dialogue Generation This is the official implementation for paper Knowledge Bridging for Empathetic Dialogue Generat

Qintong Li 50 Dec 20, 2022