Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

Microsoft

Last update: Dec 17, 2022

Related tags

Text Data & NLP ProphetNet

Overview

ProphetNet-X

This repo provides the code for reproducing the experiments in ProphetNet. In the paper, we propose a new pre-trained language model called ProphetNet for sequence-to-sequence learning with a novel self-supervised objective called future n-gram prediction.
We have released the ProphetNet baselines for GLGE benchmark (A New General Language Generation Evaluation Benchmark) in here. Have a try! :)
We provide ProphetNet-X family models for Chinses(ProphetNet-Zh), Multi-lingual(ProphetNet-Multi), English open domain dialog(ProphetNet-Dialog), Chinese open domain dialog(ProphetNet-Dialog-Zh), code generation(ProphetNet-Code). The details are described in ProphetNet-X paper.

This repo is still developing, feel free to report bugs and we will fix them ~

What's new

ProphetNet-X models are released!

Try new ProphetNet pretrained models for Chinese, English Dialog, Chinese Dialog, Multi-lingual, and Code Generation.

Different ProphetNet-X models have the only difference of the vocabulary file. Simply modify one model file and you can evaluate your idea with all the pretrained models and finetuning scripts!

Future updates

ProphetNet pretrained models for bio-medical text.
ProphetNet pretrained models for protein.
New ProphetNet models for long document modeling.
New algorithms for Transformer/ProphetNet to reduce inference latency with no hurt to the results.
New ProphetNet models for non-auto-regressive generation.
For Natural Language Understanding tasks.

Dependency

pip install torch==1.3.0
pip install fairseq==v0.9.0
pip install tensorboardX==1.7

Pre-trained Models

We have released the following checkpoints for pre-trained models as described in the paper of ProphetNet-X(appear soon).

ProphetNet-X is based on ProphetNet, which also serves the ProphetNet-En model.

Recommended Checkpoints:

ProphetNet-En [link]
ProphetNet-Zh [link]
ProphetNet-Multi [link]
ProphetNet-Dialog-En [link]
ProphetNet-Dialog-Zh [link]
ProphetNet-Code [link]

Expired Checkpoints:

ProphetNet-En-16GB [link]
ProphetNet-Multi-Wiki100 [link]

How to use

The procedure includes 1) Tokenize, 2) Binarize, 3) Finetune, 4) Inference.
ProphetNet is implemented on base of Fairseq, which you can refer to Fairseq Mannual.

For all the ProphetNet-X models, the only difference is the dictionary, which means different Tokenizers should be used.

We take ProphetNet-En for example:

Tokenize. Prepare your train.src, train.tgt, and valid, test sets. Input and output of one sample are placed in the .src and .tgt file with one line.
Use bert-uncased tokenizer to tokenize your data into word piece.

from transformers import BertTokenizer


def bert_uncased_tokenize(fin, fout):
    fin = open(fin, 'r', encoding='utf-8')
    fout = open(fout, 'w', encoding='utf-8')
    tok = BertTokenizer.from_pretrained('bert-base-uncased')
    for line in fin:
        word_pieces = tok.tokenize(line.strip())
        new_line = " ".join(word_pieces)
        fout.write('{}\n'.format(new_line))
bert_uncased_tokenize('train.src', 'tokenized_train.src')
bert_uncased_tokenize('train.tgt', 'tokenized_train.tgt')
bert_uncased_tokenize('valid.src', 'tokenized_valid.src')
bert_uncased_tokenize('valid.tgt', 'tokenized_valid.tgt')
bert_uncased_tokenize('test.src', 'tokenized_test.src')
bert_uncased_tokenize('test.tgt', 'tokenized_test.tgt')

Binirize it with fairseq-preprocess

fairseq-preprocess \
--user-dir prophetnet \
--task translation_prophetnet \
--source-lang src --target-lang tgt \
--trainpref tokenized_train --validpref tokenized_valid --testpref tokenized_test \
--destdir processed --srcdict vocab.txt --tgtdict vocab.txt \
--workers 20

Fine tune with fairseq-train.
--disable-ngram-loss：only keep the next first token loss.
--ngram: number of future tokens to predict. Provided pretrained checkpoint predicts 2 future tokens, and you should set it as 2 to be consistent.
If your device does not support float16, remove --fp16.

DATA_DIR=processed
USER_DIR=./prophetnet
ARCH=ngram_transformer_prophet_large
CRITERION=ngram_language_loss
SAVE_DIR=./model
TENSORBOARD_LOGDIR=./logs
PRETRAINED_MODEL=pretrained_checkpoints/prophetnet_en.pt

fairseq-train \
--fp16 \
--user-dir $USER_DIR --task translation_prophetnet --arch $ARCH \
--optimizer adam --adam-betas '(0.9, 0.999)' --clip-norm 0.1 \
--lr 0.00001 --min-lr 1e-09 \
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 1000 \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
--criterion $CRITERION --label-smoothing 0.1 \
--update-freq 1  --max-tokens 1400 --max-sentences 7 \
--num-workers 4 \
--load-from-pretrained-model $PRETRAINED_MODEL \
--ddp-backend=no_c10d --max-epoch 10 \
--max-source-positions 512 --max-target-positions 512 \
--skip-invalid-size-inputs-valid-test \
--save-dir $SAVE_DIR \
--keep-last-epochs 10 \
--tensorboard-logdir $TENSORBOARD_LOGDIR \
$DATA_DIR

Inference with fairseq-generate to generate targets for given processed test files. Or you can fairseq-interactive to generate answers for your typed-in text (which should also been tokenized).

BEAM=5
LENPEN=1.5
CHECK_POINT=./model/checkpoint5.pt
TEMP_FILE=fairseq_outputs.txt
OUTPUT_FILE=sorted_outputs.txt

fairseq-generate processed --path $CHECK_POINT --user-dir prophetnet --task translation_prophetnet --batch-size 80 --gen-subset test --beam $BEAM --num-workers 4 --no-repeat-ngram-size 3 --lenpen $LENPEN 2>&1 > $TEMP_FILE
grep ^H $TEMP_FILE | cut -c 3- | sort -n | cut -f3- | sed "s/ ##//g" > $OUTPUT_FILE

TIPS:

If you met problems to run fairseq-preprocess, fairseq-train and other commands, or if you want to modify the workflow/inference pipeline, it's a good choice to download fairseq git repo, checkout v0.9.0, and merge our codes.
Then, modify their preprocess.py, train.py or generate.py, to run your new pipeline.

Repo Reference

This repo is partially referred to Fairseq-v0.9.0 and MASS.

How to Cite

If you extend or use this work, please cite the paper where it was introduced:

@inproceedings{qi2020prophetnet,
  title={Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training},
  author={Qi, Weizhen and Yan, Yu and Gong, Yeyun and Liu, Dayiheng and Duan, Nan and Chen, Jiusheng and Zhang, Ruofei and Zhou, Ming},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings},
  pages={2401--2410},
  year={2020}
}
@article{qi2021prophetnet,
  title={ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation},
  author={Qi, Weizhen and Gong, Yeyun and Yan, Yu and Xu, Can and Yao, Bolun and Zhou, Bartuer and Cheng, Biao and Jiang, Daxin and Chen, Jiusheng and Zhang, Ruofei and others},
  journal={arXiv preprint arXiv:2104.08006},
  year={2021}
}

Microsoft Open Source Code of Conduct

Comments

Abstractive Summarization using ProphetNet
I'm following these steps to summarize my document -

download CNN\DM fine-tuned checkpoint

preprocess your text with BERT-tokenization, and you can refer to our preprocess scripts

use fairseq-generate or fairseq-interactive to generate summarization for your given text. For fairseq-generate, you can refer to our generate scripts. For fairseq-interactive, you can easily generate summarization for a typed-in text interactively. Detailed instructions can be found in fairseq manual

What is the --task argument for summarization?

Also, would this be sufficient if my processed input is in 2.txt?

fairseq-generate 2.txt --path content/drive/My Drive/prophetnet_large_160G_cnndm_model.pt --user-dir prophetnet --task summarization_prophetnet --batch-size 80 --gen-subset test --beam $BEAM --num-workers 4 --lenpen $LENPEN 2>&1 > $OUTPUT_FILE
opened by harshithbelagur 10

KeyError: "best loss", when loading checkpoint as Fairseq Model

Hi guys,

Thank you for the incredible work.

I tried to load this model from the larger checkpoint in the following manner:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained(model_name_or_path=MODEL_DIR,  \
                                         checkpoint_file='prophetnet_large_pretrained_160G_14epoch_model.pt')

but was presented with a key error:

KeyError                                  Traceback (most recent call last)
<ipython-input-13-782ea15f21fd> in <module>()
      1 MODEL_DIR = '/content/drive/My Drive/src/models/'
----> 2 model = TransformerModel.from_pretrained(model_name_or_path=MODEL_DIR,                                         checkpoint_file='prophetnet_large_pretrained_160G_14epoch_model.pt')

4 frames
/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py in _upgrade_state_dict(state)
    298     if "optimizer_history" not in state:
    299         state["optimizer_history"] = [
--> 300             {"criterion_name": "CrossEntropyCriterion", "best_loss": state["best_loss"]}
    301         ]
    302         state["last_optimizer_state"] = state["optimizer"]

KeyError: 'best_loss'

Versions fairseq==0.9.0 torch==1.4.0

Any advice on how to proceed would be greatly appreciated, I wish to load ProphetNet into a fairseq model so I can adapt the architecture to a custom task.

opened by chrisdoyleIE 7

Increasing --max-source-positions --max-target-positions

Hi again,

I was finetuning some data with --max-source-positions 1024 --max-target-positions 1024.

But it paused at epoch 001: 8%.
and showed: WARNING: overflow detected, setting loss scale to: 64.0 Is there, any upper limit with **--max-source-positions & --max-target-positions **.

I am training with 4 Tesla T4 GPUs.

Please help.

opened by ShoubhikBanerjee 6
Is it possible to run prophetnet on 11G memory GPUs?

I tried to run prophetnet on 2080ti(11G memory) with Question Generation task. However, even if I set the max-sentences as 1, it still be out of memory. So I wonder whether it is possible to run this model on 11G memory GPU. Because it has similar structure and size to the other pretrained models like BERT and Unilm, which I can run them on 11G memory GPUs.

opened by Brandonnogithub 5
Provide generated outputs

Hi all, Thanks for sharing the code and models. Is it possible to directly provide the generated outputs of the model? I am specifically interested in the summarization task and would like to just have the outputs instead of decoding them myself using the pretrained model. I understand Gigaword might be subject to license issues, but the CNN/DailyMail outputs would suffice.

Thanks!

opened by shahbazsyed 5

RuntimeError: unexpected EOF. Corrupted File?

Hello,

I performed the following:

Clone prophetnet repository
Installed torch and fairseq
Download ProphetNet-large-160GB pre-trained model
Download CNN/DM data
Preprocess CNN/DM data via preprocess_cnn_dm.py
Use fairseq-preprocess to generate binaries

When I run fairseq-train or inference fairseq-generate, I get the following errors: Train

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-train", line 11, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 333, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 51, in main
    model = task.build_model(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/tasks/fairseq_task.py", line 185, in build_model
    return models.build_model(args, self)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/models/__init__.py", line 48, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 147, in build_model
    states = torch.load(args.load_from_pretrained_model, map_location='cpu')
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 1092436 more bytes. The file might be corrupted.

Inference

Traceback (most recent call last):  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 151, in load_checkpoint_to_cpu    from fairseq.fb_pathmgr import fb_pathmgr
ModuleNotFoundError: No module named 'fairseq.fb_pathmgr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-generate", line 11, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 199, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 47, in main
    task=task,
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 179, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(filenames, arg_overrides, task)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 160, in load_checkpoint_to_cpu
    path, map_location=lambda s, l: default_restore_location(s, "cpu")
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 5239485 more bytes. The file might be corrupted.

Inputs:

Train

fairseq-train \
--fp16 \
--user-dir ./prophetnet --task translation_prophetnet --arch ngram_transformer_prophet_large \
--optimizer adam --adam-betas '(0.9, 0.999)' --clip-norm 0.1 \
--lr 0.0001 \
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 1000 \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
--criterion ngram_language_loss --label-smoothing 0.1 \
--update-freq 32  --max-sentences 2 \
--num-workers 4 \
--load-from-pretrained-model ../prophetnet_large_pretrained_160G_14epoch_model.pt \
--load-sep \
--ddp-backend=no_c10d --max-epoch 10 \
--max-source-positions 512 --max-target-positions 512 \
--skip-invalid-size-inputs-valid-test \
--seed 1 \
--save-dir ./cnndm/finetune_cnndm_checkpoints \
--keep-last-epochs 10 \
--tensorboard-logdir ./cnndm/finetune_cnndm_tensorboard \
./cnndm/processed

Inference

fairseq-generate \
./cnndm/processed \
--path ../prophetnet_large_pretrained_16G_64epoch_model.pt \
--user-dir prophetnet \
--task translation_prophetnet \
--batch-size 32 \
--gen-subset test \
--beam 5 \
--num-workers 4 \
--min-len 45 \
--max-len-b 110 \
--no-repeat-ngram-size 3 --lenpen 1.2 2>&1 > ../logs.output

Any idea how to handle this? Thank you.

opened by gouldju1 4

Assertion Error in fine-tuning of Gigaword

Hi, thank you for distributing your code! I tried to fine-tune the pre-trained ProphetNet (160G) on English Gigaword summarization dataset. I conducted pre-processing described in README and then tried fine-tuning but faced the following Assertion Error:

  File "~/anaconda3/envs/py36pytorch14/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq_cli/train.py", line 333, in cli_main
    main(args)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq_cli/train.py", line 86, in main
    train(args, trainer, task, epoch_itr)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq_cli/train.py", line 126, in train
    for i, samples in enumerate(progress, start=epoch_itr.iterations_in_epoch):
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/tqdm/std.py", line 1127, in __iter__
    for obj in iterable:
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/iterators.py", line 314, in __next__
    chunk.append(next(self.itr))
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/iterators.py", line 43, in __next__
    return next(self.itr)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/iterators.py", line 36, in __iter__
    for x in self.iterable:
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/language_pair_dataset.py", line 252, in collater
    input_feeding=self.input_feeding,
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/language_pair_dataset.py", line 69, in collate
    move_eos_to_beginning=True,
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/language_pair_dataset.py", line 22, in merge
    pad_idx, eos_idx, left_pad, move_eos_to_beginning,
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/data_utils.py", line 44, in collate_tokens
    copy_tensor(v, res[i][size - len(v):] if left_pad else res[i][:len(v)])
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/data_utils.py", line 37, in copy_tensor
    assert src[-1] == eos_idx
AssertionError

pytorch version == 1.4.0

fairseq version == 0.9.0

In addition, when I tried to train the original Transformer (--arch transformer_wmt_en_de) with label_smoothed_cross_entropy, I succeeded training.

Do you have any idea to solve the above error?

opened by takase 2

Train new model

Hi, Thanks for your awesome model Could I ask how to train a whole new model for a specific task for a language like summarizing Vietnamese articles.

Would you mind provide some instructions on that?

Edited: I have successfully finetune from pretrained model ProphetNet X to create Vietnamese model. However, I also want to create new model as well.

@qiweizhen @yuyan2do @dayihengliu

opened by stoicity 1

evaluting causing error for infer language pair on pretrained cnndm

fairseq-generate cnndm/processed --path /e/workspace/ProphetNet/a.pt --user-dir prophetnet --task translation_prophetnet --batch-size 32 --gen-subset test --beam 5 --num-workers 4 --min-len 45 --max-len-b 110  --no-repeat-ngram-size 3 --lenpen 1.2 2>&1 > cnndm/output-ck9-pelt1.2-test-beam5.txt

using the above command for Inference and Evaluation but causing an error on a pre-trained model for CNN/Daily Mail

Traceback (most recent call last):
  File "D:\windows_program\conda\envs\p\Scripts\fairseq-generate-script.py", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-generate')())
  File "e:\fairseq\fairseq_cli\generate.py", line 270, in cli_main
    main(args)
  File "e:\fairseq\fairseq_cli\generate.py", line 36, in main
    return _main(args, sys.stdout)
  File "e:\fairseq\fairseq_cli\generate.py", line 57, in _main
    task = tasks.setup_task(args)
  File "e:\fairseq\fairseq\tasks\__init__.py", line 17, in setup_task
    return TASK_REGISTRY[args.task].setup_task(args, **kwargs)
  File "e:\fairseq\fairseq\tasks\translation.py", line 226, in setup_task
    raise Exception('Could not infer language pair, please provide it explicitly')
Exception: Could not infer language pair, please provide it explicitly

but in docs their no such arguments for fairseq-generate

fairseq 0.9.0 torch 1.5.1 model prophetnet_large_160G_cnndm_model.pt

opened by amiyamandal-dev 1

fixed error in tokenization code for other datasets in readme

In this code sample in the README which helps users tokenize their own data, there is a variable used new, which is not defined in the scope of the function. I believe the intended variable to be used is word_pieces.

opened by ManavR123 1
Loading .pt into fairseq model for customisation

Hi guys,

really incredible work, thank you.

May I please ask for a way of loading the available checkpoints into its fairseq model, such that someone can build upon your architecture?

Specifically, the "bpe" and "bpe_codes" arguments as below are what I'm trying to identify.

opened by chrisdoyleIE 1
Selecting additional scoring methods for fine-tuning

We have started to fine-tune the ProphetNet model on a custom dataset. We are using fairseq==v0.9.0 version. Currently, only perplexity is supported while training, however, we would like also to validate the trained model on BLEU-4, METEOR, and ROUGE metrics. Can anyone provide any insights on this? Because the "--scoring" parameter in fairseq v0.9 is not supported.

opened by mtsourma 0

fix-bug: fix attn transpose bug

Hi, I seem to find a bug in the code.

In extract_features function of NgramTransformerDecoder, a transpose operation is applied to attn, which is the output of NgramTransformerDecoderLayer . The code snippet is as follows:

class NgramTransformerDecoder(FairseqIncrementalDecoder):
    def extract_features(self, prev_output_tokens, encoder_out=None, incremental_state=None, **unused):
        # ......
        # decoder layers
        for layer in self.layers:
            x, attn = layer(
                x,
                encoder_out['encoder_out'] if encoder_out is not None else None,
                encoder_out['encoder_padding_mask'] if encoder_out is not None else None,
                incremental_state,
                self_attn_mask=self_attn_mask,
                ngram_mask_matrix=ngram_mask_matrix,
                i_buckets_main_stream=i_buckets_main_stream,
                i_bucket_relative_stream=i_bucket_relative_stream,
                real_positions=real_positions
            )
            inner_states.append(x)
        # TODO [(1+ngram)*T, B, C] -> [B, (1+ngram)*T, C]
        x_list = x.transpose(0, 1).chunk(1 + self.ngram, 1)
        if attn is not None:
            attn_list = attn.transpose(0, 1).chunk(1 + self.ngram, 1)
        else:
            attn_list = None

        return x_list, {'attn': attn_list}

As can be seen from the code comments, it's purpose is to change the dims from [(1+ngram)*T, B, C] to [B, (1+ngram)*T, C]. The variable attn, from NgramTransformerDecoderLayer, is the second result returned by its encoder_attn(fairseq.modules.MultiheadAttention).

In fairseqv0.9.0, the code snippet of MultiheadAttention's forward function is as follows:

class MultiheadAttention(nn.Module):
    def forward(
        self,
        # ...
    ):
        # ......
        if need_weights:
            attn_weights = attn_weights_float.view(bsz, self.num_heads, tgt_len, src_len).transpose(1, 0)
            if not need_head_weights:
                # average attention weights over heads
                attn_weights = attn_weights.mean(dim=0)
        else:
            attn_weights = None

        return attn, attn_weights

It can be seen that, the second result of forward function(attn_weights), has the shape (bsz, self.num_heads, tgt_len, src_len) originally. After transpose and mean operator, it has the shape (bsz, tgt_len, src_len), which is the actual shape of attn mentioned in extract_features rather than (1+ngram)*T, B, C described in the comment. BTW, shape and transpose of x in extract_features is right. And the attn is not actually used during training and inferencing. So I guess it's the reason why it has not been found for 2 years.

But if one wants to some modification and needs to use the variable attn , like me, will find it has a confusing shape caused by the transpose operator. And it does take me some time to find the bug.

Hoping the PR can be merged.

opened by tqnwhz 1

About ProphetNet-Dialog-En in dialogue dataset

I want to know whether the personachat dataset is set under the feed shot setting during model tuning, but I think the data preprocessing code seems to use all files。Thanks.

opened by xiang-xiang-zhu 0
Wrong Tokenization in SquadQG Evaluation Scripts
Thanks for the great work.

I am reproducing the result reported in GLGE but find that the SquadQG evaluation script seem to use wrong tokenization.

In /script/evaluate/qg/eval_on_unilm_qg.py, the generated text are post-processed by fix_tokenization:

https://github.com/microsoft/ProphetNet/blob/0a1b59cb95783319b7b58ede65b768587dc49daf/GLGE_baselines/script/script/evaluate/qg/eval_on_unilm_qg.py#L40-L117

For example, it turns . . . to ..., " to '', 1 , 000 to 1,000.

However, the original data do not like the sentence after fix_tokenization. Here are some samples from the test set:

What did Harff define as " short - lived outbursts by mobs . . . ? " Who sang " Girls Love Beyoncé " in 2013 ? What city in Montana has over 100 , 000 people ?

Moreover, I reproduce MASS-base and find the results are higher if we disable fix_tokenization:

| | BLEU | METEOR | ROUGE-L | |----------------------------------------------|-------|--------|---------| | MASS-base reported in GLGE | 20.1 | 24.4 | 49.4 | | MASS-base reproduce with fix_tokenization | 20.69 | 24.92 | 49.21 | | MASS-base reproduce without fix_tokenization | 22.54 | 25.03 | 50.27 |

I wonder whether I miss somthing or the reported results use a wrong tokenization? I also hope that, if possible, the model outputs can be released to support fair and detailed comparisons.

Looking forward to your reply
opened by hzhwcmhf 0
KeyError during inference with dialog-en model

Hi,

Using fairseq cli, I ran the preprocessing for test files only to generate binaries, and then tried running the inference with prophetnet-dialog-en model.

Here is my code: ` fairseq-preprocess
--user-dir prophetnet
--task translation_prophetnet
--source-lang src --target-lang tgt
--testpref tokenized_test
--destdir processed --srcdict vocab.txt --tgtdict vocab.txt
--workers 20

BEAM=5 LENPEN=1.5 CHECK_POINT=prophetnet-dialog-en.pt TEMP_FILE=fairseq_outputs.txt OUTPUT_FILE=sorted_outputs.txt

fairseq-generate processed --path $CHECK_POINT --user-dir prophetnet --task translation_prophetnet --batch-size 80 --gen-subset test --beam $BEAM --num-workers 4 --no-repeat-ngram-size 3 --lenpen $LENPEN 2>&1 > $TEMP_FILE grep ^H $TEMP_FILE | cut -c 3- | sort -n | cut -f3- | sed "s/ ##//g" > $OUTPUT_FILE`

I got the following error. Would appreciate any advice on this. Thank you!

/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py in _upgrade_state_dict(state) --> 300 {"criterion_name": "CrossEntropyCriterion", "best_loss": state["best_loss"]} KeyError: 'best_loss'

opened by raviteja5 1

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

Related tags

Overview

ProphetNet-X

What's new

Future updates

Dependency

Pre-trained Models

How to use

TIPS:

Repo Reference

How to Cite

Comments

Inputs:

Owner

Microsoft

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"