BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

Overview

bert-gen

see https://sites.google.com/site/deepernn/home/blog/amistakeinwangchoberthasamouthanditmustspeakbertasamarkovrandomfieldlanguagemodel for the description of a mistake in the paper. BERT seems to be a non-equilibrium language model, not an MRF language model.

see https://arxiv.org/abs/1902.04094 for details.

@article{wang2019bert,
  title={BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model},
  author={Wang, Alex and Cho, Kyunghyun},
  journal={arXiv preprint arXiv:1902.04094},
  year={2019}
}
You might also like...
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.
xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Description xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building bl

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Recurrent VLN-BERT Code of the Recurrent-VLN-BERT paper: A Recurrent Vision-and-Language BERT for Navigation Yicong Hong, Qi Wu, Yuankai Qi, Cristian

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5
Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

NLP-Summarizer Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5 This project aimed to provide in

Comments
  • How about Perplexity? (in addition to BLEU)

    How about Perplexity? (in addition to BLEU)

    The colab example using BLEU is helpful & thanks. I'm wondering whether there is any plan to measure by PPL (Perplexity). Or any pointer showing how to measure the performance of generative models by PPL in general?

    opened by leejason 1
  • Strange output from unchanged generate.py

    Strange output from unchanged generate.py

    When I run generate.py with no changes (except removing pdb) this is the output I see:

    The pre-trained model you are loading is a cased model but you have not set `do_lower_case` to False. We are setting `do_lower_case=False` for you but you may want to check this behavior.
    Decoding strategy sequential, argmax at each step
    Iteration 0: this is a sentence .
    	BERT prediction: . is a . .
    Iteration 1: . is a sentence .
    	BERT prediction: . is a . .
    Iteration 2: . is a sentence .
    	BERT prediction: . is a . .
    Iteration 3: . is a sentence .
    	BERT prediction: . is a . .
    Iteration 4: . is a . .
    	BERT prediction: . . a . .
    Final: . is a . .
    

    That doesn't seem to be the desired result. Why so many periods? If this is expected, can you give me an example of an input and configuration that will find the correct answer?

    opened by summerstay 0
  • IndexError: list index out of range in detokenize

    IndexError: list index out of range in detokenize

    I get an error after running

    for temp in [1.0]:
        bert_sents = generate(n_samples, seed_text=seed_text, batch_size=batch_size, max_len=max_len,
                              sample=sample, top_k=top_k, temperature=temp, burnin=burnin, max_iter=max_iter,
                              cuda=True)
        out_file = "data/%s-len%d-burnin%d-topk%d-temp%.3f.txt" % (model_version, max_len, burnin, top_k, temp)
        write_sents(out_file, bert_sents, should_detokenize=True)
    

    Stacktrace:

    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    <ipython-input-23-776125cadf25> in <module>()
         18                           cuda=True)
         19     out_file = "data/%s-len%d-burnin%d-topk%d-temp%.3f.txt" % (model_version, max_len, burnin, top_k, temp)
    ---> 20     write_sents(out_file, bert_sents, should_detokenize=True)
    
    <ipython-input-19-027cb8b83cc4> in write_sents(out_file, sents, should_detokenize)
         15     with open(out_file, "w") as out_fh:
         16         for sent in sents:
    ---> 17             sent = detokenize(sent[1:-1]) if should_detokenize else sent
         18             out_fh.write("%s\n" % " ".join(sent))
    
    <ipython-input-16-beace4564740> in detokenize(sent)
         20     for i, tok in enumerate(sent):
         21         if tok.startswith("##"):
    ---> 22             new_sent[len(new_sent) - 1] = new_sent[len(new_sent) - 1] + tok[2:]
         23         else:
         24             new_sent.append(tok)
    
    IndexError: list index out of range
    

    The saved file head

    $ head -n3 bert-base-uncased-len40-burnin250-topk100-temp1.000.txt 
    sammy harves [ " baby candy " / " dream of baby candy " ( gas station theme ) ) mary ford and baby candy . ( gas station theme ) concept album , featuring mary ford .
    3 . contemporary art review ( 2nd ed . october 2008 ) , review with essays on contemporary art , ( london : bateman & partners , february 2009 ) sculpture and the minimalist movement , part .
    the truth outside ( matthew greengrass ) psycho ( 1964 ? ) psycho ( orson welles ) monster show ( orson welles ) ( barnacles ) part 3 ( the snare drum ) - narration ;
    
    opened by loretoparisi 0
Owner
A collection of research software package released by Cho Lab at NYU CS and CDS.
null
MicBot - MicBot uses Google Translate to speak everyone's chat messages

MicBot MicBot uses Google Translate to speak everyone's chat messages. It can al

null 2 Mar 9, 2022
Must-read papers on improving efficiency for pre-trained language models.

Must-read papers on improving efficiency for pre-trained language models.

Tobias Lee 89 Jan 3, 2023
天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

zxx飞翔的鱼 751 Dec 30, 2022
Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger In this project, our aim is to tune, compare, and contrast the perf

Chirag Daryani 0 Dec 25, 2021
This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

This Project is based on NLTK(Natural Language Toolkit) It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

SaiVenkatDhulipudi 2 Nov 17, 2021
This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

Raihan Ahmed 1 Dec 9, 2021
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 7, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 1, 2022