The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Sun Yi

Last update: Nov 21, 2022

Related tags

Deep Learning nlp sentiment-analysis sentence-classification natural-language-inference bert zero-shot entity-linking entity-typing correference-resolution prompt-learning

Overview

This is the code of our paper NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction. We use a sentence-level pre-training task NSP (Next Sentence Prediction) to realize prompt-learning and perform various downstream tasks, such as single sentence classification, sentence pair classification, coreference resolution, cloze-style task, entity linking, entity typing.

On the FewCLUE benchmark, our NSP-BERT outperforms other zero-shot methods (GPT-1-zero and PET-zero) on most of these tasks and comes close to the few-shot methods. We hope NSP-BERT can be an unsupervised tool that can assist other language tasks or models.

Guide

Section	Description
Environment	The required deployment environment
Downloads	Download links for the models' checkpoints used by NSP-BERT
Use examples	Learn to use NSP-BERT for different downstream tasks
Baselines	Baseline results for several Chinese NLP datasets (partial)
Model Comparison	Compare the models published in this repository
Strategy Details	Some of the strategies used in the paper
Discussion	Discussion and Discrimination for future work

Environment

The environments are as follows:

Python 3.6
bert4keras 0.10.6
tensorflow-gpu 1.15.0

Downloads

Models

We should dowmload the checkpoints of different models. The vocab.txt and the config.json are already in our repository.

Organization	Model Name	Model Parameters	Download Linking	Tips
Google	BERT-Chinese	L=12 H=769 A=12 102M	Tensorflow
HFL	BERT-wwm	L=12 H=769 A=12 102M	Tensorflow
	BERT-wwm-ext	L=12 H=769 A=12 102M	Tensorflow
UER	BERT-mixed-tiny	L=3 H=384 A=6 14M	Pytorch	*
	BERT-mixed-Small	L=6 H=512 A=8 31M	Pytorch	*
	BERT-mixed-Base	L=12 H=769 A=12 102M	Pytorch	*
	BERT-mixed-Large	L=24 H=1024 A=16 327M	Pytorch	*

* We need to use UER's convert tool to convert UER pytorch to Original Tensorflow.

Datasets

We use FewCLUE datasets and DuEL2.0 (CCKS2020) in our experiments.

Datasets	Download Links
FewCLUE	https://github.com/CLUEbenchmark/FewCLUE/tree/main/datasets
DuEL2.0 (CCKS2020)	https://aistudio.baidu.com/aistudio/competition/detail/83

Put the datasets into the NSP-BERT/datasets/.

Use examples

We can run individual python files in the project directly to evaluate our NSP-BERT.

NSP-BERT
    |- datasets
        |- clue_datasets
           |- ...
        |- DuEL 2.0
           |- dev.json
           |- kb.json
    |- models
        |- uer_mixed_corpus_bert_base
           |- bert_config.json
           |- vocab.txt
           |- bert_model.ckpt...
           |- ...
    |- nsp_bert_classification.py             # Single Sentence Classification
    |- nsp_bert_sentence_pair.py              # Sentence Pair Classification
    |- nsp_bert_cloze_style.py                # Cloze-style Task
    |- nsp_bert_coreference_resolution.py     # Coreference Resolution
    |- nsp_bert_entity_linking.py             # Entity Linking and Entity Typing
    |- utils.py

Python File	Task	Datasets
nsp_bert_classification.py	Single Sentence Classification	EPRSTMT, TNEWS, CSLDCP, IFLYTEK
nsp_bert_sentence_pair.py	Sentence Pair Classification	OCNLI, BUSTM, CSL
nsp_bert_cloze_style.py	Cloze-style Task	ChID
nsp_bert_coreference_resolution.py	Coreference Resolution	CLUEWSC
nsp_bert_entity_linking.py	Entity Linking and Entity Typing	DuEL2.0

Baselines

Reference FewCLUE, we choos 3 training scenarios, fine-tuning, few-shot and zero-shot. The baselines use Chineses-RoBERTa-Base and Chinses-GPT-1 as the backbone model.

Methods

Scenarios	Methods
Fine-tuning	BERT, RoBERTa
Few-Shot	PET, ADAPET, P-tuning, LM-BFF, EFL
Zero-Shot	GPT-zero, PET-zero

Downloads

Organization	Model Name	Model Parameters	Download Linking
huawei-noah	Chinese GPT	L=12 H=769 A=12 102M	Tensorflow
HFL	RoBERTa-wwm-ext	L=12 H=769 A=12 102M	Tensorflow

Model Comparison

Strategy Details

Discussion

Sincce NSP-BERT is a sentence-level prompt-learning model, it is significantly superior to GPT-zero and PET-zero in terms of Single Sentence Classification tasks (TNEWS, CSLDCP and IFLYTEK). At the same time, it can solve the Entity Linking task (DuEL2.0), and the model is not limited by the non-fixed-length entity description, which GPT-zero and PET-zero cannot do this.
However, it doesn't work as well on Token-Level tasks, such as Cloze-style task and Entity Typing.
In future work, it is essential to extend NSP-BERT to the few-shot scenario.

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

71 Dec 30, 2022

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this

239 Jan 2, 2023

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Related tags

Overview

Overview

Guide

Environment

Downloads

Models

Datasets

Use examples

Baselines

Methods

Downloads

Model Comparison

Strategy Details

Discussion

You might also like...

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

Code for our paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021

Code for our paper "Sematic Representation for Dialogue Modeling" in ACL2021

Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

Code for our TKDE paper "Understanding WeChat User Preferences and “Wow” Diffusion"

This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network.

Owner

Sun Yi

Convolutional neural network web app trained to track our infant’s sleep schedule using our Google Nest camera.

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

Code for our CVPR 2021 paper "MetaCam+DSCE"

Code for our CVPR2021 paper coordinate attention

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet]