Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings

Last update: Jan 8, 2023

Related tags

Deep Learning Prompt-BERT

Overview

Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings

Results on STS Tasks

Model	STS12	STS13	STS14	STS15	STS16	STSb	SICK-R	Avg.
unsup-prompt-bert-base Download	71.98	84.66	77.13	84.52	81.10	82.03	70.64	78.87
unsup-prompt-roberta-base Download	73.98	84.73	77.88	84.93	81.89	82.74	69.21	79.34
sup-prompt-bert-base Download	75.48	85.59	80.57	85.99	81.08	84.56	80.52	81.97
sup-prompt-roberta-base Download	76.75	85.93	82.28	86.69	82.80	86.14	80.04	82.95

Download Data

cd SentEval/data/downstream/
bash download_dataset.sh
cd -
cd ./data
bash download_wiki.sh
bash download_nli.sh
cd -

Static token embedding with removing embedding biases

robert-base, bert-base-cased and robert-base-uncased

./run.sh roberta-base-embedding-only-remove-baises
./run.sh bert-base-cased-embedding-only-remove-baises
./run.sh bert-base-uncased-embedding-only-remove-baises

Non fine-tuned BERT with Prompt

bert-base-uncased with prompt

./run.sh bert-prompt

bert-base-uncased with optiprompt

./run.sh bert-optiprompt

fine-tuned BERT with Prompt

unsupervised

SEED=0
./run.sh unsup-roberta $SEED

SEED=0
./run.sh unsup-bert $SEED

supervised

./run.sh sup-roberta

./run.sh sup-bert

Our Code is based on SimCSE

Comments

How to represent sentence in Template Denoising step?

Hi there,

I am recently rebuilding your work in fairseq. Your model is really impressive.

I am able to rebuild your results in Table 8, with different templates, I can get 78.41 scores on average (RoBERTa_base as backbone model).

However, when I try to reproduce your default method, which is different templates with denoising, the highest score I can get is 78.54 (RoBERTa_base as backbone model).

I tried using either 1) MASK token's representation to represent the template, or 2) cls token's representation to represent the template at the Template Denoising step.

Can you clarify which method you use as the template biases?

Many thanks!

opened by CSerxy 16
how do i learn scratch

About ./run.sh bert-optiprompt or ./run.sh sup-roberta How do i learn from scratch, readme is bash evaluation only. After downloading the model via sh [unsup-bert|unsup-roberta|sup-berta], I can running ./run.sh sup-roberta. I'd appreciate it if you could give me a hint. I'm fascinated by your thesis.

opened by serotoninpm 6
question on fine-tune

Hi, I met some questions that when I want to fine-tuned with the command ./run.sh unsup-bert 0 ,there is an error that OSError: Can't load config for 'result/unsup-bert_s0'. I want to know that if I need to download the model under the Results on STS Tasks title? and I am confused that Shouldn't these models be obtained after this command is executed? What should I put in the result directory？ thanks!

opened by wacharlin 4
Results

Hello, I download the "unsup-bert" model, and evaluate it. I only get 77.80 on avg. Can you add an evaluate procedure in readme or a bash file ?

opened by cwszz 3
More details about the results of unfine-tuned version.

Hello, first congratualations on your great work.

I want to ask the details about the performance of the unfine-tuned prompt bert using manual prompt, cause I use "The sentence of "[X]" means [MASK]." only get an result of avg 59 on sts.

Did you use template denoising also on unfine-tuned version?

opened by YWMditto 2
unsupervised contrastive learning

I understand that the concept of contrastive learning represents closer to each other and farther away. When learning unsuperviced contrastive learning, do you use the template to bring the closest closer, but does it include the concept of further away?

opened by serotoninpm 2
A little question about the training

Hi, thank u for your excellent work. Please forgive me for asking a simple question. These days, I am training a unsup-bert model. I set random seed=0 and get a standard result with avg score 78.54, which is same as the paper reported (78.54±0.15), both the results are lower than your provided model (78.87). So I wonder if you have updated the training strategy or provide the best model for us.

opened by ilingen 2
A question about the paper

Hello, I have just read the excellent paper. And I have a question. I want to know if the "frequency" in Figure1 is calculated from the datasets where 「5.1Dataset」 mentioned?

opened by Aureole-1210 1
Question of supervised train loss

Hello， In the "model.py" , I finded the loss function of supervised training is CrossEntropyLoss. With hard negatives, why not select TripletMarginLoss? Looking forward to your reply, thanks!

opened by Chris-WangQ 1
Why adding periods during evaluation?

I noticed that you add periods during evaluation (in eval.py). I am not so sure if this operation is valid because I didn't find the same operation in SimCSE's evaluation code. I notice results under certain settings drop if I remove this operation.

opened by qfkkwgd 1
Standard deviation

Hi,

Thanks for your brilliant work!

I have a question about the results. I noticed that only unsupervised models are reported with standard deviation. Why supervised models are not reported with the standard deviation?

Also, would you mind to share about the random seeds that you used?

Thanks a lot!

opened by qfkkwgd 1
Relationships between anisotropy and removing biased tokens

First of all, thanks for your inspiring paper.

When I read the introduction part of your paper (I haven't read all yet), I wondered one thing.

In the paper, removing biased tokens shows improvements. However, it doesn't really have any relationships with anisotropy?

I guess tokens which appear frequently in one sentence, have high probabilities that such tokens appear in other sentences. So, I think removing such tokens makes sentence embeddings distinguished each other. And finally, I expect it makes sentence embeddings have different directions and be isotropic.

My opinion can be wrong ! I am just curious of your opinion.

Thank you.

opened by kimwongyuda 2
Any advice for the Chinese version?

Hi, I'm trying to adopt Prompt-Bert into the public Chinese dataset.

using prompt like following: cls下面这句话"sent_0"对应的语义是mask。sep+

The spearman score is much worse than the rare SimCSE.

Is there any advice for the Chinese prompts?

opened by shuopwang 1
Questions about the paper.

Hello,

I was very fortunate to read your paper, and the experimental results are exciting. The paper mentions two observations: Observation 1: Original BERT layers fail to improve the performance. Observation 2: Embedding biases harm the sentence embeddings performance. Based on your experimental results, these two phenomena do exist and can be improved. But I don't see a connection between these two observations and prompts.

How does prompt solve the bias problem?

Looking forward to your reply, thanks!

opened by chaochen99 6

Owner

GitHub

State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

88 Dec 30, 2022

GEP (GDB Enhanced Prompt) - a GDB plug-in for GDB command prompt with fzf history search, fish-like autosuggestions, auto-completion with floating window, partial string matching in history, and more!

GEP (GDB Enhanced Prompt) GEP (GDB Enhanced Prompt) is a GDB plug-in which make your GDB command prompt more convenient and flexibility. Why I need th

23 Dec 21, 2022

Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

431 Dec 27, 2022

I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

139 Dec 27, 2022

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

37 Oct 30, 2022

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

14 Aug 24, 2022

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

124 Dec 27, 2022

VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

44 Nov 1, 2022

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

22 Dec 8, 2022

Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".

Multilingual Unsupervised Sentence Simplification Code and pretrained models to reproduce experiments in "MUSS: Multilingual Unsupervised Sentence Sim

81 Dec 29, 2022

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

478 Dec 25, 2022

[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

REval Table of Contents Introduction Overview Requirements Installation Probing Usage Citation License ?? Introduction REval is a simple framework for

13 Jan 6, 2023

Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".

RE_improved_baseline Code for technical report "An Improved Baseline for Sentence-level Relation Extraction". Requirements torch >= 1.8.1 transformers

74 Nov 29, 2022

This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

STaCK: Sentence Ordering with Temporal Commonsense Knowledge This repository contains the pytorch implementation of the paper STaCK: Sentence Ordering

Deep Cognition and Language Research (DeCLaRe) Lab

23 Dec 16, 2022

CNNs for Sentence Classification in PyTorch

Introduction This is the implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in PyTorch. Kim's implementation of t

956 Dec 19, 2022

A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

488 Nov 28, 2022

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

114 Jan 6, 2023

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

96 Nov 1, 2022

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations Code repo for paper Trans-Encoder: Unsupervised sentence-pa

101 Dec 29, 2022