Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings

Overview

Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings

Results on STS Tasks

Model STS12 STS13 STS14 STS15 STS16 STSb SICK-R Avg.
unsup-prompt-bert-base Download 71.98 84.66 77.13 84.52 81.10 82.03 70.64 78.87
unsup-prompt-roberta-base Download 73.98 84.73 77.88 84.93 81.89 82.74 69.21 79.34
sup-prompt-bert-base Download 75.48 85.59 80.57 85.99 81.08 84.56 80.52 81.97
sup-prompt-roberta-base Download 76.75 85.93 82.28 86.69 82.80 86.14 80.04 82.95

Download Data

cd SentEval/data/downstream/
bash download_dataset.sh
cd -
cd ./data
bash download_wiki.sh
bash download_nli.sh
cd -

Static token embedding with removing embedding biases

robert-base, bert-base-cased and robert-base-uncased

./run.sh roberta-base-embedding-only-remove-baises
./run.sh bert-base-cased-embedding-only-remove-baises
./run.sh bert-base-uncased-embedding-only-remove-baises

Non fine-tuned BERT with Prompt

bert-base-uncased with prompt

./run.sh bert-prompt

bert-base-uncased with optiprompt

./run.sh bert-optiprompt

fine-tuned BERT with Prompt

unsupervised

SEED=0
./run.sh unsup-roberta $SEED
SEED=0
./run.sh unsup-bert $SEED

supervised

./run.sh sup-roberta 
./run.sh sup-bert

Our Code is based on SimCSE

Comments
  • How to represent sentence in Template Denoising step?

    How to represent sentence in Template Denoising step?

    Hi there,

    I am recently rebuilding your work in fairseq. Your model is really impressive.

    I am able to rebuild your results in Table 8, with different templates, I can get 78.41 scores on average (RoBERTa_base as backbone model).

    However, when I try to reproduce your default method, which is different templates with denoising, the highest score I can get is 78.54 (RoBERTa_base as backbone model).

    I tried using either 1) MASK token's representation to represent the template, or 2) cls token's representation to represent the template at the Template Denoising step.

    Can you clarify which method you use as the template biases?

    Many thanks!

    opened by CSerxy 16
  • how do i learn scratch

    how do i learn scratch

    About ./run.sh bert-optiprompt or ./run.sh sup-roberta How do i learn from scratch, readme is bash evaluation only. After downloading the model via sh [unsup-bert|unsup-roberta|sup-berta], I can running ./run.sh sup-roberta. I'd appreciate it if you could give me a hint. I'm fascinated by your thesis.

    opened by serotoninpm 6
  • question on fine-tune

    question on fine-tune

    Hi, I met some questions that when I want to fine-tuned with the command ./run.sh unsup-bert 0 ,there is an error that OSError: Can't load config for 'result/unsup-bert_s0'. I want to know that if I need to download the model under the Results on STS Tasks title? and I am confused that Shouldn't these models be obtained after this command is executed? What should I put in the result directory? thanks!

    opened by wacharlin 4
  • Results

    Results

    Hello, I download the "unsup-bert" model, and evaluate it. I only get 77.80 on avg. Can you add an evaluate procedure in readme or a bash file ?

    opened by cwszz 3
  • More details about the results of unfine-tuned version.

    More details about the results of unfine-tuned version.

    Hello, first congratualations on your great work.

    I want to ask the details about the performance of the unfine-tuned prompt bert using manual prompt, cause I use "The sentence of "[X]" means [MASK]." only get an result of avg 59 on sts.

    Did you use template denoising also on unfine-tuned version?

    opened by YWMditto 2
  • unsupervised contrastive learning

    unsupervised contrastive learning

    I understand that the concept of contrastive learning represents closer to each other and farther away. When learning unsuperviced contrastive learning, do you use the template to bring the closest closer, but does it include the concept of further away?

    opened by serotoninpm 2
  • A little question about the training

    A little question about the training

    Hi, thank u for your excellent work. Please forgive me for asking a simple question. These days, I am training a unsup-bert model. I set random seed=0 and get a standard result with avg score 78.54, which is same as the paper reported (78.54±0.15), both the results are lower than your provided model (78.87). So I wonder if you have updated the training strategy or provide the best model for us.

    opened by ilingen 2
  • A question about the paper

    A question about the paper

    Hello, I have just read the excellent paper. And I have a question. I want to know if the "frequency" in Figure1 is calculated from the datasets where 「5.1Dataset」 mentioned?

    opened by Aureole-1210 1
  • Question of supervised train loss

    Question of supervised train loss

    Hello, In the "model.py" , I finded the loss function of supervised training is CrossEntropyLoss. With hard negatives, why not select TripletMarginLoss? Looking forward to your reply, thanks!

    opened by Chris-WangQ 1
  • Why adding periods during evaluation?

    Why adding periods during evaluation?

    I noticed that you add periods during evaluation (in eval.py). I am not so sure if this operation is valid because I didn't find the same operation in SimCSE's evaluation code. I notice results under certain settings drop if I remove this operation.

    opened by qfkkwgd 1
  • Standard deviation

    Standard deviation

    Hi,

    Thanks for your brilliant work!

    I have a question about the results. I noticed that only unsupervised models are reported with standard deviation. Why supervised models are not reported with the standard deviation?

    Also, would you mind to share about the random seeds that you used?

    Thanks a lot!

    opened by qfkkwgd 1
  • Relationships between anisotropy and removing biased tokens

    Relationships between anisotropy and removing biased tokens

    First of all, thanks for your inspiring paper.

    When I read the introduction part of your paper (I haven't read all yet), I wondered one thing.

    In the paper, removing biased tokens shows improvements. However, it doesn't really have any relationships with anisotropy?

    I guess tokens which appear frequently in one sentence, have high probabilities that such tokens appear in other sentences. So, I think removing such tokens makes sentence embeddings distinguished each other. And finally, I expect it makes sentence embeddings have different directions and be isotropic.

    My opinion can be wrong ! I am just curious of your opinion.

    Thank you.

    opened by kimwongyuda 2
  • Any advice for the Chinese version?

    Any advice for the Chinese version?

    Hi, I'm trying to adopt Prompt-Bert into the public Chinese dataset.

    using prompt like following: cls下面这句话"sent_0"对应的语义是masksep+

    The spearman score is much worse than the rare SimCSE.

    Is there any advice for the Chinese prompts?

    opened by shuopwang 1
  • Questions about the paper.

    Questions about the paper.

    Hello,

    I was very fortunate to read your paper, and the experimental results are exciting. The paper mentions two observations: Observation 1: Original BERT layers fail to improve the performance. Observation 2: Embedding biases harm the sentence embeddings performance. Based on your experimental results, these two phenomena do exist and can be improved. But I don't see a connection between these two observations and prompts.

    How does prompt solve the bias problem?

    Looking forward to your reply, thanks!

    opened by chaochen99 6
Owner
null
State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

Fredrik Carlsson 88 Dec 30, 2022
GEP (GDB Enhanced Prompt) - a GDB plug-in for GDB command prompt with fzf history search, fish-like autosuggestions, auto-completion with floating window, partial string matching in history, and more!

GEP (GDB Enhanced Prompt) GEP (GDB Enhanced Prompt) is a GDB plug-in which make your GDB command prompt more convenient and flexibility. Why I need th

Alan Li 23 Dec 21, 2022
Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

Google Research 431 Dec 27, 2022
I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

Sehoon Kim 139 Dec 27, 2022
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

THUNLP 37 Oct 30, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

null 124 Dec 27, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 1, 2022
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".

Multilingual Unsupervised Sentence Simplification Code and pretrained models to reproduce experiments in "MUSS: Multilingual Unsupervised Sentence Sim

Facebook Research 81 Dec 29, 2022
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

REval Table of Contents Introduction Overview Requirements Installation Probing Usage Citation License ?? Introduction REval is a simple framework for

null 13 Jan 6, 2023
Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".

RE_improved_baseline Code for technical report "An Improved Baseline for Sentence-level Relation Extraction". Requirements torch >= 1.8.1 transformers

Wenxuan Zhou 74 Nov 29, 2022
This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

STaCK: Sentence Ordering with Temporal Commonsense Knowledge This repository contains the pytorch implementation of the paper STaCK: Sentence Ordering

Deep Cognition and Language Research (DeCLaRe) Lab 23 Dec 16, 2022
CNNs for Sentence Classification in PyTorch

Introduction This is the implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in PyTorch. Kim's implementation of t

Shawn Ng 956 Dec 19, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

AI2 114 Jan 6, 2023
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 1, 2022
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations Code repo for paper Trans-Encoder: Unsupervised sentence-pa

Amazon 101 Dec 29, 2022