NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

Overview

OptiPrompt

This is the PyTorch implementation of the paper Factual Probing Is [MASK]: Learning vs. Learning to Recall.

We propose OptiPrompt, a simple and effective approach for Factual Probing. OptiPrompt optimizes the prompts on the input embedding space directly. It outperforms previous prompting methods on the LAMA benchmark. Furthermore, in order to better interprete probing results, we propose control experiments based on the probing results on randomly initialized models. Please check our paper for details.

Quick links

Setup

Install dependecies

Our code is based on python 3.7. All experiments are run on a single GPU.

Please install all the dependency packages using the following command:

pip install -r requirements.txt

Download the data

We pack all datasets we used in our experiments here. Please download it and extract the files to ./data, or run the following commands to autoamtically download and extract it.

bash script/download_data.sh

The datasets are structured as below.

data
├── LAMA-TREx                         # The original LAMA-TREx test set (34,039 examples)
│   ├── P17.jsonl                     # Testing file for the relation `P17`
│   └── ...
├── LAMA-TREx_UHN                     # The LAMA-TREx_UHN test set (27,102 examples)
│   ├── P17.jsonl                     # Testing file for the relation `P17`
│   └── ...
├── LAMA-TREx-easy-hard               # The easy and hard partitions of the LAMA-TREx dataset (check the paper for details)
│   ├── Easy                          # The LAMA-easy partition (10,546 examples)
│   │   ├── P17.jsonl                 # Testing file for the relation `P17`
│   │   └── ...
│   └── Hard                          # The LAMA-hard partition (23,493 examples)
│       ├── P17.jsonl                 # Testing file for the relation `P17`
│       └── ...
├── autoprompt_data                   # Training data collected by AutoPrompt
│   ├── P17                           # Train/dev/test files for the relation `P17`
│   │   ├── train.jsonl               # Training examples
│   │   ├── dev.jsonl                 # Development examples
│   │   └── test.jsonl                # Test examples (the same as LAMA-TREx test set)
│   └── ...
└── cmp_lms_data                      # Training data collected by ourselves which can be used for BERT, RoBERTa, and ALBERT (we only use this dataset in Table 6 in the paper)
    ├── P17                           # Train/dev/test files for the relation `P17`
    │   ├── train.jsonl               # Training examples
    │   ├── dev.jsonl                 # Development examples
    │   ├── test.jsonl                # Test examples (a subset of the LAMA-TREx test set, filtered using the common vocab of three models)
    └── ...

Run OptiPrompt

Train/evaluate OptiPrompt

You can use code/run_optiprompt.py to train or evaluate the prompts on a specific relation. A command template is as follow:

rel=P101
dir=outputs/${rel}
mkdir -p ${dir}

python code/run_optiprompt.py \
    --relation_profile relation_metainfo/LAMA_relations.jsonl \
    --relation ${rel} \
    --common_vocab_filename common_vocabs/common_vocab_cased.txt \
    --model_name bert-base-cased \
    --do_train \
    --train_data data/autoprompt_data/${rel}/train.jsonl \
    --dev_data data/autoprompt_data/${rel}/dev.jsonl \
    --do_eval \
    --test_data data/LAMA-TREx/${rel}.jsonl \
    --output_dir ${dir} \
    --random_init none \
    --output_predictions \
    [--init_manual_template] [--num_vectors 5 | 10]

Arguments:

  • relation_profile: the meta information for each relation, containing the manual templates.
  • relation: the relation type (e.g., P101) considered in this experiment.
  • common_vocab_filename: the vocabulary used to filter out facts; it should be the intersection of different models' for fair comparison.
  • model_name: the pre-trained model used in this experiment, e.g., bert-base-cased, albert-xxlarge-v1.
  • do_train: whether to train the prompts on a training and development set.
  • do_eval: whether to test the trained prompts on a testing set.
  • {train|dev|test}_data: the file path of training/development/testing dataset.
  • random_init: how do we random initialize the model before training, there are three settings:
    • none: use the pre-trained model, no random initialization is used;
    • embedding: the Rand E control setting, where we random initialize the embedding layer of the model;
    • all: the Rand M control setting, where we random initialize all the parameters of the model.
  • init_manual_template: whether initialize the dense vectors in OptiPrompt using the manual prompts.
  • num_vectors: how many dense vectors are added in OptiPrompt (this argument is valid only when init_manual_template is not set).
  • output_predictions: whether to output top-k predictions for each testing fact (k is specified by --k).

Run experiments on all relations

We provide an example script (scripts/run_optiprompt.sh) to run OptiPrompt on all 41 relations on the LAMA benchmark. Run the following command to use it:

bash scripts/run_opti.sh

The default setting of this script is to run OptiPromot initialized with manual prompts on the pre-trained bert-base-cased model (no random initialization is used). The results will be stored in the outputs directory.

Please modify the shell variables (i.e., OUTPUTS_DIR, MODEL, RAND) in scripts/run_optiprompt.sh if you want to run experiments on other settings.

Run Fine-tuning

We release the code that we used in our experiments (check Section 4 in the paper).

Fine-tuning language models on factual probing

You can use code/run_finetune.py to fine-tune a language model on a specific relation. A command template is as follow:

rel=P101
dir=outputs/${rel}
mkdir -p ${dir}

python code/run_finetune.py \
    --relation_profile relation_metainfo/LAMA_relations.jsonl \
    --relation ${rel} \
    --common_vocab_filename common_vocabs/common_vocab_cased.txt \
    --model_name bert-base-cased \
    --do_train \
    --train_data data/autoprompt_data/${rel}/train.jsonl \
    --dev_data data/autoprompt_data/${rel}/dev.jsonl \
    --do_eval \
    --test_data data/LAMA-TREx/${rel}.jsonl \
    --output_dir ${dir} \
    --random_init none \
    --output_predictions

Arguments:

  • relation_profile: the meta information for each relation, containing the manual templates.
  • relation: the relation type (e.g., P101) considered in this experiment.
  • common_vocab_filename: the vocabulary used to filter out facts; it should be the intersection of different models' for fair comparison.
  • model_name: the pre-trained model used in this experiment, e.g., bert-base-cased, albert-xxlarge-v1.
  • do_train: whether to train the prompts on a training and development set.
  • do_eval: whether to test the trained prompts on a testing set.
  • {train|dev|test}_data: the file path of training/development/testing dataset.
  • random_init: how do we random initialize the model before training, there are three settings:
    • none: use the pre-trained model, no random initialization is used;
    • embedding: the Rand E control setting, where we random initialize the embedding layer of the model;
    • all: the Rand M control setting, where we random initialize all the parameters of the model.
  • output_predictions: whether to output top-k predictions for each testing fact (k is specified by --k).

Run experiments on all relations

We provide an example script (scripts/run_finetune.sh) to run fine-tuning on all 41 relations on the LAMA benchmark. Run the following command to use it:

bash scripts/run_finetune.sh

Please modify the shell variables (i.e., OUTPUTS_DIR, MODEL, RAND) in scripts/run_finetune.sh if you want to run experiments on other settings.

Evaluate LAMA/LPAQA/AutoPrompt prompts

We provide a script to evaluate prompts released in previous works (based on code/run_finetune.py with only --do_eval). Please use the foolowing command:

bash scripts/run_eval_prompts {lama | lpaqa | autoprompt}

Questions?

If you have any questions related to the code or the paper, feel free to email Zexuan Zhong ([email protected]) or Dan Friedman ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

@inproceedings{zhong2021factual,
   title={Factual Probing Is [MASK]: Learning vs. Learning to Recall},
   author={Zhong, Zexuan and Friedman, Dan and Chen, Danqi},
   booktitle={North American Association for Computational Linguistics (NAACL)},
   year={2021}
}
You might also like...
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Jetson Nano-based smart camera system that measures crowd face mask usage in real-time.
Jetson Nano-based smart camera system that measures crowd face mask usage in real-time.

MaskCam MaskCam is a prototype reference design for a Jetson Nano-based smart camera system that measures crowd face mask usage in real-time, with all

[CVPR 2021] MiVOS - Scribble to Mask module
[CVPR 2021] MiVOS - Scribble to Mask module

MiVOS (CVPR 2021) - Scribble To Mask Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] A simplistic network that turns scri

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.
[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

 QueryInst: Parallelly Supervised Mask Query for Instance Segmentation
QueryInst: Parallelly Supervised Mask Query for Instance Segmentation

QueryInst is a simple and effective query based instance segmentation method driven by parallel supervision on dynamic mask heads, which outperforms previous arts in terms of both accuracy and speed.

Official Implementation and Dataset of
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

Label Mask for Multi-label Classification

LM-MLC 一种基于完型填空的多标签分类算法 1 前言 本文主要介绍本人在全球人工智能技术创新大赛【赛道一】设计的一种基于完型填空(模板)的多标签分类算法:LM-MLC,该算法拟合能力很强能感知标签关联性,在多个数据集上测试表明该算法与主流算法无显著性差异,在该比赛数据集上的dev效果很好,但是由

A PyTorch implementation of the architecture of Mask RCNN
A PyTorch implementation of the architecture of Mask RCNN

EDIT (AS OF 4th NOVEMBER 2019): This implementation has multiple errors and as of the date 4th, November 2019 is insufficient to be utilized as a reso

Comments
  • error:

    error: "TypeError: forward() got an unexpected keyword argument 'masked_lm_labels'"

    i try to run the model by bash scripts/run_optiprompt.sh, but the model was not trainning and output this: ` 03/16/2022 15:53:28 - INFO - main - Namespace(check_step=-1, common_vocab_filename='common_vocabs/common_vocab_cased.txt', dev_data='data/autoprompt_data/P1001/dev.jsonl', do_eval=True, do_shuffle=False, do_train=True, eval_batch_size=8, eval_per_epoch=3, init_manual_template=True, k=5, learning_rate=0.003, model_dir=None, model_name='bert-base-cased', num_epoch=10, num_vectors=5, output_dir='optiprompt-outputs/P1001', output_predictions=True, random_init='none', relation='P1001', relation_profile='relation_metainfo/LAMA_relations.jsonl', seed=6, test_data='data/LAMA-TREx/P1001.jsonl', train_batch_size=16, train_data='data/autoprompt_data/P1001/train.jsonl', warmup_proportion=0.1) 03/16/2022 15:53:28 - INFO - main - # GPUs: 8 03/16/2022 15:53:28 - INFO - main - Model: bert-base-cased Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']

    • This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
    • This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 03/16/2022 15:53:50 - INFO - models - Vocab size: 28996 Using eos_token, but it is not set yet. 03/16/2022 15:53:50 - INFO - main - Original vocab size: 28996 03/16/2022 15:53:50 - INFO - main - # vocab after adding new tokens: 29006 03/16/2022 15:53:50 - INFO - main - Common vocab: common_vocabs/common_vocab_cased.txt, size: 21018 03/16/2022 15:53:51 - INFO - main - Tie embeddings of tokens: ([V1], is) 03/16/2022 15:53:51 - INFO - main - Tie embeddings of tokens: ([V2], a) 03/16/2022 15:53:51 - INFO - main - Tie embeddings of tokens: ([V3], legal) 03/16/2022 15:53:51 - INFO - main - Tie embeddings of tokens: ([V4], term) 03/16/2022 15:53:51 - INFO - main - Tie embeddings of tokens: ([V5], in) 03/16/2022 15:53:51 - INFO - main - Tie embeddings of tokens: ([V6], .) 03/16/2022 15:53:51 - INFO - main - Template: [X] [V1] [V2] [V3] [V4] [V5] [Y] [V6] 03/16/2022 15:53:51 - INFO - main - Train batches: 7 03/16/2022 15:53:51 - INFO - main - Valid batches: 4 0%| | 0/4 [00:00<?, ?it/s]03/16/2022 15:53:51 - INFO - models - Moving model to CUDA 0%| | 0/4 [00:22<?, ?it/s] Traceback (most recent call last): File "code/run_optiprompt.py", line 174, in best_result, result_rel = evaluate(model, valid_samples_batches, valid_sentences_batches, filter_indices, index_list) File "/home/XuYifan/optiprompt/OptiPrompt/code/utils.py", line 110, in evaluate log_probs, cor_b, tot_b, pred_b, topk_preds, loss, common_vocab_loss = model.run_batch(sentences_b, samples_b, training=False, filter_indices=filter_indices, index_list=index_list, vocab_to_common_vocab=vocab_to_common_vocab) File "/home/XuYifan/optiprompt/OptiPrompt/code/models.py", line 355, in run_batch loss, logits = self.mlm_model( File "/home/XuYifan/anaconda3/envs/XDAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/XuYifan/anaconda3/envs/XDAI/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/XuYifan/anaconda3/envs/XDAI/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/XuYifan/anaconda3/envs/XDAI/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/XuYifan/anaconda3/envs/XDAI/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/XuYifan/anaconda3/envs/XDAI/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/home/XuYifan/anaconda3/envs/XDAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) TypeError: forward() got an unexpected keyword argument 'masked_lm_labels' ` how can I deal with this problem? my info: Linux 3.10.0-1160.25.1.el7.x86_64 python 3.8 torch 1.7.0
      torchaudio 0.8.0 transformers 3.0.0
    opened by xuyifan-0731 3
  • Question about parameter fixing

    Question about parameter fixing

    I notice that you add the whole word embedding to the optimizer instead of just prompts?

    # Add word embeddings to the optimizer
    optimizer = AdamW([{'params': model.base_model.embeddings.word_embeddings.parameters()}], lr=args.learning_rate, correct_bias=False)
    

    Would the embedding of other words also be updated?

    opened by c-box 2
  • Error when trying the code/run_optiprompt.py

    Error when trying the code/run_optiprompt.py

    I tried your command template as follow: ################################# rel=P101 dir=outputs/${rel} mkdir -p ${dir}

    python code/run_optiprompt.py
    --relation_profile relation_metainfo/LAMA_relations.jsonl
    --relation ${rel}
    --common_vocab_filename common_vocabs/common_vocab_cased.txt
    --model_name bert-base-cased
    --do_train
    --train_data data/autoprompt_data/${rel}/train.jsonl
    --dev_data data/autoprompt_data/${rel}/dev.jsonl
    --do_eval
    --test_data data/LAMA-TREx/${rel}.jsonl
    --output_dir ${dir}
    --random_init none
    --output_predictions
    ################################# However, an error was given as follow:

    ######################### Traceback (most recent call last): File "code/run_optiprompt.py", line 174, in best_result, result_rel = evaluate(model, valid_samples_batches, valid_sentences_batches, filter_indices, index_list) File "/home/prompt/OptiPrompt/code/utils.py", line 133, in evaluate eval_loss += loss.item() * tot_b ValueError: only one element tensors can be converted to Python scalars ##########################

    How can I sovle it? Thanks~

    opened by Luciferder 1
Owner
Princeton Natural Language Processing
Princeton Natural Language Processing
NAACL2021 - COIL Contextualized Lexical Retriever

COIL Repo for our NAACL paper, COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. The code covers learning

Luyu Gao 108 Dec 31, 2022
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation This project hosts the code for implementing the DCT-MASK algorithms

Alibaba Cloud 57 Nov 27, 2022
Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network.

face-mask-detection Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network. It contains 3 scr

amirsalar 13 Jan 18, 2022
The Face Mask recognition system uses AI technology to detect the person with or without a mask.

Face Mask Detection Face Mask Detection system built with OpenCV, Keras/TensorFlow using Deep Learning and Computer Vision concepts in order to detect

Rohan Kasabe 4 Apr 5, 2022
[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

REval Table of Contents Introduction Overview Requirements Installation Probing Usage Citation License ?? Introduction REval is a simple framework for

null 13 Jan 6, 2023
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Evaluating the Factual Consistency of Abstractive Text Summarization Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher Int

Salesforce 165 Dec 21, 2022
Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Piggyback: https://arxiv.org/abs/1801.06519 Pretrained masks and backbones are available here: https://uofi.box.com/s/c5kixsvtrghu9yj51yb1oe853ltdfz4q

Arun Mallya 165 Nov 22, 2022
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

Facebook Research 25.5k Jan 7, 2023
Reviving Iterative Training with Mask Guidance for Interactive Segmentation

This repository provides the source code for training and testing state-of-the-art click-based interactive segmentation models with the official PyTorch implementation

Visual Understanding Lab @ Samsung AI Center Moscow 406 Jan 1, 2023
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals This repo contains the Pytorch implementation of our paper: Unsupervised Seman

Wouter Van Gansbeke 335 Dec 28, 2022