Measuring and Improving Consistency in Pretrained Language Models

Overview

ParaRel 🤘

This repository contains the code and data for the paper:

Measuring and Improving Consistency in Pretrained Language Models

as well as the resource: ParaRel 🤘

Since this work required running a lot of experiments, it is structured by scripts that automatically runs many sub-experiments, on parallel servers, and tracking using an experiment tracking website: wandb, which are then aggregated using a jupyter notebook. To run all the experiments I used task spooler, a queue-based software that allows to run multiple commands in parallel (and store the rest in a queue)

It is also possible to run individual experiments, for which one can look for in the corresponding script.

For any question, query regarding the code, or paper, please reach out at [email protected]

ParaRel 🤘

If you're only interested in the data, you can find it under data. Each file contains the paraphrases patterns for a specific relation, in a json file.

Create environment

conda create -n pararel python=3.7 anaconda
conda activate pararel

pip install -r requirements.txt

add project to path:

export PYTHONPATH=${PYTHONPATH}:/path-to-project

Setup

In case you just want to start with the filtered data we used (filtering objects that consist more than a single word piece in the LMs we considered), you can find them here. Otherwise:

First, begin by downloading the trex dataset from here, alternatively, check out the LAMA github repo. Download it to the following folder so that the following folder would exist: data/trex/data/TREx along with the relevant files

Next, in case you want to rerun automatically some/all of the experiments, you will need to update the paths in the runs scripts with your folder path and virtual environment.

Run Scripts

Filter data from trex, to include only triplets that appear in the inspected LMs in this work: bert-base-cased, roberta-base, albert-base-v2 (as well as the larger versions, that contain the same vocabulary)

python runs/pararel/filter.py

A single run looks like the following:

python lm_meaning/lm_entail/filter_data.py \
       --in_data data/trex/data/TREx/P106.jsonl \
       --model_names bert-base-cased,bert-large-cased,bert-large-cased-whole-word-masking,roberta-base,roberta-large,albert-base-v2,albert-xxlarge-v2 \
       --out_file data/trex_lms_vocab/P106.jsonl

Evaluate consistency:

python runs/eval/run_lm_consistent.py

A single run looks like the following:

python pararel/consistency/encode_consistency_probe.py \
       --data_file data/trex_lms_vocab/P106.jsonl \
       --lm bert-base-cased \
       --graph data/pattern_data/graphs/P106.graph \
       --gpu 0 \
       --wandb \
       --use_targets

Encode the patterns along with the subjects, to save the representations:

python runs/pararel/encode_text.py

A single run looks like the following:

python lm_meaning/encode/encode_text.py \
       --patterns_file data/pattern_data/graphs_json/P106.jsonl \
       --data_file data/trex_lms_vocab/P106.jsonl \
       --lm bert-base-cased \
       --pred_file data/output/representations/P106_bert-base-cased.npy \
       --wandb

Improving Consistency with ParaRel

The code and README are available here

FAQ

Q: Why do you report 31 N-1 relations, whereas in the LAMA paper there are only 25?

A: Explanation

Citation:

If you find this work relevant to yours, please cite us:

@article{Elazar2021MeasuringAI,
  title={Measuring and Improving Consistency in Pretrained Language Models},
  author={Yanai Elazar and Nora Kassner and Shauli Ravfogel and Abhilasha Ravichander and Ed Hovy and Hinrich Schutze and Yoav Goldberg},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.01017}
}
You might also like...
Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models
Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network
Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset
YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.
Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Music Source Separation with Channel-wise Subband Phase Aware ResUnet (CWS-PResUNet) Introduction This repo contains the pretrained Music Source Separ

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

The PASS dataset: pretrained models and how to get the data -  PASS: Pictures without humAns for Self-Supervised Pretraining
The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper
(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Res2Net The official pytorch implemention of the paper "Res2Net: A New Multi-scale Backbone Architecture" Our paper is accepted by IEEE Transactions o

Comments
  • Could you please explain the data format of relation prompts?

    Could you please explain the data format of relation prompts?

    For example:

    {"pattern": "[X] originated from [Y].", "lemma": "originate", "extended_lemma": "originate-from", "tense": "past"}
    

    What's the meaning of each keyword?

    opened by c-box 2
  • Which three relations have you filtered and how to get the exactly result presented in table 3?

    Which three relations have you filtered and how to get the exactly result presented in table 3?

    Hello! Thanks for releasing the code and data. When I tried to reproduce the results, I found there are 39 relations here [https://github.com/yanaiela/pararel/tree/main/data/pattern_ data/graphs_json.] while in the article they should be 38. I wonder if I have miss anything to filter the data and a little confused about which one should be removed.

    Meanwhile, I tried to reproduce the results in table 2 and 3. So after filtering, I run run_lm_consistent.py with bert-base-cased as language model. However, the results are slightly different.

    Here are the comparison of my result and the original ones represented in your paper.

    metrics | results | original -- | -- | -- Consistency | 58.72402028656049+-23.441761447607665 | 58.5+-24.2 Accuracy | 43.5833928562685+-26.147602234172046 | 45.8+-26.1 Unk-Const | 48.17776229348506+-22.316262495803235 | 46.5+-21.7 known-Const | 62.54802072504408+-24.32609269135139 | 63.8+-24.5

    Do you have any idea about what I may miss in the procedure and what can I do to get the same result? Thanks for your kindly help.

    opened by Kaiwen-Tang 1
  • How to get the same results presented in Table 3?

    How to get the same results presented in Table 3?

    Thanks for releasing the code and data. This work is really amazing. I'm trying to get identical results with Table 3 in the paper (https://arxiv.org/pdf/2102.01017.pdf). I run run_lm_consistent.py with--baseline or Roberta-base models, then average the "lama_acc", "consistency", "lama_group_acc" metrics over relations, the results are:

    reproduce with roberta-base | reproduce | average over 38 relations | average over 31 N-1 relations || in paper | roberta-base | |:---------------|---------------:|-------------------:|-:|---------------------:|----------:| | lama_acc | 0.356008 | 0.387121 || accuracy | 39 | | lama_group_acc | 0.152222 | 0.164011 || accuracy&consistency | 16.4 | | consistency | 0.506783 | 0.520455 || consistency | 52.1 |

    reproduce with --baseline | reproduce | average over 38 relations | average over 31 N-1 relations || in paper | majority| |:---------------|---------------:|-------------------:|-:|---------------------:|----------:| | lama_acc | 0.24693 | 0.227869 || accuracy | 23.1 | | lama_group_acc | 0.24693 | 0.227869 || accuracy&consistency | 23.1 | | consistency | 1 | 1 || consistency | 100 |

    I place the reproduction results on the left side and the result from the paper on the right side.

    I got almost the same "consistency" with Table 3 by averaging over 31 relations. But the "accuracy" metrics are slightly different. How can I get the same probing results with the paper for the baseline models? Do I miss something in the procedure? Thanks for your help.

    opened by woqucc 1
Owner
Yanai Elazar
PhD student at Bar-Ilan University, Israel
Yanai Elazar
This code reproduces the results of the paper, "Measuring Data Leakage in Machine-Learning Models with Fisher Information"

Fisher Information Loss This repository contains code that can be used to reproduce the experimental results presented in the paper: Awni Hannun, Chua

Facebook Research 43 Dec 30, 2022
This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging To appear on KDD'21...[pdf] This project provides an unsupervised framework for mining and

Xiaotao Gu 146 Dec 22, 2022
Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

Rahul Nadkarni 41 Nov 30, 2022
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

Vision CAIR Research Group, KAUST 140 Dec 28, 2022
Facebook Research 605 Jan 2, 2023
The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.

Jie Huang 14 Oct 21, 2022
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

THUNLP 75 Nov 2, 2022
Measuring Coding Challenge Competence With APPS

Measuring Coding Challenge Competence With APPS This is the repository for Measuring Coding Challenge Competence With APPS by Dan Hendrycks*, Steven B

Dan Hendrycks 218 Dec 27, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 1, 2023
[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

Michihiro Yasunaga 264 Jan 1, 2023