Measuring and Improving Consistency in Pretrained Language Models

Yanai Elazar

Last update: Dec 2, 2022

Related tags

Deep Learning pararel

Overview

ParaRel 🤘

This repository contains the code and data for the paper:

Measuring and Improving Consistency in Pretrained Language Models

as well as the resource: ParaRel 🤘

Since this work required running a lot of experiments, it is structured by scripts that automatically runs many sub-experiments, on parallel servers, and tracking using an experiment tracking website: wandb, which are then aggregated using a jupyter notebook. To run all the experiments I used task spooler, a queue-based software that allows to run multiple commands in parallel (and store the rest in a queue)

It is also possible to run individual experiments, for which one can look for in the corresponding script.

For any question, query regarding the code, or paper, please reach out at [email protected]

ParaRel 🤘

If you're only interested in the data, you can find it under data. Each file contains the paraphrases patterns for a specific relation, in a json file.

Create environment

conda create -n pararel python=3.7 anaconda
conda activate pararel

pip install -r requirements.txt

add project to path:

export PYTHONPATH=${PYTHONPATH}:/path-to-project

Setup

In case you just want to start with the filtered data we used (filtering objects that consist more than a single word piece in the LMs we considered), you can find them here. Otherwise:

First, begin by downloading the trex dataset from here, alternatively, check out the LAMA github repo. Download it to the following folder so that the following folder would exist: data/trex/data/TREx along with the relevant files

Next, in case you want to rerun automatically some/all of the experiments, you will need to update the paths in the runs scripts with your folder path and virtual environment.

Run Scripts

Filter data from trex, to include only triplets that appear in the inspected LMs in this work: bert-base-cased, roberta-base, albert-base-v2 (as well as the larger versions, that contain the same vocabulary)

python runs/pararel/filter.py

A single run looks like the following:

python lm_meaning/lm_entail/filter_data.py \
       --in_data data/trex/data/TREx/P106.jsonl \
       --model_names bert-base-cased,bert-large-cased,bert-large-cased-whole-word-masking,roberta-base,roberta-large,albert-base-v2,albert-xxlarge-v2 \
       --out_file data/trex_lms_vocab/P106.jsonl

Evaluate consistency:

python runs/eval/run_lm_consistent.py

A single run looks like the following:

python pararel/consistency/encode_consistency_probe.py \
       --data_file data/trex_lms_vocab/P106.jsonl \
       --lm bert-base-cased \
       --graph data/pattern_data/graphs/P106.graph \
       --gpu 0 \
       --wandb \
       --use_targets

Encode the patterns along with the subjects, to save the representations:

python runs/pararel/encode_text.py

A single run looks like the following:

python lm_meaning/encode/encode_text.py \
       --patterns_file data/pattern_data/graphs_json/P106.jsonl \
       --data_file data/trex_lms_vocab/P106.jsonl \
       --lm bert-base-cased \
       --pred_file data/output/representations/P106_bert-base-cased.npy \
       --wandb

Improving Consistency with ParaRel

The code and README are available here

FAQ

Q: Why do you report 31 N-1 relations, whereas in the LAMA paper there are only 25?

A: Explanation

Citation:

If you find this work relevant to yours, please cite us:

@article{Elazar2021MeasuringAI,
  title={Measuring and Improving Consistency in Pretrained Language Models},
  author={Yanai Elazar and Nora Kassner and Shauli Ravfogel and Abhilasha Ravichander and Ed Hovy and Hinrich Schutze and Yoav Goldberg},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.01017}
}

You might also like...

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

3.3k Jan 4, 2023

Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

24 Dec 21, 2022

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

408 Jan 1, 2023

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

73 Dec 16, 2022

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Music Source Separation with Channel-wise Subband Phase Aware ResUnet (CWS-PResUNet) Introduction This repo contains the pretrained Music Source Separ

100 Dec 25, 2022

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

2.4k Dec 28, 2022

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

249 Dec 22, 2022

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

53 Nov 9, 2022

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Res2Net The official pytorch implemention of the paper "Res2Net: A New Multi-scale Backbone Architecture" Our paper is accepted by IEEE Transactions o

928 Dec 29, 2022

Comments

Could you please explain the data format of relation prompts?
For example:

{"pattern": "[X] originated from [Y].", "lemma": "originate", "extended_lemma": "originate-from", "tense": "past"}

What's the meaning of each keyword？
opened by c-box 2
Which three relations have you filtered and how to get the exactly result presented in table 3?

Hello! Thanks for releasing the code and data. When I tried to reproduce the results, I found there are 39 relations here [https://github.com/yanaiela/pararel/tree/main/data/pattern_ data/graphs_json.] while in the article they should be 38. I wonder if I have miss anything to filter the data and a little confused about which one should be removed.

Meanwhile, I tried to reproduce the results in table 2 and 3. So after filtering, I run run_lm_consistent.py with bert-base-cased as language model. However, the results are slightly different.

Here are the comparison of my result and the original ones represented in your paper.

metrics | results | original -- | -- | -- Consistency | 58.72402028656049+-23.441761447607665 | 58.5+-24.2 Accuracy | 43.5833928562685+-26.147602234172046 | 45.8+-26.1 Unk-Const | 48.17776229348506+-22.316262495803235 | 46.5+-21.7 known-Const | 62.54802072504408+-24.32609269135139 | 63.8+-24.5

Do you have any idea about what I may miss in the procedure and what can I do to get the same result? Thanks for your kindly help.

opened by Kaiwen-Tang 1
How to get the same results presented in Table 3?

Thanks for releasing the code and data. This work is really amazing. I'm trying to get identical results with Table 3 in the paper (https://arxiv.org/pdf/2102.01017.pdf). I run run_lm_consistent.py with--baseline or Roberta-base models, then average the "lama_acc", "consistency", "lama_group_acc" metrics over relations, the results are:

reproduce with roberta-base | reproduce | average over 38 relations | average over 31 N-1 relations || in paper | roberta-base | |:---------------|---------------:|-------------------:|-:|---------------------:|----------:| | lama_acc | 0.356008 | 0.387121 || accuracy | 39 | | lama_group_acc | 0.152222 | 0.164011 || accuracy&consistency | 16.4 | | consistency | 0.506783 | 0.520455 || consistency | 52.1 |

reproduce with --baseline | reproduce | average over 38 relations | average over 31 N-1 relations || in paper | majority| |:---------------|---------------:|-------------------:|-:|---------------------:|----------:| | lama_acc | 0.24693 | 0.227869 || accuracy | 23.1 | | lama_group_acc | 0.24693 | 0.227869 || accuracy&consistency | 23.1 | | consistency | 1 | 1 || consistency | 100 |

I place the reproduction results on the left side and the result from the paper on the right side.

I got almost the same "consistency" with Table 3 by averaging over 31 relations. But the "accuracy" metrics are slightly different. How can I get the same probing results with the paper for the baseline models? Do I miss something in the procedure? Thanks for your help.

opened by woqucc 1

Owner

Yanai Elazar

PhD student at Bar-Ilan University, Israel

GitHub

This code reproduces the results of the paper, "Measuring Data Leakage in Machine-Learning Models with Fisher Information"

Fisher Information Loss This repository contains code that can be used to reproduce the experimental results presented in the paper: Awni Hannun, Chua

43 Dec 30, 2022

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging To appear on KDD'21...[pdf] This project provides an unsupervised framework for mining and

146 Dec 22, 2022

Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

41 Nov 30, 2022

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

140 Dec 28, 2022

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Official code Cross-Covariance Image Transformer (XCiT)

605 Jan 2, 2023

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.

14 Oct 21, 2022

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

75 Nov 2, 2022

Measuring and Improving Consistency in Pretrained Language Models

Related tags

Overview

ParaRel 🤘

ParaRel 🤘

Create environment

Setup

Run Scripts

Improving Consistency with ParaRel

FAQ

Citation:

You might also like...

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Comments

Could you please explain the data format of relation prompts?

Which three relations have you filtered and how to get the exactly result presented in table 3?

How to get the same results presented in Table 3?

Owner

Yanai Elazar

This code reproduces the results of the paper, "Measuring Data Leakage in Machine-Learning Models with Fisher Information"

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

Using pretrained language models for biomedical knowledge graph completion.

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

Measuring Coding Challenge Competence With APPS

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links