Contrastive Fact Verification

Last update: Dec 19, 2022

Related tags

Deep Learning VitaminC

Overview

VitaminC

This repository contains the dataset and models for the NAACL 2021 paper: Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence. The VitaminC dataset contains more than 450,000 claim-evidence pairs from over 100,000 revisions to popular Wikipedia pages, and additional "synthetic" revisions.

We're still updating this repo. More to come soon. Please reach out to us if you have any questions.

Below are instructions for the four main tasks described in the paper:

Revision Flagging
Fact Verification
Word-level Rationales
Factually Consistent Generation

Install

If you're only interested in the dataset (in jsonlines format), please find the per-task links below.

To install this pacakage with the code to process the dataset and run transformer models and baselines, run:

python setup.py install

Note: python>=3.7 is needed for all the dependencies to work.

Revision Flagging

VitaminC revision flagging data (the script below will automatically download it): link

Example of evaluating ALBERT-base model on the test dataset:

sh scripts/run_flagging.sh

The BOW and edit distance baselines from the paper are in scripts/factual_flagging_baselines.py.

Fact Verification

VitaminC fact verification data (the script below will automatically download it): link

Example of evaluating ALBERT-base model fine-tuned with VitaminC and FEVER datasets on the "real" and "synthetic" test sets of VitaminC:

sh scripts/run_fact_verification.sh

To evaluate the same model on another jsonlines file (containing claim, evidence, and label fields). Use:

sh scripts/run_fact_verification.sh path_to_test_file

Other available pretrained models (including the ALBERT-xlarge model that performed the best):

tals/albert-base-vitaminc
tals/albert-base-vitaminc-mnli
tals/albert-base-vitaminc-fever
tals/albert-xlarge-vitaminc
tals/albert-xlarge-vitaminc-mnli
tals/albert-xlarge-vitaminc-fever

Word-level Rationales

Will be added soon

Factually Consistent Generation

Will be added soon

Citation

If you find our code and/or data useful, please cite our paper:

@InProceedings{Schuster2019,
    title = "Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence",
    author="Tal Schuster and Adam Fisch and Regina Barzilay",
    booktitle = "NAACL 2021",
    year = "2021",
    url = "https://arxiv.org/abs/2103.08541",
}

Comments

Little clarification
So I am a little confused,

So it seems you're using everything from hugging face, and I am not familiar with all the arguments,

Are you training the model first and then evaluating it? https://github.com/TalSchuster/VitaminC#fact-verification , here It says to evaluate?

On running it seems the model spends 20-30min on GPU, so is it training the model and then evaluating?

I am confused how is training and evaluation being done.
opened by nile649 6
Sentence boundaries
Hi,

Thank you for the great work!

I wanted to check whether it is possible to get a version of the dataset splits that has cleaner sentence boundaries. Right now there are records with different spacing before/after a punctuation mark, which makes it difficult to split into sentences with simple sentence tokenizers from spacy/nltk, e.g. no spacing after a dot at the end of a sentence or spacing after a final dot in an abbreviation:

The lease of another A330-200 was announced in December 2016.In January 2013 , Hawaiian signed a Memorandum of Understanding with Airbus for an order of 16 A321neo aircraft plus up to 9 options .

The movie D.A.R.Y.L . received negative reviews .

With the mysterious U.S . Senator Eddie Morra ( Bradley Cooper ) providing him with a second drug to counteract NZT 's d...

Vol . 3 : The Subliminal Verses is by Slipknot and Stone Sour .

Any help would be appreciated, including ideas of tokenizers/ways to quickly deal with this.

Thank you!
opened by apepa 2
rational.py

Dear Authors,

Thanks for sharing your code. We have read through your codebase, it's very helpful. We are wondering if you can share the scripts/rationale.py with us as well? Much appreciated!

opened by ankechiang 1
Code about factually consistent generation

Dear author, I recently read this article that I feel very good, but my work direction is based on factual generation, so I want to follow the fourth task you proposed. But I opened the warehouse and waited for a while I still didn’t find the code about factually consistent generation. I would appreciate it if you could provide the code.

opened by lonelywolf1999 0
docs: demo, experiments and live inference API on Tiyaro

Hello Dr. Schuster (@TalSchuster)!

Thank you for your work on TalSchuster/VitaminC. This GitHub project is interesting, and we think that it would be a great addition to make this work instantly discoverable & available as an API for all your users, to quickly try and use it in their applications.

On Tiyaro, every model in TalSchuster/VitaminC will get its own: Dedicated model card (see https://console.tiyaro.ai/explore/tals-albert-xlarge-vitaminc-mnli Model demo (see https://console.tiyaro.ai/explore/tals-albert-xlarge-vitaminc-mnli/demo) Unique Inference API (https://api.tiyaro.ai/explore/huggingface/1//tals/albert-xlarge-vitaminc-mnli) Sample code snippets and swagger spec for the API

Users will also be able to compare your model with other models of similar types on various parameters using Tiyaro Experiments (https://blog.tiyaro.ai/evaluate-openmmlabs-mmocr-models-using-tiyaro-experiments)

—- I am from Tiyaro.ai (https://tiyaro.ai/). We are working on enabling developers to instantly evaluate, use and customize the world’s best AI. We are constantly working on adding new features to Tiyaro EasyTrain, EasyServe & Experiments, to make the best use of your ML model, and making AI more accessible for anyone.

Sincerely, I-Jong Lin

Founding Engineer at Tiyaro https://www.linkedin.com/in/i-jong-lin-721842/

P.S. If you have any questions about the infrastructure or other models that you may wish to have a similar treatment. Feel free to contact me at [email protected] P.P.S. If we do have a conversation, I will be looking forward to using my favorite word "balagan."

opened by ijonglin 0
Factually Consistent Generation

Dear Authors,

Thank you for sharing your codes. It's very help for us to dig into the VitaminC dataset. As we explore the four tasks mentioned in your paper, we notice that the codes related to "Factually Consistent Generation" task is currently missing in your codebase. We are wondering if you can kindly share your code with us.

Thanks for your help!

opened by ankechiang 0

Owner

GitHub

Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

Explainable Fact Checking: A Survey This repository and the accompanying webpage contain resources for the paper "Explainable Fact Checking: A Survey"

42 Nov 17, 2022

✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

How Robust are Fact Checking Systems on Colloquial Claims? Official PyTorch implementation of our NAACL paper: Byeongchang Kim*, Hyunwoo Kim*, Seokhee

19 Mar 15, 2022

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

42 Nov 24, 2022

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

"# SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING" i

28 Dec 12, 2022

The VeriNet toolkit for verification of neural networks

VeriNet The VeriNet toolkit is a state-of-the-art sound and complete symbolic interval propagation based toolkit for verification of neural networks.

9 Dec 21, 2022

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

24 May 20, 2022

VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat

338 Dec 27, 2022

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

TDY-CNN for Text-Independent Speaker Verification Official implementation of Temporal Dynamic Convolutional Neural Network for Text-Independent Speake

16 Oct 17, 2022

Pocsploit is a lightweight, flexible and novel open source poc verification framework

208 Dec 24, 2022

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

76 Nov 23, 2022

Contrastive Fact Verification

Related tags

Overview

VitaminC

Install

Revision Flagging

Fact Verification

Word-level Rationales

Factually Consistent Generation

Citation

Comments

Little clarification

Sentence boundaries

rational.py

Code about factually consistent generation

docs: demo, experiments and live inference API on Tiyaro

Factually Consistent Generation

Owner

Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

The VeriNet toolkit for verification of neural networks

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

Pocsploit is a lightweight, flexible and novel open source poc verification framework

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Learning Inverts the Data Generating Process

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.