Contrastive Fact Verification

Overview

VitaminC

This repository contains the dataset and models for the NAACL 2021 paper: Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence. The VitaminC dataset contains more than 450,000 claim-evidence pairs from over 100,000 revisions to popular Wikipedia pages, and additional "synthetic" revisions.

We're still updating this repo. More to come soon. Please reach out to us if you have any questions.

Below are instructions for the four main tasks described in the paper:

Install

If you're only interested in the dataset (in jsonlines format), please find the per-task links below.

To install this pacakage with the code to process the dataset and run transformer models and baselines, run:

python setup.py install

Note: python>=3.7 is needed for all the dependencies to work.


Revision Flagging

VitaminC revision flagging data (the script below will automatically download it): link

Example of evaluating ALBERT-base model on the test dataset:

sh scripts/run_flagging.sh

The BOW and edit distance baselines from the paper are in scripts/factual_flagging_baselines.py.


Fact Verification

VitaminC fact verification data (the script below will automatically download it): link

Example of evaluating ALBERT-base model fine-tuned with VitaminC and FEVER datasets on the "real" and "synthetic" test sets of VitaminC:

sh scripts/run_fact_verification.sh

To evaluate the same model on another jsonlines file (containing claim, evidence, and label fields). Use:

sh scripts/run_fact_verification.sh path_to_test_file

Other available pretrained models (including the ALBERT-xlarge model that performed the best):

tals/albert-base-vitaminc
tals/albert-base-vitaminc-mnli
tals/albert-base-vitaminc-fever
tals/albert-xlarge-vitaminc
tals/albert-xlarge-vitaminc-mnli
tals/albert-xlarge-vitaminc-fever

Word-level Rationales

Will be added soon


Factually Consistent Generation

Will be added soon


Citation

If you find our code and/or data useful, please cite our paper:

@InProceedings{Schuster2019,
    title = "Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence",
    author="Tal Schuster and Adam Fisch and Regina Barzilay",
    booktitle = "NAACL 2021",
    year = "2021",
    url = "https://arxiv.org/abs/2103.08541",
}
Comments
  • Little clarification

    Little clarification

    So I am a little confused,

    So it seems you're using everything from hugging face, and I am not familiar with all the arguments,

    1. Are you training the model first and then evaluating it? https://github.com/TalSchuster/VitaminC#fact-verification , here It says to evaluate?

    On running it seems the model spends 20-30min on GPU, so is it training the model and then evaluating?

    I am confused how is training and evaluation being done.

    opened by nile649 6
  • Sentence boundaries

    Sentence boundaries

    Hi,

    Thank you for the great work!

    I wanted to check whether it is possible to get a version of the dataset splits that has cleaner sentence boundaries. Right now there are records with different spacing before/after a punctuation mark, which makes it difficult to split into sentences with simple sentence tokenizers from spacy/nltk, e.g. no spacing after a dot at the end of a sentence or spacing after a final dot in an abbreviation:

    • The lease of another A330-200 was announced in December 2016.In January 2013 , Hawaiian signed a Memorandum of Understanding with Airbus for an order of 16 A321neo aircraft plus up to 9 options .
    • The movie D.A.R.Y.L . received negative reviews .
    • With the mysterious U.S . Senator Eddie Morra ( Bradley Cooper ) providing him with a second drug to counteract NZT 's d...
    • Vol . 3 : The Subliminal Verses is by Slipknot and Stone Sour .

    Any help would be appreciated, including ideas of tokenizers/ways to quickly deal with this.

    Thank you!

    opened by apepa 2
  • rational.py

    rational.py

    Dear Authors,

    Thanks for sharing your code. We have read through your codebase, it's very helpful. We are wondering if you can share the scripts/rationale.py with us as well? Much appreciated!

    opened by ankechiang 1
  • Code about factually consistent generation

    Code about factually consistent generation

    Dear author, I recently read this article that I feel very good, but my work direction is based on factual generation, so I want to follow the fourth task you proposed. But I opened the warehouse and waited for a while I still didn’t find the code about factually consistent generation. I would appreciate it if you could provide the code.

    opened by lonelywolf1999 0
  • docs: demo, experiments and live inference API on Tiyaro

    docs: demo, experiments and live inference API on Tiyaro

    Hello Dr. Schuster (@TalSchuster)!

    Thank you for your work on TalSchuster/VitaminC. This GitHub project is interesting, and we think that it would be a great addition to make this work instantly discoverable & available as an API for all your users, to quickly try and use it in their applications.

    On Tiyaro, every model in TalSchuster/VitaminC will get its own: Dedicated model card (see https://console.tiyaro.ai/explore/tals-albert-xlarge-vitaminc-mnli Model demo (see https://console.tiyaro.ai/explore/tals-albert-xlarge-vitaminc-mnli/demo) Unique Inference API (https://api.tiyaro.ai/explore/huggingface/1//tals/albert-xlarge-vitaminc-mnli) Sample code snippets and swagger spec for the API

    Users will also be able to compare your model with other models of similar types on various parameters using Tiyaro Experiments (https://blog.tiyaro.ai/evaluate-openmmlabs-mmocr-models-using-tiyaro-experiments)

    —- I am from Tiyaro.ai (https://tiyaro.ai/). We are working on enabling developers to instantly evaluate, use and customize the world’s best AI. We are constantly working on adding new features to Tiyaro EasyTrain, EasyServe & Experiments, to make the best use of your ML model, and making AI more accessible for anyone.

    Sincerely, I-Jong Lin

    Founding Engineer at Tiyaro https://www.linkedin.com/in/i-jong-lin-721842/

    P.S. If you have any questions about the infrastructure or other models that you may wish to have a similar treatment. Feel free to contact me at [email protected] P.P.S. If we do have a conversation, I will be looking forward to using my favorite word "balagan."

    opened by ijonglin 0
  • Factually Consistent Generation

    Factually Consistent Generation

    Dear Authors,

    Thank you for sharing your codes. It's very help for us to dig into the VitaminC dataset. As we explore the four tasks mentioned in your paper, we notice that the codes related to "Factually Consistent Generation" task is currently missing in your codebase. We are wondering if you can kindly share your code with us.

    Thanks for your help!

    opened by ankechiang 0
Owner
null
Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

Explainable Fact Checking: A Survey This repository and the accompanying webpage contain resources for the paper "Explainable Fact Checking: A Survey"

Neema Kotonya 42 Nov 17, 2022
✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

How Robust are Fact Checking Systems on Colloquial Claims? Official PyTorch implementation of our NAACL paper: Byeongchang Kim*, Hyunwoo Kim*, Seokhee

Byeongchang Kim 19 Mar 15, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Saeed Lotfi 28 Dec 12, 2022
The VeriNet toolkit for verification of neural networks

VeriNet The VeriNet toolkit is a state-of-the-art sound and complete symbolic interval propagation based toolkit for verification of neural networks.

null 9 Dec 21, 2022
SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Wentao Zhu 24 May 20, 2022
VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat

null 338 Dec 27, 2022
Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

TDY-CNN for Text-Independent Speaker Verification Official implementation of Temporal Dynamic Convolutional Neural Network for Text-Independent Speake

Seong-Hu Kim 16 Oct 17, 2022
Pocsploit is a lightweight, flexible and novel open source poc verification framework

Pocsploit is a lightweight, flexible and novel open source poc verification framework

cckuailong 208 Dec 24, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python>=3.7 pytorch>=1.6.0 torchvision>=0.8

Yunfan Li 210 Dec 30, 2022
Contrastive Learning Inverts the Data Generating Process

Official code to reproduce the results and data presented in the paper Contrastive Learning Inverts the Data Generating Process.

null 71 Nov 25, 2022
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

Xinlong Wang 491 Jan 3, 2023
Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

Training GANs with Stronger Augmentations via Contrastive Discriminator (ICLR 2021) This repository contains the code for reproducing the paper: Train

Jongheon Jeong 174 Dec 29, 2022
Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization (Salesforce Research) This is a PyTorch implementation of the CoMatch paper [B

Salesforce 107 Dec 14, 2022
git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR 2021) This repo contains the implementation of our state-of-the-art fewshot ob

null 233 Dec 29, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 7, 2022
CLASP - Contrastive Language-Aminoacid Sequence Pretraining

CLASP - Contrastive Language-Aminoacid Sequence Pretraining Repository for creating models pretrained on language and aminoacid sequences similar to C

Michael Pieler 133 Dec 29, 2022
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

null 119 Dec 4, 2022