Open-Ended Commonsense Reasoning (NAACL 2021)

(Bill) Yuchen Lin

Last update: Oct 19, 2022

Related tags

Deep Learning OpenCSR

Overview

Open-Ended Commonsense Reasoning

Quick links: [Paper] | [Video] | [Slides] | [Documentation]

This is the repository of the paper, Differentiable Open-Ended Commonsense Reasoning, by Bill Yuchen Lin, Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Xiang Ren, and William W. Cohen, in Proc. of NAACL 2021.

Abstract

Current commonsense reasoning research focuses on developing models that use commonsense knowledge to answer multiple-choice questions. However, systems designed to answer multiple-choice questions may not be useful in applications that do not provide a small list of candidate answers to choose from. As a step towards making commonsense reasoning research more realistic, we propose to study open-ended commonsense reasoning (OpenCSR) — the task of answering a commonsense question without any pre-defined choices — using as a resource only a corpus of commonsense facts. OpenCSR is challenging due to a large decision space, and because many questions require implicit multi-hop reasoning. As an approach to OpenCSR, we propose DrFact, an efficient Differentiable model for multi-hop Reasoning over knowledge Facts. To evaluate OpenCSR methods, we adapt several popular commonsense reasoning benchmarks, and collect multiple new answers for each test question via crowd-sourcing. Experiments show that DrFact outperforms strong baseline methods by a large margin.

Content

Please check the documentation for running the code.

We show the instructions for running four retrieval approaches to the OpenCSR task — BM25 (off-the-shelf), DPR (EMNLP2020), DrKIT (ICLR 2020) and DrFact (ours, NAACL 2021), as well as a concept re-ranker to boost the performance by learning with cross-attention. Note that there is a relative dependency of these four methods:

training the DPR model needs the results from BM25 (to create training data);
DrFact needs to reuse DPR’s fact index and single-hop results (for creating distant supervision);
DrFact and DrKIT share many utility functions (sparse matrix operation and indexing scripts). We detailed the detailed instructions in individual pages.

Outline and Documentation

drfact_data/
- datasets/ (download from here)
- knowledge_corpus/ (download from here)
baseline_methods/
- BM25/ --> https://open-csr.github.io/methods/bm25
- DPR/ --> https://open-csr.github.io/methods/dpr
- MCQA/ (i.e., Concept Re-ranker) --> https://open-csr.github.io/methods/reranker
language-master/language/labs/
- drkit/ (common modules for DrKIT and DrFact)
- drfact/ (for running DrFact)
scripts/
- run_drkit.sh --> https://open-csr.github.io/methods/drkit
- run_drfact.sh --> https://open-csr.github.io/methods/drfact
evaluation/ --> https://open-csr.github.io/evaluation

Citation

@inproceedings{lin-etal-2021-differentiable,
    title = "Differentiable Open-Ended Commonsense Reasoning",
    author = "Lin, Bill Yuchen and Sun, Haitian and Dhingra, Bhuwan and Zaheer, Manzil and Ren, Xiang and Cohen, William",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.366",
    pages = "4611--4625"
}

Contact

This repo is now under active development, and there may be issues caused by refactoring code. Please email [email protected] if you have any questions.

You might also like...

ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

(Comet-) ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs Paper Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sa

152 Dec 27, 2022

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

33 Dec 5, 2022

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Comments

error occurs when converting the OpenCSR datasets format

I was trying to Preprocess for DPR according to the guide where step 2 went wrong:

Traceback (most recent call last): File "baseline_methods/DPR/convert_qas_csv.py", line 8, in nlp.pipeline = [('tagger', nlp.tagger)] AttributeError: 'English' object has no attribute 'tagger'

I've searched for this Error, but discussions are rare. my spacy version is 3.2.1, and version of en-core-web-sm is 3.2.0. Could anyone offer some solutions?

opened by SoyMark 1
module 'torch.cuda' has no attribute 'amp'

when training DPR, this error occurs. Some says that attribute "cuda.amp" needs a torch version>=1.50. In the guide the recommanded torch version is 1.4.0, so I wonder if it is possible to solve the problem without installing other versions of cuda and packages again.

opened by SoyMark 0
OBQA/linked_train.jsonl does not has attributes "init_facts" and "sup_facts"

I am attempting to train DrKIT by following the steps https://open-csr.github.io//methods/drkit. The error seems to be there are no these two keys "init_facts" and "sup_facts" in the data file.

tf_log.cont_eval.txt tf_log.train.txt

opened by 4hebailanc 0

Open-Ended Commonsense Reasoning (NAACL 2021)

Related tags

Overview

Open-Ended Commonsense Reasoning

Quick links: [Paper] | [Video] | [Slides] | [Documentation]

Abstract

Content

Please check the documentation for running the code.

Outline and Documentation

Citation

Contact

You might also like...

ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

Contextualized Perturbation for Textual Adversarial Attack, NAACL 2021

[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

Self-training with Weak Supervision (NAACL 2021)

Comments

error occurs when converting the OpenCSR datasets format

module 'torch.cuda' has no attribute 'amp'

OBQA/linked_train.jsonl does not has attributes "init_facts" and "sup_facts"

Owner

(Bill) Yuchen Lin

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Benchmark for evaluating open-ended generation

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]