ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

Overview

(Comet-) ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

Example for ATOMIC2020

Paper

Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, Yejin Choi
"(Comet-) Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs."
Appearing at AAAI Conference on Artificial Intelligence 2021

Data: ATOMIC 2020

The data for ATOMIC 2020 is available here. If you need the ATOMIC 2018 data ( Sap et al. 2018 ) it is downloadable here.

Model: COMET-ATOMIC 2020

Trained COMET model can be downloaded here.

Codebase

We include code used in expirements in COMET-ATOMIC2020 for reproducibility, ease of use. Our models are based off the HuggingFace Transformers codebase, with minor adjustments to adapt the model for our data. Details can be found in the AAAI paper.

Setup

Run pip install -r requirements.txt to install requirements for your Python instance. We recommend Conda to manage Python installs. Our codebases is on Python 3.

It's recommended that you test that your enviroment is set up correctly before running modeling code. You can do this via python models/comet_atomic2020_gpt2/comet_gpt2.py --test_install

The code for modeling is located in mosaic/infra/modeling. mosaic/datasets/KGDataset is used to convert the ATOMIC2020 CSV into an HuggingFace Datasets object.

Directory Overview

beaker_exp: Contains files needed to run expirements using Beaker (https://beaker.org/) instead of on your local machine.

human_eval: Contains HTML files for human evaluation on Amazon MTurk, as described in the AAAI paper.

models: Contains additional modeling files to reproduce the GPT2 and BART expirements. models/comet_atomic2020_bart contains a README and code to run COMET-BART2020.

scripts: Contains additional scripts (e.g. utils.py) used during expirements in the COMET-ATOMIC2020 paper.

split: Contains code used to make the test, train, and dev splits of ATOMIC2020 with Stratified Random Sampling.

system_eval: Contains code for automatic evaluation of generated entities.

Contributions

We welcome contributions to the codebase of COMET-2020. We encourage pull requests instead of issues; and suggest filing a GitHub issue with questions / suggestions.

License

COMET-ATOMIC 2020 (codebase) is licensed under the Apache License 2.0. The ATOMIC 2020 dataset is licensed under CC-BY.

Contact

Email: jenah[at]allenai[dot]org

Comments
  • Can't reprodce results for BART

    Can't reprodce results for BART

    I downloaded the pre-trained COMET with BART, and executed run.sh without --do-train and gave the path of the downloaded model to --model_name_or_path however, the test RougeL is not what you reported in the paper:

    I downloaded the test files from [here] as mentioned in other issue (https://storage.googleapis.com/ai2-mosaic-public/projects/mosaic-kgs/data_atomic_2020_BART-format.tgz ) It's the content of metrics.json

                "test_avg_loss": 4.657896041870117,
                "test_avg_rouge1": 0.2141809276565792,
                "test_avg_rouge2": 0.03116943751406806,
                "test_avg_rougeL": 0.21284768699554912,
                "test_avg_gen_time": 0.003392871381747925,
                "test_avg_summ_len": 7.352367184676545,
                "avg_rouge1": 0.2141809276565792,
                "step_count": 1
    
    opened by puraminy 8
  • Reproducing the behavior of the AI2 demo

    Reproducing the behavior of the AI2 demo

    Hi,

    Thanks for putting together the repo! I'm trying to make predictions with the model using the models/comet_atomic2020_bart/generation_example.py script. But the behavior is sometimes a little worse than that of the AI2 demo.

    Could you please specify the model architecture/hyperparameter settings that are used by the demo? Thanks in advance!

    opened by veronica320 6
  • How to get .source and .target file at comet_atomic2020_bart

    How to get .source and .target file at comet_atomic2020_bart

    Hello sir.

    I tried to run your codes that use BART model to generate knowledge triples.

    In your codes, "models/comet_atomic2020_bart/finetune.py" requires "train.source" file and "train.target" file...

    However, I couldn't figure out how to get these files.

    How can I get these files?

    Thanks.

    opened by yongho94 6
  • Duplicate tuples and tuples with none tail node

    Duplicate tuples and tuples with none tail node

    There are duplicate tuples in all three splits of data: ~68,626 in the train, ~7,410 in dev, and ~8,473 in test (please correct me if I'm wrong). I wonder why? And should we just ignore the duplicates when using the data? One example:

    ['PersonX answers the question', 'xAttr', 'knowledgeable']
    ['PersonX answers the question', 'xAttr', 'knowledgeable']
    

    Also, there are tuples with none tail node value (these none valued tuples are also part of the duplicate tuples). For example,

    ['PersonX accidentally threw ___', 'xIntent', 'none']
    ['PersonX accidentally threw ___', 'xIntent', 'none']
    

    I wonder how these none values should be interpreted? Should we just ignore them? Or, does it mean that the subject or head has no relation of type edge relation in the tuple? For instance, in the case of PersonX accidentally threw ___, PersonX has no xIntent? If that's the case, then how should we treat the following cases:

    ['PersonX accidently left', 'oReact', 'none']
    ['PersonX accidently left', 'oReact', 'sad']
    

    Where we have the same relation one time with a none tail node and another time with a non-empty tail node.

    Thanks.

    opened by phosseini 5
  • How to evaluate generated data

    How to evaluate generated data

    When I tried to run:

    py eval.py --gen_file ../models/comet_atomic2020_gpt2/results/pred_generations.jsonl

    I got the following error:

    Traceback (most recent call last):
      File "eval.py", line 139, in <module>
        sources, references, predictions = preprocess(args.gen_file, keys)
      File "eval.py", line 109, in preprocess
        keys_list = keys if keys!=None else generations[0]["generations"].keys()
    AttributeError: 'list' object has no attribute 'keys'
    

    What are keys? the generated list has no keys,

    You also pass keys to the evaluation, but what are them?

    python anli_evaluation/eval.py --gen_file GENERATIONS_FILE --keys MODEL_KEYS[comma-separated list] --results_file RESULTS_FILE

    I also checked automatic_eval.py in system_eval, it gets different types of input (type 1, 2, 3) none of which are according to the generated results. The generated results have source, target, and generations while the function expect other fields (such as tails, head, etc.)

    opened by puraminy 3
  • ightning_base.py

    ightning_base.py", line 59, in __init__: AttributeError: can't set attribute

    I got this error $bash ./run.sh : AttributeError: can't set attribute

    % ipython
    Python 3.8.13 (default, Mar 28 2022, 06:16:26) 
    Type 'copyright', 'credits' or 'license' for more information
    IPython 8.2.0 -- An enhanced Interactive Python. Type '?' for help.
    In [2]: import pytorch_lightning
    In [3]: print(pytorch_lightning.__version__)
    1.6.3
    
    In [4]: import torch
    In [5]: print(torch.__version__)
    1.10.1
    
    % bash ./run.sh           
    Traceback (most recent call last):
      File "finetune.py", line 441, in <module>
        main(args)
      File "finetune.py", line 330, in main
        model: SummarizationModule = SummarizationModule(args)
      File "finetune.py", line 68, in __init__
        super().__init__(hparams, num_labels=None, mode=self.mode, **kwargs)
      File "/Users/davidlaxer/comet-atomic-2020/models/comet_atomic2020_bart/lightning_base.py", line 59, in __init__
        self.hparams = hparams
      File "/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1225, in __setattr__
        object.__setattr__(self, name, value)
    AttributeError: can't set attribute
    
    

    https://github.com/PyTorchLightning/pytorch-lightning/issues/7443

    opened by dbl001 2
  • COMET results

    COMET results

    Could you please share or upload the COMET 2020 results files so that I can check the evaluation method you developed? The file automatic_eval.py in system_eval contains the mentioned experiment files

    opened by puraminy 2
  • How do you calculate BLEU for short targets?

    How do you calculate BLEU for short targets?

    The target or tail of many relations is short, for example it could be adjectives like happy, satisfied for xReact relation. How do you measure BLEU for these tails? As I read in the following implementation of BLEU

    https://github.com/mjpost/sacrebleu/issues/150#issuecomment-808509500

    The sentences must have at least 4 words so that it calculate BLEU. Moreover, because of the problems mentioned in previous issues I couldn't use eval.py to evaluate the generated lists by your code

    opened by puraminy 2
  • T5 Implemetation

    T5 Implemetation

    Do you have an implementation of COMET with T5? I have seen the following article mentioned it, but I couldn't find any code.

    Understanding Few-Shot Commonsense Knowledge Models

    Could you guide me on how I can change the current code to use T5? I've seen you declared T5 in some codes but you didn't use that.

    Thanks

    opened by puraminy 2
  • How to use ATOMIC to generate commensense just like using the 'predict' button in your demo?

    How to use ATOMIC to generate commensense just like using the 'predict' button in your demo?

    Hi, I'm quite interested in the result of applying COMET on my own dataset. I've also checked your demo and found that I can do Commonsense Inferences about People and Events after entering a sentence and click 'predict'. But now I want to do inferences for plenty sentences, automatically, and in this case what should I do to edit corresponding py files to reach my goal? Looking forward to your replies!

    opened by Jingjie1218 1
  • FileNotFoundError: [Errno 2] No such file or directory: 'data/gpt2-zeroshot/atomic/test_samp led_prefixes.jsonl'

    FileNotFoundError: [Errno 2] No such file or directory: 'data/gpt2-zeroshot/atomic/test_samp led_prefixes.jsonl'

    When I tried to run gpt2_zeroshot.py, the following error occurred:

    
    FileNotFoundError: [Errno 2] No such file or directory: 'data/gpt2-zeroshot/atomic/test_samp
    led_prefixes.jsonl'
    

    Where is this file, or how I can create it?

    opened by puraminy 1
  • Update requirements to include transformers and wandb

    Update requirements to include transformers and wandb

    Hi! When I installed from the requirements.txt and ran the test command:

    python -m models.comet_atomic2020_gpt2.comet_gpt2 --test_install

    It showed that it required transformers and wandb.

    opened by rjbray915 0
  • Update PyTorch-Lightning Requirement to 1.6.4

    Update PyTorch-Lightning Requirement to 1.6.4

    I'd like to use 'comet' with the new PyTorch 'mps' backend. The PyTorch 'mps' implementation is still being 'shaken-out', as is PyTorch-Lightning 1.6.4, however, some of the code works. Can you create a development branch of 'comet' that can use the latest version of 'PyTorch-Lightning' so we can run on the 'mps' backend (GPU).

    opened by dbl001 0
  • input format

    input format

    Current comet model often requires X wanted to as a prefix. Without such prefix, the model output are not good enough in some cases.

    We might want to emphasize the prefix, or to retrain the model which works well without such prefix.

    enhancement 
    opened by keisks 0
  • Using comet-bart for zer-shot entailment?

    Using comet-bart for zer-shot entailment?

    consider sent1="X drives too fast" and sent2="X is pulled over by a cop" Now we know "sent1 "happens before" sent2" is true. Is there any zero-shot way of finding out whether this is true or not? Also What if in ATOMIC, we have sent1 -> r1 -> r2 -> ... -> rk -> sent2? Is there a way to find out about this from COMET? I don't want to know this from ATOMIC because well sent1 and sent2 can be sentences outside of ATOMIC. That's where COMET would be useful.

    opened by theartpiece 1
Owner
AI2
AI2
This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

STaCK: Sentence Ordering with Temporal Commonsense Knowledge This repository contains the pytorch implementation of the paper STaCK: Sentence Ordering

Deep Cognition and Language Research (DeCLaRe) Lab 23 Dec 16, 2022
Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

Diversifying Commonsense Reasoning Generation on Knowledge Graph Introduction -- This is the pytorch implementation of our ACL 2022 paper "Diversifyin

DM2 Lab @ ND 61 Dec 30, 2022
Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

Peifeng Wang 33 Dec 5, 2022
Using pretrained GROVER to extract the atomic fingerprints from molecule

Extracting atomic fingerprints from molecules using pretrained Graph Neural Network models (GROVER).

Xuan Vu Nguyen 1 Jan 28, 2022
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

INK Lab @ USC 19 Nov 30, 2022
Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

GD-VCR Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021). Research Questions and Aims: How well can a model perform o

Da Yin 24 Oct 13, 2022
Source code and Dataset creation for the paper "Neural Symbolic Regression That Scales"

NeuralSymbolicRegressionThatScales Pytorch implementation and pretrained models for the paper "Neural Symbolic Regression That Scales", presented at I

null 35 Nov 25, 2022
Open-Ended Commonsense Reasoning (NAACL 2021)

Open-Ended Commonsense Reasoning Quick links: [Paper] | [Video] | [Slides] | [Documentation] This is the repository of the paper, Differentiable Open-

(Bill) Yuchen Lin 31 Oct 19, 2022
The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

null 4 Apr 20, 2022
PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM)

Neuro-Symbolic Sudoku Solver PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM). Please n

Ashutosh Hathidara 60 Dec 10, 2022
Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

DNA This repository provides the code of our paper: Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Illustration of DNA

Changlin Li 215 Dec 19, 2022
QA-GNN: Question Answering using Language Models and Knowledge Graphs

QA-GNN: Question Answering using Language Models and Knowledge Graphs This repo provides the source code & data of our paper: QA-GNN: Reasoning with L

Michihiro Yasunaga 434 Jan 4, 2023
Compositional and Parameter-Efficient Representations for Large Knowledge Graphs

NodePiece - Compositional and Parameter-Efficient Representations for Large Knowledge Graphs NodePiece is a "tokenizer" for reducing entity vocabulary

Michael Galkin 107 Jan 4, 2023
Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

CyGNet This repository reproduces the AAAI'21 paper “Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Network

CunchaoZ 89 Jan 3, 2023
Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Continuous Query Decomposition This repository contains the official implementation for our ICLR 2021 (Oral) paper, Complex Query Answering with Neura

UCL Natural Language Processing 71 Dec 29, 2022
Language models are open knowledge graphs ( non official implementation )

language-models-are-knowledge-graphs-pytorch Language models are open knowledge graphs ( work in progress ) A non official reimplementation of Languag

theblackcat102 132 Dec 18, 2022
Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Query Embedding on Hyper-Relational Knowledge Graphs This repository contains the code used for the experiments in the paper Query Embedding on Hyper-

DimitrisAlivas 19 Jul 26, 2022
Collective Multi-type Entity Alignment Between Knowledge Graphs (WWW'20)

CG-MuAlign A reference implementation for "Collective Multi-type Entity Alignment Between Knowledge Graphs", published in WWW 2020. If you find our pa

Bran Zhu 28 Dec 11, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022