git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]

Overview

Commonsense Knowledge Base Completion with Structural and Semantic Context

Code for the paper Commonsense Knowledge Base Completion with Structural and Semantic Context.

Bibtex

@article{malaviya2020commonsense,
  title={Commonsense Knowledge Base Completion with Structural and Semantic Context},
  author={Malaviya, Chaitanya and Bhagavatula, Chandra and Bosselut, Antoine and Choi, Yejin},
  journal={Proceedings of the 34th AAAI Conference on Artificial Intelligence},
  year={2020}
}

Requirements

  • PyTorch
  • Run pip install -r requirements.txt to install the required packages.

Dataset

The ATOMIC dataset used in this paper is available here and the ConceptNet graph is available here. For convenience, both the pre-processed version of ATOMIC and ConceptNet used in the experiments are provided at this link.

Note: The ATOMIC dataset was pre-processed to canonicalize person references and remove punctuations (described in preprocess_atomic.py.

Note: The original evaluation sets provided in the ConceptNet dataset contain correct as well as incorrect tuples for evaluating binary classification accuracy. valid.txt in data/conceptnet is the concatenation of the correct tuples from the two development sets provided in the original dataset while test.txt is the set of correct tuples from the original test set.

Training

To train a model, run the following command:

python -u src/run_kbc_subgraph.py --dataset conceptnet --evaluate-every 10 --n-layers 2 --graph-batch-size 60000 --sim_relations --bert_concat

This trains the model and saves the model under the saved_models directory.

Language Model Fine-tuning

In this work, we use representations from a BERT model fine-tuned to the language of the nodes in the knowledge graph.

The script to fine-tune BERT as a language model on the two knowledge graphs is present in the lm_finetuning/ directory. For example, here is a command to fine-tune BERT as a language model on ConceptNet:

python lm_finetuning/simple_lm_finetuning.py --train_corpus {CONCEPTNET_TRAIN_CORPUS} --bert_model bert-large-uncased --output_dir {OUTPUT_DIR}

Pre-Trained Models

We provide the fine-tuned BERT models and pre-computed BERT embeddings for both ConceptNet and ATOMIC at this link. If you unzip the downloaded file in the root directory of the repository, the training script will load the embeddings.

We also provide the pre-trained KB completion models for both datasets for ease of use. Link to Conceptnet model and ATOMIC model.

Evaluation

To evaluate a trained model, and get predictions, provide the model path to the --load_model argument and use the --eval_only argument. For example, to evaluate the pre-trained ConceptNet model provided above, use the following command:

CUDA_VISIBLE_DEVICES={GPU_ID} python src/run_kbc_subgraph.py --dataset conceptnet --sim_relations --bert_concat --use_bias --load_model {PATH_TO_PRETRAINED_MODEL} --eval_only --write_results

This will load the pre-trained model, and evaluate it on the validation and test set. The predictions are saved to ./topk_results.json.

Similarly, to evaluate the trained model on ATOMIC, use the following command:

CUDA_VISIBLE_DEVICES={GPU_ID} python src/run_kbc_subgraph.py --dataset atomic --sim_relations --use_bias --load_model {PATH_TO_PRETRAINED_MODEL} --eval_only --write_results

Please email me at [email protected] for any questions or comments.

Comments
  • some question about simple_lm_finetuning

    some question about simple_lm_finetuning

    Hello, could you tell me the details of fine-tune code? I'm trying to add some nodes in the atomic node data, and I need to re-train bert embedding of the nodes.

    But I got some trouble with simple_lm_finetuning.py RuntimeError: index out of range: Tried to access index -1 out of table with 511 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418 with debug ,I can see that the 'lm_label_ids' variable contains "-1".

    Here is my 'train_corpus', the struct is same with your 'atomic_node_names.txt' file: image

    And if we change the train_corpus to your 'atomic_node_names.txt' file, the error still appears.

    My transformers lib is version 2.2.0 Is there some thing wrong?

    opened by woocoder 8
  • there is a problem at runtime

    there is a problem at runtime

    When I'm running run_ kbc_ subgraph.py An error occurred .The error code line is in Bert_ feature_ extractor.py File.

    Traceback (most recent call last): File "src/run_kbc_subgraph.py", line 534, in main(args) File "src/run_kbc_subgraph.py", line 115, in main args.sim_relations) File "src/run_kbc_subgraph.py", line 52, in load_data train_network.add_sim_edges_bert() File "D:\commonsense-kg-completion-master\src\reader.py", line 84, in add_sim_edges_bert bert_model = BertLayer(self.dataset) File "D:\commonsense-kg-completion-master\src\bert_feature_extractor.py", line 185, in init self.bert_model.to(self.device) AttributeError: 'collections.OrderedDict' object has no attribute 'to'

    How can I solve this problem? I look forward to your answer

    opened by hvuehu 4
  • Question about knowledge graph embedding

    Question about knowledge graph embedding

    Hi, bro. I want to use the concept-knowledge graph embedding which you propose at link https://drive.google.com/file/d/1R4C2s8QWwdNE9CUwtfhsYevmM7V-01YT/view?usp=sharing

    but, Only have embedding file and I can't find vocab file, I load the conceptnet_bert_embeddings.pt and it size is 78334, while the node vocab size at follow link is 78249? https://drive.google.com/file/d/1dpSK-eV_USdQ9XvqBuj2rjvtgz_97P0E/view?usp=sharing

    I wanna know where is the node/vocab file?

    Thanks.

    opened by yichao96 3
  • Could you please tell me the version of DGL, PyTorch, and CUDA you were using?

    Could you please tell me the version of DGL, PyTorch, and CUDA you were using?

    The code constantly yields errors about the devices. Each time I modified the code according to the information, new errors occurred. I think it might help if you can tell me the version of these packages you were using. Thanks.

    opened by Ber666 2
  • LM model not found error when training

    LM model not found error when training

    I downloaded the pre-processed data set and unzipped into the data directory.

    I ran: !python -u src/run_kbc_subgraph.py --dataset conceptnet --evaluate-every 10 --n-layers 2 --graph-batch-size 60000 --sim_relations --bert_concat

    And got this result:

    2020-08-27 15:20:31.912536: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 DGL backend not selected or invalid. Assuming PyTorch for now. Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable. Valid options are: pytorch, mxnet, tensorflow (all lowercase) Using backend: pytorch Namespace(bert_concat=True, bert_mlp=False, bert_sum=False, cpu_decoding=False, dataset='conceptnet', debug=False, decoder='ConvTransE', decoder_batch_size=128, dropout=0.2, embedding_dim=200, eval_batch_size=500, eval_only=False, evaluate_every=10, feature_map_dropout=0.2, gcn_type='WGCNAttentionLayer', gpu=-1, grad_norm=1.0, graph_batch_size=60000, init_embedding_dim=200, input_dropout=0.2, input_layer='lookup', label_smoothing_epsilon=0.1, layer_norm=False, load_model=None, lr=0.0001, n_bases=100, n_epochs=200, n_hidden=200, n_layers=2, negative_sample=0, no_cuda=False, output_dir='saved_models', regularization=0.1, seed=42, sim_relations=True, sim_sim=False, tying=False, use_bias=False, write_results=False) Number of edges: 99999

    Graph Summary

    Nodes: 78088 Edges: 100000 Relations: 34 Density: 0.000016

    ******************* Sample Edges ******************* ReceivesAction: hockey --> play on ice IsA: hockey --> team sport IsA: hockey --> violent sport IsA: hockey --> game IsA: hockey --> great sport HasProperty: hockey --> violent HasProperty: hockey --> cold IsA: hockey --> type of game IsA: hockey --> sport of skill and precision IsA: hockey --> sport game


    Average Degree: 1.254213195369327 Adding sim edges.. bert_model_embeddings/nodes-lm-conceptnet/conceptnet_bert_embeddings.pt Downloading: 100% 232k/232k [00:00<00:00, 423kB/s] Loading model from bert_model_embeddings/nodes-lm-conceptnet/ Traceback (most recent call last): File "src/run_kbc_subgraph.py", line 532, in main(args) File "src/run_kbc_subgraph.py", line 115, in main args.sim_relations) File "src/run_kbc_subgraph.py", line 52, in load_data train_network.add_sim_edges_bert() File "/content/drive/My Drive/common_sense/commonsense-kg-completion/src/reader.py", line 82, in add_sim_edges_bert bert_model = BertLayer(self.dataset) File "/content/drive/My Drive/common_sense/commonsense-kg-completion/src/bert_feature_extractor.py", line 175, in init self.bert_model = torch.load(output_model_file, map_location='cpu') File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 571, in load with _open_file_like(f, 'rb') as opened_file: File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 229, in _open_file_like return _open_file(name_or_buffer, mode) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 210, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'bert_model_embeddings/nodes-lm-conceptnet/lm_pytorch_model.bin'

    opened by ontocord 2
  • About the pre-trained BERT ConceptNet embedding

    About the pre-trained BERT ConceptNet embedding

    Hello! Thanks for sharing your code and pre-trained embeddings.

    I wonder how can we know the corresponding word for every bert embedding vector in the file "conceptnet_bert_embeddings.pt"? Looks like it's shape is 78334 x 1024. But the "cn_node_names.txt" contains 78249 entities,,,,, So I'm not sure how to link each entity to its pre-trained BERT embedding. Could you give me some insights? thanks!

    opened by zsun227 2
  • an Error

    an Error

    Thank you for your great work, but when I try to train the model using the command "python -u src/run_kbc_subgraph.py --dataset conceptnet --evaluate-every 10 --n-layers 2 --graph-batch-size 60000 --sim_relations --bert_concat".

    I got an error as follow:

    Traceback (most recent call last): File "src/run_kbc_subgraph.py", line 532, in main(args) File "src/run_kbc_subgraph.py", line 349, in main loss = model.get_score(e1, rel, target, graph_embeddings) File "/home/s2020017/work/commonsenseKBC/src/model.py", line 155, in get_score reg_loss = self.regularization_loss(embedding) File "/home/s2020017/work/commonsenseKBC/src/model.py", line 134, in regularization_loss dec_weight = self.decoder.module.w_relation.weight.pow(2) File "/home/s2020017/Anaconda_3.7/envs/commonsenseKBC/lib/python3.6/site-packages/torch/nn/modules/module.py", line 518, in getattr type(self).name, name)) AttributeError: 'ConvTransE' object has no attribute 'module'

    How can I fix it? Thank you so much!!

    opened by Diison 1
  • The scripts kills itself during the evaluation process.

    The scripts kills itself during the evaluation process.

    Hi, it's a great job. I'm trying to run the evaluation using the following command:

    CUDA_VISIBLE_DEVICES={GPU_ID} python src/run_kbc_subgraph.py --dataset atomic --sim_relations --use_bias --load_model {PATH_TO_PRETRAINED_MODEL} --eval_only --write_results
    

    I create a new vitural environment with conda using the requirements file in the repo. I have downloaded ATOMIC model here and BERT embeddings at this link. However, the script kills itself when finishing the loop without any error or warning. I am running the script on 128GB RAM UBUNTU 20.04 with NVIDIA TITAN RTX. So it may not be related to the resource limitation.

    Could you please help me find out why it kills itself? Thank you very much!

    opened by RomanShen 0
  • fine-tune bert

    fine-tune bert

    thanks for your great work i feel confused about the way to fine-tuned the bert as mentioned before, the model remove the relation, so the input text is {head entity token} concat {tail entity token} ?

    opened by Jiang-X-Pro 4
  • Ablations for non commensense KGs and BERT?

    Ablations for non commensense KGs and BERT?

    Hi, I really liked the paper and the approaches outlined. I had two questions:

    1. I know the paper is targeted towards commonsense KGs, but are there results for this technique on standard KG benchmark datasets like WN18/FB15k-237/the RR versions?
    2. Are there any ablations to show the effect of BERT pretraining? (fine tuning with MLM on the corpus used in the nodes vs using a standard off the shelf pretrained BERT)
    opened by nishanthcgit 0
  • Pretrained Full-Graph Model

    Pretrained Full-Graph Model

    Thank you for your very interesting paper and for releasing the code.

    From a look at the code, it appears the code for training the model the full ConceptNet Graph isn't complete (for instance, reader .ConceptNetFullReader isn't implemented)..

    Would it be possible for you to provide this code, or (even better) the node embeddings for the full graph?

    Many thanks.

    opened by petervickers 1
  • Error in init_with_bert not passing network

    Error in init_with_bert not passing network

    In trying to train from scratch, I found that there is a bug.

    !python -u src/run_kbc_subgraph.py --dataset conceptnet --evaluate-every 10 --n-layers 2 --graph-batch-size 60000 --sim_relations --bert_concat

    init_with_bert in model.py, which calls bert_model.forward_as_init(num_nodes). forward_as_init expects a second argument network, but was sent None. node_list computes the node from the network as you can see...

    def forward_as_init(self, num_nodes, network=None):
    
        if self.exists:
            print("Loading BERT embeddings from disk..")
            return torch.load(self.filename)
    
        node_ids = np.arange(num_nodes)
        node_list = [network.graph.nodes[idx] for idx in node_ids]
    

    I think this i the bug, but I can't figure out how to pass in network. +++

    Error message:

    2020-08-27 17:52:38.578804: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 Using backend: pytorch Namespace(bert_concat=True, bert_mlp=False, bert_sum=False, cpu_decoding=False, dataset='conceptnet', debug=False, decoder='ConvTransE', decoder_batch_size=128, dropout=0.2, embedding_dim=200, eval_batch_size=500, eval_only=False, evaluate_every=10, feature_map_dropout=0.2, gcn_type='WGCNAttentionLayer', gpu=-1, grad_norm=1.0, graph_batch_size=60000, init_embedding_dim=200, input_dropout=0.2, input_layer='lookup', label_smoothing_epsilon=0.1, layer_norm=False, load_model=None, lr=0.0001, n_bases=100, n_epochs=200, n_hidden=200, n_layers=2, negative_sample=0, no_cuda=False, output_dir='saved_models', regularization=0.1, seed=42, sim_relations=True, sim_sim=False, tying=False, use_bias=False, write_results=False) Number of edges: 99999

    Graph Summary

    Nodes: 78088 Edges: 100000 Relations: 34 Density: 0.000016

    ******************* Sample Edges ******************* ReceivesAction: hockey --> play on ice IsA: hockey --> team sport IsA: hockey --> violent sport IsA: hockey --> game IsA: hockey --> great sport HasProperty: hockey --> violent HasProperty: hockey --> cold IsA: hockey --> type of game IsA: hockey --> sport of skill and precision IsA: hockey --> sport game


    Average Degree: 1.254213195369327 Adding sim edges.. bert_model_embeddings/nodes-lm-conceptnet/conceptnet_bert_embeddings.pt Loading model from bert_model_embeddings/nodes-lm-conceptnet/ Computing BERT embeddings.. saving Computed embeddings. Added 4649160 sim edges tcmalloc: large alloc 1198546944 bytes == 0xdb340000 @ 0x7f87cf059b6b 0x7f87cf079379 0x7f87763ae92e 0x7f87763b0946 0x7f87ae2a19e5 0x7f87ae526af3 0x7f87ae517f97 0x7f87ae517c7d 0x7f87ae517f97 0x7f87ae622a1a 0x7f87be1540d5 0x7f87be155cd1 0x7f87bdf560ca 0x551755 0x5a9eec 0x50a783 0x50c1f4 0x507f24 0x509277 0x594b01 0x54a17f 0x5517c1 0x5a9eec 0x50a783 0x50c1f4 0x507f24 0x509202 0x594b01 0x54a17f 0x5517c1 0x5a9eec bert_model_embeddings/nodes-lm-conceptnet/conceptnet_bert_embeddings.pt Loading model from bert_model_embeddings/nodes-lm-conceptnet/ Traceback (most recent call last): File "src/run_kbc_subgraph.py", line 532, in main(args) File "src/run_kbc_subgraph.py", line 143, in main use_cuda=use_cuda) File "/content/drive/My Drive/common_sense/commonsense-kg-completion/src/model.py", line 54, in init self.bert_concat_layer = EmbeddingLayer(num_nodes, self.bert_dim, args.dataset, init_bert=True) File "/content/drive/My Drive/common_sense/commonsense-kg-completion/src/model.py", line 168, in init self.init_with_bert(num_nodes, dataset) File "/content/drive/My Drive/common_sense/commonsense-kg-completion/src/model.py", line 176, in init_with_bert bert_weights = bert_model.forward_as_init(num_nodes) File "/content/drive/My Drive/common_sense/commonsense-kg-completion/src/bert_feature_extractor.py", line 243, in forward_as_init node_list = [network.graph.nodes[idx] for idx in node_ids] File "/content/drive/My Drive/common_sense/commonsense-kg-completion/src/bert_feature_extractor.py", line 243, in node_list = [network.graph.nodes[idx] for idx in node_ids] AttributeError: 'NoneType' object has no attribute 'graph'

    opened by ontocord 0
Owner
AI2
AI2
ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

(Comet-) ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs Paper Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sa

AI2 152 Dec 27, 2022
git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Self-Attention Attribution This repository contains the implementation for AAAI-2021 paper Self-Attention Attribution: Interpreting Information Intera

null 60 Dec 29, 2022
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022
This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

STaCK: Sentence Ordering with Temporal Commonsense Knowledge This repository contains the pytorch implementation of the paper STaCK: Sentence Ordering

Deep Cognition and Language Research (DeCLaRe) Lab 23 Dec 16, 2022
Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

Diversifying Commonsense Reasoning Generation on Knowledge Graph Introduction -- This is the pytorch implementation of our ACL 2022 paper "Diversifyin

DM2 Lab @ ND 61 Dec 30, 2022
git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Joint Entity and Relation Extraction with Set Prediction Networks Source code for Joint Entity and Relation Extraction with Set Prediction Networks. W

null 130 Dec 13, 2022
Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

Peifeng Wang 33 Dec 5, 2022
git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective This is the pytorch implementation of our paper "[Beta R-CNN: Looking into Pede

null 35 Sep 8, 2021
git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction Code for the ECCV 2020 paper by Yiming Qian and Yasutaka Furukawa Getting

null 37 Dec 4, 2022
git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

Investigating Loss Functions for Extreme Super-Resolution NTIRE 2020 Perceptual Extreme Super-Resolution Submission. Our method ranked first and secon

Sejong Yang 0 Oct 17, 2022
git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

USD-Seg This project is an implement of paper USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation, based on FCOS detector f

Ruolin Ye 80 Nov 28, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

Wenxuan Zhou 146 Nov 29, 2022
Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

Rahul Nadkarni 41 Nov 30, 2022
TuckER: Tensor Factorization for Knowledge Graph Completion

TuckER: Tensor Factorization for Knowledge Graph Completion This codebase contains PyTorch implementation of the paper: TuckER: Tensor Factorization f

Ivana Balazevic 296 Dec 6, 2022
Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

MKGFormer Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion" Model Architecture Illu

ZJUNLP 68 Dec 28, 2022
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

INK Lab @ USC 19 Nov 30, 2022
Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

GD-VCR Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021). Research Questions and Aims: How well can a model perform o

Da Yin 24 Oct 13, 2022