Convolutional 2D Knowledge Graph Embeddings resources

Overview

ConvE

Convolutional 2D Knowledge Graph Embeddings resources.

Paper: Convolutional 2D Knowledge Graph Embeddings

Used in the paper, but do not use these datasets for your research: FB15k and WN18. Please also note that the Kinship and Nations datasets have a high number of inverse relationships which makes them unsuitable for research. Nations has +95% inverse relationships and Kinship about 48%.

ConvE key facts

Predictive performance

Dataset MR MRR Hits@10 Hits@3 Hits@1
FB15k 64 0.75 0.87 0.80 0.67
WN18 504 0.94 0.96 0.95 0.94
FB15k-237 246 0.32 0.49 0.35 0.24
WN18RR 4766 0.43 0.51 0.44 0.39
YAGO3-10 2792 0.52 0.66 0.56 0.45
Nations 2 0.82 1.00 0.88 0.72
UMLS 1 0.94 0.99 0.97 0.92
Kinship 2 0.83 0.98 0.91 0.73

Run time performance

For an embedding size of 200 and batch size 128, a single batch takes on a GTX Titan X (Maxwell):

  • 64ms for 100,000 entities
  • 80ms for 1,000,000 entities

Parameter efficiency

Parameters ConvE/DistMult MRR ConvE/DistMult Hits@10 ConvE/DistMult Hits@1
~5.0M 0.32 / 0.24 0.49 / 0.42 0.24 / 0.16
1.89M 0.32 / 0.23 0.49 / 0.41 0.23 / 0.15
0.95M 0.30 / 0.22 0.46 / 0.39 0.22 / 0.14
0.24M 0.26 / 0.16 0.39 / 0.31 0.19 / 0.09

ConvE with 8 times less parameters is still more powerful than DistMult. Relational Graph Convolutional Networks use roughly 32x more parameters to have the same performance as ConvE.

Installation

This repo supports Linux and Python installation via Anaconda.

  1. Install PyTorch using Anaconda.
  2. Install the requirements pip install -r requirements.txt
  3. Download the default English model used by spaCy, which is installed in the previous step python -m spacy download en
  4. Run the preprocessing script for WN18RR, FB15k-237, YAGO3-10, UMLS, Kinship, and Nations: sh preprocess.sh
  5. You can now run the model

Running a model

Parameters need to be specified by white-space tuples for example:

CUDA_VISIBLE_DEVICES=0 python main.py --model conve --data FB15k-237 \
                                      --input-drop 0.2 --hidden-drop 0.3 --feat-drop 0.2 \
                                      --lr 0.003 --preprocess

will run a ConvE model on FB15k-237.

To run a model, you first need to preprocess the data once. This can be done by specifying the --preprocess parameter:

CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME --preprocess

After the dataset is preprocessed it will be saved to disk and this parameter can be omitted.

CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME

The following parameters can be used for the --model parameter:

conve
distmult
complex

The following datasets can be used for the --data parameter:

FB15k-237
WN18RR
YAGO3-10
umls
kinship
nations

And here a complete list of parameters.

Link prediction for knowledge graphs

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
                        input batch size for training (default: 128)
  --test-batch-size TEST_BATCH_SIZE
                        input batch size for testing/validation (default: 128)
  --epochs EPOCHS       number of epochs to train (default: 1000)
  --lr LR               learning rate (default: 0.003)
  --seed S              random seed (default: 17)
  --log-interval LOG_INTERVAL
                        how many batches to wait before logging training
                        status
  --data DATA           Dataset to use: {FB15k-237, YAGO3-10, WN18RR, umls,
                        nations, kinship}, default: FB15k-237
  --l2 L2               Weight decay value to use in the optimizer. Default:
                        0.0
  --model MODEL         Choose from: {conve, distmult, complex}
  --embedding-dim EMBEDDING_DIM
                        The embedding dimension (1D). Default: 200
  --embedding-shape1 EMBEDDING_SHAPE1
                        The first dimension of the reshaped 2D embedding. The
                        second dimension is infered. Default: 20
  --hidden-drop HIDDEN_DROP
                        Dropout for the hidden layer. Default: 0.3.
  --input-drop INPUT_DROP
                        Dropout for the input embeddings. Default: 0.2.
  --feat-drop FEAT_DROP
                        Dropout for the convolutional features. Default: 0.2.
  --lr-decay LR_DECAY   Decay the learning rate by this factor every epoch.
                        Default: 0.995
  --loader-threads LOADER_THREADS
                        How many loader threads to use for the batch loaders.
                        Default: 4
  --preprocess          Preprocess the dataset. Needs to be executed only
                        once. Default: 4
  --resume              Resume a model.
  --use-bias            Use a bias in the convolutional layer. Default: True
  --label-smoothing LABEL_SMOOTHING
                        Label smoothing value to use. Default: 0.1
  --hidden-size HIDDEN_SIZE
                        The side of the hidden layer. The required size
                        changes with the size of the embeddings. Default: 9728
                        (embedding size 200).

To reproduce most of the results in the ConvE paper, you can use the default parameters and execute the command below:

CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME

For the reverse model, you can run the provided file with the name of the dataset name and a threshold probability:

python inverse_model.py WN18RR 0.9

Changing the embedding size for ConvE

If you want to change the embedding size you can do that via the ``--embedding-dim parameter. However, for ConvE, since the embedding is reshaped as a 2D embedding one also needs to pass the first dimension of the reshaped embedding (--embedding-shape1`) while the second dimension is infered. When once changes the embedding size, the hidden layer size `--hidden-size` also needs to be different but it is difficult to determine before run time. The easiest way to determine the hidden size is to run the model, let it run on an error due to wrong shape, and then reshape according to the dimension in the error message.

Example: Change embedding size to be 100. We want 10x10 2D embeddings. We run python main.py --embedding-dim 100 --embedding-shape1 10 and we run on an error due to wrong hidden dimension:

   ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [128 x 4608], m2: [9728 x 100] at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/generic/THCTensorMathBlas.cu:273

Now we change the hidden dimension to 4608 accordingly: python main.py --embedding-dim 100 --embedding-shape1 10 --hidden-size 4608. Now the model runs with an embedding size of 100 and 10x10 2D embeddings.

Adding new datasets

To run it on a new datasets, copy your dataset folder into the data folder and make sure your dataset split files have the name train.txt, valid.txt, and test.txt which contain tab separated triples of a knowledge graph. Then execute python wrangle_KG.py FOLDER_NAME, afterwards, you can use the folder name of your dataset in the dataset parameter.

Adding your own model

You can easily write your own knowledge graph model by extending the barebone model MyModel that can be found in the model.py file.

Quirks

There are some quirks of this framework.

  1. The model currently ignores data that does not fit into the specified batch size, for example if your batch size is 100 and your test data is 220, then 20 samples will be ignored. This is designed in that way to improve performance on small datasets. To test on the full test-data you can save the model checkpoint, load the model (with the --resume True variable) and then evaluate with a batch size that fits the test data (for 220 you could use a batch size of 110). Another solution is to just use a fitting batch size from the start, that is, you could train with a batch size of 110.

Issues

It has been noted that #6 WN18RR does contain 212 entities in the test set that do not appear in the training set. About 6.7% of the test set is affected. This means that most models will find it impossible to make any reasonable predictions for these entities. This will make WN18RR appear more difficult than it really is, but it should not affect the usefulness of the dataset. If all researchers compared to the same datasets the scores will still be comparable.

Logs

Some log files of the original research are included in the repo (logs.tar.gz). These log files are mostly unstructured in names and might be created from checkpoints so that it is difficult to comprehend them. Nevertheless, it might help to replicate the results or study the behavior of the training under certain conditions and thus I included them here.

Citation

If you found this codebase or our work useful please cite us:

@inproceedings{dettmers2018conve,
	Author = {Dettmers, Tim and Pasquale, Minervini and Pontus, Stenetorp and Riedel, Sebastian},
	Booktitle = {Proceedings of the 32th AAAI Conference on Artificial Intelligence},
	Title = {Convolutional 2D Knowledge Graph Embeddings},
	Url = {https://arxiv.org/abs/1707.01476},
	Year = {2018},
        pages  = {1811--1818},
  	Month = {February}
}



Comments
  • Reopening issue #43 on data augmentation with reversed triples

    Reopening issue #43 on data augmentation with reversed triples

    Thanks for the answer in https://github.com/TimDettmers/ConvE/issues/43#issuecomment-487323794, but I don't quite get the point. As pointed out in [1] adding inverse relations to the training set affects the performance of the model. To cite their paper:

    Third, we propose a different formulation of the objective, in which we model separately predicates and their inverse: for each predicate pred, we create an inverse predicate predicate and create a triple (obj; pred^-1; sub) for each training triple (sub; pred; obj). At test time, queries of the form (?; pred; obj) are answered as (obj; pred^-1; ?). Similar formulations were previously used by Shen et al. (2016) and Joulin et al. (2017), but for different models for which there was no clear alternative, so the impact of this reformulation has never been evaluated.

    ... Learning and predicting with the inverse predicates, however, changes the picture entirely. First, with both CP and ComplEx, we obtain significant gains in performance on all the datasets. More precisely, we obtain state-of-the-art results with CP, matching those of ComplEx.

    So does ConvE addi inverse relations as [1] did in their paper? Then according to [1] one can conclude that ConvE has profited from this data augmentation, unless it does an ablation study and shows there is no difference, right? I think this is an important point concerning a fair comparison against other existing; this can decide acceptance/rejection of future knowledge graph embeddings papers!

    [1] Lacroix, Timothee, Nicolas Usunier, and Guillaume Obozinski. “Canonical Tensor Decomposition for Knowledge Base Completion.” In International Conference on Machine Learning, 2863–72, 2018. http://proceedings.mlr.press/v80/lacroix18a.html.

    opened by dschaehi 14
  • Best Hyperparameter Settings

    Best Hyperparameter Settings

    Dear Tim,

    thank you for the great work.

    • I was wondering whether you could provide the best found hyperparameter settings of ConvE on WN18RR, FB15k-237, UMLS and Kinship.

    • 209 number of the entities that occur on test split of WN18RR can not be found on the train split of WN18RR. Consequently, I was wondering whether you are aware of it. If so, what is the supporting argument for creating entities that are do not occur during training.

    Cheers

    opened by Demirrr 5
  • No module named

    No module named "spodernet.utils"

    Hello

    I have installed spodernet as mentioned in this issue: https://github.com/TimDettmers/ConvE/issues/13 But while running the model, I still get an error "No module named spodernet.utils". I am using pytorch 0.4.1.post2. Can you please suggest how to remove this error?

    Thanks & Regards Aayushee

    opened by aayushee 5
  • some difference between paper and code

    some difference between paper and code

    1. paper say use early stop but it seems the code just train 1000 epochs ?
    2. paper say force L2 norm for DistMult and CompEx but I didn't see where is it ?
    3. I don't think we should see the test result during training, which is cheating . so why do test every 3 epoch in the main.py
    4. paper say DistMult and CompEx use margin-based loss and ConvE use cross entropy loss but what I see is that they all use the torch.nn.BCELoss for model.loss ?

    maybe the questions are a bit more but I am working on a new model on the task , hard , so hope you could give some answer which is a great help for me and I will appreciate it a lot ,thank you ! (please forgive me for my poor English..)

    opened by shanry 5
  • Detailed parameter settings of ComplEx on WN18RR and FB15237?

    Detailed parameter settings of ComplEx on WN18RR and FB15237?

    Hi, I have failed to reproduce results of ComplEx on WN18RR and FB15K237 reported in ConvE paper. I use an implementation of myself and it can reproduce results of FB15K and WN18 correctly. Could you please tell me the optimal parameter settings of ComplEx you implemented on these two datasets?

    opened by scissorsy 5
  • Clarification on Experiment Setup

    Clarification on Experiment Setup

    Two clarification questions:

    1. Do you retrain the embeddings with valid set triples added for test set prediction? An alternative is to train the embeddings using only train set triples and obtain the results for both dev and test using the same set of embeddings. Looking at the code I think you're doing the latter, just to make sure.

    2. For dataset such as KINSHIP, do you random split the triples into train, dev, test according to a certain ratio or is there an official data split used by all papers? I'm asking because another KBC paper also released the KINSHIP dataset and the data split is different from yours.

    Thanks!

    opened by todpole3 5
  • Changing embedding size fails (line numbers in Quirks section are not right)

    Changing embedding size fails (line numbers in Quirks section are not right)

    Sorry but I could not follow this description

    If you use a different embedding size, the ConvE concatenation size cannot be determined automatically and you have to set it yourself in line 106/107. Also the first dimension of the projection layer will change. You will need to comment out the print function (line 118) to get the needed dimension, and adjust the size of the fully connected layer in line 98

    I could not follow this explanation (lines 106/107, 98, 118 of which files ?) If I want to change the embedding size to say 50, how many places I have to make the change ?

    opened by unmeshvrije 4
  • Any body has the problem of Memory Explosion?

    Any body has the problem of Memory Explosion?

    Hi Tim, When I run this code, the memory usage is getting bigger and finally got an explosion. I was wondering if you have the same problem and what's the solution?

    Looking forward to your kind feedback. Thanks, tsingker88

    opened by tsingker88 4
  • Can not reproduce results in the paper for WN18RR dataset

    Can not reproduce results in the paper for WN18RR dataset

    I have tried the generic command for reproducing results in the paper for WN18RR dataset, but it could not reproduce MRR reported in the paper, I managed only to get 0.42.

    Which hyperparameters can reproduce the 0.46 MRR reported in the paper?

    opened by samehkamaleldin 4
  • No module named

    No module named "spodernet"

    thank you for your code! I am a freshman,so there is some questions: I ran the command

    CUDA_VISIBLE_DEVICES=0 python main.py model ConvE dataset FB15k-237
    input_drop 0.2 hidden_drop 0.3 feat_drop 0.2
    lr 0.003 process True after preprocessing there is a error say: No module named "spodernet" I find in "src" ther is a folder named spodernet,but i don't know how to install it,can you help me? thank you

    opened by ShangYuming 4
  • Question about

    Question about "Inverse Model" in the paper.

    I have understood how to detect inverse relations by in the "Inverse Model" in the paper. But I do not quite get the idea of "k matches" in testing, "At test time, we check if the test triple has inverse matches outside the test set: if k matches are found, we sample a permutation of the top k ranks for these matches; if no match is found, we select a random rank for the test triple." Is this means, if we have (s, r_i, o), we will find all (o, r_j, s) in training set (r_j \in R), and k is the number of (o, r_j, s) ? If yes, why should we do so? It seems this test looks like "relation prediction" instead of "entity prediction". Maybe I do not really understand it.

    opened by leiloong 3
  • About spaCy

    About spaCy

    Hello, thanks your excellent work. I would like to ask about the use of spaCy, because I tried to run the model and found that although the code has import spaCy, it doesn't seem to be used. Would you tell me where spaCy is used? Thank you

    opened by jerry155333522 0
  • About the indegree

    About the indegree

    Hi, thanks your excellent work. There is a question about the indegree. Specially, the in-degree of a node is the number of times it is a tail entity? And how to calculate the MRR value with degree range in [0,100], does this MRR value mean the degree range of the predicted entity is in [0,100]?

    question 
    opened by yhjiujiu 1
  • About the activation function

    About the activation function

    Hello , Tim. Have u experimented that replacing the sigmoid with softmax in the logits layer? I tried to run your code, but I found that I got a lower MRR score than the result your paper with sigmoid. When I changed it to softmax, I got a higher MRR score than u. I want to cite your paper in our experiments, could u tell me how to address this problem and use your result as our base. Thank u, looking forward to your reply.

    opened by HammerWang98 2
  • Can I use more network layers?

    Can I use more network layers?

    Hi, thanks for your job and code, I have a question about that the number of network layer. I noticed you only using one conv2d and one FC, as everyone knows, deep model can get better performance. I wonder why you don't use more convolution and FC layers? Doesn't the deep model help performance?

    Hope your reply, thanks again.

    opened by quqxui 0
Owner
Tim Dettmers
Tim Dettmers
Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

wangle 823 Dec 28, 2022
open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

null 7 Nov 2, 2022
Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

Akuchi 36 Dec 18, 2022
A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

NEC Laboratories Europe 13 Sep 8, 2022
Large-scale Knowledge Graph Construction with Prompting

Large-scale Knowledge Graph Construction with Prompting across tasks (predictive and generative), and modalities (language, image, vision + language, etc.)

ZJUNLP 161 Dec 28, 2022
Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch This framework provides an easy method t

Ubiquitous Knowledge Processing Lab 9.1k Jan 2, 2023
Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch This framework provides an easy method t

Ubiquitous Knowledge Processing Lab 4.2k Feb 18, 2021
SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr

Princeton Natural Language Processing 2.5k Jan 7, 2023
A library for Multilingual Unsupervised or Supervised word Embeddings

MUSE: Multilingual Unsupervised and Supervised Embeddings MUSE is a Python library for multilingual word embeddings, whose goal is to provide the comm

Facebook Research 3k Jan 6, 2023
InferSent sentence embeddings

InferSent InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language in

Facebook Research 2.2k Dec 27, 2022
Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS

Vo Van Phuc 18 Nov 25, 2022
Shared code for training sentence embeddings with Flax / JAX

flax-sentence-embeddings This repository will be used to share code for the Flax / JAX community event to train sentence embeddings on 1B+ training pa

Nils Reimers 23 Dec 30, 2022
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 29, 2022
Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

KoSimCSE Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch SimCSE Installation git clone https://github.com/BM-K/

null 34 Nov 24, 2022
Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Speaker-Embeddings-Correlation-Pooling This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel

Themos Stafylakis 10 Apr 30, 2022
An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

Khalid Saifullah 37 Sep 5, 2022
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.

Benjamin Heinzerling 1.1k Jan 3, 2023
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

Explosion 222 Dec 16, 2022
NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi

Saarland University Spoken Language Systems Group 39 Nov 15, 2022