Convolutional 2D Knowledge Graph Embeddings resources

Tim Dettmers

Last update: Dec 24, 2022

Related tags

Deep Learning ConvE

Overview

ConvE

Convolutional 2D Knowledge Graph Embeddings resources.

Paper: Convolutional 2D Knowledge Graph Embeddings

Used in the paper, but do not use these datasets for your research: FB15k and WN18. Please also note that the Kinship and Nations datasets have a high number of inverse relationships which makes them unsuitable for research. Nations has +95% inverse relationships and Kinship about 48%.

ConvE key facts

Predictive performance

Dataset	MR	MRR	Hits@10	Hits@3	Hits@1
FB15k	64	0.75	0.87	0.80	0.67
WN18	504	0.94	0.96	0.95	0.94
FB15k-237	246	0.32	0.49	0.35	0.24
WN18RR	4766	0.43	0.51	0.44	0.39
YAGO3-10	2792	0.52	0.66	0.56	0.45
Nations	2	0.82	1.00	0.88	0.72
UMLS	1	0.94	0.99	0.97	0.92
Kinship	2	0.83	0.98	0.91	0.73

Run time performance

For an embedding size of 200 and batch size 128, a single batch takes on a GTX Titan X (Maxwell):

64ms for 100,000 entities
80ms for 1,000,000 entities

Parameter efficiency

Parameters	ConvE/DistMult MRR	ConvE/DistMult Hits@10	ConvE/DistMult Hits@1
~5.0M	0.32 / 0.24	0.49 / 0.42	0.24 / 0.16
1.89M	0.32 / 0.23	0.49 / 0.41	0.23 / 0.15
0.95M	0.30 / 0.22	0.46 / 0.39	0.22 / 0.14
0.24M	0.26 / 0.16	0.39 / 0.31	0.19 / 0.09

ConvE with 8 times less parameters is still more powerful than DistMult. Relational Graph Convolutional Networks use roughly 32x more parameters to have the same performance as ConvE.

Installation

This repo supports Linux and Python installation via Anaconda.

Install PyTorch using Anaconda.
Install the requirements pip install -r requirements.txt
Download the default English model used by spaCy, which is installed in the previous step python -m spacy download en
Run the preprocessing script for WN18RR, FB15k-237, YAGO3-10, UMLS, Kinship, and Nations: sh preprocess.sh
You can now run the model

Running a model

Parameters need to be specified by white-space tuples for example:

CUDA_VISIBLE_DEVICES=0 python main.py --model conve --data FB15k-237 \
                                      --input-drop 0.2 --hidden-drop 0.3 --feat-drop 0.2 \
                                      --lr 0.003 --preprocess

will run a ConvE model on FB15k-237.

To run a model, you first need to preprocess the data once. This can be done by specifying the --preprocess parameter:

CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME --preprocess

After the dataset is preprocessed it will be saved to disk and this parameter can be omitted.

CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME

The following parameters can be used for the --model parameter:

conve
distmult
complex

The following datasets can be used for the --data parameter:

FB15k-237
WN18RR
YAGO3-10
umls
kinship
nations

And here a complete list of parameters.

Link prediction for knowledge graphs

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
                        input batch size for training (default: 128)
  --test-batch-size TEST_BATCH_SIZE
                        input batch size for testing/validation (default: 128)
  --epochs EPOCHS       number of epochs to train (default: 1000)
  --lr LR               learning rate (default: 0.003)
  --seed S              random seed (default: 17)
  --log-interval LOG_INTERVAL
                        how many batches to wait before logging training
                        status
  --data DATA           Dataset to use: {FB15k-237, YAGO3-10, WN18RR, umls,
                        nations, kinship}, default: FB15k-237
  --l2 L2               Weight decay value to use in the optimizer. Default:
                        0.0
  --model MODEL         Choose from: {conve, distmult, complex}
  --embedding-dim EMBEDDING_DIM
                        The embedding dimension (1D). Default: 200
  --embedding-shape1 EMBEDDING_SHAPE1
                        The first dimension of the reshaped 2D embedding. The
                        second dimension is infered. Default: 20
  --hidden-drop HIDDEN_DROP
                        Dropout for the hidden layer. Default: 0.3.
  --input-drop INPUT_DROP
                        Dropout for the input embeddings. Default: 0.2.
  --feat-drop FEAT_DROP
                        Dropout for the convolutional features. Default: 0.2.
  --lr-decay LR_DECAY   Decay the learning rate by this factor every epoch.
                        Default: 0.995
  --loader-threads LOADER_THREADS
                        How many loader threads to use for the batch loaders.
                        Default: 4
  --preprocess          Preprocess the dataset. Needs to be executed only
                        once. Default: 4
  --resume              Resume a model.
  --use-bias            Use a bias in the convolutional layer. Default: True
  --label-smoothing LABEL_SMOOTHING
                        Label smoothing value to use. Default: 0.1
  --hidden-size HIDDEN_SIZE
                        The side of the hidden layer. The required size
                        changes with the size of the embeddings. Default: 9728
                        (embedding size 200).

To reproduce most of the results in the ConvE paper, you can use the default parameters and execute the command below:

CUDA_VISIBLE_DEVICES=0 python main.py --data DATASET_NAME

For the reverse model, you can run the provided file with the name of the dataset name and a threshold probability:

python inverse_model.py WN18RR 0.9

Changing the embedding size for ConvE

If you want to change the embedding size you can do that via the ``--embedding-dim parameter. However, for ConvE, since the embedding is reshaped as a 2D embedding one also needs to pass the first dimension of the reshaped embedding (--embedding-shape1`) while the second dimension is infered. When once changes the embedding size, the hidden layer size `--hidden-size` also needs to be different but it is difficult to determine before run time. The easiest way to determine the hidden size is to run the model, let it run on an error due to wrong shape, and then reshape according to the dimension in the error message.

Example: Change embedding size to be 100. We want 10x10 2D embeddings. We run python main.py --embedding-dim 100 --embedding-shape1 10 and we run on an error due to wrong hidden dimension:

   ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [128 x 4608], m2: [9728 x 100] at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/generic/THCTensorMathBlas.cu:273

Now we change the hidden dimension to 4608 accordingly: python main.py --embedding-dim 100 --embedding-shape1 10 --hidden-size 4608. Now the model runs with an embedding size of 100 and 10x10 2D embeddings.

Adding new datasets

To run it on a new datasets, copy your dataset folder into the data folder and make sure your dataset split files have the name train.txt, valid.txt, and test.txt which contain tab separated triples of a knowledge graph. Then execute python wrangle_KG.py FOLDER_NAME, afterwards, you can use the folder name of your dataset in the dataset parameter.

Adding your own model

You can easily write your own knowledge graph model by extending the barebone model MyModel that can be found in the model.py file.

Quirks

There are some quirks of this framework.

The model currently ignores data that does not fit into the specified batch size, for example if your batch size is 100 and your test data is 220, then 20 samples will be ignored. This is designed in that way to improve performance on small datasets. To test on the full test-data you can save the model checkpoint, load the model (with the --resume True variable) and then evaluate with a batch size that fits the test data (for 220 you could use a batch size of 110). Another solution is to just use a fitting batch size from the start, that is, you could train with a batch size of 110.

Issues

It has been noted that #6 WN18RR does contain 212 entities in the test set that do not appear in the training set. About 6.7% of the test set is affected. This means that most models will find it impossible to make any reasonable predictions for these entities. This will make WN18RR appear more difficult than it really is, but it should not affect the usefulness of the dataset. If all researchers compared to the same datasets the scores will still be comparable.

Logs

Some log files of the original research are included in the repo (logs.tar.gz). These log files are mostly unstructured in names and might be created from checkpoints so that it is difficult to comprehend them. Nevertheless, it might help to replicate the results or study the behavior of the training under certain conditions and thus I included them here.

Citation

If you found this codebase or our work useful please cite us:

@inproceedings{dettmers2018conve,
	Author = {Dettmers, Tim and Pasquale, Minervini and Pontus, Stenetorp and Riedel, Sebastian},
	Booktitle = {Proceedings of the 32th AAAI Conference on Artificial Intelligence},
	Title = {Convolutional 2D Knowledge Graph Embeddings},
	Url = {https://arxiv.org/abs/1707.01476},
	Year = {2018},
        pages  = {1811--1818},
  	Month = {February}
}

Comments

Reopening issue #43 on data augmentation with reversed triples

Thanks for the answer in https://github.com/TimDettmers/ConvE/issues/43#issuecomment-487323794, but I don't quite get the point. As pointed out in [1] adding inverse relations to the training set affects the performance of the model. To cite their paper:

Third, we propose a different formulation of the objective, in which we model separately predicates and their inverse: for each predicate pred, we create an inverse predicate predicate and create a triple (obj; pred^-1; sub) for each training triple (sub; pred; obj). At test time, queries of the form (?; pred; obj) are answered as (obj; pred^-1; ?). Similar formulations were previously used by Shen et al. (2016) and Joulin et al. (2017), but for different models for which there was no clear alternative, so the impact of this reformulation has never been evaluated.

... Learning and predicting with the inverse predicates, however, changes the picture entirely. First, with both CP and ComplEx, we obtain significant gains in performance on all the datasets. More precisely, we obtain state-of-the-art results with CP, matching those of ComplEx.

So does ConvE addi inverse relations as [1] did in their paper? Then according to [1] one can conclude that ConvE has profited from this data augmentation, unless it does an ablation study and shows there is no difference, right? I think this is an important point concerning a fair comparison against other existing; this can decide acceptance/rejection of future knowledge graph embeddings papers!

[1] Lacroix, Timothee, Nicolas Usunier, and Guillaume Obozinski. “Canonical Tensor Decomposition for Knowledge Base Completion.” In International Conference on Machine Learning, 2863–72, 2018. http://proceedings.mlr.press/v80/lacroix18a.html.

opened by dschaehi 14
Best Hyperparameter Settings
Dear Tim,

thank you for the great work.

I was wondering whether you could provide the best found hyperparameter settings of ConvE on WN18RR, FB15k-237, UMLS and Kinship.

209 number of the entities that occur on test split of WN18RR can not be found on the train split of WN18RR. Consequently, I was wondering whether you are aware of it. If so, what is the supporting argument for creating entities that are do not occur during training.

Cheers
opened by Demirrr 5
No module named "spodernet.utils"

Hello

I have installed spodernet as mentioned in this issue: https://github.com/TimDettmers/ConvE/issues/13 But while running the model, I still get an error "No module named spodernet.utils". I am using pytorch 0.4.1.post2. Can you please suggest how to remove this error?

Thanks & Regards Aayushee

opened by aayushee 5
some difference between paper and code
paper say use early stop but it seems the code just train 1000 epochs ?

paper say force L2 norm for DistMult and CompEx but I didn't see where is it ?

I don't think we should see the test result during training, which is cheating . so why do test every 3 epoch in the main.py

paper say DistMult and CompEx use margin-based loss and ConvE use cross entropy loss but what I see is that they all use the torch.nn.BCELoss for model.loss ?

maybe the questions are a bit more but I am working on a new model on the task , hard , so hope you could give some answer which is a great help for me and I will appreciate it a lot ,thank you ! (please forgive me for my poor English..)
opened by shanry 5
Detailed parameter settings of ComplEx on WN18RR and FB15237?

Hi, I have failed to reproduce results of ComplEx on WN18RR and FB15K237 reported in ConvE paper. I use an implementation of myself and it can reproduce results of FB15K and WN18 correctly. Could you please tell me the optimal parameter settings of ComplEx you implemented on these two datasets?

opened by scissorsy 5
Clarification on Experiment Setup
Two clarification questions:

Do you retrain the embeddings with valid set triples added for test set prediction? An alternative is to train the embeddings using only train set triples and obtain the results for both dev and test using the same set of embeddings. Looking at the code I think you're doing the latter, just to make sure.

For dataset such as KINSHIP, do you random split the triples into train, dev, test according to a certain ratio or is there an official data split used by all papers? I'm asking because another KBC paper also released the KINSHIP dataset and the data split is different from yours.

Thanks!
opened by todpole3 5
Changing embedding size fails (line numbers in Quirks section are not right)

Sorry but I could not follow this description

If you use a different embedding size, the ConvE concatenation size cannot be determined automatically and you have to set it yourself in line 106/107. Also the first dimension of the projection layer will change. You will need to comment out the print function (line 118) to get the needed dimension, and adjust the size of the fully connected layer in line 98

I could not follow this explanation (lines 106/107, 98, 118 of which files ?) If I want to change the embedding size to say 50, how many places I have to make the change ?

opened by unmeshvrije 4
Any body has the problem of Memory Explosion?

Hi Tim, When I run this code, the memory usage is getting bigger and finally got an explosion. I was wondering if you have the same problem and what's the solution?

Looking forward to your kind feedback. Thanks, tsingker88

opened by tsingker88 4
Can not reproduce results in the paper for WN18RR dataset

I have tried the generic command for reproducing results in the paper for WN18RR dataset, but it could not reproduce MRR reported in the paper, I managed only to get 0.42.

Which hyperparameters can reproduce the 0.46 MRR reported in the paper?

opened by samehkamaleldin 4
No module named "spodernet"

thank you for your code! I am a freshman,so there is some questions: I ran the command

CUDA_VISIBLE_DEVICES=0 python main.py model ConvE dataset FB15k-237
input_drop 0.2 hidden_drop 0.3 feat_drop 0.2
lr 0.003 process True after preprocessing there is a error say: No module named "spodernet" I find in "src" ther is a folder named spodernet,but i don't know how to install it,can you help me? thank you

opened by ShangYuming 4
Question about "Inverse Model" in the paper.

I have understood how to detect inverse relations by in the "Inverse Model" in the paper. But I do not quite get the idea of "k matches" in testing, "At test time, we check if the test triple has inverse matches outside the test set: if k matches are found, we sample a permutation of the top k ranks for these matches; if no match is found, we select a random rank for the test triple." Is this means, if we have (s, r_i, o), we will find all (o, r_j, s) in training set (r_j \in R), and k is the number of (o, r_j, s) ? If yes, why should we do so? It seems this test looks like "relation prediction" instead of "entity prediction". Maybe I do not really understand it.

opened by leiloong 3
About spaCy

Hello, thanks your excellent work. I would like to ask about the use of spaCy, because I tried to run the model and found that although the code has import spaCy, it doesn't seem to be used. Would you tell me where spaCy is used? Thank you

opened by jerry155333522 0
About the indegree

Hi, thanks your excellent work. There is a question about the indegree. Specially, the in-degree of a node is the number of times it is a tail entity? And how to calculate the MRR value with degree range in [0,100], does this MRR value mean the degree range of the predicted entity is in [0,100]?
question

opened by yhjiujiu 1
About the activation function

Hello , Tim. Have u experimented that replacing the sigmoid with softmax in the logits layer? I tried to run your code, but I found that I got a lower MRR score than the result your paper with sigmoid. When I changed it to softmax, I got a higher MRR score than u. I want to cite your paper in our experiments, could u tell me how to address this problem and use your result as our base. Thank u, looking forward to your reply.

opened by HammerWang98 2
Can I use more network layers？

Hi, thanks for your job and code, I have a question about that the number of network layer. I noticed you only using one conv2d and one FC, as everyone knows, deep model can get better performance. I wonder why you don't use more convolution and FC layers? Doesn't the deep model help performance?

Hope your reply, thanks again.

opened by quqxui 0

Convolutional 2D Knowledge Graph Embeddings resources

Related tags

Overview

ConvE

ConvE key facts

Predictive performance

Run time performance

Parameter efficiency

Installation

Running a model

Changing the embedding size for ConvE

Adding new datasets

Adding your own model

Quirks

Issues

Logs

Citation

Comments

Owner

Tim Dettmers

🤖 A Python library for learning and evaluating knowledge graph embeddings

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021

Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

Using pretrained language models for biomedical knowledge graph completion.