TuckER: Tensor Factorization for Knowledge Graph Completion

Ivana Balazevic

Last update: Dec 6, 2022

Related tags

Deep Learning TuckER

Overview

TuckER: Tensor Factorization for Knowledge Graph Completion

This codebase contains PyTorch implementation of the paper:

TuckER: Tensor Factorization for Knowledge Graph Completion. Ivana Balažević, Carl Allen, and Timothy M. Hospedales. Empirical Methods in Natural Language Processing (EMNLP), 2019. [Paper]

TuckER: Tensor Factorization for Knowledge Graph Completion. Ivana Balažević, Carl Allen, and Timothy M. Hospedales. ICML Adaptive & Multitask Learning Workshop, 2019. [Short Paper]

Link Prediction Results

Dataset	MRR	Hits@10	Hits@3	Hits@1
FB15k	0.795	0.892	0.833	0.741
WN18	0.953	0.958	0.955	0.949
FB15k-237	0.358	0.544	0.394	0.266
WN18RR	0.470	0.526	0.482	0.443

Running a model

To run the model, execute the following command:

 CUDA_VISIBLE_DEVICES=0 python main.py --dataset FB15k-237 --num_iterations 500 --batch_size 128
                                       --lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.3 
                                       --hidden_dropout1 0.4 --hidden_dropout2 0.5 --label_smoothing 0.1

Available datasets are:

FB15k-237
WN18RR
FB15k
WN18

To reproduce the results from the paper, use the following combinations of hyperparameters with batch_size=128:

dataset	lr	dr	edim	rdim	input_d	hidden_d1	hidden_d2	label_smoothing
FB15k	0.003	0.99	200	200	0.2	0.2	0.3	0.
WN18	0.005	0.995	200	30	0.2	0.1	0.2	0.1
FB15k-237	0.0005	1.0	200	200	0.3	0.4	0.5	0.1
WN18RR	0.003	1.0	200	30	0.2	0.2	0.3	0.1

Requirements

The codebase is implemented in Python 3.6.6. Required packages are:

numpy      1.15.1
pytorch    1.0.1

Citation

If you found this codebase useful, please cite:

@inproceedings{balazevic2019tucker,
title={TuckER: Tensor Factorization for Knowledge Graph Completion},
author={Bala\v{z}evi\'c, Ivana and Allen, Carl and Hospedales, Timothy M},
booktitle={Empirical Methods in Natural Language Processing},
year={2019}
}

Comments

Unable to reproduce results on WN18RR

Hi Ivana

I am trying to get entity embeddings for a downstream application. For WN18RR dataset I was unable to reproduce the reported results of TuckER. I used the hyperparameters given in the README of this repo. Following is the command I used:

 CUDA_VISIBLE_DEVICES=3 python main.py --dataset WN18RR --num_iterations 500 --batch_size 128 \
                                       --lr 0.01 --dr 1.0 --edim 200 --rdim 30 --input_dropout 0.2 \
                                       --hidden_dropout1 0.2 --hidden_dropout2 0.3 --label_smoothing 0.1

And the results are:

495
12.792492151260376
0.00035594542557143403
Validation:
Number of data points: 6068
Hits @10: 0.5121951219512195
Hits @3: 0.4728081740276862
Hits @1: 0.43638760711931446
Mean rank: 6254.662491760053
Mean reciprocal rank: 0.4624483298017613
Test:
Number of data points: 6268
Hits @10: 0.5140395660497766
Hits @3: 0.4738353541799617
Hits @1: 0.43123803446075304
Mean rank: 6595.924856413529
Mean reciprocal rank: 0.45961590280892123
5.328977823257446

Should I increase the number of epochs or am I missing something?

Thanks

opened by apoorvumang 9

Reopening evaluation issue
Hi,

I was just going through your code and found out that the training data has been augmented by adding new relations for reversed triples from the training set (correct me if I am wrong). I am not sure whether this is harmless, as this might have a regularzing effect on the weights the model learns.

Instead of adding new relations for reversing the triples, could you try the following and check whether this gives the same result?

Create d.train_data_reversed, where for each triple from d.train_data you only switch e_s and e_o and keep the relation. (So you don't create any new relations in this dataset.)

Add to class TuckER a method forward_reversed that is exactly the same as forward, but transposes the tensor W, so that the axes for e_s and e_o are switched.

When training, use forward for d.train_data and use forward_reversed for d.train_data_reversed

I think this way, one can guarantee that the evaluation is fair. It would be also interesting to know how you evaluate other models you compare with, for examples, whether you use the BCE loss and augment the training data for other models as well. This will make sure that it is is not the BCE loss or data augmentation that helps TuckER perform well.
opened by dschaehi 8

Parameters for reproducing results from paper

Can you provide the parameters for reproducing the results from the paper on FB15k and FB15K-237? I ran the command from the README:

 CUDA_VISIBLE_DEVICES=0 python main.py --dataset FB15k-237 --num_iterations 500 --batch_size 128
                                       --lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.3 
                                       --hidden_dropout1 0.4 --hidden_dropout2 0.5 --label_smoothing 0.1

which gave final performance of

Number of data points: 35070
Hits @10: 0.4009124607927003
Hits @3: 0.2555460507556316
Hits @1: 0.1760193897918449
Mean rank: 291.46401482748786
Mean reciprocal rank: 0.24741750020439274
Test:
Number of data points: 40932
Hits @10: 0.3974396560148539
Hits @3: 0.2546662757744552
Hits @1: 0.17094205022964917
Mean rank: 304.61949086289457
Mean reciprocal rank: 0.24344486414937788

Any ideas?

UPDATE: I noticed in the paper that you mention the best learning rate for FB15k-237 is 0.005 instead of 0.0005 and best the learning rate decay is 0.995 instead of 1.0 -- might that be the issue?

opened by bkj 7

why do you have reverse triples in evaluation?

In code

self.valid_data = self.load_data(data_dir, "valid", reverse=reverse)
self.test_data = self.load_data(data_dir, "test", reverse=reverse)

it should be

self.valid_data = self.load_data(data_dir, "valid", reverse=False)
self.test_data = self.load_data(data_dir, "test", reverse=False)

I did testing with this and the results it shows are much better than reported in the paper. Please let me know if I am wrong.

opened by apoorvumang 3

Could the one-way evaluation be a problem?

Hi,

I have a question on the evaluation in the code.

when the test rank is evaluated, the scores seem only be calculated for each head toward all tails. I didn't see the scores are calculated for each tail toward all heads in the code. Don't people usually calculate them both and average them as the final scores? Would this one-way evaluation be a problem, such as having some bias?

Thank you!

opened by Xiaobeing 2
Why set "padding_idx=0" in nn.Embedding

Hi~ I have found that the code set "padding_idx=0" in nn.Embedding, like self.E = torch.nn.Embedding(len(d.entities), d1, ) self.R = torch.nn.Embedding(len(d.relations), d2, padding_idx=0) However, this will lead the gradient of the first entity and relation becoming zero. This is very interesting and I want to know the reason for this. Thank you!

opened by THUCSTHanxu13 2
Hyperparameters for Yago3-10

Hi, thanks for developing this amazing model. I'd like to try and train it on the Yago3-10 dataset (I think you have used it in your other work titled "Hypernetwork Knowledge Graph Embeddings").

Have you ever tried to train TuckER on that dataset? Can you suggest me any hyperparameter settings, before I start running a long grid search? :)

Thanks for your help!

Andrea

opened by AndRossi 2
question about evaluation
In the paper, you say

for a given triple, we generate 2*n_e test triples by

keeping the subject entity e_s and relation r fixed and replacing the object entity e_o with all possible entities E and by

keeping the object entity e_o and relation r fixed and replacing the subject entity e_s with all entities E.

In the evaluate function, it looks like you score all possilbee_o's given an (e_s, e_r) tuple, then compute the rank of the true e_o. So I see how you're doing 1) above, but are you actually doing 2)?

Thanks! ~ Ben
opened by bkj 2
Reverse flag implementation
Hey if I just change the line 194 (d = Data(data_dir=data_dir, reverse=True) in main.py file, and use reverse=False, and run the code for FB15k-237 with recommended settings, the MRR shoots up to 0.4067. Is it expected behaviour?

To replicate:

Just change reverse=False in main.py

CUDA_VISIBLE_DEVICES=0 python main.py --dataset FB15k-237 --num_iterations 500 --batch_size 128 --lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.3 --hidden_dropout1 0.4 --hidden_dropout2 0.5 --label_smoothing 0.1

MRR keeps increasing.

Log for iteration 145: 145 21.162700176239014 0.001321860825107114 Test: Number of data points: 20466 Hits @10: 0.6135053259063813 Hits @3: 0.4763998827323366 Hits @1: 0.3409557314570507 Mean rank: 147.47825662073683 Mean reciprocal rank: 0.4335430118109515
opened by luffycodes 1
data progress,

def load_data(self, data_dir, data_type="train", reverse=False): with open("%s%s.txt" % (data_dir, data_type), "r") as f: data = f.read().strip().split("\n") data = [i.split("") for i in data] if reverse: data += [[i[2], i[1]+"_reverse", i[0]] for i in data] return data

Are your sure the data is data = [i.split("") for i in data] not data = [i.split("\t") for i in data], your data is splited by "\t", but you used space, if I use data = [i.split("\t") for i in data], I can not get the result you report in your paper about FB15k-237, can you explain it?

opened by ToneLi 1
Do you only test tails in the evalution?

I am a bit confused about the evaluation protocol. In the evaluation, you only feed (head, rel) to the model and get predictions with n elements representing the scores of (head, rel, t_1) ... (head, rel, t_n). Why you don't repeat this process for the tail? Could you explain the reason? I think it should be done right? Previous works all conduct the evaluation in this way. Maybe I misunderstand your code. Look forward to your reply.

opened by liu-jc 1
why not use 1-x score function.

Hi, thanks for your elegent code and job!!
I am thinking how to achieve the 1-x socre funtion( The x of 1-x means the number of entity to form a loss. 1-N uses the whole entity.）. In my opinion, 1-x should has much better performace than 1-N, becase is hard to train in high dim. So why not use 1-x score function ?

opened by quqxui 0
Unable to reproduce results on FB15k

Hey, I ran the code with suggested parameters, however, I was not able to reproduce the results on FB15k.

On FB15K, I got the following MRR (The best MRR is 0.789 in 500 epochs): 500 30.736143827438354 0.00022165691843464258 Validation: Number of data points: 100000 Hits @10: 0.88763 Hits @3: 0.8288 Hits @1: 0.73175 Mean rank: 39.8066 Mean reciprocal rank: 0.789614621151087 Test: Number of data points: 118142 Hits @10: 0.8898613532867228 Hits @3: 0.8294086776929458 Hits @1: 0.7293595842291479 Mean rank: 38.221682382218006 Mean reciprocal rank: 0.7889229105464421

opened by luffycodes 0

Owner

Ivana Balazevic

PhD candidate in Machine Learning @ University of Edinburgh. Ex Research Scientist Intern @ Facebook AI Research (FAIR) and @ Samsung AI.

GitHub

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

2.8k Feb 12, 2021

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. Code, based on the PyTorch framework, for reprodu

3 Dec 27, 2022

Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

41 Nov 30, 2022

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

MKGFormer Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion" Model Architecture Illu

68 Dec 28, 2022

git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]

Commonsense Knowledge Base Completion with Structural and Semantic Context Code for the paper Commonsense Knowledge Base Completion with Structural an

96 Nov 5, 2022

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

Corp-Rel is a PoC of Corpartion Relationship Knowledge Graph System. It's built on top of the Open Source Graph Database: Nebula Graph with a dataset

20 Dec 11, 2022

Neural Factorization of Shape and Reflectance Under An Unknown Illumination

NeRFactor [Paper] [Video] [Project] This is the authors' code release for: NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown I

283 Jan 4, 2023

A PyTorch implementation of a Factorization Machine module in cython.

fmpytorch A library for factorization machines in pytorch. A factorization machine is like a linear model, except multiplicative interaction terms bet

167 Jul 6, 2022

Implementation of SSMF: Shifting Seasonal Matrix Factorization

SSMF Implementation of SSMF: Shifting Seasonal Matrix Factorization, Koki Kawabata, Siddharth Bhatia, Rui Liu, Mohit Wadhwa, Bryan Hooi. NeurIPS, 2021

9 Jun 10, 2022

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

47 Jan 9, 2023

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

67 Dec 20, 2022

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

KaGRMN-DSG_ABSA This repository contains the PyTorch source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated

4 May 20, 2022

TuckER: Tensor Factorization for Knowledge Graph Completion

Related tags

Overview

TuckER: Tensor Factorization for Knowledge Graph Completion

Link Prediction Results

Running a model

Requirements

Citation

Comments

Owner

Ivana Balazevic

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Using pretrained language models for biomedical knowledge graph completion.

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

Neural Factorization of Shape and Reflectance Under An Unknown Illumination

A PyTorch implementation of a Factorization Machine module in cython.

Implementation of SSMF: Shifting Seasonal Matrix Factorization

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

Simulating Sycamore quantum circuits classically using tensor network algorithm.

Spectral Tensor Train Parameterization of Deep Learning Layers

FluidNet re-written with ATen tensor lib

Pretty Tensor - Fluent Neural Networks in TensorFlow

A torch.Tensor-like DataFrame library supporting multiple execution runtimes and Arrow as a common memory format

Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework