A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Overview

KGEval

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

The framework and experimental results are described in Ben Rim et al. 2021 (Outstanding Paper Award, AKBC 2021).

Instructions

Create a virtual environment

virtualenv -p python3.6 eval_env
source eval_env/bin/activate
pip install -r requirements.txt

Download data

In the main folder, run:

source data/download.sh

Download model

If you want to test the framework immediately, you can download pre-trained Pykeen models by running:

source download_models.sh

Generate behavioral tests

Symmetry Tests

Can choose --dataset FB15K237, WN18RR, YAGO310

python tests/run.py --dataset FB15K237 --mode generate --capability symmetry

This should result into the following output, and the files for each test set will be added under behavioral_tests\dataset\symmetry:

2021-10-03 23:37:35,060 - [INFO] - Preparing test sets for the dataset FB15K237
2021-10-03 23:37:37,621 - [INFO] - ########################## <----TRAIN---> ############################
2021-10-03 23:37:37,621 - [INFO] - 0 repetitions removed
2021-10-03 23:37:37,621 - [INFO] - 272115 triples remaining in train set
2021-10-03 23:37:37,621 - [INFO] - 6778 symmetric triples found in train set
2021-10-03 23:37:37,786 - [INFO] - ########################## <----TEST---> ############################
2021-10-03 23:37:37,786 - [INFO] - 0 repetitions removed
2021-10-03 23:37:37,786 - [INFO] - 20466 triples remaining in test set
2021-10-03 23:37:37,786 - [INFO] - 113 symmetric triples found in test set
2021-10-03 23:37:37,806 - [INFO] - ########################## <----VALID---> ############################
2021-10-03 23:37:37,806 - [INFO] - 0 repetitions removed
2021-10-03 23:37:37,806 - [INFO] - 17535 triples remaining in valid set
2021-10-03 23:37:37,806 - [INFO] - 113 symmetric triples found in valid set
2021-10-03 23:37:39,106 - [INFO] - #################### <---TEST SET 1: MEMORIZATION ---> ##########################
2021-10-03 23:37:39,106 - [INFO] - There are 5470 entries in the memorization set (occur in both directions)
2021-10-03 23:37:39,106 - [INFO] - #################### <---TEST SET 2: ONE DIRECTION SEEN ---> ##########################
2021-10-03 23:37:39,106 - [INFO] - There are 1308 entries not shown in both directions (to be reversed for testing)
2021-10-03 23:37:39,836 - [INFO] - #################### <--- SYMMETRIC RELATIONS ---> ##########################
2021-10-03 23:37:39,836 - [INFO] - TRAIN SET contains 6778 symmetric entries
2021-10-03 23:37:39,836 - [INFO] - TEST SET contains  113 symmetric entries with 113 not in training
2021-10-03 23:37:39,836 - [INFO] - VALID SET contains 113 symmetric entries with 113 not in training
2021-10-03 23:37:39,839 - [INFO] - #################### <---TEST SET 3: UNSEEN INSTANCES ---> ##########################
2021-10-03 23:37:39,840 - [INFO] - There are 226 entries that are not seen in any direction in training
2021-10-03 23:37:40,267 - [INFO] - #################### <---TEST SET 4: ASYMMETRY ---> ##########################
2021-10-03 23:37:40,267 - [INFO] - There are 3000 asymmetric entries in test set added to test 4

Hierarchy Tests

Only available for FB15K237 dataset

python tests/run.py --dataset FB15K237 --mode generate --capability hierarchy

The output should be and will be available under behavioral_tests/dataset/hierarchy/, the naming of the files corresponds to triples where the tail belongs to a specified level. For example, 1.txt contains triples where the tail has a type of level 1 in the entity type hierarchy :

2021-10-04 01:38:13,517 - [INFO] - Results of Hierarchy Behavioral Tests for FB15K237
2021-10-04 01:38:20,367 - [INFO] - <--------------- Entity Hiararchy statistics ----------------->
2021-10-04 01:38:20,568 - [INFO] - Level 0 contains 1 types and 3415 triples
2021-10-04 01:38:20,887 - [INFO] - Level 1 contains 66 types and 2006 triples
2021-10-04 01:38:20,900 - [INFO] - Level 2 contains 136 types and 4273 triples
2021-10-04 01:38:20,913 - [INFO] - Level 3 contains 213 types and 3560 triples
2021-10-04 01:38:20,923 - [INFO] - Level 4 contains 262 types and 3369 triples

Run Tests (pykeen models)

Symmetry behavioral tests on distmult or rotate:

python tests/run.py --dataset FB15K237 --mode test --model_name rotate

The output will be printed as shown below, and will also be available in the results folder under dataset/symmetry:

2021-10-04 14:00:57,100 - [INFO] - Starting test1 with rotate model
2021-10-04 14:03:23,249 - [INFO] - On test1, MR: 1.2407678244972578, MRR: 0.9400152688974949, Hits@1: 0.9014624953269958, Hits@5: 0.988482654094696, Hits@10: 0.9965264797210693
2021-10-04 14:03:23,249 - [INFO] - Starting test2 with rotate model
2021-10-04 14:04:15,614 - [INFO] - On test2, MR: 23.446483180428135, MRR: 0.4409348919640765, Hits@1: 0.30351680517196655, Hits@5: 0.5894495248794556, Hits@10: 0.7025994062423706
2021-10-04 14:04:15,614 - [INFO] - Starting test3 with rotate model
2021-10-04 14:04:25,364 - [INFO] - On test3, MR: 1018.9469026548672, MRR: 0.04786047740344238, Hits@1: 0.008849557489156723, Hits@5: 0.06194690242409706, Hits@10: 0.12389380484819412
2021-10-04 14:04:25,365 - [INFO] - Starting test4 with rotate model
2021-10-04 14:05:38,900 - [INFO] - On test4, MR: 4901.459, MRR: 0.07606098649786266, Hits@1: 0.9496666789054871, Hits@5: 0.893666684627533, Hits@10: 0.8823333382606506

Hierarchy behavioral tests on distmult or rotate:

   python tests/run.py --dataset FB15K237 --mode test --capability hierarchy --model_name rotate

Run Tests on other models and other frameworks

(To be added)

You might also like...
Convolutional 2D Knowledge Graph Embeddings resources

ConvE Convolutional 2D Knowledge Graph Embeddings resources. Paper: Convolutional 2D Knowledge Graph Embeddings Used in the paper, but do not use thes

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

Switch spaces for knowledge graph embeddings

SwisE Switch spaces for knowledge graph embeddings. Requirements: python3 pytorch numpy tqdm Reproduce the results To reproduce the reported results,

Large-scale Knowledge Graph Construction with Prompting

Large-scale Knowledge Graph Construction with Prompting across tasks (predictive and generative), and modalities (language, image, vision + language, etc.)

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

CrossNER is a fully-labeled collected of named entity recognition (NER) data spanning over five diverse domains (Politics, Natural Science, Music, Literature, and Artificial Intelligence) with specialized entity categories for different domains.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics. Jury offers a smooth and easy-to-use interface. It uses datasets for underlying metric computation, and hence adding custom metric is easy as adopting datasets.Metric.

A library for finding knowledge neurons in pretrained transformer models.
A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?
EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT

Owner
NEC Laboratories Europe
Research software developed at NEC Laboratories Europe
NEC Laboratories Europe
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 89 Dec 18, 2022
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

ParlAI (pronounced “par-lay”) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat, to task-oriented dia

Facebook Research 9.7k Jan 9, 2023
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

ParlAI (pronounced “par-lay”) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat, to task-oriented dia

Facebook Research 7k Feb 18, 2021
Code for evaluating Japanese pretrained models provided by NTT Ltd.

japanese-dialog-transformers 日本語の説明文はこちら This repository provides the information necessary to evaluate the Japanese Transformer Encoder-decoder dialo

NTT Communication Science Laboratories 216 Dec 22, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2.3k Dec 29, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2k Feb 9, 2021
Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

?? Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

LCS2-IIITDelhi 5 Sep 13, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 141 Dec 30, 2022
Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

wangle 823 Dec 28, 2022