🤖 A Python library for learning and evaluating knowledge graph embeddings

PyKEEN

Last update: Jan 9, 2023

Related tags

Deep Learning machine-learning link-prediction knowledge-base-completion knowledge-graph-embeddings knowledge-graphs pykeen

Overview

PyKEEN

PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information).

Installation • Quickstart • Datasets • Models • Support • Citation

Installation

The latest stable version of PyKEEN can be downloaded and installed from PyPI with:

$ pip install pykeen

The latest version of PyKEEN can be installed directly from the source on GitHub with:

$ pip install git+https://github.com/pykeen/pykeen.git

More information about installation (e.g., development mode, Windows installation, Colab, Kaggle, extras) can be found in the installation documentation.

Quickstart

This example shows how to train a model on a dataset and test on another dataset.

The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.

from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE',
    dataset='nations',
)

The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on using your own dataset, understanding the evaluation, and making novel link predictions.

PyKEEN is extensible such that:

Each model has the same API, so anything from pykeen.models can be dropped in
Each training loop has the same API, so pykeen.training.LCWATrainingLoop can be dropped in
Triples factories can be generated by the user with from pykeen.triples.TriplesFactory

The full documentation can be found at https://pykeen.readthedocs.io.

Implementation

Below are the models, datasets, training modes, evaluators, and metrics implemented in pykeen.

Datasets (27)

The following datasets are built in to PyKEEN. The citation for each dataset corresponds to either the paper describing the dataset, the first paper published using the dataset with knowledge graph embedding models, or the URL for the dataset if neither of the first two are available. If you want to use a custom dataset, see the Bring Your Own Dataset tutorial. If you have a suggestion for another dataset to include in PyKEEN, please let us know here.

Name	Documentation	Citation	Entities	Relations	Triples
Clinical Knowledge Graph	`pykeen.datasets.CKG`	Santos et al., 2020	7617419	11	26691525
CN3l Family	`pykeen.datasets.CN3l`	Chen et al., 2017	3206	42	21777
CoDEx (large)	`pykeen.datasets.CoDExLarge`	Safavi et al., 2020	77951	69	612437
CoDEx (medium)	`pykeen.datasets.CoDExMedium`	Safavi et al., 2020	17050	51	206205
CoDEx (small)	`pykeen.datasets.CoDExSmall`	Safavi et al., 2020	2034	42	36543
ConceptNet	`pykeen.datasets.ConceptNet`	Speer et al., 2017	28370083	50	34074917
Countries	`pykeen.datasets.Countries`	Bouchard et al., 2015	271	2	1158
Commonsense Knowledge Graph	`pykeen.datasets.CSKG`	Ilievski et al., 2020	2087833	58	4598728
DB100K	`pykeen.datasets.DB100K`	Ding et al., 2018	99604	470	697479
DBpedia50	`pykeen.datasets.DBpedia50`	Shi et al., 2017	24624	351	34421
Drug Repositioning Knowledge Graph	`pykeen.datasets.DRKG`	`gnn4dr/DRKG`	97238	107	5874257
FB15k	`pykeen.datasets.FB15k`	Bordes et al., 2013	14951	1345	592213
FB15k-237	`pykeen.datasets.FB15k237`	Toutanova et al., 2015	14505	237	310079
Hetionet	`pykeen.datasets.Hetionet`	Himmelstein et al., 2017	45158	24	2250197
Kinships	`pykeen.datasets.Kinships`	Kemp et al., 2006	104	25	10686
Nations	`pykeen.datasets.Nations`	`ZhenfengLei/KGDatasets`	14	55	1992
OGB BioKG	`pykeen.datasets.OGBBioKG`	Hu et al., 2020	45085	51	5088433
OGB WikiKG	`pykeen.datasets.OGBWikiKG`	Hu et al., 2020	2500604	535	17137181
OpenBioLink	`pykeen.datasets.OpenBioLink`	Breit et al., 2020	180992	28	4563407
OpenBioLink	`pykeen.datasets.OpenBioLinkLQ`	Breit et al., 2020	480876	32	27320889
Unified Medical Language System	`pykeen.datasets.UMLS`	`ZhenfengLei/KGDatasets`	135	46	6529
WD50K (triples)	`pykeen.datasets.WD50KT`	Galkin et al., 2020	40107	473	232344
WK3l-120k Family	`pykeen.datasets.WK3l120k`	Chen et al., 2017	119748	3109	1375406
WK3l-15k Family	`pykeen.datasets.WK3l15k`	Chen et al., 2017	15126	1841	209041
WordNet-18	`pykeen.datasets.WN18`	Bordes et al., 2014	40943	18	151442
WordNet-18 (RR)	`pykeen.datasets.WN18RR`	Toutanova et al., 2015	40559	11	92583
YAGO3-10	`pykeen.datasets.YAGO310`	Mahdisoltani et al., 2015	123143	37	1089000

Models (30)

Name	Reference	Citation
CompGCN	`pykeen.models.CompGCN`	Vashishth et al., 2020
ComplEx	`pykeen.models.ComplEx`	Trouillon et al., 2016
ComplEx Literal	`pykeen.models.ComplExLiteral`	Kristiadi et al., 2018
ConvE	`pykeen.models.ConvE`	Dettmers et al., 2018
ConvKB	`pykeen.models.ConvKB`	Nguyen et al., 2018
CrossE	`pykeen.models.CrossE`	Zhang et al., 2019
DistMA	`pykeen.models.DistMA`	Shi et al., 2019
DistMult	`pykeen.models.DistMult`	Yang et al., 2014
DistMult Literal	`pykeen.models.DistMultLiteral`	Kristiadi et al., 2018
ER-MLP	`pykeen.models.ERMLP`	Dong et al., 2014
ER-MLP (E)	`pykeen.models.ERMLPE`	Sharifzadeh et al., 2019
HolE	`pykeen.models.HolE`	Nickel et al., 2016
KG2E	`pykeen.models.KG2E`	He et al., 2015
MuRE	`pykeen.models.MuRE`	Balažević et al., 2019
NTN	`pykeen.models.NTN`	Socher et al., 2013
PairRE	`pykeen.models.PairRE`	Chao et al., 2020
ProjE	`pykeen.models.ProjE`	Shi et al., 2017
QuatE	`pykeen.models.QuatE`	Zhang et al., 2019
RESCAL	`pykeen.models.RESCAL`	Nickel et al., 2011
R-GCN	`pykeen.models.RGCN`	Schlichtkrull et al., 2018
RotatE	`pykeen.models.RotatE`	Sun et al., 2019
SimplE	`pykeen.models.SimplE`	Kazemi et al., 2018
Structured Embedding	`pykeen.models.StructuredEmbedding`	Bordes et al., 2011
TorusE	`pykeen.models.TorusE`	Ebisu et al., 2018
TransD	`pykeen.models.TransD`	Ji et al., 2015
TransE	`pykeen.models.TransE`	Bordes et al., 2013
TransH	`pykeen.models.TransH`	Wang et al., 2014
TransR	`pykeen.models.TransR`	Lin et al., 2015
TuckER	`pykeen.models.TuckER`	Balažević et al., 2019
Unstructured Model	`pykeen.models.UnstructuredModel`	Bordes et al., 2014

Losses (7)

Name	Reference	Description
Binary cross entropy (after sigmoid)	`pykeen.losses.BCEAfterSigmoidLoss`	A module for the numerically unstable version of explicit Sigmoid + BCE loss.
Binary cross entropy (with logits)	`pykeen.losses.BCEWithLogitsLoss`	A module for the binary cross entropy loss.
Cross entropy	`pykeen.losses.CrossEntropyLoss`	A module for the cross entropy loss that evaluates the cross entropy after softmax output.
Margin ranking	`pykeen.losses.MarginRankingLoss`	A module for the margin ranking loss.
Mean square error	`pykeen.losses.MSELoss`	A module for the mean square error loss.
Self-adversarial negative sampling	`pykeen.losses.NSSALoss`	An implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_.
Softplus	`pykeen.losses.SoftplusLoss`	A module for the softplus loss.

Regularizers (5)

Name	Reference	Description
combined	`pykeen.regularizers.CombinedRegularizer`	A convex combination of regularizers.
lp	`pykeen.regularizers.LpRegularizer`	A simple L_p norm based regularizer.
no	`pykeen.regularizers.NoRegularizer`	A regularizer which does not perform any regularization.
powersum	`pykeen.regularizers.PowerSumRegularizer`	A simple x^p based regularizer.
transh	`pykeen.regularizers.TransHRegularizer`	A regularizer for the soft constraints in TransH.

Optimizers (6)

Name	Reference	Description
adadelta	`torch.optim.Adadelta`	Implements Adadelta algorithm.
adagrad	`torch.optim.Adagrad`	Implements Adagrad algorithm.
adam	`torch.optim.Adam`	Implements Adam algorithm.
adamax	`torch.optim.Adamax`	Implements Adamax algorithm (a variant of Adam based on infinity norm).
adamw	`torch.optim.AdamW`	Implements AdamW algorithm.
sgd	`torch.optim.SGD`	Implements stochastic gradient descent (optionally with momentum).

Training Loops (2)

Name	Reference	Description
lcwa	`pykeen.training.LCWATrainingLoop`	A training loop that uses the local closed world assumption training approach.
slcwa	`pykeen.training.SLCWATrainingLoop`	A training loop that uses the stochastic local closed world assumption training approach.

Negative Samplers (3)

Name	Reference	Description
basic	`pykeen.sampling.BasicNegativeSampler`	A basic negative sampler.
bernoulli	`pykeen.sampling.BernoulliNegativeSampler`	An implementation of the Bernoulli negative sampling approach proposed by [wang2014]_.
pseudotyped	`pykeen.sampling.PseudoTypedNegativeSampler`	A sampler that accounts for which entities co-occur with a relation.

Stoppers (2)

Name	Reference	Description
early	`pykeen.stoppers.EarlyStopper`	A harness for early stopping.
nop	`pykeen.stoppers.NopStopper`	A stopper that does nothing.

Evaluators (2)

Name	Reference	Description
rankbased	`pykeen.evaluation.RankBasedEvaluator`	A rank-based evaluator for KGE models.
sklearn	`pykeen.evaluation.SklearnEvaluator`	An evaluator that uses a Scikit-learn metric.

Metrics (16)

Name	Description
AUC-ROC	The area under the ROC curve, on [0, 1]. Higher is better.
Adjusted Arithmetic Mean Rank (AAMR)	The mean over all chance-adjusted ranks, on (0, 2). Lower is better.
Adjusted Arithmetic Mean Rank Index (AAMRI)	The re-indexed adjusted mean rank (AAMR), on [-1, 1]. Higher is better.
Average Precision	The area under the precision-recall curve, on [0, 1]. Higher is better.
Geometric Mean Rank (GMR)	The geometric mean over all ranks, on [1, inf). Lower is better.
Harmonic Mean Rank (HMR)	The harmonic mean over all ranks, on [1, inf). Lower is better.
Hits @ K	The relative frequency of ranks not larger than a given k, on [0, 1]. Higher is better
Inverse Arithmetic Mean Rank (IAMR)	The inverse of the arithmetic mean over all ranks, on (0, 1]. Higher is better.
Inverse Geometric Mean Rank (IGMR)	The inverse of the geometric mean over all ranks, on (0, 1]. Higher is better.
Inverse Median Rank	The inverse of the median over all ranks, on (0, 1]. Higher is better.
Mean Rank (MR)	The arithmetic mean over all ranks on, [1, inf). Lower is better.
Mean Reciprocal Rank (MRR)	The inverse of the harmonic mean over all ranks, on (0, 1]. Higher is better.
Median Rank	The median over all ranks, on [1, inf). Lower is better.

Trackers (7)

Name	Reference	Description
console	`pykeen.trackers.ConsoleResultTracker`	A class that directly prints to console.
csv	`pykeen.trackers.CSVResultTracker`	Tracking results to a CSV file.
json	`pykeen.trackers.JSONResultTracker`	Tracking results to a JSON lines file.
mlflow	`pykeen.trackers.MLFlowResultTracker`	A tracker for MLflow.
neptune	`pykeen.trackers.NeptuneResultTracker`	A tracker for Neptune.ai.
tensorboard	`pykeen.trackers.TensorBoardResultTracker`	A tracker for TensorBoard.
wandb	`pykeen.trackers.WANDBResultTracker`	A tracker for Weights and Biases.

Hyper-parameter Optimization

Samplers (3)

Name	Reference	Description
grid	`optuna.samplers.GridSampler`	Sampler using grid search.
random	`optuna.samplers.RandomSampler`	Sampler using random sampling.
tpe	`optuna.samplers.TPESampler`	Sampler using TPE (Tree-structured Parzen Estimator) algorithm.

Any sampler class extending the optuna.samplers.BaseSampler, such as their sampler implementing the CMA-ES algorithm, can also be used.

Experimentation

Reproduction

PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:

$ pykeen experiments reproduce tucker balazevic2019 fb15k

Where the three arguments are the model name, the reference, and the dataset. The output directory can be optionally set with -d.

Ablation

PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:

$ pykeen experiments ablation ~/path/to/config.json

Large-scale Reproducibility and Benchmarking Study

We used PyKEEN to perform a large-scale reproducibility and benchmarking study which are described in our article:

@article{ali2020benchmarking,
  title={Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework},
  author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Galkin, Mikhail and Sharifzadeh, Sahand and Fischer, Asja and Tresp, Volker and Lehmann, Jens},
  journal={arXiv preprint arXiv:2006.13365},
  year={2020}
}

We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/benchmarking.

Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

Acknowledgements

Supporters

This project has been supported by several organizations (in alphabetical order):

Funding

The development of PyKEEN has been funded by the following grants:

Funding Body	Program	Grant
DARPA	Automating Scientific Knowledge Extraction (ASKE)	HR00111990009
German Federal Ministry of Education and Research (BMBF)	Maschinelles Lernen mit Wissensgraphen (MLWin)	01IS18050D
German Federal Ministry of Education and Research (BMBF)	Munich Center for Machine Learning (MCML)	01IS18036A
Innovation Fund Denmark (Innovationsfonden)	Danish Center for Big Data Analytics driven Innovation (DABAI)	Grand Solutions

Logo

The PyKEEN logo was designed by Carina Steinborn

Citation

If you have found PyKEEN useful in your work, please consider citing our article:

@article{ali2021pykeen,
    author = {Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Sharifzadeh, Sahand and Tresp, Volker and Lehmann, Jens},
    journal = {Journal of Machine Learning Research},
    number = {82},
    pages = {1--6},
    title = {{PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings}},
    url = {http://jmlr.org/papers/v22/20-825.html},
    volume = {22},
    year = {2021}
}

Comments

💃 📦 Add BoxE

Closes #613

Following on the issue (https://github.com/pykeen/pykeen/issues/613), I have now written a preliminary version of binary BoxE (which I called BoxE-KG), and am sharing it as is as you suggested. It is based on an interaction (BoxEKGInteraction), and the file with my additions is under src/pykeen/models/unimodal: I instantiated boxes analogously to my official code, and set initializations consistently as well. I have also implemented the BoxE distance function and other helpers.

~I have tried to train the model, and this runs. However, I have had some erratic behavior with NSSALoss: When I run this in the forward function (to double-check, this is currently commented-out), I get sensible values. But when I pass this loss to the pipeline, the loss values are minuscule, starting at around 0.001! So I would appreciate some help understanding this. Otherwise, we can proceed from there to verify the model.~ <- This is all solved!!

Dependencies:

[x] #623
[x] https://github.com/pykeen/pykeen/pull/624
[x] https://github.com/pykeen/pykeen/pull/626

Test Code

Test code

import numpy as np
import torch

from pykeen.losses import NSSALoss
from pykeen.datasets import WN18RR
from pykeen.models.unimodal.boxe_kg import BoxE
from pykeen.pipeline import pipeline

from torch.nn import functional

# TODO: Align optimizer settings: Constant LR

class NSSALossLogging(NSSALoss):
    def forward(
        self,
        pos_scores: torch.FloatTensor,
        neg_scores: torch.FloatTensor,
        neg_weights: torch.FloatTensor,
    ) -> torch.FloatTensor:
        # copy of NSSALoss.forward
        neg_loss = functional.logsigmoid(-neg_scores - self.margin)
        neg_loss = neg_weights * neg_loss
        neg_loss = self._reduction_method(neg_loss)
        print("-", -neg_loss.item())

        pos_loss = functional.logsigmoid(self.margin + pos_scores)
        pos_loss = self._reduction_method(pos_loss)
        print("+", -pos_loss.item())

        loss = -pos_loss - neg_loss

        if self._reduction_method is torch.mean:
            loss = loss / 2.0

        return loss



def main():
    embedding_dim = 500
    unif_init_bound = 2 * np.sqrt(embedding_dim)
    init_kw = dict(a=-1 / unif_init_bound, b=1 / unif_init_bound)
    size_init_kw = dict(a=-1, b=1)
    dataset = WN18RR()
    triples_factory = dataset.training
    model = BoxE(
        triples_factory=triples_factory,
        embedding_dim=500,
        norm_order=2,
        tanh_map=True,
        entity_initializer=torch.nn.init.uniform_,
        entity_initializer_kwargs=init_kw,
        relation_initializer=torch.nn.init.uniform_,
        relation_initializer_kwargs=init_kw,
        relation_size_initializer=torch.nn.init.uniform_,
        relation_size_initializer_kwargs=size_init_kw,
    )

    results = pipeline(
        random_seed=1000000,
        dataset=dataset,
        model=model,
        training_kwargs=dict(num_epochs=300, batch_size=512, checkpoint_name="tria.pt", checkpoint_frequency=100),
        loss=NSSALossLogging(margin=5, adversarial_temperature=0.0, reduction="sum"),
        lr_scheduler=torch.optim.lr_scheduler.ConstantLR,
        lr_scheduler_kwargs=dict(total_iters=0),
        training_loop="sLCWA",
        negative_sampler="basic",
        negative_sampler_kwargs=dict(num_negs_per_pos=150),
        result_tracker="json",
        result_tracker_kwargs=dict(name="test.json"),
        evaluation_kwargs=dict(batch_size=16),
        optimizer=torch.optim.Adam,
        optimizer_kwargs=dict(lr=0.001)   # Cancel out the thing
    )

if __name__ == '__main__':
    main()

💃 Model 💎 New Component

opened by ralphabb 60

🦆 🐍 Inductive LP framework
Closes #720.

This PR brings the support of inductive link prediction in PyKEEN with new datasets and training loops.

Quite a lot of things going on here and I had to take some design decisions, so I'll list important features and TODOs, and please feel free to edit the design decisions.

Inductive setup means that at inference time (validation and test) we run predictions on a new graph comprised of new entities. In the seminal work of Teru et al there exist Training graph (on which we train a model) and Inductive Inference graph - the inductive inference graph is then split into 3 parts: main graph, missing validation triples, and missing test triples. However, as we classified in the ISWC'21 paper, there exist other inductive scenarios where nodes might get added to the training graph as well. The main assumption: all relations must be seen in the transductive training graph, such that we can learn at least relation embeddings.

A new InductiveDataset class now includes at least 4 factories:

transductive_training - on which we train and get the known relations

inductive_inference - the inference graph on which we are supposed to run a GNN model (or anything else)

inductive_validation - missing triples from inductive_inference to predict as validation triples

inductive_test - missing triples from inductive_inference to predict as validation triples

I thought of further specializing this dataset into DisjointInductiveDataset where InductiveInference is totally disjoint from the TransductiveTraining and MixedInductiveDataset where InductiveInference might be an updated, bigger version of the TransductiveTraining graph. So far I kept the base class only

Because of 4 factories, I had to create some loading spaghetti LazyInductiveDataset, DisjointInductivePathDataset, UnpackedRemoteDisjointInductiveDataset in order to create the loaders for 12 standard ILP datasets from Teru et al. The data splits exist right on github, so it's the inductive version of the UnpackedRemoteDataset. Each of 3 datasets (FB15k-237, WN18RR, NELL-995) consists of 4 versions that differ by the sizes of training and inductive inference graphs. Each loader has the default v1 version. The downloading procedures create relevant subfolders in the PYKEEN_DATASETS home. I tried loading all of them and it works

Crucial part of the loading: all inductive splits (inference, validation, test) share the same RELATIONS index from the transductive training part, i.e., relation2id in the TripleFactory creation process must belong to the original training.

Next steps are:

Adding the Inductive Training Loop - where training instances are obtained from the transductive training factory

Adding the ILP Evaluator - where evaluation instances are obtained from the inductive factories

TODOs:

[x] InductiveDataset class

[x] Loading Helpers

[x] 12 ILP datasets from Teru et al

[x] Inductive Training Loop + InductiveSLCWA + Inductive LCWA

[x] Inductive Evaluator

[x] Inductive NodePiece

[x] Inductive LP MVP

[x] A restricted inductive evaluator to replicate the Teru et al setting to evaluate the model only on 50 randomly selected entities from the inference graph

[x] More generic model interfaces

[x] Add easy integration of NodePiece (featurizer) + GNN (graph encoder) + any interaction function (as link prediction decoder). Experimentally, it works even better than plain NodePiece + interaction

Spin-off PRs

https://github.com/pykeen/pykeen/pull/729

https://github.com/pykeen/pykeen/pull/733

https://github.com/pykeen/pykeen/pull/734

https://github.com/pykeen/pykeen/pull/736

https://github.com/pykeen/pykeen/pull/743

https://github.com/pykeen/pykeen/pull/769
opened by migalkin 35
💃 🥤 Extract interaction function from models
This is a replacement for #88 , where the merge target is master.

@mali-git As discussed in today's call, I tried to draft an API for the interaction function. It is built around the most generic form of interaction function, which has one batch dimension, and then allows broadcasting over multiple entities/relations to meet the use cases for e.g. scoring all tail entities at once, but also supports, e.g. full CWA scores.

This can likely also help for the fast LCWA @lvermue once envisioned 😉

I did not define the methods to be static to allow for parametric interaction functions such as e.g. ER-MLP having some weights.

Overview

One shared implementation for score_hrt / score_h / score_r / score_t in the base class, done in InteractionFunction.

One shared implementation of _score for all models sharing the same set of embeddings (e.g. TransE/DistMult/ERMLP -> one vector for each entity/relation, TransH -> additional vector for each entity, etc.)

A state-less functional form of the interaction function where all necessary states are passed from the outside. This is done in pykeen.nn.modules.

A state-full implementation of interaction function encapsulating all shared parameters (e.g. weight matrices for ERMLP / ConvE, etc.), but delegating the actual interaction to the state-less version. This is done in pykeen.nn.functional.

Tasks:

[x] What to do with interaction models where we have more than one vector for an entity/relation, such as e.g. TransH?

[x] Re-introduce slicing (In best case, on the generic level)

[x] Re-introduce regularization on generic level

[x] Fix R-GCN (or keep it broken for #110 ?)

[x] Add regularizer/constrainer directly to modules, recursively search for modules having regularize and accumulate value.

[x] Update to reshape in the generic Interaction

~[ ] Update pipeline model composition~ bumped to https://github.com/pykeen/pykeen/pull/163

Dependencies:

[x] #137

enhancement
opened by mberr 23
How to upgrade PyKEEN<1.8.0 code that uses `EmbeddingSpecification`?

Describe the bug

It seems that EmbeddingSpecification is no longer under pykeen.nn.representation.

How to reproduce

from pykeen.nn.representation import EmbeddingSpecification

Environment

PyKEEN | 1.8.0

Additional information

No response
question

opened by thtang 19
🚧🔦 Update R-GCN configuration
This PR updates the RGCN implementation and experiment configuration.

In particular it

[x] fixes some errors in the old experiment configurations

[x] converts the JSON configurations to YAML (see also #612 ), and adds extensive comments to the fb15k version

[x] add gradient clipping (see also #607) , cf. here

[x] learns separate decompositions for forward and backward edges (aka "normal" and inverse relations), and does not include the self-loop in any of them, but rather learns one additional independent weight for it

[x] removes batch normalization (which is not part of the original model)

[x] adds FB15k237 configuration

[x] ~makes sure that the graph sampler is used for batch sampling rather than sampling individual triples~ solved in #614

While the changes improve the results obtained in the reproduction setting (at least for fb15k), they cannot achieve the reported performance.

Dependencies

[x] #607

[x] #612

[x] #614

Related:

#603

https://github.com/MichSchli/RelationPrediction/issues/6

https://github.com/MichSchli/RelationPrediction/issues/10

☠️ R-GCN ☠️
opened by mberr 18
Support for text on Literal models

I am currently working on a project on knowledge graph embeddings and I wanted to test the efficiency of models which use literals to enrich the embeddings (i.e. LiteralE models). I have seen that your library already provides an implementation of the models DistMult and ComplEx which use information from numerical literals and I was wondering if you thought about providing support also for textual literals such as in here: https://github.com/SmartDataAnalytics/LiteralE. The issue is that the repository mentioned above is no more maintained and lacks of utilities such as hyper parameter optimization that you already offer in your suite.
enhancement

opened by sntcristian 18
Improve vectorization of novelty computation
This PR improves the vectorization of novelty computation for predict_heads / predict_tails.

Fixes #49

Still to do:

[x] Provide fast implementation for scoring/sorting all possible triples (@mberr)

[x] Provide in documentation tutorial about making predictions (@cthoyt)
opened by mberr 18

ValueError: need at least one array to concatenate in rank_based_evaluator.py

Describe the bug

When running a pipeline to train a model it calls to the rank_based_evaluator.py script. There, the line c_ranks = np.concatenate([ranks_flat[side, rank_type] for side in sides]) outputs the following error:

 File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

How to reproduce

I haven't developed a script to make the error reproducible, but it happens when calling the pipeline function:

results = pipeline(
    training=training,
    testing=testing,
    validation=validation,
    model=model,
    model_kwargs=dict(
        embedding_dim=embedding
    ),
    loss=loss,
    training_loop='sLCWA',
    negative_sampler='basic',
    result_tracker='tensorboard',
    result_tracker_kwargs=dict(
        experiment_path=logdir,
    ),
    training_kwargs=dict(
        num_epochs=epochs,
        batch_size=batch_size,
        sampler=sampler),
    stopper='early',
    stopper_kwargs=dict(
        frequency=5,
        patience=10,
        relative_delta=0.05,
        metric='adjusted_mean_rank_index',
    ),
    random_seed=42,
    device=device
)

Environment

Unable to handle parameter in AutoSF: coefficients | Key | Value | |-----------------|-----------------------------| | OS | posix | | Platform | Linux | | Release | 3.10.0-1160.36.2.el7.x86_64 | | Time | Mon Mar 7 17:27:26 2022 | | Python | 3.8.12 | | PyKEEN | 1.7.1-dev | | PyKEEN Hash | UNHASHED | | PyKEEN Branch | | | PyTorch | 1.10.2 | | CUDA Available? | false | | CUDA Version | 10.2 | | cuDNN Version | 7605 |

Additional information

I suspect that it's related with some recent update, because I have other environment in which the code runs perfectly. The numpy version is the same in both environments. My current environment is the following:

 Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    conda-forge
_openmp_mutex             4.5                       1_gnu  
absl-py                   0.15.0             pyhd3eb1b0_0  
aiohttp                   3.8.1            py38h7f8727e_0  
aiosignal                 1.2.0              pyhd3eb1b0_0  
alembic                   1.7.6                    pypi_0    pypi
async-timeout             4.0.1              pyhd3eb1b0_0  
attrs                     21.4.0             pyhd3eb1b0_0  
autopage                  0.5.0                    pypi_0    pypi
blas                      1.0                         mkl    conda-forge
blinker                   1.4              py38h06a4308_0  
bottleneck                1.3.2            py38heb32a55_1  
brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
bzip2                     1.0.8                h7b6447c_0  
c-ares                    1.18.1               h7f8727e_0  
ca-certificates           2021.10.8            ha878542_0    conda-forge
cachetools                4.2.2              pyhd3eb1b0_0  
certifi                   2021.10.8        py38h578d9bd_1    conda-forge
cffi                      1.15.0           py38hd667e15_1  
charset-normalizer        2.0.12             pyhd8ed1ab_0    conda-forge
class-resolver            0.3.4                    pypi_0    pypi
click                     8.0.4            py38h06a4308_0  
click-default-group       1.2.2                    pypi_0    pypi
cliff                     3.10.1                   pypi_0    pypi
cmaes                     0.8.2                    pypi_0    pypi
cmd2                      2.4.0                    pypi_0    pypi
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
colorlog                  6.6.0                    pypi_0    pypi
cryptography              35.0.0           py38ha5dfef3_0    conda-forge
cudatoolkit               10.2.89              hfd86e86_1  
dataclasses               0.8                pyh6d0b6a4_7  
dataclasses-json          0.5.6                    pypi_0    pypi
decorator                 4.4.2                      py_0    conda-forge
docdata                   0.0.3                    pypi_0    pypi
docrep                    0.3.2                    pypi_0    pypi
ffmpeg                    4.3                  hf484d3e_0    pytorch
freetype                  2.11.0               h70c0345_0  
frozenlist                1.2.0            py38h7f8727e_0  
giflib                    5.2.1                h7b6447c_0  
gmp                       6.2.1                h2531618_2  
gnutls                    3.6.15               he1e5248_0  
google-auth               1.33.0             pyhd3eb1b0_0  
google-auth-oauthlib      0.4.1                      py_2  
googledrivedownloader     0.4                pyhd3deb0d_1    conda-forge
greenlet                  1.1.2                    pypi_0    pypi
grpcio                    1.42.0           py38hce63b2e_0  
html5lib                  1.1                pyh9f0ad1d_0    conda-forge
idna                      3.3                pyhd8ed1ab_0    conda-forge
importlib-metadata        4.11.2                   pypi_0    pypi
importlib-resources       5.4.0                    pypi_0    pypi
inflect                   5.4.0                    pypi_0    pypi
intel-openmp              2021.4.0          h06a4308_3561  
isodate                   0.6.1              pyhd8ed1ab_0    conda-forge
jinja2                    3.0.3              pyhd8ed1ab_0    conda-forge
joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h7f8727e_0  
lame                      3.100                h7b6447c_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.35.1               h7274673_9  
libblas                   3.9.0            12_linux64_mkl    conda-forge
libcblas                  3.9.0            12_linux64_mkl    conda-forge
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.3.0               h5101ec6_17  
libgfortran-ng            7.5.0               h14aa051_20    conda-forge
libgfortran4              7.5.0               h14aa051_20    conda-forge
libgomp                   9.3.0               h5101ec6_17  
libiconv                  1.15                 h63c8f33_5  
libidn2                   2.3.2                h7f8727e_0  
liblapack                 3.9.0            12_linux64_mkl    conda-forge
libpng                    1.6.37               hbc83047_0  
libprotobuf               3.19.1               h4ff587b_0  
libstdcxx-ng              9.3.0               hd4cf53a_17  
libtasn1                  4.16.0               h27cfd23_0  
libtiff                   4.2.0                h85742a9_0  
libunistring              0.9.10               h27cfd23_0  
libuv                     1.40.0               h7b6447c_0  
libwebp                   1.2.2                h55f646e_0  
libwebp-base              1.2.2                h7f8727e_0  
lz4-c                     1.9.3                h295c915_1  
mako                      1.1.6                    pypi_0    pypi
markdown                  3.3.4            py38h06a4308_0  
markupsafe                2.1.0                    pypi_0    pypi
marshmallow               3.14.1                   pypi_0    pypi
marshmallow-enum          1.5.1                    pypi_0    pypi
mkl                       2021.4.0           h06a4308_640  
mkl-service               2.4.0            py38h7f8727e_0  
mkl_fft                   1.3.1            py38hd3c417c_0  
mkl_random                1.2.2            py38h51133e4_0  
more-click                0.0.6                    pypi_0    pypi
more-itertools            8.12.0                   pypi_0    pypi
multidict                 5.2.0            py38h7f8727e_2  
mypy-extensions           0.4.3                    pypi_0    pypi
ncurses                   6.3                  h7f8727e_2  
nettle                    3.7.3                hbbd107a_1  
networkx                  2.5.1              pyhd8ed1ab_0    conda-forge
numexpr                   2.8.1            py38h6abb31d_0  
numpy                     1.20.3                   pypi_0    pypi
oauthlib                  3.1.0                      py_0  
openh264                  2.1.1                h4ff587b_0  
openssl                   1.1.1m               h7f8727e_0  
optuna                    2.10.0                   pypi_0    pypi
packaging                 21.3               pyhd3eb1b0_0  
pandas                    1.3.5                    pypi_0    pypi
pbr                       5.8.1                    pypi_0    pypi
pillow                    9.0.1            py38h22f2fdc_0  
pip                       21.2.4           py38h06a4308_0  
prettytable               3.2.0                    pypi_0    pypi
protobuf                  3.19.1           py38h295c915_0  
pyasn1                    0.4.8              pyhd3eb1b0_0  
pyasn1-modules            0.2.8                      py_0  
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyg                       2.0.3           py38_torch_1.10.0_cu102    pyg
pyjwt                     1.7.1                    py38_0  
pykeen                    1.7.1.dev0               pypi_0    pypi
pyopenssl                 22.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.7              pyhd8ed1ab_0    conda-forge
pyperclip                 1.8.2                    pypi_0    pypi
pysocks                   1.7.1            py38h578d9bd_4    conda-forge
pystow                    0.4.0                    pypi_0    pypi
python                    3.8.12               h12debd9_0  
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-louvain            0.15               pyhd8ed1ab_1    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.10.2          py3.8_cuda10.2_cudnn7.6.5_0    pytorch
pytorch-cluster           1.5.9           py38_torch_1.10.0_cu102    pyg
pytorch-mutex             1.0                        cuda    pytorch
pytorch-scatter           2.0.9           py38_torch_1.10.0_cu102    pyg
pytorch-sparse            0.6.12          py38_torch_1.10.0_cu102    pyg
pytorch-spline-conv       1.2.1           py38_torch_1.10.0_cu102    pyg
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0                      pypi_0    pypi
rdflib                    6.1.1              pyhd8ed1ab_0    conda-forge
readline                  8.1.2                h7f8727e_1  
requests                  2.27.1             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.0                      py_0  
rexmex                    0.1.0                    pypi_0    pypi
rsa                       4.7.2              pyhd3eb1b0_1  
scikit-learn              1.0.2            py38h51133e4_1  
scipy                     1.8.0                    pypi_0    pypi
setuptools                58.0.4           py38h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sklearn                   0.0                      pypi_0    pypi
sqlalchemy                1.4.32                   pypi_0    pypi
sqlite                    3.37.2               hc218d9a_0  
stevedore                 3.5.0                    pypi_0    pypi
tabulate                  0.8.9                    pypi_0    pypi
tensorboard               2.6.0                      py_1  
tensorboard-data-server   0.6.0            py38hca6d32c_0  
tensorboard-plugin-wit    1.6.0                      py_0  
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tk                        8.6.11               h1ccaba5_0  
torchaudio                0.10.2               py38_cu102    pytorch
torchvision               0.11.3               py38_cu102    pytorch
tqdm                      4.63.0             pyhd8ed1ab_0    conda-forge
typing-extensions         3.10.0.2             hd3eb1b0_0  
typing-inspect            0.7.1                    pypi_0    pypi
typing_extensions         3.10.0.2           pyh06a4308_0  
urllib3                   1.26.8             pyhd8ed1ab_1    conda-forge
wcwidth                   0.2.5                    pypi_0    pypi
webencodings              0.5.1                      py_1    conda-forge
werkzeug                  2.0.3              pyhd3eb1b0_0  
wheel                     0.37.1             pyhd3eb1b0_0  
wrapt                     1.13.3                   pypi_0    pypi
xz                        5.2.5                h7b6447c_0  
yacs                      0.1.8              pyhd8ed1ab_0    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
yarl                      1.6.3            py38h27cfd23_0  
zipp                      3.7.0              pyhd3eb1b0_0  
zlib                      1.2.11               h7f8727e_4  
zstd                      1.4.9                haebb681_0

bug

opened by AlejandroTL 17

📡 📉 Adding Tensorboard Tracker
This PR closes #383

Adding tensorboard as a results tracker.

What you need to test:

pip install tensorboard

start the tensorboard process with the default logging path:tensorboard --logdir=.data/pykeen/logs/tensorboard/

Issues:

I am not happy with the current naming scheme for the experiment log dir.

The params are current saved as text within the tb log. There is a add_hparams function, but I don't think it is quite what is needed here.

It would be great to distinguish between train and eval parameters when logging. It might also be worth considering if all the metrics actually need to be logged to tensorboard.

Tensorboard has great support for visualizing embeddings using the add_embedding function - would be great if the final embeddings (or a subset of them more realistically) could be added.

Tasks:

[x] add documentation and examples

[x] update setup.cfg with tensorboard deps

🐺 Tracker 💎 New Component
opened by sbonner0 17
Results of HPO
What is your question

I am using HPO to optimize (Epoch, batch size, embedding dimension, neg_per-positive, learning rate) I got the best model with the best hyperparameters with highest Mrr. but when I am trying the same results of hyperparameter and build the mode , I got MRR value higher than I got in the results of HPO itself. Is this normal ? why some trials have much higher Mrr in HPO but failed?

How can I Know the default value for margin in marginrankingloss function in default case for models and is it the same in all models?

Environment

Key | Value -- | -- OS | posix Platform | Linux Release | 5.10.107+ Time | Mon Jun 13 09:04:32 2022 Python | 3.7.12 PyKEEN | 1.8.1 PyKEEN Hash | UNHASHED PyKEEN Branch | PyTorch | 1.11.0 CUDA Available? | true CUDA Version | 11.0 cuDNN Version | 8005

Issue Template Checks

[X] This is not a bug report (use a different issue template if it is)

[X] This is not a feature request (use a different issue template if it is)

[X] I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed

question
opened by Ahmed-fub 15
Kaggle notebooks are having trouble loading entrypoints
I use the following code on colab or kaggle

!pip install pykeen -q from pykeen.datasets import OpenBioLink

Exception occurs on the import line. this error message pops out:

/usr/local/lib/python3.7/dist-packages/pykeen/datasets/__init__.py in <module>() 75 } 76 if not _DATASETS: ---> 77 raise RuntimeError('Datasets have been loaded with entrypoints since PyKEEN v1.0.5. Please reinstall.') 78 79 #: A mapping of datasets' names to their classes RuntimeError: Datasets have been loaded with entrypoints since PyKEEN v1.0.5. Please reinstall.

It would be fine if I restart the kernel such that the python environment is reloaded at least once after the package installation. However it would make kaggle commit (Save version => Save & Run all (commit)) impossible, cause I have to do the restart manually and interactively, but the commit session is not interactive.

Because of this, I can not produce a saved notebook with output file from pykeen. The only option left is to do everything on interactive session, then download the output file manually, then upload it as a kaggle dataset.

Solution

From https://github.com/pykeen/pykeen/issues/373#issuecomment-821060699:

I just found the workaound. It should work on both colab and kaggle.

from pkg_resources import require require('pykeen')

https://colab.research.google.com/drive/1yFWDQ3OybultFHaNdi_gJCHpBJiB4Iem?usp=sharing

question
opened by jerryIsHere 15
⚡🧪 Bring back lightning tests

As https://github.com/Lightning-AI/lightning/pull/14117 has been fixed, and https://github.com/Lightning-AI/ecosystem-ci/pull/50 is regaining momentum, this PR brings back automatic PyTorch Lightning tests.

opened by mberr 1
Saving Checkpoints to S3 bucket
Problem Statement

While training on AWS sagemaker, it is expensive to keep the checkpoints on the notebook. Being able to upload checkpoints directly to s3 bucket will save time.

Describe the solution you'd like

in check points, allow s3 bucket urls

from pykeen.pipeline import pipeline result = pipeline( model='transe', dataset='nations', training_kwargs=dict(num_epochs=10, checkpoint_name='test_checkpoint.pt', checkpoint_directory='s3://bucket/checkpoints/', ) )

Describe alternatives you've considered

Uploading checkpoints at the end of training for backup.

Additional information

No response

Issue Template Checks

[X] This is not a bug report (use a different issue template if it is)

[X] This is not a question (use the discussions forum instead)

enhancement
opened by mhmgad 0
Make tutorial for enabling different learning rates
Problem Statement

First of all, thanks for the great work with the library! It would be very useful to be able to specify different learning rates. Right now, when running a pipeline, an instance of the optimizer is created by passing all parameters in the model:

https://github.com/pykeen/pykeen/blob/313055e8c846a52a35901f2746a43e5efdae1e3e/src/pykeen/pipeline/api.py#L1035-L1039

However, in some cases we might also want to apply per-parameter options, for example

optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)

Describe the solution you'd like

A possible solution could be an optional dictionary passed when creating the pipeline, e.g. optimizer_params. If it's not provided, then the pipeline would default to the above, otherwise the user could choose different learning rates for modules in a custom model:

optimizer_instance = optimizer_resolver.make( optimizer, optimizer_kwargs, params=optimizer_params if optimizer_params else model_instance.get_grad_params(), )

Describe alternatives you've considered

I tried getting access to the optimizer via a TrainingCallback, and I considered modifying the learning rate for different modules in the pre_step method:

class MultiLearningRateCallback(TrainingCallback): .... pre_step(self, **kwargs): # Here we have access to the optimizer via self.optimizer

The problem is that at this point the optimizer has already been initialized and has been assigned Parameters, which are difficult to map to the original modules.

Additional information

No response

Issue Template Checks

[X] This is not a bug report (use a different issue template if it is)

[X] This is not a question (use the discussions forum instead)

documentation
opened by dfdazac 3
TypeError: missing required positional argument when using `CoreTriplesFactory.from_path_binary()`
Describe the bug

According to the document about saving pipeline result to directory, it said:

training_triples contains the training triples factory, including label-to-id mappings, if used. It has been saved via pykeen.triples.CoreTriplesFactory.to_path_binary(), and can re-loaded via pykeen.triples.CoreTriplesFactory.from_path_binary().

But error occurred when I want to load the triplets_factory from the pipeline result saved before.

triplets_factory = CoreTriplesFactory.from_path_binary('RESCAL_FB15k237/training_triples')

Traceback (most recent call last): File "/home/tzuwei/developing/kge-loss/main.py", line 32, in <module> triplets_factory = CoreTriplesFactory.from_path_binary(f'{save_path}/training_triples') File "/home/tzuwei/.pyenv/versions/loss/lib/python3.9/site-packages/pykeen/triples/triples_factory.py", line 816, in from_path_binary return cls(**cls._from_path_binary(path=path)) TypeError: __init__() missing 2 required positional arguments: 'num_entities' and 'num_relations'

Seems like the data return by cls._from_path_binary does not contain num_entities and num_relations at line 816.

https://github.com/pykeen/pykeen/blob/313055e8c846a52a35901f2746a43e5efdae1e3e/src/pykeen/triples/triples_factory.py#L793-L824

How to reproduce

The following code can reproduce the problem on my machine

from pykeen.pipeline import pipeline from pykeen.models import RESCAL from pykeen.datasets import FB15k237 from pykeen.triples import CoreTriplesFactory result = pipeline( model=RESCAL, dataset=FB15k237, ) result.save_to_directory('RESCAL_FB15k237') triplets_factory = CoreTriplesFactory.from_path_binary('RESCAL_FB15k237/training_triples')

Environment

Unable to handle parameter in CooccurrenceFilteredModel: base | Key | Value | |-----------------|--------------------------| | OS | posix | | Platform | Linux | | Release | 5.15.0-52-generic | | Time | Wed Dec 7 18:02:29 2022 | | Python | 3.9.14 | | PyKEEN | 1.9.0 | | PyKEEN Hash | 1f526edb | | PyKEEN Branch | master | | PyTorch | 1.13.0+cu117 | | CUDA Available? | true | | CUDA Version | 11.7 | | cuDNN Version | 8500 |

Additional information

Thank you for providing this open source project. Your work is very helpful to me. :smile:

Issue Template Checks

[X] This is not a feature request (use a different issue template if it is)

[X] This is not a question (use the discussions forum instead)

[X] I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed

bug
opened by uier 3

Validation Loss

This extracts parts of the training loop related to calculating the epoch loss into a function to re-use it for calculating validation losses.

Example:

from pykeen.datasets import get_dataset
from pykeen.pipeline import pipeline

dataset = get_dataset(dataset="nations")
pipeline(
    dataset=dataset,
    model="mure",
    training_kwargs=dict(
        callbacks="validation-loss",
        callback_kwargs=dict(triples_factory=dataset.validation),
    ),
    result_tracker="console",
)

opened by mberr 0

Releases(v1.9.0)

v1.9.0(Aug 4, 2022)
The theme of this release of PyKEEN is centered on new and exciting representations to bring more kinds of data (text, image, scalar data) into training in an elegant way. Several of these contribute to new functionality for NodePiece.

Training and Evaluation

🔬🔁 Evaluation loop by @mberr in https://github.com/pykeen/pykeen/pull/768

🐦🛑 Early stopping: Reload weights from best epoch by @mberr in https://github.com/pykeen/pykeen/pull/961

🔬🚪 Update evaluator's evaluate to pass through kwargs by @mberr in https://github.com/pykeen/pykeen/pull/938

🌪😿 Fix epoch loss by @mberr in https://github.com/pykeen/pykeen/pull/1021

Datasets

🥨🕸️ Add Global Biotic Interactions (GloBI) dataset by @cthoyt in https://github.com/pykeen/pykeen/pull/947

Fix dataset caching with inverse triples by @mberr in https://github.com/pykeen/pykeen/pull/1034

Models

New

🇨🇴🕸️ Add co-occurence filtered meta model by @mberr in https://github.com/pykeen/pykeen/pull/943

Updates

📏🤝 Add LineaRE interaction by @mberr in https://github.com/pykeen/pykeen/pull/971

✨🤖 Update ERMLP to ERModel by @mberr in https://github.com/pykeen/pykeen/pull/869

✨🤖 Update ERMLP-E to ERModel by @mberr in https://github.com/pykeen/pykeen/pull/872

✨🤖 Update HolE to ER-Model by @mberr in https://github.com/pykeen/pykeen/pull/953

✨🤖 Update TransE to ER-Model by @mberr in https://github.com/pykeen/pykeen/pull/955

✨🤖 Update TransH to ER-Model by @mberr in https://github.com/pykeen/pykeen/pull/954

✨🤖 Update RESCAL to ER-Model by @mberr in https://github.com/pykeen/pykeen/pull/952

✨💀 Phase out old-style model by @mberr in https://github.com/pykeen/pykeen/pull/865

🫶 🧨 Update ConvKB & SE to use einsum by @mberr in https://github.com/pykeen/pykeen/pull/978

🏎️ 🏴‍☠️ Add efficient RGCN implementation by @mberr in https://github.com/pykeen/pykeen/pull/634

🔧➡️ Move Nguyen's TransE configurations into correct directory by @PhaelIshall in https://github.com/pykeen/pykeen/pull/957

Representations

👥📝 Wikidata Textual Representations by @mberr in https://github.com/pykeen/pykeen/pull/966

🔗🗿 Combined Representation by @mberr in https://github.com/pykeen/pykeen/pull/964

🔳🔲 Add PartitionRepresentation by @mberr in https://github.com/pykeen/pykeen/pull/980

📋❇️ Generalize Text Encoders & add a simple one by @mberr in https://github.com/pykeen/pykeen/pull/969

🚚🗽 Add transformed representation by @mberr in https://github.com/pykeen/pykeen/pull/984

👀📇 Simple visual representations by @mberr in https://github.com/pykeen/pykeen/pull/965

🏋️🚂 Tensor Train Representation by @mberr in https://github.com/pykeen/pykeen/pull/989

NodePiece

⚓🔍 NodePiece: GPU-enabled BFS searcher by @migalkin in https://github.com/pykeen/pykeen/pull/990

🏴‍☠️🌊 NodePiece x METIS by @mberr in https://github.com/pykeen/pykeen/pull/988

⚓ 📖 NodePiece documentation on MetisAnchorTokenizer by @migalkin in https://github.com/pykeen/pykeen/pull/1026

Documentation

😺🇪🇬 Explicitly set Sphinx language by @mberr in https://github.com/pykeen/pykeen/pull/951

💥 📒 Add troubleshooting for loading old models by @jas-ho in https://github.com/pykeen/pykeen/pull/963

📘 🚀 Update README by @cthoyt in https://github.com/pykeen/pykeen/pull/1039

📒 🤡 Fix documentation build by @cthoyt in https://github.com/pykeen/pykeen/pull/946

📕 🏴‍☠️ Update docs and deprecations by @mberr in https://github.com/pykeen/pykeen/pull/979

📗 🖊️ Update docs about normalizers and constrainers by @mberr in https://github.com/pykeen/pykeen/pull/1047

Loss

⚔️⚖️ Add adversarially weighted BCE loss by @mberr in https://github.com/pykeen/pykeen/pull/958

⚔️🤔 New procedure for computing AdversarialBCEWithLogits by @migalkin in https://github.com/pykeen/pykeen/pull/997

Predictions

🐉🐉 Score multiple tails at once by @mberr in https://github.com/pykeen/pykeen/pull/949

🔮〰️ Update Prediction Filtering by @mberr in https://github.com/pykeen/pykeen/pull/1048

🔮 🎉 Add inference_mode annotation to get_prediction_df() by @tatiana-iazykova in https://github.com/pykeen/pykeen/pull/1024

🔨🧪 Fix the device in _safe_evaluate() by @migalkin in https://github.com/pykeen/pykeen/pull/1041

Meta

🤖🖐️ Update GHA by @mberr in https://github.com/pykeen/pykeen/pull/959

🦈🧵 Darglint forever by @cthoyt in https://github.com/pykeen/pykeen/pull/985

Misc

🚨📊 Cast kwargs as strings in plot_er by @vsocrates in https://github.com/pykeen/pykeen/pull/945

⛏📲 Add utility to analyze degree distributions by @mberr in https://github.com/pykeen/pykeen/pull/857

🥯✔️ Add max_id/shape verification by @mberr in https://github.com/pykeen/pykeen/pull/983

Use torch_ppr by @mberr in https://github.com/pykeen/pykeen/pull/995

➕🍹 Add ExtraReprMixin by @mberr in https://github.com/pykeen/pykeen/pull/994

🛤️🛢️ Add prefix when tracking pipeline metrics by @mberr in https://github.com/pykeen/pykeen/pull/998

⭕🔺 Update PyG version for CI by @mberr in https://github.com/pykeen/pykeen/pull/1025

#️⃣🐍 Allow passing numpy.ndarray to CoreTriplesFactory by @mberr in https://github.com/pykeen/pykeen/pull/1029

New Contributors

@PhaelIshall made their first contribution in https://github.com/pykeen/pykeen/pull/957

@jas-ho made their first contribution in https://github.com/pykeen/pykeen/pull/963

@tatiana-iazykova made their first contribution in https://github.com/pykeen/pykeen/pull/1024

Full Changelog: https://github.com/pykeen/pykeen/compare/v1.8.2...v1.9.0
Source code(tar.gz)
Source code(zip)
v1.8.2(May 24, 2022)
Datasets

Add the PrimeKG dataset by @sbonner0 in https://github.com/pykeen/pykeen/pull/915

🌀🔗 Extend EA datasets to allow loading a unified graph by @mberr in https://github.com/pykeen/pykeen/pull/871

🎺🎷 Fix wk3l loading by @mberr in https://github.com/pykeen/pykeen/pull/907

Lightning

🔥⚡ PyTorch Lightning by @mberr in https://github.com/pykeen/pykeen/pull/905

🔥⚡ PyTorch Lightning - Part 2 by @mberr in https://github.com/pykeen/pykeen/pull/917

🚅⚡ Test Training with PyTorch Lightning by @mberr in https://github.com/pykeen/pykeen/pull/930

Losses

📉🧑‍🤝‍🧑 Fix default loss of PairRE by @mberr in https://github.com/pykeen/pykeen/pull/925

ℹ️🦭 Add InfoNCE loss by @mberr in https://github.com/pykeen/pykeen/pull/926

ℹ️🚀 Update InfoNCE LCWA implementation by @mberr in https://github.com/pykeen/pykeen/pull/928

Representations

🎲🚶 Random Walk Positional Encoding by @mberr in https://github.com/pykeen/pykeen/pull/918

🏛️👨 Weisfeiler-Lehman Features by @mberr in https://github.com/pykeen/pykeen/pull/920

Other great stuff that isn't the previous commit (it's after 5PM)

🧫🐍 Update scipy minimum version by @mberr in https://github.com/pykeen/pykeen/pull/891

♻️☎️ Re-use optimized batch-size in evaluation callback by @mberr in https://github.com/pykeen/pykeen/pull/886

🖥️🦎 Fix complex initialization by @mberr in https://github.com/pykeen/pykeen/pull/888

📦📚 Update BoxE reproducibility configurations by @mberr in https://github.com/pykeen/pykeen/pull/631

🫓🪁 Improve loading of triples with nan strings by @SenJia in https://github.com/pykeen/pykeen/pull/883

🪵 ✨ Update flake8 ignores by @cthoyt in https://github.com/pykeen/pykeen/pull/897

👯‍♂️👯‍♀️ Unique hashes in the NodePiece representation by @migalkin in https://github.com/pykeen/pykeen/pull/896

📐📨 PyTorch Geometric Message Passing Representations by @mberr in https://github.com/pykeen/pykeen/pull/894

🪛📁 Fix directory path normalization by @mberr in https://github.com/pykeen/pykeen/pull/890

🧛🇪🇺 Implement more graph pair unification approaches by @mberr in https://github.com/pykeen/pykeen/pull/893

🔙🌙 Backwards Compatibility for init phases by @mberr in https://github.com/pykeen/pykeen/pull/899

📔✅ Update Docstring Coverage check by @mberr in https://github.com/pykeen/pykeen/pull/892

🪄🖊️ Class resolver type annotations by @mberr in https://github.com/pykeen/pykeen/pull/904

📋➡️ Move listing experiments from epilog to own command by @mberr in https://github.com/pykeen/pykeen/pull/903

🔧📜 Update hpo tutorial about grid search by @mberr in https://github.com/pykeen/pykeen/pull/902

📖 🛠️ Fix typo in prediction docs by @mberr in https://github.com/pykeen/pykeen/pull/912

✂️🌰 Extract triple-independent information from CoreTriplesFactory by @mberr in https://github.com/pykeen/pykeen/pull/908

🐍👍 Increase Minimum Python Version to 3.8 by @mberr in https://github.com/pykeen/pykeen/pull/921

🧚💾 Extend save to directory doc by @mberr in https://github.com/pykeen/pykeen/pull/916

🧠🏷️ Maximize memory utilization for label based initialization by @mberr in https://github.com/pykeen/pykeen/pull/898

✏️🇮🇳 Rename inductive representation methods by @mberr in https://github.com/pykeen/pykeen/pull/929

👾 ⚽ Add missing device by @vsocrates in https://github.com/pykeen/pykeen/pull/936

New Contributors

@SenJia made their first contribution in https://github.com/pykeen/pykeen/pull/883

@vsocrates made their first contribution in https://github.com/pykeen/pykeen/pull/936

Full Changelog: https://github.com/pykeen/pykeen/compare/v1.8.1...v1.8.2
Source code(tar.gz)
Source code(zip)
v1.8.1(Apr 20, 2022)
PyKEEN 1.8.1 contains a few critical bug fixes along with some other cool updates.

Evaluation

⚖️🌡️ Weighted Rank-Based Metrics by @mberr in https://github.com/pykeen/pykeen/pull/837

🌌🧐 Macro evaluation by @mberr in https://github.com/pykeen/pykeen/pull/850

Inductive Models

⚓🧍 NodePiece Anchor Searching via PPR by @mberr in https://github.com/pykeen/pykeen/pull/870

Transductive Models

✨🤖 Update DistMult to ERModel by @mberr in https://github.com/pykeen/pykeen/pull/874

✨🤖 Update ProjE to ERModel by @mberr in https://github.com/pykeen/pykeen/pull/876

✨🤖 Update RotatE to ERModel by @mberr in https://github.com/pykeen/pykeen/pull/877

✨🤖 Update ConvE to ERModel by @mberr in https://github.com/pykeen/pykeen/pull/875

🚛®️ Update TuckER to ERModel by @mberr in https://github.com/pykeen/pykeen/pull/866

✨🦜 Upgrade TransR to ERModel by @mberr in https://github.com/pykeen/pykeen/pull/868

New Datasets

🌪️ 📖 Add ILPC datasets and inductive dataset resolver by @cthoyt in https://github.com/pykeen/pykeen/pull/848

👑🤑 Add aristo-v4 dataset by @mberr in https://github.com/pykeen/pykeen/pull/855

Documentation

📗✨ Update documentation to better reflect new-style models by @mberr in https://github.com/pykeen/pykeen/pull/879

👣 📚 Correct typos in "First Steps" tutorial by @andreasala98 in https://github.com/pykeen/pykeen/pull/846

Bug Fixes

🔧#️⃣ Fix arange dtype and clip variances by @mberr in https://github.com/pykeen/pykeen/pull/881

🪄⚖️ Fix pop_regularization_term by @mberr in https://github.com/pykeen/pykeen/pull/849

🧑‍🏭🔢 Fix numeric triples factory by @mberr in https://github.com/pykeen/pykeen/pull/862

🍔 🪓 Ensure reproducible splits for all datasets by @mberr in https://github.com/pykeen/pykeen/pull/856

🚫🏋 Raise explicit error if no training batch was available by @mberr in https://github.com/pykeen/pykeen/pull/860

🚚💻 Fix TransformerEncoder tokens' device by @mberr in https://github.com/pykeen/pykeen/pull/861

Misc

🔎🏋️ Resolve optimizer, LR-scheduler & tracker in training loop by @mberr in https://github.com/pykeen/pykeen/pull/852

🎯🪜 Update default batch size HPO range by @mberr in https://github.com/pykeen/pykeen/pull/864

♻️🔥 Use torch builtin broadcast by @mberr in https://github.com/pykeen/pykeen/pull/873

New Contributors

@andreasala98 made their first contribution in https://github.com/pykeen/pykeen/pull/846

Full Changelog: https://github.com/pykeen/pykeen/compare/v1.8.0...v1.8.1
Source code(tar.gz)
Source code(zip)
v1.8.0(Mar 22, 2022)
Among a ton of updates since the beginning of the year, PyKEEN v1.8.0 has three major themes:

The introduction of the inductive link prediction pipeline and the NodePiece model. We highly suggest checking out An Open Challenge for Inductive Link Prediction on Knowledge Graphs to go along with this new pipeline and models.

The introduction of new rank-based evaluation metrics to go along with A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs

Major internal refactoring of negative sampling to better use PyTorch's data loaders and support multi-CPU generation (special thanks to @Koenkalle for help testing this)

NodePiece and Inductive Link Prediction

🦆🐍 Inductive LP framework by @migalkin in https://github.com/pykeen/pykeen/pull/722

🌙🐺 Add mode parameter by @cthoyt in https://github.com/pykeen/pykeen/pull/769

🍸✌️ Mixed tokenization for NodePiece by @mberr in https://github.com/pykeen/pykeen/pull/770

☮️⚓ NodePiece with anchors by @mberr in https://github.com/pykeen/pykeen/pull/755

🥒🏴‍☠️ Precomputed Tokenization for NodePiece by @mberr in https://github.com/pykeen/pykeen/pull/822

🦜📖 Refactor NodePiece and improve documentation by @cthoyt in https://github.com/pykeen/pykeen/pull/833

⚓ 🔧 NodePiece MixtureAnchorSelection unique anchor IDs fix + PageRank fix by @migalkin in https://github.com/pykeen/pykeen/pull/776

🧩🧪 NodePiece experimental configs by @migalkin in https://github.com/pykeen/pykeen/pull/771

👀 🏋️ Attention edge weighting by @migalkin in https://github.com/pykeen/pykeen/pull/734

Models

New

⛟↔️ Add (multi-)linear Tucker interaction by @mberr in https://github.com/pykeen/pykeen/pull/751

🍦 🎸 Soft inverse triples baseline by @mberr in https://github.com/pykeen/pykeen/pull/543

Updated

🤖 ⛑️ Fix device for FixedModel by @mberr in https://github.com/pykeen/pykeen/pull/725

🧀 🏹 Unify usage of slice_size by @cthoyt in https://github.com/pykeen/pykeen/pull/729

📟 ✂️ Remove device from model by @cthoyt in https://github.com/pykeen/pykeen/pull/730

💃 🪥 Cleanup model argument passing by @cthoyt in https://github.com/pykeen/pykeen/pull/762

Training and Evaluation

✂️ ⏰ Split early stopping logic from evaluation by @mberr in https://github.com/pykeen/pykeen/pull/355

🎲🎚️ Sampled Rank-Based Evaluator by @mberr in https://github.com/pykeen/pykeen/pull/733

⛏©️ Fix Checkpointing by @mberr in https://github.com/pykeen/pykeen/pull/740

🌋 🗺️ Switch evaluator from dataclass to dict by @cthoyt in https://github.com/pykeen/pykeen/pull/780

🌀 ⚖️ Simplify evaluate by @mberr in https://github.com/pykeen/pykeen/pull/767

🛤️ 🔁 Store result tracker inside loop by @mberr in https://github.com/pykeen/pykeen/pull/793

Callbacks

📞🔙 Evaluation callback by @mberr in https://github.com/pykeen/pykeen/pull/765

🥊 ☎️ Early stopping via training callback by @mberr in https://github.com/pykeen/pykeen/pull/354

Data and Datasets

New

🪢🤔 Add OpenEA datasets by @dobraczka in https://github.com/pykeen/pykeen/pull/784

💉8️⃣ Add the PharmKG8k dataset by @sbonner0 in https://github.com/pykeen/pykeen/pull/797

🧪💉 Add PharmKG full dataset by @sbonner0 in https://github.com/pykeen/pykeen/pull/806

♻️ 2️⃣ Replace OGB's WikiKG by WikiKG2 by @mberr in https://github.com/pykeen/pykeen/pull/809

Updates

📌🧠 Use pinned memory for training data loader by @mberr in https://github.com/pykeen/pykeen/pull/747

🧮💃 Add property for number of parameters by @mberr in https://github.com/pykeen/pykeen/pull/804

💾 ♻️ Refactor dataset utility code by @cthoyt in https://github.com/pykeen/pykeen/pull/830

💾 🐕 Update dataset registration by @cthoyt in https://github.com/pykeen/pykeen/pull/832

💾 🚀 Update dataset statistics by @cthoyt in https://github.com/pykeen/pykeen/pull/834

😴🐲 Ignore create_inverse_triples for caching hash digest by @mberr in https://github.com/pykeen/pykeen/pull/813

🖇️ 📊 Use Figshare link for OpenEA dataset by @dobraczka in https://github.com/pykeen/pykeen/pull/838

📦 💾 Batch data loader by @mberr in https://github.com/pykeen/pykeen/pull/817

📥 🏭 Save Training Triples Factory by @mali-git in https://github.com/pykeen/pykeen/pull/655

🦞 💿 Negative sampling in data loader by @mberr in https://github.com/pykeen/pykeen/pull/417

💾💽 Change serialization format by @mberr in https://github.com/pykeen/pykeen/pull/785

🧰 📥 Cache dataset loading by @mberr in https://github.com/pykeen/pykeen/pull/569

Metrics

📏🪕 Compute Candidate Set Sizes by @mberr in https://github.com/pykeen/pykeen/pull/732

🏆 🍱 Update rank data structure by @cthoyt in https://github.com/pykeen/pykeen/pull/758

📐 🍱 Update metric key data structure by @cthoyt in https://github.com/pykeen/pykeen/pull/759

🐳 😃 Reorganize metrics and expectation functions by @cthoyt in https://github.com/pykeen/pykeen/pull/763

🏛️ 👽 Add improved indicator constructor by @cthoyt in https://github.com/pykeen/pykeen/pull/781

🏛️ 🥾 Improve metrics data structures by @cthoyt in https://github.com/pykeen/pykeen/pull/782

🎩 🎸 Class-Based Rank-Based Metrics by @mberr in https://github.com/pykeen/pykeen/pull/786

⚙️🌡️ Add more adjusted metrics by @cthoyt in https://github.com/pykeen/pykeen/pull/814

🪡🗜️ Refactor derived metrics by @mberr in https://github.com/pykeen/pykeen/pull/835

🔢 🐎 Update value range & docstring of adjusted metrics by @mberr in https://github.com/pykeen/pykeen/pull/823

➕🌍 Add option to add all default rank-based metrics by @mberr in https://github.com/pykeen/pykeen/pull/827

🪛💡 Fix RankBasedMetricResults.iter_rows by @mberr in https://github.com/pykeen/pykeen/pull/792

Prediction

🙃👓 Predict workflow with inverse relations by @mberr in https://github.com/pykeen/pykeen/pull/726

Representations

🪛🔗 Change interactions' shape by @mberr in https://github.com/pykeen/pykeen/pull/736

🍝 🛰️ Update constrainer, initializer, and normalizer resolution by @mberr in https://github.com/pykeen/pykeen/pull/742

🦄 🔢 Only get representations for unique indices by @mberr in https://github.com/pykeen/pykeen/pull/743

⛔💠 Remove get in canonical shape by @mberr in https://github.com/pykeen/pykeen/pull/745

⏩🛌 Fix dtype forwarding in Embedding by @mberr in https://github.com/pykeen/pykeen/pull/746

📏🕳️ Move normalization to base representation by @mberr in https://github.com/pykeen/pykeen/pull/818

✏️📏 Unify representation module nomenclature by @mberr in https://github.com/pykeen/pykeen/pull/811

✨💤 Resolve Representations by @mberr in https://github.com/pykeen/pykeen/pull/803

Trackers

Add loss kwargs to ResultTracker by @Rodrigo-A-Pereira in https://github.com/pykeen/pykeen/pull/741

🪡💻 Fix typo in ConsoleTracker.log_metrics by @mberr in https://github.com/pykeen/pykeen/pull/787

Fixes

🍌🍍 Fix ValueError during size probing on GPU machines by @mberr in https://github.com/pykeen/pykeen/pull/821

🪄➰ Fix device error in training loop by @mberr in https://github.com/pykeen/pykeen/pull/774

☕📱 Fix filterer's device by @mberr in https://github.com/pykeen/pykeen/pull/801

⛵💻 Make sure indices are moved to device by @mberr in https://github.com/pykeen/pykeen/pull/800

Documentation, Typing, and Packaging

🌊 👋 Goodbye to setup.py and Makefile for building the docs by @cthoyt in https://github.com/pykeen/pykeen/pull/761

🌌 🥛 Update Constants and Types by @mberr in https://github.com/pykeen/pykeen/pull/754

🔫 🐈‍⬛ Update black by @cthoyt in https://github.com/pykeen/pykeen/pull/764

🐍 💪 Add Python 3.10 support by @cthoyt in https://github.com/pykeen/pykeen/pull/831

🥰 📙 Update argument passing and documentation by @cthoyt in https://github.com/pykeen/pykeen/pull/842

🍊 ⌨️ Typing Updates by @cthoyt in https://github.com/pykeen/pykeen/pull/760

⚙️📚 Fix HPO doc by @mberr in https://github.com/pykeen/pykeen/pull/820

📖🔪 Extend documentation on subbatching and slicing by @mberr in https://github.com/pykeen/pykeen/pull/810

Misc

✉️ ♻️ Add list of available configurations to usage message of reproduction by @mberr in https://github.com/pykeen/pykeen/pull/753

🦎⚡ Update class-resolver by @cthoyt in https://github.com/pykeen/pykeen/pull/775

Full Changelog: https://github.com/pykeen/pykeen/compare/v1.7.0...v1.8.0
Source code(tar.gz)
Source code(zip)
v1.7.0(Jan 11, 2022)
New Models

Add BoxE by @ralphabb in https://github.com/pykeen/pykeen/pull/618

Add TripleRE by @mberr in https://github.com/pykeen/pykeen/pull/712

Add AutoSF by @mberr in https://github.com/pykeen/pykeen/pull/713

Add Transformer by @mberr in https://github.com/pykeen/pykeen/pull/714

Add Canonical Tensor Decomposition by @mberr in https://github.com/pykeen/pykeen/pull/663

Add (novel) Fixed Model by @cthoyt in https://github.com/pykeen/pykeen/pull/691

Add NodePiece model by @mberr in https://github.com/pykeen/pykeen/pull/621

Updated Models

Update R-GCN configuration by @mberr in https://github.com/pykeen/pykeen/pull/610

Update ConvKB to ERModel by @cthoyt in https://github.com/pykeen/pykeen/pull/425

Update ComplEx to ERModel by @mberr in https://github.com/pykeen/pykeen/pull/639

Rename TranslationalInteraction to NormBasedInteraction by @mberr in https://github.com/pykeen/pykeen/pull/651

Fix generic slicing dimension by @mberr in https://github.com/pykeen/pykeen/pull/683

Rename UnstructuredModel to UM and StructuredEmbedding to SE by @cthoyt in https://github.com/pykeen/pykeen/pull/721

Allow to pass unresolved loss to ERModel's __init__ by @mberr in https://github.com/pykeen/pykeen/pull/717

Representations and Initialization

Add low-rank embeddings by @mberr in https://github.com/pykeen/pykeen/pull/680

Add NodePiece representation by @mberr in https://github.com/pykeen/pykeen/pull/621

Add label-based initialization using a transformer (e.g., BERT) by @mberr in https://github.com/pykeen/pykeen/pull/638 and https://github.com/pykeen/pykeen/pull/652

Add label-based representation (e.g., to update language model using KGEM) by @mberr in https://github.com/pykeen/pykeen/pull/652

Remove literal representations (use label-based initialization instead) by @mberr in https://github.com/pykeen/pykeen/pull/679

Training

Fix displaying previous epoch's loss by @mberr in https://github.com/pykeen/pykeen/pull/627

Fix kwargs transmission on MultiTrainingCallback by @Rodrigo-A-Pereira in https://github.com/pykeen/pykeen/pull/645

Extend Callbacks by @mberr in https://github.com/pykeen/pykeen/pull/609

Add gradient clipping by @mberr in https://github.com/pykeen/pykeen/pull/607

Fix negative score shape for sLCWA by @mberr in https://github.com/pykeen/pykeen/pull/624

Fix epoch loss for loss reduction != "mean" by @mberr in https://github.com/pykeen/pykeen/pull/623

Add sLCWA support for Cross Entropy Loss by @mberr in https://github.com/pykeen/pykeen/pull/704

Inference

Add uncertainty estimate functions via MC dropout by @mberr in https://github.com/pykeen/pykeen/pull/688

Fix predict top k by @mberr in https://github.com/pykeen/pykeen/pull/690

Fix indexing in predict_* methods when using inverse relations by @mberr in https://github.com/pykeen/pykeen/pull/699

Move tensors to device for predict_* methods by @mberr in https://github.com/pykeen/pykeen/pull/658

Trackers

Fix wandb logging by @mberr in https://github.com/pykeen/pykeen/pull/647

Add multi-result tracker by @mberr in https://github.com/pykeen/pykeen/pull/682

Add Python result tracker by @mberr in https://github.com/pykeen/pykeen/pull/681

Update file trackers by @cthoyt in https://github.com/pykeen/pykeen/pull/629

Evaluation

Store rank count by @mberr in https://github.com/pykeen/pykeen/pull/672

Extend evaluate() for easier relation filtering by @mberr in https://github.com/pykeen/pykeen/pull/391

Rename sklearn evaluator and refactor evaluator code by @cthoyt in https://github.com/pykeen/pykeen/pull/708

Add additional classification metrics via rexmex by @cthoyt in https://github.com/pykeen/pykeen/pull/668

Triples and Datasets

Add helper dataset with internal batching for Schlichtkrull sampling by @mberr in https://github.com/pykeen/pykeen/pull/616

Refactor splitting code and improve documentation by @mberr in https://github.com/pykeen/pykeen/pull/709

Switch np.loadtxt to pandas.read_csv by @mberr in https://github.com/pykeen/pykeen/pull/695

Add binary I/O to triples factories @cthoyt in https://github.com/pykeen/pykeen/pull/665

Torch Usage

Use torch.finfo to determine suitable epsilon values by @mberr in https://github.com/pykeen/pykeen/pull/626

Use torch.isin instead of own implementation by @mberr in https://github.com/pykeen/pykeen/pull/635

Switch to using torch.inference_mode instead of torch.no_grad by @sbonner0 in https://github.com/pykeen/pykeen/pull/604

Miscellaneous

Add YAML experiment format by @mberr in https://github.com/pykeen/pykeen/pull/612

Add comparison with reproduction results during replication, if available by @mberr in https://github.com/pykeen/pykeen/pull/642

Adapt hello_world notebook to API changes by @dobraczka in https://github.com/pykeen/pykeen/pull/649

Add testing configuration for Jupyter notebooks by @mberr in https://github.com/pykeen/pykeen/pull/650

Add empty default loss_kwargs by @mali-git in https://github.com/pykeen/pykeen/pull/656

Optional extra config for reproduce by @mberr in https://github.com/pykeen/pykeen/pull/692

Store pipeline configuration in pipeline result by @mberr in https://github.com/pykeen/pykeen/pull/685

Fix upgrade to sequence by @mberr in https://github.com/pykeen/pykeen/pull/697

Fix pruner use in hpo_pipeline by @mberr in https://github.com/pykeen/pykeen/pull/724

Housekeeping

Automatically lint with black by @cthoyt in https://github.com/pykeen/pykeen/pull/605

Documentation and style guide cleanup by @cthoyt in https://github.com/pykeen/pykeen/pull/606

Source code(tar.gz)
Source code(zip)
v1.6.0(Oct 18, 2021)
This release is only compatible with PyTorch 1.9+. Because of some changes, it's now pretty non-trivial to support both, so moving forwards PyKEEN will continue to support the latest version of PyTorch and try its best to keep backwards compatibility.

New Models

DistMA (https://github.com/pykeen/pykeen/pull/507)

TorusE (https://github.com/pykeen/pykeen/pull/510)

Frequency Baselines (https://github.com/pykeen/pykeen/pull/514)

Gated Distmult Literal (https://github.com/pykeen/pykeen/pull/591, thanks @Rodrigo-A-Pereira)

New Datasets

WD50K (https://github.com/pykeen/pykeen/pull/511)

Wikidata5M (https://github.com/pykeen/pykeen/pull/528)

BioKG (https://github.com/pykeen/pykeen/pull/585, thanks @sbonner0)

New Losses

Double Margin Loss (https://github.com/pykeen/pykeen/pull/539)

Focal Loss (https://github.com/pykeen/pykeen/pull/542)

Pointwise Hinge Loss (https://github.com/pykeen/pykeen/pull/540)

Soft Pointwise Hinge Loss (https://github.com/pykeen/pykeen/pull/540)

Pairwise Logistic Loss (https://github.com/pykeen/pykeen/pull/540)

Added

Tutorial in using checkpoints when bringing your own data (https://github.com/pykeen/pykeen/pull/498)

Learning rate scheduling (https://github.com/pykeen/pykeen/pull/492)

Checkpoints include entity/relation maps (https://github.com/pykeen/pykeen/pull/498)

QuatE reproducibility configurations (https://github.com/pykeen/pykeen/pull/486)

Changed

Reimplment SE (https://github.com/pykeen/pykeen/pull/521) and NTN (https://github.com/pykeen/pykeen/pull/522) with new-style models

Generalize pairwise loss and pointwise loss hierarchies (https://github.com/pykeen/pykeen/pull/540)

Update to use PyTorch 1.9 functionality (https://github.com/pykeen/pykeen/pull/489)

Generalize generator strategies in LCWA (https://github.com/pykeen/pykeen/pull/602)

Fixed

FileNotFoundError on Windows/Anaconda (https://github.com/pykeen/pykeen/pull/503, thanks @Hao-666)

Fixed docstring for ComplEx interaction (https://github.com/pykeen/pykeen/pull/504)

Make DistMult the default interaction function for R-GCN (https://github.com/pykeen/pykeen/pull/548)

Fix gradient error in CompGCN buffering (https://github.com/pykeen/pykeen/pull/573)

Fix splitting of numeric triples factories (https://github.com/pykeen/pykeen/pull/594, thanks @Rodrigo-A-Pereira)

Fix determinism in spitting of triples factory (https://github.com/pykeen/pykeen/pull/500)

Fix documentation and improve HPO suggestion (https://github.com/pykeen/pykeen/pull/524, thanks @kdutia)

Source code(tar.gz)
Source code(zip)
v1.5.0(Jun 13, 2021)
New Metrics

Adjusted Arithmetic Mean Rank Index (https://github.com/pykeen/pykeen/pull/378)

Add harmonic, geometric, and median rankings (https://github.com/pykeen/pykeen/pull/381)

New Trackers

Console Tracker (https://github.com/pykeen/pykeen/pull/440)

Tensorboard Tracker (https://github.com/pykeen/pykeen/pull/416; thanks @sbonner0)

New Models

QuatE (https://github.com/pykeen/pykeen/pull/367)

CompGCN (https://github.com/pykeen/pykeen/pull/382)

CrossE (https://github.com/pykeen/pykeen/pull/467)

Reimplementation of LiteralE with arbitrary combination (g) function (https://github.com/pykeen/pykeen/pull/245)

New Negative Samplers

Pseudo-typed Negative Sampler (https://github.com/pykeen/pykeen/pull/412)

Datasets

Removed invalid datasets (OpenBioLink filtered sets; https://github.com/pykeen/pykeen/pull/https://github.com/pykeen/pykeen/pull/439)

Added WK3k-15K (https://github.com/pykeen/pykeen/pull/403)

Added WK3l-120K (https://github.com/pykeen/pykeen/pull/403)

Added CN3l (https://github.com/pykeen/pykeen/pull/403)

Added

Documentation on using PyKEEN in Google Colab and Kaggle (https://github.com/pykeen/pykeen/pull/379, thanks @jerryIsHere)

Pass custom training loops to pipeline (https://github.com/pykeen/pykeen/pull/334)

Compatibility later for the fft module (https://github.com/pykeen/pykeen/pull/288)

Official Python 3.9 support, now that PyTorch has it (https://github.com/pykeen/pykeen/pull/223)

Utilities for dataset analysis (https://github.com/pykeen/pykeen/pull/16, https://github.com/pykeen/pykeen/pull/392)

Filtering of negative sampling now uses a bloom filter by default (https://github.com/pykeen/pykeen/pull/401)

Optional embedding dropout (https://github.com/pykeen/pykeen/pull/422)

Added more HPO suggestion methods and docs (https://github.com/pykeen/pykeen/pull/446)

Training callbacks (https://github.com/pykeen/pykeen/pull/429)

Class resolver for datasets (https://github.com/pykeen/pykeen/pull/473)

Updated

R-GCN implementation now uses new-style models and is super idiomatic (https://github.com/pykeen/pykeen/pull/110)

Enable passing of interaction function by string in base model class (https://github.com/pykeen/pykeen/pull/384, https://github.com/pykeen/pykeen/pull/387)

Bump scipy requirement to 1.5.0+

Updated interfaces of models and negative samplers to enforce kwargs (https://github.com/pykeen/pykeen/pull/445)

Reorganize filtering, negative sampling, and remove triples factory from most objects (https://github.com/pykeen/pykeen/pull/400, https://github.com/pykeen/pykeen/pull/405, https://github.com/pykeen/pykeen/pull/406, https://github.com/pykeen/pykeen/pull/409, https://github.com/pykeen/pykeen/pull/420)

Update automatic memory optimization (https://github.com/pykeen/pykeen/pull/404)

Flexibly define positive triples for filtering (https://github.com/pykeen/pykeen/pull/398)

Completely reimplemented negative sampling interface in training loops (https://github.com/pykeen/pykeen/pull/427)

Completely reimplemented loss function in training loops (https://github.com/pykeen/pykeen/pull/448)

Forward-compatibility of embeddings in old-style models and updated docs on how to use embeddings (https://github.com/pykeen/pykeen/pull/474)

Fixed

Regularizer passing in the pipeline and HPO (https://github.com/pykeen/pykeen/pull/345)

Saving results when using multimodal models (https://github.com/pykeen/pykeen/pull/349)

Add missing diagonal constraint on MuRE Model (https://github.com/pykeen/pykeen/pull/353)

Fix early stopper handling (https://github.com/pykeen/pykeen/pull/419)

Fixed saving results from pipeline (https://github.com/pykeen/pykeen/pull/428, thanks @kantholtz)

Fix OOM issues with early stopper and AMO (https://github.com/pykeen/pykeen/pull/433)

Fix ER-MLP functional form (https://github.com/pykeen/pykeen/pull/444)

Source code(tar.gz)
Source code(zip)
v1.4.0(Mar 4, 2021)
New Datasets

Countries (https://github.com/pykeen/pykeen/pull/314)

DB100K (https://github.com/pykeen/pykeen/issues/316)

New Models

MuRE (https://github.com/pykeen/pykeen/pull/311)

PairRE (https://github.com/pykeen/pykeen/pull/309)

Monotonic affine transformer (https://github.com/pykeen/pykeen/pull/324)

New Algorithms

If you're interested in any of these, please get in touch with us regarding an upcoming publication.

Dataset Similarity (https://github.com/pykeen/pykeen/pull/294)

Dataset Deterioration (https://github.com/pykeen/pykeen/pull/295)

Dataset Remix (https://github.com/pykeen/pykeen/pull/296)

Added

New-style models (https://github.com/pykeen/pykeen/pull/260) for direct usage of interaction modules

Ability to train pipeline() using an Interaction module rather than a Model (https://github.com/pykeen/pykeen/pull/326, https://github.com/pykeen/pykeen/pull/330).

Changes

Lookup of assets is now mediated by the class_resolver package (https://github.com/pykeen/pykeen/pull/321, https://github.com/pykeen/pykeen/pull/327)

The docdata package is now used to parse structured information out of the model and dataset documentation in order to make a more informative README with links to citations (https://github.com/pykeen/pykeen/pull/303).

Fixed

Fixed ComplEx's implementation (https://github.com/pykeen/pykeen/pull/313)

Fixed OGB's reuse entity identifiers (https://github.com/pykeen/pykeen/pull/318, thanks @tgebhart)

Source code(tar.gz)
Source code(zip)
v1.3.0(Feb 15, 2021)
We skipped version 1.2.0 because we made an accidental release before this version was ready. We're only human, and are looking into improving our release workflow to live in CI/CD so something like this doesn't happen again. However, as an end user, this won't have an effect on you.

New Datasets

CSKG (https://github.com/pykeen/pykeen/pull/249)

DBpedia50 (https://github.com/pykeen/pykeen/issues/278)

New Trackers

General file-based Tracker (https://github.com/pykeen/pykeen/pull/254)

CSV Tracker (https://github.com/pykeen/pykeen/pull/254)

JSON Tracker (https://github.com/pykeen/pykeen/pull/254)

Added

pykeen version command for more easily reporting your environment in issues (https://github.com/pykeen/pykeen/issues/251)

Functional forms of all interaction models (e.g., TransE, RotatE) (https://github.com/pykeen/pykeen/issues/238, pykeen.nn.functional documentation). These can be generally reused, even outside of the typical PyKEEN workflows.

Modular forms of all interaction models (https://github.com/pykeen/pykeen/issues/242, pykeen.nn.modules documentation). These wrap the functional forms of interaction models and store hyper-parameters such as the p value for the L_p norm in TransE.

The initializer, normalizer, and constrainer for the entity and relation embeddings are now exposed through the __init__() function of each KGEM class and can be configured. A future update will enable HPO on these as well (https://github.com/pykeen/pykeen/issues/282).

Refactoring and Future Preparation

This release contains a few big refactors. Most won't affect end-users, but if you're writing your own PyKEEN models, these are important. Many of them are motivated to make it possible to introduce a new interface that makes it much easier for researchers (who shouldn't have to understand the inner workings of PyKEEN) to make new models.

The regularizer has been refactored (https://github.com/pykeen/pykeen/issues/266, https://github.com/pykeen/pykeen/issues/274). It no longer accepts a torch.device when instantiated.

The pykeen.nn.Embedding class has been improved in several ways:

Embedding Specification class makes it easier to write new classes (https://github.com/pykeen/pykeen/issues/277)

Refactor to make shape of embedding explicit (https://github.com/pykeen/pykeen/issues/287)

Specification of complex datatype (https://github.com/pykeen/pykeen/issues/292)

Refactoring of the loss model class to provide a meaningful class hierarchy (https://github.com/pykeen/pykeen/issues/256, https://github.com/pykeen/pykeen/issues/262)

Refactoring of the base model class to provide a consistent interface (https://github.com/pykeen/pykeen/issues/246, https://github.com/pykeen/pykeen/issues/248, https://github.com/pykeen/pykeen/issues/253, https://github.com/pykeen/pykeen/issues/257). This allowed for simplification of the loss computation based on the new hierarchy and also new implementation of regularizer class.

More automated testing of typing with MyPy (https://github.com/pykeen/pykeen/issues/255) and automated checking of documentation with doctests (https://github.com/pykeen/pykeen/issues/291)

Triples Loading

We've made some improvements to the pykeen.triples.TriplesFactory to facilitate loading even larger datasets (https://github.com/pykeen/pykeen/issues/216). However, this required an interface change. This will affect any code that loads custom triples. If you're loading triples from a path, you should now use:

path = ... # Old (doesn't work anymore) tf = TriplesFactory(path=path) # New tf = TriplesFactory.from_path(path)

Predictions

While refactoring the base model class, we excised the prediction functionality to a new module pykeen.models.predict (docs: https://pykeen.readthedocs.io/en/latest/reference/predict.html#functions). We also renamed some of the prediction functions inside the base model to make them more consistent, but we now recommend you use the functions from pykeen.models.predict instead.

Model.predict_heads() -> Model.get_head_prediction_df()

Model.predict_relations() -> Model.get_head_prediction_df()

Model.predict_tails() -> Model.get_head_prediction_df()

Model.score_all_triples() -> Model.get_all_prediction_df()

Fixed

Do not create inverse triples for validation and testing factory (https://github.com/pykeen/pykeen/issues/270)

Treat nonzero applied to large tensor error as OOM for batch size search (https://github.com/pykeen/pykeen/issues/279)

Fix bug in loading ConceptNet (https://github.com/pykeen/pykeen/issues/290). If your experiments relied on this dataset, you should rerun them.

Source code(tar.gz)
Source code(zip)
v1.1.0(Jan 20, 2021)
New Datasets

CoDEx (https://github.com/pykeen/pykeen/pull/154)

DRKG (https://github.com/pykeen/pykeen/pull/156)

OGB (https://github.com/pykeen/pykeen/pull/159)

ConceptNet (https://github.com/pykeen/pykeen/pull/160)

Clinical Knowledge Graph (https://github.com/pykeen/pykeen/pull/209)

New Trackers

Neptune.ai (https://github.com/pykeen/pykeen/pull/183)

Added

Add MLFlow set tags function (https://github.com/pykeen/pykeen/pull/139; thanks @sunny1401)

Add score_t/h function for ComplEx (https://github.com/pykeen/pykeen/pull/150)

Add proper testing for literal datasets and literal models (https://github.com/pykeen/pykeen/pull/199)

Checkpoint functionality (https://github.com/pykeen/pykeen/pull/123)

Random triple generation (https://github.com/pykeen/pykeen/pull/201)

Make negative sampler corruption scheme configurable (https://github.com/pykeen/pykeen/pull/209)

Add predict with inverse triples pipeline (https://github.com/pykeen/pykeen/pull/208)

Add generalize p-norm to regularizer (https://github.com/pykeen/pykeen/pull/225)

Changed

New harness for resetting parameters (https://github.com/pykeen/pykeen/pull/131)

Modularize embeddings (https://github.com/pykeen/pykeen/pull/132)

Update first steps documentation (https://github.com/pykeen/pykeen/pull/152; thanks @TobiasUhmann )

Switched testing to GitHub Actions (https://github.com/pykeen/pykeen/pull/165 and https://github.com/pykeen/pykeen/pull/194)

No longer support Python 3.6

Move automatic memory optimization (AMO) option out of model and into training loop (https://github.com/pykeen/pykeen/pull/176)

Improve hyper-parameter defaults and HPO defaults (https://github.com/pykeen/pykeen/pull/181 and https://github.com/pykeen/pykeen/pull/179)

Switch internal usage to ID-based triples (https://github.com/pykeen/pykeen/pull/193 and https://github.com/pykeen/pykeen/pull/220)

Optimize triples splitting algorithm (https://github.com/pykeen/pykeen/pull/187)

Generalize metadata storage in triples factory (https://github.com/pykeen/pykeen/pull/211)

Add drop_last option to data loader in training loop (https://github.com/pykeen/pykeen/pull/217)

Fixed

Whitelist support in HPO pipeline (https://github.com/pykeen/pykeen/pull/124)

Improve evaluator instantiation (https://github.com/pykeen/pykeen/pull/125; thanks @kantholtz)

CPU fallback on AMO (https://github.com/pykeen/pykeen/pull/232)

Fix HPO save issues (https://github.com/pykeen/pykeen/pull/235)

Fix GPU issue in plotting (https://github.com/pykeen/pykeen/pull/207)

Source code(tar.gz)
Source code(zip)
v1.0.5(Oct 21, 2020)
Added

Added testing on Windows with AppVeyor and documentation for installation on Windows (https://github.com/pykeen/pykeen/pull/95)

Add ability to specify custom datasets in HPO and ablation studies (https://github.com/pykeen/pykeen/pull/54)

Add functions for plotting entities and relations (as well as an accompanying tutorial) (https://github.com/pykeen/pykeen/pull/99)

Changed

Replaced BCE loss with BCEWithLogits loss (https://github.com/pykeen/pykeen/pull/109)

Store default HPO ranges in loss classes (https://github.com/pykeen/pykeen/pull/111)

Use entrypoints for datasets (https://github.com/pykeen/pykeen/pull/115) to allow registering of custom datasets

Improved WANDB results tracker (https://github.com/pykeen/pykeen/pull/117, thanks @kantholtz)

Reorganized ablation study generation and execution (https://github.com/pykeen/pykeen/pull/54)

Fixed

Fixed bug in the initialization of ConvE (https://github.com/pykeen/pykeen/pull/100)

Fixed cross-platform issue with random integer generation (https://github.com/pykeen/pykeen/pull/98)

Fixed documentation build on ReadTheDocs (https://github.com/pykeen/pykeen/pull/104)

Source code(tar.gz)
Source code(zip)
pykeen-1.0.5-py3-none-any.whl(311.71 KB)
pykeen-1.0.5.tar.gz(1.22 MB)
v1.0.4(Aug 25, 2020)
Added

Enable restricted evaluation on a subset of entities/relations (https://github.com/pykeen/pykeen/pull/62, https://github.com/pykeen/pykeen/pull/83)

Changed

Use number of epochs as step instead of number of checks (https://github.com/pykeen/pykeen/pull/72)

Fixed

Fix bug in early stopping (https://github.com/pykeen/pykeen/pull/77)

Source code(tar.gz)
Source code(zip)
pykeen-1.0.4-py3-none-any.whl(305.40 KB)
pykeen-1.0.4.tar.gz(693.91 KB)
v1.0.3(Aug 13, 2020)
Added

Side-specific evaluation (https://github.com/pykeen/pykeen/pull/44)

Grid Sampler (https://github.com/pykeen/pykeen/pull/52)

Weights & Biases Tracker (https://github.com/pykeen/pykeen/pull/68), thanks @migalkin!

Changed

Update to Optuna 2.0 (https://github.com/pykeen/pykeen/pull/52)

Generalize specification of tracker (https://github.com/pykeen/pykeen/pull/39)

Fixed

Fix bug in triples factory splitter (https://github.com/pykeen/pykeen/pull/59)

Device mismatch bug (https://github.com/pykeen/pykeen/pull/50)

Source code(tar.gz)
Source code(zip)
pykeen-1.0.3-py3-none-any.whl(303.43 KB)
pykeen-1.0.3.tar.gz(690.37 KB)
v1.0.2(Jul 10, 2020)
Added

Add default values for margin and adversarial temperature in NSSA loss (https://github.com/pykeen/pykeen/pull/29)

Added FTP uploader (https://github.com/pykeen/pykeen/pull/35)

Add AWS S3 uploader (https://github.com/pykeen/pykeen/pull/39)

Changed

Improved MLflow support (https://github.com/pykeen/pykeen/pull/40)

Lots of improvements to documentation!

Fixed

Fix triples factory splitting bug (https://github.com/pykeen/pykeen/pull/21)

Fix problem with tensors' device during prediction (https://github.com/pykeen/pykeen/pull/41)

Fix RotatE relation embeddings re-initialization (https://github.com/pykeen/pykeen/pull/26)

Source code(tar.gz)
Source code(zip)
pykeen-1.0.2-py3-none-any.whl(298.78 KB)
pykeen-1.0.2.tar.gz(676.05 KB)
v1.0.1(Jul 2, 2020)
Added

Add fractional hits@k (https://github.com/pykeen/pykeen/pull/17)

Add link prediction pipeline (https://github.com/pykeen/pykeen/pull/10)

Changed

Update documentation (https://github.com/pykeen/pykeen/pull/10)

Source code(tar.gz)
Source code(zip)
pykeen-1.0.1-py3-none-any.whl(293.98 KB)
pykeen-1.0.1.tar.gz(300.83 KB)
v1.0.0(Jun 25, 2020)

Source code(tar.gz)
Source code(zip)
pykeen-1.0.0-py3-none-any.whl(283.91 KB)
pykeen-1.0.0.tar.gz(290.93 KB)
v0.0.26(Jun 22, 2020)

This is the last release before the PyKEEN 1.0 release, be prepared for major changes.

Note! If you've come this far looking for old releases of PyKEEN, we were unfortunately not able to retain them when we moved the code to this new organization. Please see PyPI for a more complete release history (https://pypi.org/project/pykeen/#history) or the Zenodo record associated with SmartDataAnalytics/PyKEEN
Source code(tar.gz)
Source code(zip)