This tutorial repository is to introduce the functionality of KGTK to first-time users

Overview

Welcome to the KGTK notebook tutorial

The goal of this tutorial repository is to introduce the functionality of KGTK to first-time users. The Knowledge Graph Toolkit (KGTK) is a comprehensive framework for the creation and exploitation of large hyper-relational knowledge graphs (KGs), designed for ease of use, scalability, and speed. The tutorial consists of several notebooks that demonstrate how to perform network analysis, graph profiling, knowledge enrichment, and embedding computation over a portion of the Wikidata knowledge graph. The tutorial notebooks can be found in the tutorial folder. All notebooks require minimum configuration and can be run locally or in Google Colab in a matter of a few minutes. The input data for the notebooks is stored in the datasets folder. Basic understanding of knowledge graphs is sufficient for this tutorial.

This repository has been created for the purpose of the KGTK tutorial presented at ISWC 2021. For more information on this tutorial, see our website.

Notebooks

  1. 01-kgtk-introduction.ipynb introduction to kgtk and kypher.
  2. 02-kg-profiling.ipynb performs profiling of a Wikidata subgraph, by computing deep statistics of its classes, instances, and properties.
  3. 03-kg-graph-embeddings.ipynb computes graph embeddings of a Wikidata subgraph using kgtk, demonstrates how to use these embeddings for similarity estimation, and visualizes them.
  4. 04-kg-enrichment-with-csv.ipynb shows how structured data from IMDb can be integrated into a subset of Wikidata.
  5. 05-kg-enrichment-with-lod.ipynb shows how LOD graphs like Getty Vocabulary can be used to enrich Wikidata by using kgtk operations.
  6. 06-kg-network-analysis.ipynb analyzes the family network of Arnold Schwarzenegger (Q2685) in Wikidata by using KGTK operations.
  7. 07-kg-constraint-validation.ipynb demonstrates how to do constraint validation on one wikidata property.

Running the notebooks in Google Colab

List of steps required to be able to run the ISI Google colab Notebooks.

Make a copy of the notebooks to your Google Drive.

The following tutorial notebooks are available to run in Google Colab

  1. 01-kgtk-introduction.ipynb
  2. 02-kg-profiling.ipynb
  3. 03-kg-graph-embeddings.ipynb
  4. 04-kg-enrichment-with-csv.ipynb
  5. 05-kg-enrichment-with-lod.ipynb
  6. 06-kg-network-analysis.ipynb
  7. 07-kg-constraint-validation.ipynb
  8. kgtk-browser.ipynb (experimental)

Click on a link, it'll take you to the Google Colab notebook. These are readonly notebook links.

Click on Save a copy in Drive from the File menu as shown.

Save a Copy

This will create a copy of the notebook in your Google Drive.

Install kgtk

Run the first cell to install kgtk.

If you see this warning,

Author

click on Run anyway to continue

You'll see an error after the install finishes,

Restart Runtime

This is because of a conflict in Google Colab's python environment. You have to click on the Restart Runtime button.

You do not have to install kgtk again.

In some notebooks, there are a few more installation cells, in case you see the same error as above, please click on Restart Runtime

Run the cells in the notebook

Now, simply run all the cells. The notebook should run successfully.

Google Colab Caveats

  • The colab VM and python environment is ephemeral. The VM will reset after a while, all the installed libraries and files produced will be lost.
  • Google Colab File IO. Download / Upload files to Google Colab
  • You can connect a google drive to the colab notebook to read from and save to.
  • Users can run the same colab notebook by sharing it with a link. This can have unwanted complications in case multiple people run the same cell at the same time.

Contact

Comments
  • Unrealized dependency

    Unrealized dependency

    This issue might be affecting specific versions of kgtk at a higher rate (specifically those that are not dependent on these dependencies and hence didn't install them beforehand)

    Attempting to import the following libraries returns an error even when kgtk is installed:

    • gensim
    • papermill

    Could be a good idea to include a very minimal requirements.txt file in this repo?

    cc @filievski

    bug 
    opened by aditya-malte 1
  • error when querying p279star file - 02-kg-profiling.ipynb

    error when querying p279star file - 02-kg-profiling.ipynb

    When I execute the following command inside Colab environment, this error comes up

    kgtk(""" query -i p279star --match '(class)-[:P279star]->(super_class)' --return 'count(distinct super_class) as count_classes' """)

    [Errno 2] No such file or directory: '/content/p279star'

    opened by versant2612 1
  • how to retrieve specific property constraints of a property

    how to retrieve specific property constraints of a property

    Hi, I followed the import-wikidata.ipynb for importing wikidata, and these are the final files I have in the end

    aliases.en.sorted.tsv.gz
    aliases.sorted.tsv.gz
    claims.badvalue.sorted.tsv.gz
    claims.novalue.sorted.tsv.gz
    claims.somevalue.sorted.tsv.gz
    claims.sorted.tsv.gz
    descriptions.en.sorted.tsv.gz
    descriptions.sorted.tsv.gz
    labels.en.sorted.tsv.gz
    labels.sorted.tsv.gz
    metadata.node.sorted.tsv.gz
    metadata.property.datatypes.sorted.tsv.gz
    metadata.types.sorted.tsv.gz
    qualifiers.badvalue.sorted.tsv.gz
    qualifiers.badvalueClaims.sorted.tsv.gz
    qualifiers.novalue.sorted.tsv.gz
    qualifiers.novalueClaims.sorted.tsv.gz
    qualifiers.somevalue.sorted.tsv.gz
    qualifiers.somevalueClaims.sorted.tsv.gz
    qualifiers.sorted.tsv.gz
    sitelinks.en.qualifiers.sorted.tsv.gz
    sitelinks.en.sorted.tsv.gz
    sitelinks.qualifiers.sorted.tsv.gz
    sitelinks.sorted.tsv.gz
    

    How can I retrieve the property constraints of a property? e.g. in claims.sorted.tsv.gz for the property P1303 I have triples like

    P1303-P2302-Q52558054-16872d63-0	P1303	P2302	Q52558054	normal	wikibase-item
    

    and then in qualifiers.sorted.tsv.gz I have triples like

    P10-P2302-Q21510852-dde2f0ce-0-P2316-Q21502408-0	P10-P2302-Q21510852-dde2f0ce-0	P2316	Q21502408	wikibase-item
    

    How can I put the two things together, for getting the constraints (and all their qualifiers) of e.g. P1303? Ofc there is a way but I'm missing it :)

    Thanks a lot, Valentina

    opened by valecarriero 0
  • SPARQL queries

    SPARQL queries

    Hi everybody,

    thanks again for this very cool project. Is there a way to perform SPARQL queries on your system? Wikidata query service is indeed very powerful, but it often time-outs. Here is a wikidata query example that I would like to perform with kgtk.

    Thanks. Best,

    Giorgio

    opened by GiorgioBarnabo 6
  • error when populating the cache - 02-kg-profiling.ipynb

    error when populating the cache - 02-kg-profiling.ipynb

    When executing ck.load_files_into_cache() at "02-kg-profiling.ipynb" on Colab, I've received this error message

    Incorrect number of bindings supplied. The current statement uses 4, and there are 3 supplied.

    opened by versant2612 5
Owner
USC ISI I2
USC ISI I2
Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

Yasunori Shimura 12 Nov 9, 2022
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

TorchRL Disclaimer This library is not officially released yet and is subject to change. The features are available before an official release so that

Meta Research 860 Jan 7, 2023
This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

How to Implement a First-Order Low-Pass Filter in Discrete Time We often teach or learn about filters in continuous time, but then need to implement t

Joshua Marshall 4 Aug 24, 2022
Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

PyOpenVINO - An Experimental Python Implementation of OpenVINO Inference Engine (minimum-set) Description The PyOpenVINO is a spin-off product from my

Yasunori Shimura 7 Oct 31, 2022
Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.

Easy Few-Shot Learning Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification. This repository is made for you

Sicara 399 Jan 8, 2023
Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

Active Learning with the Nvidia TLT Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT). In this tutorial, we will show you ho

Lightly 25 Dec 3, 2022
Tutorial to set up TensorFlow Object Detection API on the Raspberry Pi

A tutorial showing how to set up TensorFlow's Object Detection API on the Raspberry Pi

Evan 1.1k Dec 26, 2022
basic tutorial on pytorch

Quick Tutorial on PyTorch PyTorch Basics Linear Regression Logistic Regression Artificial Neural Networks Convolutional Neural Networks Recurrent Neur

null 7 Sep 15, 2022
A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

Evan 1.3k Jan 2, 2023
Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Streamlit Tutorials Install pip install streamlit Run cd [directory] streamlit run app.py --server.address 0.0.0.0 --server.port [your port] # http:/

Jihye Back 30 Jan 6, 2023
Yet Another Reinforcement Learning Tutorial

This repo contains self-contained RL implementations

Sungjoon 65 Dec 10, 2022
The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

SaxonJS-Tutorial-2021, version 1.0.4 Last updated on 4 November, 2021. Table of contents Background Prerequisites Starting a web server Running a Java

Saxonica 11 Oct 23, 2022
Hypersearch weight debugging and losses tutorial

tutorial Activate tensorboard option Running TensorBoard remotely When working on a remote server, you can use SSH tunneling to forward the port of th

null 1 Dec 11, 2021
Simulation code and tutorial for BBHnet training data

Simulation Dataset for BBHnet NOTE: OLD README, UPDATE IN PROGRESS We generate simulation dataset to train BBHnet, our deep learning framework for det

null 0 May 31, 2022
A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

YOLOv4 CrowdHuman Tutorial This is a tutorial demonstrating how to train a YOLOv4 people detector using Darknet and the CrowdHuman dataset. Table of c

JK Jung 118 Nov 10, 2022
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 25.1k Jan 2, 2023
The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algorithms that do the job in the least jargon possible and examples to guide you through every step of the way.

Anish 324 Dec 27, 2022
Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly Code for this paper Ultra-Data-Efficient GAN Tra

VITA 77 Oct 5, 2022