SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs

Related tags

Deep Learning smore
Overview

SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs

SMORE is a a versatile framework that scales multi-hop query embeddings over KGs. SMORE can easily train query embeddings on Freebase KG with more than 86M nodes and 338M edges on a single machine. For more details, please refer to our paper.

Overview

SMORE designs an optimized pipeline with the following features.

  • Multi-GPU Training
  • Bidirectional Online Query Sampling

Installation

First clone the repository, and install the package dependency required in the requirements.txt.

Then navigate to the root folder of the project and do

git submodule update --init
pip install -e .

Models

SMORE supports six different singel / multi-hop reasoning methods on KG.

Examples

Please see the example script of each methods under smore/training folder. We provide example scripts of the six query emebddings on six KGs.

Contributing

We welcome pull request, please check CONTRIBUTING.md for more details.

Citations

If you use this repo, please cite the following paper.

@article{
 ren2020scaling,
 title={Scaling up Logical Query Embeddings on Knowledge Graphs},
 author={Ren, Hongyu and Dai, Hanjun and Dai, Bo and Chen, Xinyun and Zhou, Denny and Leskovec, Jure and Schuurmans, Dale},
 year={2021}
}

License

SMORE is licensed under the Apache License, Version 2.0.

This is not an officially supported Google product.

Contact [email protected] and [email protected] for questions about the repo.

Comments
  • Wikikgv2 Model Training is Hanging

    Wikikgv2 Model Training is Hanging

    Hi,

    I'm having an issue where the model gets stuck while training. It typically happens early in the first epoch. Below is an example when running the smore/training/vec_scripts/train_shallow_wikikgv2.sh (unmodified sans the GPUs) on 4 NVIDIA RTX A6000 50GB GPUs.

    model_stuck

    It hangs forever unless I stop it with a keyboard interrupt. Doing so yields the following traceback (I only post a portion because it's very long and repetitive).

    model_traceback

    It seems like something is happening in the multiprocessing as it's hanging when sharing messages between processes.

    Any help would be appreciated! @hyren

    Thanks, Harry

    opened by HarryShomer 15
  • problem while installing

    problem while installing

    Thank you for your contribution. I'm trying to install this repository but met the following problem: ERROR: Command errored out with exit status 1: src/lib/edge_sampler.cpp:16:24: fatal error: ThreadPool.h: no such file or directory how could I fix it?

    opened by KeepYang 6
  • Code for generating candidate entities during evaluation for WikiKG

    Code for generating candidate entities during evaluation for WikiKG

    Hi, I noticed the evaluation data is loaded in this line https://github.com/google-research/smore/blob/5b1a8a00b0cbfa024f411fc080b3d46dc681edd8/smore/training/main_train.py#L212 for WikiKG.

    However, I cannot find the code for generating candidate tail entities during evaluation. Can you show more details?

    Thanks a lot!

    opened by chocolate9624 2
  • How to get the data including train_bidir.bin?

    How to get the data including train_bidir.bin?

    Hi Contributors, congratulations on your exciting work.

    A small problem that, I fail to find out how to get the data for training. After I download the complex query data by "wget http://data.neuralnoise.com/cqd-data.tgz" from "https://github.com/uclnlp/cqd", the "train_bidir.bin" file is still missing? How could I get theses files?

    Best Regards, Lei

    opened by leiloong 1
  • Confusion of the --cpu_num flag

    Confusion of the --cpu_num flag

    Hi there,

    We are running the train_concat_wikikgv2.sh in the training/complex_scripts folder, when we use the default setting (--cpu_num=8), it actually requires around 60 cpu cores which makes the code to be killed easily. And when I tried to reduce the number (even to zero), it's still get killed.

    So I'm a little bit confused how can we set the number cpu cores? Any help is appreciated!

    Thank you!

    opened by Juanhui28 0
  • Evaluation get stuck

    Evaluation get stuck

    Hi,

    Seems there is still a chance for the evalution to get stuck. When we run the train_shallow_wikikgv2.sh , it runs after 4799999 steps and gets stuck in the evaluation. When we stop it with keyboard interrupt, we got the following message:

    截屏2022-10-13 下午10 20 48

    And when we run the train_concat_wikikgv2.sh , it stucks at the first time for the evaluation. When we stop it with keyboard interrupt, it shows similar error messages with the train_shallow_wikikgv2.sh. 截屏2022-10-13 下午10 23 14

    Could you please help to check? Any help is appreciated!

    opened by Juanhui28 8
  • Training get stuck

    Training get stuck

    Hi,

    thanks for developing a useful tool for training larger-scale KG. However, when I use smore to train models like ComplexE or TransE on wikikgv2, it has about a 50% chance of getting stuck in the training step (i.e., after loading the data, and this can happen before or after the checkpoint save steps) . Have you encountered this issue?

    BTW, I only find training scripts for TransE and ComplexE, but there are 4 other KGE models, I wonder why they are not trained on wikikgv2, or is there anything need to pay attention to when writing the training scripts?

    Many thanks and look forward to your reply.

    opened by AprLie 5
  • Problem when loading the valid.pt file

    Problem when loading the valid.pt file

    I am working on WikiKG90Mv2 and after downloading the provided candidate file valid.pt file using the below code: def download_candidate_set(save_dir): valid_url = "https://snap.stanford.edu/smore/valid.pt" test_url = "https://snap.stanford.edu/smore/test.pt" if not os.path.exists(os.path.join(save_dir, "valid.pt")): url.download_url(valid_url, save_dir) if not os.path.exists(os.path.join(save_dir, "test.pt")): url.download_url(test_url, save_dir) I am having trouble opening the valid.pt file using the provided code: all_data = torch.load(os.path.join(args.eval_path, "%s.pt" % phase)) The error reported is as followed: RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory and my environment is python3.6+pytorch1.7.0+cuda11.0. Many thanks!

    opened by anyezhiya368 1
  • "pure virtual method called" when running vec/train_mpnet_wikikg90m.sh on GPU

    The program can run normally, but it will report an error at the end, when running vec/train_mpnet_wikikg90m.sh on GPU. The specific information is as follows: 2022-07-11 08:42:01,674 INFO --------------------------------------------------------------------------------------------- 2022-07-11 08:42:01,675 INFO Model Parameter Configuration: 2022-07-11 08:42:01,676 INFO Parameter relation_embedding.embedding: torch.Size([0, 200]), require_grad = True 2022-07-11 08:42:01,676 INFO Parameter entity_embedding.embedding: torch.Size([0, 200]), require_grad = True 2022-07-11 08:42:01,677 INFO Parameter center_net.layers: torch.Size([402, 200]), require_grad = True 2022-07-11 08:42:01,677 INFO Parameter feature_mod.entity_proj.weight: torch.Size([200, 768]), require_grad = True 2022-07-11 08:42:01,678 INFO Parameter feature_mod.entity_proj.bias: torch.Size([200]), require_grad = True 2022-07-11 08:42:01,678 INFO Parameter feature_mod.relation_proj.weight: torch.Size([200, 768]), require_grad = True 2022-07-11 08:42:01,678 INFO Parameter feature_mod.relation_proj.bias: torch.Size([200]), require_grad = True 2022-07-11 08:42:01,678 INFO Parameter Number: 388000 2022-07-11 08:42:01,679 INFO --------------------------------------------------------------------------------------------- 2022-07-11 08:42:01,679 INFO Geo: VecFeatured 2022-07-11 08:42:01,679 INFO Data Path: /gf3/home/zlz/data/knowledge_graphs/wikikg90m-v2 2022-07-11 08:42:01,679 INFO #entity: 91230610 2022-07-11 08:42:01,679 INFO #relation: 1387 2022-07-11 08:42:01,679 INFO #max steps: 1001 2022-07-11 08:42:01,680 INFO Evaluate unions using: DNF 2022-07-11 08:42:11,911 INFO Randomly Initializing VecFeatured Model... 2022-07-11 08:42:11,912 INFO tasks = 1p 2022-07-11 08:42:11,912 INFO init_step = 0 2022-07-11 08:42:11,912 INFO Training info: 2022-07-11 08:42:11,912 INFO 1p.-1p: infinite 2022-07-11 08:42:11,912 INFO Start Training... 2022-07-11 08:42:11,912 INFO learning_rate = 0 2022-07-11 08:42:11,912 INFO batch_size = 512 2022-07-11 08:42:11,912 INFO hidden_dim = 200 2022-07-11 08:42:11,912 INFO gamma = 10.000000 2022-07-11 08:42:11,913 INFO loading static entity+relation features from /gf3/home/zlz/data/knowledge_graphs/wikikg90m-v2/processed 2022-07-11 08:47:32,461 INFO [GPU 0] tasks: 1p.-1p overwritting args.save_path logging to ../logs/wikikg90m-v2/1p.-1p-1p/VecFeatured/g-10.0-mode-(feat-only-768,l2,)-adv-1.0-reg-1e-09-ngpu-0-os-(0,0,u,u,0,True,False)-dataset-(single,3000,e,True,before)-opt-(aggr,adagrad,cpu,False,5)-sharen-naive-lr_none/2022.07.11-08:42:01 r(e) r(e) step: 1000, t_read: 0.00140, t_fwd: 0.00535, t_loss: 0.00207, t_opt: 0.00091: 100%|██████████| 1001/1001 [00:11<00:00, 90.85it/s] pure virtual method called terminate called without an active exception /dat/zlz/anaconda3/envs/ZLZ/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 4 leaked semaphores to clean up at shutdown len(cache)) 2022-07-11 08:50:42,270 INFO Training finished!!

    How to fix this error? Thanks!

    opened by zhanglizhi15 0
  • dist_l2 forward error when running train_shallow_wikikgv2.sh for WikiKG90M-v2

    dist_l2 forward error when running train_shallow_wikikgv2.sh for WikiKG90M-v2

    Hi,

    Thanks for your exciting contributions to the open-source KG framework!

    I have followed the steps in README.md and README_wikikgv2.md to install the package and download the LSC WikiKG90M-v2 data. When I am trying out the baseline models, it reports the following error:

    Screen Shot 2022-07-02 at 5 17 34 PM

    I tried to identify the source of error, and it seems from dist_forward() function defined in extlib_cuda.cpp. I am using 4 TITAN RTX GPUs (24 GB each) with Driver Version: 440.100 and CUDA Version: 10.2.

    How to fix this error? Thanks!

    opened by chunlinli 1
Owner
Google Research
Google Research
Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Unsupervised-Multi-hop-QA This repository contains code and models for the paper: Unsupervised Multi-hop Question Answering by Question Generation (NA

Liangming Pan 70 Nov 27, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

null 16 Dec 14, 2022
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

SGLKT-VisDial Pytorch Implementation for the paper: Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer Gi-Cheon Kang, Junseok P

Gi-Cheon Kang 9 Jul 5, 2022
Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

Diversifying Commonsense Reasoning Generation on Knowledge Graph Introduction -- This is the pytorch implementation of our ACL 2022 paper "Diversifyin

DM2 Lab @ ND 61 Dec 30, 2022
Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

Rahul Nadkarni 41 Nov 30, 2022
TuckER: Tensor Factorization for Knowledge Graph Completion

TuckER: Tensor Factorization for Knowledge Graph Completion This codebase contains PyTorch implementation of the paper: TuckER: Tensor Factorization f

Ivana Balazevic 296 Dec 6, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

zshicode 1 Nov 18, 2021
Collective Multi-type Entity Alignment Between Knowledge Graphs (WWW'20)

CG-MuAlign A reference implementation for "Collective Multi-type Entity Alignment Between Knowledge Graphs", published in WWW 2020. If you find our pa

Bran Zhu 28 Dec 11, 2022
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

null 171 Dec 29, 2022
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

Microsoft 125 Jan 4, 2023
git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]

Commonsense Knowledge Base Completion with Structural and Semantic Context Code for the paper Commonsense Knowledge Base Completion with Structural an

AI2 96 Nov 5, 2022
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021] Abstract Analyzing complex scenes with DNN is a challenging ta

Irene Yuan 24 Jun 27, 2022
The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

null 4 Apr 20, 2022
MVP Benchmark for Multi-View Partial Point Cloud Completion and Registration

MVP Benchmark: Multi-View Partial Point Clouds for Completion and Registration [NEWS] 2021-07-12 [NEW ?? ] The submission on Codalab starts! 2021-07-1

PL 93 Dec 21, 2022
QA-GNN: Question Answering using Language Models and Knowledge Graphs

QA-GNN: Question Answering using Language Models and Knowledge Graphs This repo provides the source code & data of our paper: QA-GNN: Reasoning with L

Michihiro Yasunaga 434 Jan 4, 2023
ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

(Comet-) ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs Paper Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sa

AI2 152 Dec 27, 2022