Large-scale Knowledge Graph Construction with Prompting

Overview

PromptKGC

Large-scale Knowledge Graph Construction with Prompting across tasks (predictive and generative), and modalities (language, image, vision + language, etc.)

GenKGC: link prediction as sequence-to-sequence generation for fast inference

KG-Prompt: data-efficient prompt learning-based knowledge graph completion

News

  • [Model Release] Jaunary, 2022: GenKGC - A sequence-to-sequence approach for knowledge graph completion.
  • [Model Release] January, 2022: KG-Prompt - A prompt learning-based approach for few-shot knowledge graph completion

Release

***** January, 2022: GenKGC | KG-Prompt release *****

  • GenKGC (Jaunary 31, 2020): GenKGC converts knowledge graph completion to sequence-to-sequence generation with pre-trained language model with relation-guided demonstration and entity-aware hierarchical decoding. It can obtain better or comparable performance than baselines, and achieve faster inference speed compared with previous methods with pre-trained language models. "From Discrimination to Generation: Knowledge Graph Completion with Generative Transformer "
  • KG-Prompt (Jaunary 31, 2020): A prompt-tuning approach (knowledge collaborative fine-tuning) for low-resource knowledge graph completion. KG-Prompt leverages the structured knowledge to construct the initial prompt template and learn the optimal templates, labels and model parameters through a collaborative fine-tuning algorithm. It can obtain state-of-the-art few-shot performance on FB15K-237, WN18RR, and UMLS. "Knowledge Collaborative Fine-tuning for Low-resource Knowledge Graph Completion Journal of Software 2022"

Contact Information

For help or issues using the models, please submit a GitHub issue.

For other communications, please contact Ningyu Zhang.

Comments
  • 关于RetrievalRE复现的问题

    关于RetrievalRE复现的问题

    您好,非常感谢你们能提供代码以及详细的文档,不过我在复现RetrievalRE相关的时候遇到了些问题。

    我在复现Standard RE(文中table3)的semeval时,由于没有提供相关的脚本,我简单的复用k-shot的脚本如下

    CUDA_VISIBLE_DEVICES=0 python main.py --max_epochs=30  --num_workers=8 \
        --model_name_or_path roberta-large \
        --accumulate_grad_batches 1 \
        --batch_size 16 \
        --data_dir dataset/semeval/ \
        --check_val_every_n_epoch 1 \
        --data_class WIKI80 \
        --max_seq_length 256 \
        --model_class RobertaForPrompt \
        --t_lambda 0.001 \
        --litmodel_class BertLitModel \
        --task_name wiki80 \
        --lr 3e-5 \
        --use_template_words 0 \
        --init_type_words 0 \
        --output_dir output/semeval/all
    

    得到了0.8895692105838618,经过knn_combine之后得到了0.8916030534351145,和table3中的90.4对不上,可能是相关的setting不对,能提供一下正确的setting吗,非常感谢。

    help wanted 
    opened by Facico 4
  • Optimal hyperparameters

    Optimal hyperparameters

    Hi,

    May I have the optimal hyperparameters used for the reported results in the original paper of GenKGC. We ran the script of scripts/wn18rr.sh, while this set of hyperparameters gives the following result: image There is still a gap between this one and the reported results. Could you please check whether the given hyperparameters are for the reported result?

    BTW, your work is quite interesting and we would like to reference your result in our current work. But as the MRR metric has not been reported in GenKGC original paper, we have to run this code to get the MRR result. If possible, please help us to check it. It is really appreciated.

    question 
    opened by ccchobits 3
  • [bug] UnboundLocalError: local variable 'hr' referenced before assignment

    [bug] UnboundLocalError: local variable 'hr' referenced before assignment

    File "/home/xxx/projects/PromptKG/toolkit/lit_models/transformer.py", line 236, in test_step head_ids.append(hr[0]) UnboundLocalError: local variable 'hr' referenced before assignment

    def test_step(self, batch, batch_idx):
            hr_vector = self.model(**batch)['hr_vector']
            scores = torch.mm(hr_vector, self.entity_embedding.t())
            bsz = len(batch['batch_data'])
            label = []
            head_ids = []
            for i in range(bsz):
                d = batch['batch_data'][i]
                head_ids.append(hr[0])
                inverse = d.inverse
                hr = tuple(d.hr)
                t = d.t
                label.append(t)
                idx = []
                if inverse:
                    for hh in self.trainer.datamodule.filter_tr_to_h.get(hr, []):
                        if hh == t:
                            continue
                        idx.append(hh)
                else:
                    for hh in self.trainer.datamodule.filter_tr_to_h.get(hr, []):
                        if hh == t:
                            continue
                        idx.append(hh)
    
                scores[i][idx] = -100
                # scores[i].index_fill_(0, idx, -1)
            rerank_by_graph(scores, head_ids)
            _, outputs = torch.sort(scores, dim=1, descending=True)
            _, outputs = torch.sort(outputs, dim=1)
            ranks = outputs[torch.arange(bsz), label].detach().cpu() + 1
    
            return dict(ranks=ranks)
    
    opened by BabelTower 1
  • 关于MetaQA的问题

    关于MetaQA的问题

    您好,对于kbqa以及kgc领域都是非常棒的工作。但是关于MetaQA任务我有几点疑问,若您能解答,感激不尽~ 1,在此KGQA任务中,Promp思想体现在哪?(是.json数据集中的'triples'吗。) 2,1-hop下train.json、test.json中每条数据中‘triples’是如何构建的?(因为在原始MetaQA数据中并没有提供question所对应的subgraphs。) 3,您在复现KGT5工作时,是否采用了与原论文相同的kgc pretraining策略?(在框架中好像并没有该阶段所对应的代码。) 再次感谢,还望能解答一下,祝好~!

    opened by czh17 1
  • key error

    key error

    I comes up with a running error of: File "data/processor.py", line 417, in _create_examples relation_names.append(rel2text[t])
    KeyError: '/soccer/football_team/current_roster./soccer/football_roster_position/position' when I run the command "sh ./scripts/wn18rr.sh" in project GenKGC

    opened by YangLing0818 1
  • Exception has occurred: ImportError cannot import name '_BaseLazyModule' from 'transformers.file_utils'

    Exception has occurred: ImportError cannot import name '_BaseLazyModule' from 'transformers.file_utils'

    Error occurred when running main.py in PromptKGC project: Exception has occurred: ImportError cannot import name '_BaseLazyModule' from 'transformers.file_utils'

    opened by pony-m 1
  • cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/home/mz20/.local/lib/python3.8/site-packages/torchmetrics/utilities/data.py)

    cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/home/mz20/.local/lib/python3.8/site-packages/torchmetrics/utilities/data.py)

    Error occurs when running main.py : cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/home/mz20/.local/lib/python3.8/site-packages/torchmetrics/utilities/data.py)

    opened by pony-m 1
  • Bump pytorch-lightning from 1.3.1 to 1.6.0 in /research/RetrievalRE

    Bump pytorch-lightning from 1.3.1 to 1.6.0 in /research/RetrievalRE

    Bumps pytorch-lightning from 1.3.1 to 1.6.0.

    Release notes

    Sourced from pytorch-lightning's releases.

    PyTorch Lightning 1.6: Support Intel's Habana Accelerator, New efficient DDP strategy (Bagua), Manual Fault-tolerance, Stability and Reliability.

    The core team is excited to announce the PyTorch Lightning 1.6 release ⚡

    Highlights

    PyTorch Lightning 1.6 is the work of 99 contributors who have worked on features, bug-fixes, and documentation for a total of over 750 commits since 1.5. This is our most active release yet. Here are some highlights:

    Introducing Intel's Habana Accelerator

    Lightning 1.6 now supports the Habana® framework, which includes Gaudi® AI training processors. Their heterogeneous architecture includes a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries and a configurable Matrix Math engine.

    You can leverage the Habana hardware to accelerate your Deep Learning training workloads simply by passing:

    trainer = pl.Trainer(accelerator="hpu")
    

    single Gaudi training

    trainer = pl.Trainer(accelerator="hpu", devices=1)

    distributed training with 8 Gaudi

    trainer = pl.Trainer(accelerator="hpu", devices=8)

    The Bagua Strategy

    The Bagua Strategy is a deep learning acceleration framework that supports multiple, advanced distributed training algorithms with state-of-the-art system relaxation techniques. Enabling Bagua, which can be considerably faster than vanilla PyTorch DDP, is as simple as:

    trainer = pl.Trainer(strategy="bagua")
    

    or to choose a custom algorithm

    trainer = pl.Trainer(strategy=BaguaStrategy(algorithm="gradient_allreduce") # default

    Towards stable Accelerator, Strategy, and Plugin APIs

    The Accelerator, Strategy, and Plugin APIs are a core part of PyTorch Lightning. They're where all the distributed boilerplate lives, and we're constantly working to improve both them and the overall PyTorch Lightning platform experience.

    In this release, we've made some large changes to achieve that goal. Not to worry, though! The only users affected by these changes are those who use custom implementations of Accelerator and Strategy (TrainingTypePlugin) as well as certain Plugins. In particular, we want to highlight the following changes:

    • All TrainingTypePlugins have been renamed to Strategy (#11120). Strategy is a more appropriate name because it encompasses more than simply training communcation. This change is now aligned with the changes we implemented in 1.5, which introduced the new strategy and devices flags to the Trainer.

    ... (truncated)

    Changelog

    Sourced from pytorch-lightning's changelog.

    [1.6.0] - 2022-03-29

    Added

    • Allow logging to an existing run ID in MLflow with MLFlowLogger (#12290)
    • Enable gradient accumulation using Horovod's backward_passes_per_step (#11911)
    • Add new DETAIL log level to provide useful logs for improving monitoring and debugging of batch jobs (#11008)
    • Added a flag SLURMEnvironment(auto_requeue=True|False) to control whether Lightning handles the requeuing (#10601)
    • Fault Tolerant Manual
      • Add _Stateful protocol to detect if classes are stateful (#10646)
      • Add _FaultTolerantMode enum used to track different supported fault tolerant modes (#10645)
      • Add a _rotate_worker_indices utility to reload the state according the latest worker (#10647)
      • Add stateful workers (#10674)
      • Add an utility to collect the states across processes (#10639)
      • Add logic to reload the states across data loading components (#10699)
      • Cleanup some fault tolerant utilities (#10703)
      • Enable Fault Tolerant Manual Training (#10707)
      • Broadcast the _terminate_gracefully to all processes and add support for DDP (#10638)
    • Added support for re-instantiation of custom (subclasses of) DataLoaders returned in the *_dataloader() methods, i.e., automatic replacement of samplers now works with custom types of DataLoader (#10680)
    • Added a function to validate if fault tolerant training is supported. (#10465)
    • Added a private callback to manage the creation and deletion of fault-tolerance checkpoints (#11862)
    • Show a better error message when a custom DataLoader implementation is not well implemented and we need to reconstruct it (#10719)
    • Show a better error message when frozen dataclass is used as a batch (#10927)
    • Save the Loop's state by default in the checkpoint (#10784)
    • Added Loop.replace to easily switch one loop for another (#10324)
    • Added support for --lr_scheduler=ReduceLROnPlateau to the LightningCLI (#10860)
    • Added LightningCLI.configure_optimizers to override the configure_optimizers return value (#10860)
    • Added LightningCLI(auto_registry) flag to register all subclasses of the registerable components automatically (#12108)
    • Added a warning that shows when max_epochs in the Trainer is not set (#10700)
    • Added support for returning a single Callback from LightningModule.configure_callbacks without wrapping it into a list (#11060)
    • Added console_kwargs for RichProgressBar to initialize inner Console (#10875)
    • Added support for shorthand notation to instantiate loggers with the LightningCLI (#11533)
    • Added a LOGGER_REGISTRY instance to register custom loggers to the LightningCLI (#11533)
    • Added info message when the Trainer arguments limit_*_batches, overfit_batches, or val_check_interval are set to 1 or 1.0 (#11950)
    • Added a PrecisionPlugin.teardown method (#10990)
    • Added LightningModule.lr_scheduler_step (#10249)
    • Added support for no pre-fetching to DataFetcher (#11606)
    • Added support for optimizer step progress tracking with manual optimization (#11848)
    • Return the output of the optimizer.step. This can be useful for LightningLite users, manual optimization users, or users overriding LightningModule.optimizer_step (#11711)
    • Teardown the active loop and strategy on exception (#11620)
    • Added a MisconfigurationException if user provided opt_idx in scheduler config doesn't match with actual optimizer index of its respective optimizer (#11247)
    • Added a loggers property to Trainer which returns a list of loggers provided by the user (#11683)
    • Added a loggers property to LightningModule which retrieves the loggers property from Trainer (#11683)
    • Added support for DDP when using a CombinedLoader for the training data (#11648)
    • Added a warning when using DistributedSampler during validation/testing (#11479)
    • Added support for Bagua training strategy (#11146)
    • Added support for manually returning a poptorch.DataLoader in a *_dataloader hook (#12116)
    • Added rank_zero module to centralize utilities (#11747)
    • Added a _Stateful support for LightningDataModule (#11637)
    • Added _Stateful support for PrecisionPlugin (#11638)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Add LAMA Evaluation

    Add LAMA Evaluation

    Add LAMA evaluation to the KNNKGE model by an extra parameter '--lama_test' in scripts. Therefore users can evaluate their model on LAMA after training.

    opened by TimelordRi 0
Owner
ZJUNLP
NLP Group of Knowledge Engine Lab at Zhejiang University
ZJUNLP
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

Maluuba Inc. 309 Oct 19, 2022
null 189 Jan 2, 2023
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

IndoBERTweet ?? ???? 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe

IndoLEM 40 Nov 30, 2022
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

OpenBMB 377 Jan 2, 2023
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

NAVER AI 47 Dec 20, 2022
Tools for curating biomedical training data for large-scale language modeling

Tools for curating biomedical training data for large-scale language modeling

BigScience Workshop 242 Dec 25, 2022
Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

wangle 823 Dec 28, 2022
Convolutional 2D Knowledge Graph Embeddings resources

ConvE Convolutional 2D Knowledge Graph Embeddings resources. Paper: Convolutional 2D Knowledge Graph Embeddings Used in the paper, but do not use thes

Tim Dettmers 586 Dec 24, 2022
open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

null 7 Nov 2, 2022
Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

Akuchi 36 Dec 18, 2022
A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

NEC Laboratories Europe 13 Sep 8, 2022
Switch spaces for knowledge graph embeddings

SwisE Switch spaces for knowledge graph embeddings. Requirements: python3 pytorch numpy tqdm Reproduce the results To reproduce the reported results,

Shuai Zhang 4 Dec 1, 2021
Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your text documents (yes, even PDF's) and images.

Ravn Tech, Inc. 166 Jan 7, 2023
SpikeX - SpaCy Pipes for Knowledge Extraction

SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.

Erre Quadro Srl 384 Dec 12, 2022
Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

Zeqiu (Ellen) Wu 10 Oct 21, 2022
A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

EleutherAI 96 Dec 21, 2022
DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

liuhuanyong 357 Dec 24, 2022
Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chenhe Dong 28 Nov 10, 2022
EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT

DMIS Laboratory - Korea University 41 Nov 18, 2022