Large-scale Knowledge Graph Construction with Prompting

ZJUNLP

Last update: Dec 28, 2022

Related tags

Text Data & NLP prompt knowledge-graph generation kg prompt-tuning genkgc promptkgc kgprompt

Overview

PromptKGC

Large-scale Knowledge Graph Construction with Prompting across tasks (predictive and generative), and modalities (language, image, vision + language, etc.)

GenKGC: link prediction as sequence-to-sequence generation for fast inference

KG-Prompt: data-efficient prompt learning-based knowledge graph completion

News

[Model Release] Jaunary, 2022: GenKGC - A sequence-to-sequence approach for knowledge graph completion.
[Model Release] January, 2022: KG-Prompt - A prompt learning-based approach for few-shot knowledge graph completion

Release

***** January, 2022: GenKGC | KG-Prompt release *****

GenKGC (Jaunary 31, 2020): GenKGC converts knowledge graph completion to sequence-to-sequence generation with pre-trained language model with relation-guided demonstration and entity-aware hierarchical decoding. It can obtain better or comparable performance than baselines, and achieve faster inference speed compared with previous methods with pre-trained language models. "From Discrimination to Generation: Knowledge Graph Completion with Generative Transformer "
KG-Prompt (Jaunary 31, 2020): A prompt-tuning approach (knowledge collaborative fine-tuning) for low-resource knowledge graph completion. KG-Prompt leverages the structured knowledge to construct the initial prompt template and learn the optimal templates, labels and model parameters through a collaborative fine-tuning algorithm. It can obtain state-of-the-art few-shot performance on FB15K-237, WN18RR, and UMLS. "Knowledge Collaborative Fine-tuning for Low-resource Knowledge Graph Completion Journal of Software 2022"

Contact Information

For help or issues using the models, please submit a GitHub issue.

For other communications, please contact Ningyu Zhang.

Comments

关于RetrievalRE复现的问题
您好，非常感谢你们能提供代码以及详细的文档，不过我在复现RetrievalRE相关的时候遇到了些问题。

我在复现Standard RE（文中table3）的semeval时，由于没有提供相关的脚本，我简单的复用k-shot的脚本如下

CUDA_VISIBLE_DEVICES=0 python main.py --max_epochs=30 --num_workers=8 \ --model_name_or_path roberta-large \ --accumulate_grad_batches 1 \ --batch_size 16 \ --data_dir dataset/semeval/ \ --check_val_every_n_epoch 1 \ --data_class WIKI80 \ --max_seq_length 256 \ --model_class RobertaForPrompt \ --t_lambda 0.001 \ --litmodel_class BertLitModel \ --task_name wiki80 \ --lr 3e-5 \ --use_template_words 0 \ --init_type_words 0 \ --output_dir output/semeval/all

得到了0.8895692105838618，经过knn_combine之后得到了0.8916030534351145，和table3中的90.4对不上，可能是相关的setting不对，能提供一下正确的setting吗，非常感谢。
help wanted
opened by Facico 4
Optimal hyperparameters

Hi,

May I have the optimal hyperparameters used for the reported results in the original paper of GenKGC. We ran the script of scripts/wn18rr.sh, while this set of hyperparameters gives the following result: There is still a gap between this one and the reported results. Could you please check whether the given hyperparameters are for the reported result?

BTW, your work is quite interesting and we would like to reference your result in our current work. But as the MRR metric has not been reported in GenKGC original paper, we have to run this code to get the MRR result. If possible, please help us to check it. It is really appreciated.
question

opened by ccchobits 3

[bug] UnboundLocalError: local variable 'hr' referenced before assignment

File "/home/xxx/projects/PromptKG/toolkit/lit_models/transformer.py", line 236, in test_step head_ids.append(hr[0]) UnboundLocalError: local variable 'hr' referenced before assignment

def test_step(self, batch, batch_idx):
        hr_vector = self.model(**batch)['hr_vector']
        scores = torch.mm(hr_vector, self.entity_embedding.t())
        bsz = len(batch['batch_data'])
        label = []
        head_ids = []
        for i in range(bsz):
            d = batch['batch_data'][i]
            head_ids.append(hr[0])
            inverse = d.inverse
            hr = tuple(d.hr)
            t = d.t
            label.append(t)
            idx = []
            if inverse:
                for hh in self.trainer.datamodule.filter_tr_to_h.get(hr, []):
                    if hh == t:
                        continue
                    idx.append(hh)
            else:
                for hh in self.trainer.datamodule.filter_tr_to_h.get(hr, []):
                    if hh == t:
                        continue
                    idx.append(hh)

            scores[i][idx] = -100
            # scores[i].index_fill_(0, idx, -1)
        rerank_by_graph(scores, head_ids)
        _, outputs = torch.sort(scores, dim=1, descending=True)
        _, outputs = torch.sort(outputs, dim=1)
        ranks = outputs[torch.arange(bsz), label].detach().cpu() + 1

        return dict(ranks=ranks)

opened by BabelTower 1

关于MetaQA的问题

您好，对于kbqa以及kgc领域都是非常棒的工作。但是关于MetaQA任务我有几点疑问，若您能解答，感激不尽~ 1，在此KGQA任务中，Promp思想体现在哪？（是.json数据集中的'triples'吗。） 2，1-hop下train.json、test.json中每条数据中‘triples’是如何构建的？（因为在原始MetaQA数据中并没有提供question所对应的subgraphs。） 3，您在复现KGT5工作时，是否采用了与原论文相同的kgc pretraining策略？（在框架中好像并没有该阶段所对应的代码。）再次感谢，还望能解答一下，祝好~！

opened by czh17 1
key error

I comes up with a running error of: File "data/processor.py", line 417, in _create_examples relation_names.append(rel2text[t])
KeyError: '/soccer/football_team/current_roster./soccer/football_roster_position/position' when I run the command "sh ./scripts/wn18rr.sh" in project GenKGC

opened by YangLing0818 1
Exception has occurred: ImportError cannot import name '_BaseLazyModule' from 'transformers.file_utils'

Error occurred when running main.py in PromptKGC project: Exception has occurred: ImportError cannot import name '_BaseLazyModule' from 'transformers.file_utils'

opened by pony-m 1
cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/home/mz20/.local/lib/python3.8/site-packages/torchmetrics/utilities/data.py)

Error occurs when running main.py : cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/home/mz20/.local/lib/python3.8/site-packages/torchmetrics/utilities/data.py)

opened by pony-m 1
Bump pytorch-lightning from 1.3.1 to 1.6.0 in /research/RetrievalRE
Bumps pytorch-lightning from 1.3.1 to 1.6.0.

Release notes

Sourced from pytorch-lightning's releases.

PyTorch Lightning 1.6: Support Intel's Habana Accelerator, New efficient DDP strategy (Bagua), Manual Fault-tolerance, Stability and Reliability.

The core team is excited to announce the PyTorch Lightning 1.6 release ⚡

Highlights

Backward Incompatible Changes

Full Changelog

Contributors

Highlights

PyTorch Lightning 1.6 is the work of 99 contributors who have worked on features, bug-fixes, and documentation for a total of over 750 commits since 1.5. This is our most active release yet. Here are some highlights:

Introducing Intel's Habana Accelerator

Lightning 1.6 now supports the Habana® framework, which includes Gaudi® AI training processors. Their heterogeneous architecture includes a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries and a configurable Matrix Math engine.

You can leverage the Habana hardware to accelerate your Deep Learning training workloads simply by passing:

trainer = pl.Trainer(accelerator="hpu") single Gaudi training trainer = pl.Trainer(accelerator="hpu", devices=1) distributed training with 8 Gaudi
trainer = pl.Trainer(accelerator="hpu", devices=8)

The Bagua Strategy

The Bagua Strategy is a deep learning acceleration framework that supports multiple, advanced distributed training algorithms with state-of-the-art system relaxation techniques. Enabling Bagua, which can be considerably faster than vanilla PyTorch DDP, is as simple as:

trainer = pl.Trainer(strategy="bagua") or to choose a custom algorithm
trainer = pl.Trainer(strategy=BaguaStrategy(algorithm="gradient_allreduce") # default

Towards stable Accelerator, Strategy, and Plugin APIs

The Accelerator, Strategy, and Plugin APIs are a core part of PyTorch Lightning. They're where all the distributed boilerplate lives, and we're constantly working to improve both them and the overall PyTorch Lightning platform experience.

In this release, we've made some large changes to achieve that goal. Not to worry, though! The only users affected by these changes are those who use custom implementations of Accelerator and Strategy (TrainingTypePlugin) as well as certain Plugins. In particular, we want to highlight the following changes:

All TrainingTypePlugins have been renamed to Strategy (#11120). Strategy is a more appropriate name because it encompasses more than simply training communcation. This change is now aligned with the changes we implemented in 1.5, which introduced the new strategy and devices flags to the Trainer.

... (truncated)

Changelog

Sourced from pytorch-lightning's changelog.

[1.6.0] - 2022-03-29

Added

Allow logging to an existing run ID in MLflow with MLFlowLogger (#12290)

Enable gradient accumulation using Horovod's backward_passes_per_step (#11911)

Add new DETAIL log level to provide useful logs for improving monitoring and debugging of batch jobs (#11008)

Added a flag SLURMEnvironment(auto_requeue=True|False) to control whether Lightning handles the requeuing (#10601)

Fault Tolerant Manual

Add _Stateful protocol to detect if classes are stateful (#10646)

Add _FaultTolerantMode enum used to track different supported fault tolerant modes (#10645)

Add a _rotate_worker_indices utility to reload the state according the latest worker (#10647)

Add stateful workers (#10674)

Add an utility to collect the states across processes (#10639)

Add logic to reload the states across data loading components (#10699)

Cleanup some fault tolerant utilities (#10703)

Enable Fault Tolerant Manual Training (#10707)

Broadcast the _terminate_gracefully to all processes and add support for DDP (#10638)

Added support for re-instantiation of custom (subclasses of) DataLoaders returned in the *_dataloader() methods, i.e., automatic replacement of samplers now works with custom types of DataLoader (#10680)

Added a function to validate if fault tolerant training is supported. (#10465)

Added a private callback to manage the creation and deletion of fault-tolerance checkpoints (#11862)

Show a better error message when a custom DataLoader implementation is not well implemented and we need to reconstruct it (#10719)

Show a better error message when frozen dataclass is used as a batch (#10927)

Save the Loop's state by default in the checkpoint (#10784)

Added Loop.replace to easily switch one loop for another (#10324)

Added support for --lr_scheduler=ReduceLROnPlateau to the LightningCLI (#10860)

Added LightningCLI.configure_optimizers to override the configure_optimizers return value (#10860)

Added LightningCLI(auto_registry) flag to register all subclasses of the registerable components automatically (#12108)

Added a warning that shows when max_epochs in the Trainer is not set (#10700)

Added support for returning a single Callback from LightningModule.configure_callbacks without wrapping it into a list (#11060)

Added console_kwargs for RichProgressBar to initialize inner Console (#10875)

Added support for shorthand notation to instantiate loggers with the LightningCLI (#11533)

Added a LOGGER_REGISTRY instance to register custom loggers to the LightningCLI (#11533)

Added info message when the Trainer arguments limit_*_batches, overfit_batches, or val_check_interval are set to 1 or 1.0 (#11950)

Added a PrecisionPlugin.teardown method (#10990)

Added LightningModule.lr_scheduler_step (#10249)

Added support for no pre-fetching to DataFetcher (#11606)

Added support for optimizer step progress tracking with manual optimization (#11848)

Return the output of the optimizer.step. This can be useful for LightningLite users, manual optimization users, or users overriding LightningModule.optimizer_step (#11711)

Teardown the active loop and strategy on exception (#11620)

Added a MisconfigurationException if user provided opt_idx in scheduler config doesn't match with actual optimizer index of its respective optimizer (#11247)

Added a loggers property to Trainer which returns a list of loggers provided by the user (#11683)

Added a loggers property to LightningModule which retrieves the loggers property from Trainer (#11683)

Added support for DDP when using a CombinedLoader for the training data (#11648)

Added a warning when using DistributedSampler during validation/testing (#11479)

Added support for Bagua training strategy (#11146)

Added support for manually returning a poptorch.DataLoader in a *_dataloader hook (#12116)

Added rank_zero module to centralize utilities (#11747)

Added a _Stateful support for LightningDataModule (#11637)

Added _Stateful support for PrecisionPlugin (#11638)

... (truncated)

Commits

44e3edb Cleanup CHANGELOG (#12507)

e3893b9 Merge pull request #12509 from RobertLaurella/patch-1

041da41 Remove TPU Availability check from parse devices (#12326)

4fe0076 Prepare for the 1.6.0 release

17215ed Fix titles capitalization in docs

a775804 Update Plugins doc (#12440)

71e25f3 Update CI in README.md (#12495)

c6cb634 Add usage of Jupyter magic command for loggers (#12333)

42169a2 Add typing to LightningModule.trainer (#12345)

2de6a9b Fix warning message formatting in save_hyperparameters (#12498)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Add LAMA Evaluation

Add LAMA evaluation to the KNNKGE model by an extra parameter '--lama_test' in scripts. Therefore users can evaluate their model on LAMA after training.

opened by TimelordRi 0

Owner

ZJUNLP

NLP Group of Knowledge Engine Lab at Zhejiang University

GitHub

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

309 Oct 19, 2022

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

189 Jan 2, 2023

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

IndoBERTweet ?? ???? 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe

40 Nov 30, 2022

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

377 Jan 2, 2023

Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

47 Dec 20, 2022

Tools for curating biomedical training data for large-scale language modeling

242 Dec 25, 2022

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

823 Dec 28, 2022

Convolutional 2D Knowledge Graph Embeddings resources

ConvE Convolutional 2D Knowledge Graph Embeddings resources. Paper: Convolutional 2D Knowledge Graph Embeddings Used in the paper, but do not use thes

586 Dec 24, 2022

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

7 Nov 2, 2022

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

36 Dec 18, 2022

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

13 Sep 8, 2022

Switch spaces for knowledge graph embeddings

SwisE Switch spaces for knowledge graph embeddings. Requirements: python3 pytorch numpy tqdm Reproduce the results To reproduce the reported results,

4 Dec 1, 2021

Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your text documents (yes, even PDF's) and images.

166 Jan 7, 2023

SpikeX - SpaCy Pipes for Knowledge Extraction

SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.

384 Dec 12, 2022

Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

10 Oct 21, 2022

A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

96 Dec 21, 2022

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库，可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

357 Dec 24, 2022

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

28 Nov 10, 2022

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT

41 Nov 18, 2022

Large-scale Knowledge Graph Construction with Prompting

Related tags

Overview

PromptKGC

News

Release

Contact Information

Comments

PyTorch Lightning 1.6: Support Intel's Habana Accelerator, New efficient DDP strategy (Bagua), Manual Fault-tolerance, Stability and Reliability.

Highlights

Introducing Intel's Habana Accelerator

single Gaudi training

distributed training with 8 Gaudi

The Bagua Strategy

or to choose a custom algorithm

Towards stable Accelerator, Strategy, and Plugin APIs

[1.6.0] - 2022-03-29

Added

Owner

ZJUNLP

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Code for text augmentation method leveraging large-scale language models

Tools for curating biomedical training data for large-scale language modeling

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

Convolutional 2D Knowledge Graph Embeddings resources

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Switch spaces for knowledge graph embeddings

Knowledge Management for Humans using Machine Learning & Tags

SpikeX - SpaCy Pipes for Knowledge Extraction

Extracting Summary Knowledge Graphs from Long Documents

A library for finding knowledge neurons in pretrained transformer models.

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?