The source code of HeCo

Nian Liu

Last update: Dec 27, 2022

Related tags

Text Data & NLP HeCo

Overview

HeCo

This repo is for source code of KDD 2021 paper "Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning".
Paper Link: https://arxiv.org/abs/2105.09111

Environment Settings

python==3.8.5
scipy==1.5.4
torch==1.7.0
numpy==1.19.2
scikit_learn==0.24.2

GPU: GeForce RTX 2080 Ti
CPU: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz

Usage

Fisrt, go into ./code, and then you can use the following commend to run our model:

python main.py acm --gpu=0

Here, "acm" can be replaced by "dblp", "aminer" or "freebase".

Some tips in parameters

We suggest you to carefully select the “pos_num” (existed in ./data/pos.py) to ensure the threshold of postives for every node. This is very important to final results. Of course, more effective way to select positives is welcome.
In ./code/utils/params.py, except "lr" and "patience", meticulously tuning dropout and tau is applaudable.
In our experiments, we only assign target type of nodes with original features, but assign other type of nodes with one-hot. This is because most of datasets used only provide features of target nodes in their original version. So, we believe in that if high-quality features of other type of nodes are provided, the overall results will improve a lot. The AMiner dataset is an example. In this dataset, there are not original features, so every type of nodes are all asigned with one-hot. In other words, every node has the same quality of features, and in this case, our HeCo is far ahead of other baselines. So, we strongly suggest that if you have high-quality features for other type of nodes, try it!

Cite

Contact

If you have any questions, please feel free to contact me with [email protected]

Comments

您好，请问代码中pos.py选取正例的方法和论文描述的是否有出入？

以acm数据集为例，代码中pos.py选取正例时，load进来的pap和psp并没有表示两篇文章的在对应元路径下的连接条数，因为mp_gen.py生成的pap与psp只表示两篇paper是否有元路径相连，所以并没有考虑两篇文章的连接强度；但论文中说的是会根据两篇文章的多种元路径连接总数进行排序，然后选择具有最多元路径条数的5个点作为正例。也就是论文和代码在这里有些不一致，所以想请问您正例的选择方式到底应该是怎样的？

opened by moonyeaL 6
Positive Sample

Hello, the method of calculating the positive example proposed in the paper is very unique! However, some people are worried about the choice of its threshold. Most positive examples are actually pseudo positive examples, especially those with high threshold. For example DBLP, there are 1000, and more than half of them are inconsistent labels. What impact does this have on the model

opened by DuanhaoranCC 4
Neighbor预处理问题

https://github.com/liun-online/HeCo/blob/81fbfad8c91147ab881c858e393cc69f2de491db/data/neibor.py#L17

dict.values()可以保证neighbor列表按照节点的索引序排列吗？我测了以下好像是乱序的，请问这是什么原因呢？

opened by KennyNH 3
几个重要但却迷糊的问题？

您好，看完文章，仍然存在几个疑惑，非常期待您能给出解答。 1、target node 是不是指的就是我们要去进行分类以及聚类的那一类节点（比如paper）？进一步讲，该方法是不是无法得到异构图中非target类型节点的表征，从而解决链接预测（比如paper与subject之间）等问题？ 2、文中讲z^mp用于下游的应用（节点分类，聚类），是因为目标类型节点显式地（explicitly）参与到了z^mp的生成之中。这怎么理解，是指对于一个目标节点（target node i），其原路径邻居节点（meta-path based neighbors）中只包含了与目标节点同种类型的节点？ 3、eq(6)是GCN中拉普拉斯矩阵与节点特征进行聚合的方式，但是实际上GCN中还存在可训练参数W，为什么这里却没有，是省略了，还是这里确实就没有，如果本来就没有，背后的道理是啥？另外，看过你在文章中给出的例子，对于节点i的原路径邻居节点仍然存在疑惑，具体来讲，对于一个目标节点而言，其一条原路径上所有节点（包含目标类型节点和非目标类型节点）都转化为它的一阶邻居节点?路径上其他节点之间的边是否仍然保留，然后再进行信息聚合？ 4、可以举个更详细点的例子，说明节点原始特征x_i是如何初始化的嘛？具体而言，图中所有类型的节点都是有original attribute的吗，遇到没有原始特征/属性的类型节点怎么办（随机初始化）？

opened by HuizhaoWang 3
The first term of Eq(4) is right?

hello，eq(4) is to compute the contribution of different type neighbors for node i, why the first term in eq(4) need to sum then average all node V in graph. In the words, the first term in eq(4) should be remove some parts.

opened by HuizhaoWang 2
Unable to reproduce the results for ACM. Results highly depend on random seed.

Hello!

Using the default hyperparameters and the datasets provided in this repo, I am unable to reproduce the results for ACM. F1 macro, F1 micro and AUC are consistently around 1-2% worse than the results published in the paper.

I have also discovered that results highly depend on the random seed. Sometimes, a good seed can give very good results, while a bad seed can be -2% or -5%. This is only observed in ACM dataset, and to a smaller extent in AMiner dataset. For DBLP and Freebase, the results don't vary as much with random seed. I wonder if you observe the same problem?

For the reported results, randomly run 10 times and report the average results, did you run the experiments 10 times with different seeds? I saw that the seed for each dataset is fixed in params.py. Did you tune the seed for each dataset?

Thank you

opened by gau-nernst 1
Model Framework Diagram

Hello dear author!

I just got started with heterogeneous graph neural network. I would like to ask you how to draw the model frame diagram. I have been using PPT all the time, but I don't know how to draw the heterogeneous graph.

Thank you very much for your patient guidance.

opened by DuanhaoranCC 1

Owner

Nian Liu

GitHub

When doing audio and video sentiment recognition, I found that a lot of code is duplicated, often a function in different time debugging for a long time, based on this problem, I want to manage all the previous work, organized into an open source library can be iterative. For their own use and others.

FastAudioVisual Our project is developed here. The goal finish time is March 01, 2021 What is FastAudioVisual? FastAudioVisual is a tool that allows u

39 Oct 27, 2022

This is the source code of RPG (Reward-Randomized Policy Gradient)

RPG (Reward-Randomized Policy Gradient) Zhenggang Tang*, Chao Yu*, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu (

40 Nov 25, 2022

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 8, 2022

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

49 Dec 17, 2022

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline For NLP with Bloom's Taxonomy Using Improved Question Classification and Question Generation using Deep Learning This repository contains all

9 Jul 17, 2021

Open source code for AlphaFold.

AlphaFold This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP

9.7k Jan 2, 2023

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

2.3k Jan 1, 2023

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations Created by Jiahao Pang, Duanshun Li, and Dong Tian from InterDigital In

21 Dec 29, 2022

Source code for CsiNet and CRNet using Fully Connected Layer-Shared feedback architecture.

FCS-applications Source code for CsiNet and CRNet using the Fully Connected Layer-Shared feedback architecture. Introduction This repository contains

4 Oct 7, 2022

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

10 Jul 1, 2022

Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

947 Dec 28, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Dec 30, 2022

An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

Neural Networks and Deep Learning lab, MIPT

6k Dec 30, 2022

NLTK Source

Natural Language Toolkit (NLTK) NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets, and tutorials supporting

11.4k Jan 4, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

The source code of HeCo

Related tags

Overview

HeCo

Environment Settings

Usage

Some tips in parameters

Cite

Contact

Comments

您好，请问代码中pos.py选取正例的方法和论文描述的是否有出入？

Positive Sample

Neighbor预处理问题

几个重要但却迷糊的问题？

The first term of Eq(4) is right?

Unable to reproduce the results for ACM. Results highly depend on random seed.

Model Framework Diagram

Owner

Nian Liu

This is the source code of RPG (Reward-Randomized Policy Gradient)

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Open source code for AlphaFold.

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

Source code for CsiNet and CRNet using Fully Connected Layer-Shared feedback architecture.

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Guide to using pre-trained large language models of source code

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

An open source library for deep learning end-to-end dialog systems and chatbots.

NLTK Source

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

An open-source NLP research library, built on PyTorch.

Open Source Neural Machine Translation in PyTorch

An open source library for deep learning end-to-end dialog systems and chatbots.

An Open-Source Package for Neural Relation Extraction (NRE)