Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan

Last update: Dec 18, 2022

Related tags

Deep Learning graphs signac node-classification graph-neural-networks neurips-2020 heterophily

Overview

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs

Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. 2020. Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs. Advances in Neural Information Processing Systems 33 (2020).

[Paper] [Poster] [Slides]

Requirements

Basic Requirements

Python >= 3.7 (tested on 3.8)
signac: this package utilizes signac to manage experiment data and jobs. signac can be installed with the following command:
```
pip install signac==1.1 signac-flow==0.7.1 signac-dashboard
```
Note that the latest version of signac may cause incompatibility.
numpy (tested on 1.18.5)
scipy (tested on 1.5.0)
networkx >= 2.4 (tested on 2.4)
scikit-learn (tested on 0.23.2)

For `H2GCN`

TensorFlow >= 2.0 (tested on 2.2)

Note that it is possible to use H2GCN without signac and scikit-learn on your own data and experimental framework.

For baselines

We also include the code for the baseline methods in the repository. These code are mostly the same as the reference implementations provided by the authors, with our modifications to add JK-connections, interoperability with our experimental pipeline, etc. For the requirements to run these baselines, please refer to the instructions provided by the original authors of the corresponding code, which could be found in each folder under /baselines.

As a general note, TensorFlow 1.15 can be used for all code requiring TensorFlow 1.x; for PyTorch, it is usually fine to use PyTorch 1.6; all code should be able to run under Python >= 3.7. In addition, the basic requirements must also be met.

Usage

Download Datasets

The datasets can be downloaded using the bash scripts provided in /experiments/h2gcn/scripts, which also prepare the datasets for use in our experimental framework based on signac.

We make use of signac to index and manage the datasets: the datasets and experiments are stored in hierarchically organized signac jobs, with the 1st level storing different graphs, 2nd level storing different sets of features, and 3rd level storing different training-validation-test splits. Each level contains its own state points and job documents to differentiate with other jobs.

Use signac schema to list all available properties in graph state points; use signac find to filter graphs using properties in the state points:

cd experiments/h2gcn/

# List available properties in graph state points
signac schema

# Find graphs in syn-products with homophily level h=0.1
signac find numNode 10000 h 0.1

# Find real benchmark "Cora"
signac find benchmark true datasetName\.\$regex "cora"

/experiments/h2gcn/utils/signac_tools.py provides helpful functions to iterate through the data space in Python; more usages of signac can be found in these documents.

Replicate Experiments with `signac`

To replicate our experiments of each model on specific datasets, use Python scripts in /experiments/h2gcn, and the corresponding JSON config files in /experiments/h2gcn/configs. For example, to run H2GCN on our synthetic benchmarks syn-cora:
```
cd experiments/h2gcn/
python run_hgcn_experiments.py -c configs/syn-cora/h2gcn.json [-i] run [-p PARALLEL_NUM]
```
- Files and results generated in experiments are also stored with signac on top of the hierarchical order introduced above: the 4th level separates different models, and the 5th level stores files and results generated in different runs with different parameters of the same model.
- By default, stdout and stderr of each model are stored in terminal_output.log in the 4th level; use -i if you want to see them through your terminal.
- Use -p if you want to run experiments in parallel on multiple graphs (1st level).
- Baseline models can be run through the following scripts:
  - GCN, GCN-Cheby, GCN+JK and GCN-Cheby+JK: run_gcn_experiments.py
  - GraphSAGE, GraphSAGE+JK: run_graphsage_experiments.py
  - MixHop: run_mixhop_experiments.py
  - GAT: run_gat_experiments.py
  - MLP: run_hgcn_experiments.py
To summarize experiment results of each model on specific datasets to a CSV file, use Python script /experiments/h2gcn/run_experiments_summarization.py with the corresponding model name and config file. For example, to summarize H2GCN results on our synthetic benchmark syn-cora:
```
cd experiments/h2gcn/
python run_experiments_summarization.py h2gcn -f configs/syn-cora/h2gcn.json
```
To list all paths of the 3rd level datasets splits used in a experiment (in planetoid format) without running experiments, use the following command:
```
cd experiments/h2gcn/
python run_hgcn_experiments.py -c configs/syn-cora/h2gcn.json --check_paths run
```

Standalone H2GCN Package

Our implementation of H2GCN is stored in the h2gcn folder, which can be used as a standalone package on your own data and experimental framework.

Example usages:

H2GCN-2

cd h2gcn
python run_experiments.py H2GCN planetoid \
  --dataset ind.citeseer \
  --dataset_path ../baselines/gcn/gcn/data/

H2GCN-1

cd h2gcn
python run_experiments.py H2GCN planetoid \
  --network_setup M64-R-T1-G-V-C1-D0.5-MO \
  --dataset ind.citeseer \
  --dataset_path ../baselines/gcn/gcn/data/

Use --help for more advanced usages:

python run_experiments.py H2GCN planetoid --help

We only support datasets stored in planetoid format. You could also add support to different data formats and models beyond H2GCN by adding your own modules to /h2gcn/datasets and /h2gcn/models, respectively; check out ou code for more details.

Contact

Please contact Jiong Zhu ([email protected]) in case you have any questions.

Citation

Please cite our paper if you make use of this code in your own work:

@article{zhu2020beyond,
  title={Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs},
  author={Zhu, Jiong and Yan, Yujun and Zhao, Lingxiao and Heimann, Mark and Akoglu, Leman and Koutra, Danai},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

Comments

run H2GCN alone on graph with heterophily

Sorry to bother you with another question: How can I run H2GCN alone on graph with heterophily? After downloading them, how to load them without signac?

opened by ShunliRen 9
Can not run H2GCN on Actor, WebKB and Wikipedia Dataset

Hello,

According to your repository you have provided the citation (cora, citeseer, pubmed) datasets. I want to reproduce the results of H2GCN in Actor, WebKB (Cornell, Texas, Wisconsin) and Wikipedia (Chameleon, Squirrel) datasets.

But the citation datasets are transformed into .x, .y, .allx, .ally, .graph, .test.index in these format while other datasets mentioned above are not available in internet in these format.

Can you please provide the utility code which you used to load these datasets to run H2GCN on them?

opened by AmitRoy7781 3
ImportError: cannot import name 'FlowProject' from 'flow'

While trying to recreate experiments from experiments/h2gcn/ I am encountering this issue ImportError: cannot import name 'FlowProject' from 'flow' I cannot resolve it; can you help?

opened by ScottHoang 2

How to run baseline model?

Hi, I want to run baseline models with

cd  experiments/h2gcn  
python run_graphsage_experiments.py -c configs/real-geomgcn/graphsage.json run --debug

I got the following error:

[07-03 04:53:12 run_model@4c95f334af46cefe80977d9c96123445     INFO] Traceback (most recent call last):
[07-03 04:53:12 run_model@4c95f334af46cefe80977d9c96123445     INFO] File "model.py", line 10, in <module>
[07-03 04:53:12 run_model@4c95f334af46cefe80977d9c96123445     INFO] import dataset
[07-03 04:53:12 run_model@4c95f334af46cefe80977d9c96123445     INFO] ModuleNotFoundError: No module named 'dataset'
[07-03 04:53:13 run_model@4c95f334af46cefe80977d9c96123445    ERROR] Check log at /H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments/terminal_output.log

And the terminal_output.log contains:

[07-03 04:43:07     INFO] ===============
>>>>Executing command ['/home/anaconda3/bin/python', '-u', 'model.py', '--dataset_path', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb', '--dataset', 'ind.cora-unmodified-0.48p__0.2p', '--run_id=H2GCN --network_setup M64-T1-G-V-C1-MO --adj_nhood 1 2 --l2_regularize_weight 1e-5@ce7091201460a2a7d3384b919b781b63_9caf91da0e68a2cb7a0d7b4c648929c4_20c7d1b09c28a20e6fa374cc6c8544e8_8b2d686c68a5e9f7ff474419f73f3a6d_0c634e96f738e8800231132e2848e303_248ec654a2e8d5f30df7ac8ef12d3cf5_a4007892967f684b0be93efd41b1d2f3_83150345b828a1192e923bc2e926b66b', '--use_signac', '--signac_root', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments', '--val_size', '1019', 'H2GCN', '--network_setup', 'M64-T1-G-V-C1-MO', '--adj_nhood', '1', '2', '--l2_regularize_weight', '1e-5']
===============
[07-03 04:43:08     INFO] Traceback (most recent call last):
[07-03 04:43:08     INFO] File "model.py", line 10, in <module>
[07-03 04:43:08     INFO] import dataset
[07-03 04:43:08     INFO] ModuleNotFoundError: No module named 'dataset'
[07-03 04:43:08    ERROR] Check log at /H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments/terminal_output.log
[07-03 04:45:57     INFO] ===============
>>>>Executing command ['/home/anaconda3/bin/python', '-u', 'model.py', '--dataset_path', ’/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb', '--dataset', 'ind.cora-unmodified-0.48p__0.2p', '--run_id=H2GCN --network_setup M64-T1-G-V-C1-MO --adj_nhood 1 2 --l2_regularize_weight 1e-5@ce7091201460a2a7d3384b919b781b63_9caf91da0e68a2cb7a0d7b4c648929c4_20c7d1b09c28a20e6fa374cc6c8544e8_8b2d686c68a5e9f7ff474419f73f3a6d_0c634e96f738e8800231132e2848e303_248ec654a2e8d5f30df7ac8ef12d3cf5_a4007892967f684b0be93efd41b1d2f3_83150345b828a1192e923bc2e926b66b', '--use_signac', '--signac_root', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments', '--val_size', '1019', 'H2GCN', '--network_setup', 'M64-T1-G-V-C1-MO', '--adj_nhood', '1', '2', '--l2_regularize_weight', '1e-5']
===============
[07-03 04:45:58     INFO] Traceback (most recent call last):
[07-03 04:45:58     INFO] File "model.py", line 10, in <module>
[07-03 04:45:58     INFO] import dataset
[07-03 04:45:58     INFO] ModuleNotFoundError: No module named 'dataset'
[07-03 04:45:58    ERROR] Check log at /H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments/terminal_output.log
[07-03 04:53:11     INFO] ===============
>>>>Executing command ['/anaconda3/bin/python', '-u', 'model.py', '--dataset_path', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb', '--dataset', 'ind.cora-unmodified-0.48p__0.2p', '--run_id=--hid_units 64 --epochs 500@ce7091201460a2a7d3384b919b781b63_9caf91da0e68a2cb7a0d7b4c648929c4_20c7d1b09c28a20e6fa374cc6c8544e8_8b2d686c68a5e9f7ff474419f73f3a6d_0c634e96f738e8800231132e2848e303_248ec654a2e8d5f30df7ac8ef12d3cf5_a4007892967f684b0be93efd41b1d2f3_83150345b828a1192e923bc2e926b66b', '--use_signac', '--signac_root', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments', '--val_size', '1019', '--hid_units', '64', '--epochs', '500']
===============
[07-03 04:53:12     INFO] Traceback (most recent call last):
[07-03 04:53:12     INFO] File "model.py", line 10, in <module>
[07-03 04:53:12     INFO] import dataset
[07-03 04:53:12     INFO] ModuleNotFoundError: No module named 'dataset'
[07-03 04:53:13    ERROR] Check log at /H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments/terminal_output.log

I don't know whether it is the right way to directly run the baseline models, could you help me solve this problem? Thx

opened by JhuoW 2

How to get and use the node features of the syn-cora dataset?

Recently I had the pleasure of reading your paper "Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs" published in NIPS and was impressed with the algorithm and experimental results. I am following the instructions on your github to try to reproduce your experiment using the H2GCN algorithm alone (instead of signac). I downloaded the synthetic dataset using the "get-syn-cora.sh" script file you provided, but during the experiment I noticed that the downloaded ".allx" file does not seem to contain the node features. (The data in ".allx" are some sparse, binary-valued matrices, although the format is ture.) Therefore, I would like to check with you: Does the downloaded dataset named "workspace" already include node features? If yes, how do I use it? Is it possible to call it directly using the same interface as Kipf? If not, how should I regenerate the features to be able to reproduce the results reported in Figure2(a)? Could you please provide it in the same as Kipf et.al, I believe it would be very profitable for the community and to promote your work. We sincerely look forward to hearing from you!

opened by LirongWu 2
about downloading dataset

thanks for your impressive work! When I download the dataset, it seems that I can't connect to https://umich.app.box.com. Would you like to give me some suggestion about it ? thx! `--2021-01-11 19:04:34-- (try: 4) https://umich.app.box.com/public/static/oerjpreqd1u1cn481mk1m788fd4ahcvv.gz

Connecting to umich.app.box.com (umich.app.box.com)|2001::42dc:932f|:443... failed: Connection timed out. Retrying.`

opened by ShunliRen 2
About heterophilic datasets

Firstly, thanks for your impressive work and the source code. I downloaded the datasets based on your script. For heterophilic datasets, I found there are two files in data_source folder (one for graph edges and one for node&feature&label). I would like to know are these two files the same as here (https://github.com/graphdml-uiuc-jlu/geom-gcn/tree/master/new_data)? Thanks!

opened by liu-jc 2
About downloading syn-cora

Hi, I try to access to https://umich.app.box.com/public/static/oerjpreqd1u1cn481mk1m788fd4ahcvv.gz. But it says 'The shared file or folder has been removed.' How to achieve the syn-cora dataset?

opened by GNN-zl 2

Owner

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan

Code repository for work by the GEMS Lab: https://gemslab.github.io/research/

GitHub

An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

알고리즘 스터디 ?? 부스트캠프 웹모바일 6기 iOS 10조의 알고리즘 스터디 입니다. 개인적인 사정 등으로 S034, S055만 참가하였습니다. 스터디 목적 상진: 코테 합격 + 부캠끝나고 아침에 일어나기 위해 필요한 사이클 기완: 꾸준하게 자리에 앉아 공부하기 +

2 Jan 11, 2022

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

4 Aug 2, 2022

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Self-supervised Graph-level Representation Learning with Local and Global Structure Introduction This project is an implementation of ``Self-supervise

50 Dec 9, 2022

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

1 Oct 24, 2021

Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

QC-DGM This is the official PyTorch implementation and models for our CVPR 2021 paper: Deep Graph Matching under Quadratic Constraint. It also contain

55 Nov 14, 2022

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" ?? ?? . ?? Re

87 Dec 28, 2022

The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

SHGNN: Structure-Aware Heterogeneous Graph Neural Network The source code and dataset of the paper: SHGNN: Structure-Aware Heterogeneous Graph Neural

7 Nov 13, 2022

RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Iterative refinement graph neural network for antibody sequence-structure co-des

83 Dec 31, 2022

Learning to trade under the reinforcement learning framework

Trading Using Q-Learning In this project, I will present an adaptive learning model to trade a single stock under the reinforcement learning framework

470 Nov 28, 2022

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

64 Dec 12, 2022

A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

6 Mar 17, 2022

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Deep GNN, Shallow Sampling Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, R

117 Dec 20, 2022

Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

8 Oct 15, 2021

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

12 Oct 28, 2022

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

1 Nov 18, 2021

Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Related tags

Overview

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs

Requirements

Basic Requirements

For H2GCN

For baselines

Usage

Download Datasets

Replicate Experiments with signac

Standalone H2GCN Package

Contact

Citation

Comments

Owner

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan

An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Learning to trade under the reinforcement learning framework

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

A general framework for deep learning experiments under PyTorch based on pytorch-lightning

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Geometric Vector Perceptron --- a rotation-equivariant GNN for learning from biomolecular structure

For `H2GCN`

Replicate Experiments with `signac`