Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Overview

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs

Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. 2020. Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs. Advances in Neural Information Processing Systems 33 (2020).

[Paper] [Poster] [Slides]

Requirements

Basic Requirements

  • Python >= 3.7 (tested on 3.8)

  • signac: this package utilizes signac to manage experiment data and jobs. signac can be installed with the following command:

    pip install signac==1.1 signac-flow==0.7.1 signac-dashboard

    Note that the latest version of signac may cause incompatibility.

  • numpy (tested on 1.18.5)

  • scipy (tested on 1.5.0)

  • networkx >= 2.4 (tested on 2.4)

  • scikit-learn (tested on 0.23.2)

For H2GCN

  • TensorFlow >= 2.0 (tested on 2.2)

Note that it is possible to use H2GCN without signac and scikit-learn on your own data and experimental framework.

For baselines

We also include the code for the baseline methods in the repository. These code are mostly the same as the reference implementations provided by the authors, with our modifications to add JK-connections, interoperability with our experimental pipeline, etc. For the requirements to run these baselines, please refer to the instructions provided by the original authors of the corresponding code, which could be found in each folder under /baselines.

As a general note, TensorFlow 1.15 can be used for all code requiring TensorFlow 1.x; for PyTorch, it is usually fine to use PyTorch 1.6; all code should be able to run under Python >= 3.7. In addition, the basic requirements must also be met.

Usage

Download Datasets

The datasets can be downloaded using the bash scripts provided in /experiments/h2gcn/scripts, which also prepare the datasets for use in our experimental framework based on signac.

We make use of signac to index and manage the datasets: the datasets and experiments are stored in hierarchically organized signac jobs, with the 1st level storing different graphs, 2nd level storing different sets of features, and 3rd level storing different training-validation-test splits. Each level contains its own state points and job documents to differentiate with other jobs.

Use signac schema to list all available properties in graph state points; use signac find to filter graphs using properties in the state points:

cd experiments/h2gcn/

# List available properties in graph state points
signac schema

# Find graphs in syn-products with homophily level h=0.1
signac find numNode 10000 h 0.1

# Find real benchmark "Cora"
signac find benchmark true datasetName\.\$regex "cora"

/experiments/h2gcn/utils/signac_tools.py provides helpful functions to iterate through the data space in Python; more usages of signac can be found in these documents.

Replicate Experiments with signac

  • To replicate our experiments of each model on specific datasets, use Python scripts in /experiments/h2gcn, and the corresponding JSON config files in /experiments/h2gcn/configs. For example, to run H2GCN on our synthetic benchmarks syn-cora:

    cd experiments/h2gcn/
    python run_hgcn_experiments.py -c configs/syn-cora/h2gcn.json [-i] run [-p PARALLEL_NUM]
    • Files and results generated in experiments are also stored with signac on top of the hierarchical order introduced above: the 4th level separates different models, and the 5th level stores files and results generated in different runs with different parameters of the same model.

    • By default, stdout and stderr of each model are stored in terminal_output.log in the 4th level; use -i if you want to see them through your terminal.

    • Use -p if you want to run experiments in parallel on multiple graphs (1st level).

    • Baseline models can be run through the following scripts:

      • GCN, GCN-Cheby, GCN+JK and GCN-Cheby+JK: run_gcn_experiments.py
      • GraphSAGE, GraphSAGE+JK: run_graphsage_experiments.py
      • MixHop: run_mixhop_experiments.py
      • GAT: run_gat_experiments.py
      • MLP: run_hgcn_experiments.py
  • To summarize experiment results of each model on specific datasets to a CSV file, use Python script /experiments/h2gcn/run_experiments_summarization.py with the corresponding model name and config file. For example, to summarize H2GCN results on our synthetic benchmark syn-cora:

    cd experiments/h2gcn/
    python run_experiments_summarization.py h2gcn -f configs/syn-cora/h2gcn.json
  • To list all paths of the 3rd level datasets splits used in a experiment (in planetoid format) without running experiments, use the following command:

    cd experiments/h2gcn/
    python run_hgcn_experiments.py -c configs/syn-cora/h2gcn.json --check_paths run

Standalone H2GCN Package

Our implementation of H2GCN is stored in the h2gcn folder, which can be used as a standalone package on your own data and experimental framework.

Example usages:

  • H2GCN-2

    cd h2gcn
    python run_experiments.py H2GCN planetoid \
      --dataset ind.citeseer \
      --dataset_path ../baselines/gcn/gcn/data/
  • H2GCN-1

    cd h2gcn
    python run_experiments.py H2GCN planetoid \
      --network_setup M64-R-T1-G-V-C1-D0.5-MO \
      --dataset ind.citeseer \
      --dataset_path ../baselines/gcn/gcn/data/
  • Use --help for more advanced usages:

    python run_experiments.py H2GCN planetoid --help

We only support datasets stored in planetoid format. You could also add support to different data formats and models beyond H2GCN by adding your own modules to /h2gcn/datasets and /h2gcn/models, respectively; check out ou code for more details.

Contact

Please contact Jiong Zhu ([email protected]) in case you have any questions.

Citation

Please cite our paper if you make use of this code in your own work:

@article{zhu2020beyond,
  title={Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs},
  author={Zhu, Jiong and Yan, Yujun and Zhao, Lingxiao and Heimann, Mark and Akoglu, Leman and Koutra, Danai},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}
Comments
  • run H2GCN alone on graph with heterophily

    run H2GCN alone on graph with heterophily

    Sorry to bother you with another question: How can I run H2GCN alone on graph with heterophily? After downloading them, how to load them without signac?

    opened by ShunliRen 9
  • Can not run H2GCN on Actor, WebKB and Wikipedia Dataset

    Can not run H2GCN on Actor, WebKB and Wikipedia Dataset

    Hello,

    According to your repository you have provided the citation (cora, citeseer, pubmed) datasets. I want to reproduce the results of H2GCN in Actor, WebKB (Cornell, Texas, Wisconsin) and Wikipedia (Chameleon, Squirrel) datasets.

    But the citation datasets are transformed into .x, .y, .allx, .ally, .graph, .test.index in these format while other datasets mentioned above are not available in internet in these format.

    Can you please provide the utility code which you used to load these datasets to run H2GCN on them?

    opened by AmitRoy7781 3
  • ImportError: cannot import name 'FlowProject' from 'flow'

    ImportError: cannot import name 'FlowProject' from 'flow'

    While trying to recreate experiments from experiments/h2gcn/ I am encountering this issue ImportError: cannot import name 'FlowProject' from 'flow' I cannot resolve it; can you help?

    opened by ScottHoang 2
  • How to run baseline model?

    How to run baseline model?

    Hi, I want to run baseline models with

    cd  experiments/h2gcn  
    python run_graphsage_experiments.py -c configs/real-geomgcn/graphsage.json run --debug
    

    I got the following error:

    [07-03 04:53:12 run_model@4c95f334af46cefe80977d9c96123445     INFO] Traceback (most recent call last):
    [07-03 04:53:12 run_model@4c95f334af46cefe80977d9c96123445     INFO] File "model.py", line 10, in <module>
    [07-03 04:53:12 run_model@4c95f334af46cefe80977d9c96123445     INFO] import dataset
    [07-03 04:53:12 run_model@4c95f334af46cefe80977d9c96123445     INFO] ModuleNotFoundError: No module named 'dataset'
    [07-03 04:53:13 run_model@4c95f334af46cefe80977d9c96123445    ERROR] Check log at /H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments/terminal_output.log
    

    And the terminal_output.log contains:

    [07-03 04:43:07     INFO] ===============
    >>>>Executing command ['/home/anaconda3/bin/python', '-u', 'model.py', '--dataset_path', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb', '--dataset', 'ind.cora-unmodified-0.48p__0.2p', '--run_id=H2GCN --network_setup M64-T1-G-V-C1-MO --adj_nhood 1 2 --l2_regularize_weight 1e-5@ce7091201460a2a7d3384b919b781b63_9caf91da0e68a2cb7a0d7b4c648929c4_20c7d1b09c28a20e6fa374cc6c8544e8_8b2d686c68a5e9f7ff474419f73f3a6d_0c634e96f738e8800231132e2848e303_248ec654a2e8d5f30df7ac8ef12d3cf5_a4007892967f684b0be93efd41b1d2f3_83150345b828a1192e923bc2e926b66b', '--use_signac', '--signac_root', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments', '--val_size', '1019', 'H2GCN', '--network_setup', 'M64-T1-G-V-C1-MO', '--adj_nhood', '1', '2', '--l2_regularize_weight', '1e-5']
    ===============
    [07-03 04:43:08     INFO] Traceback (most recent call last):
    [07-03 04:43:08     INFO] File "model.py", line 10, in <module>
    [07-03 04:43:08     INFO] import dataset
    [07-03 04:43:08     INFO] ModuleNotFoundError: No module named 'dataset'
    [07-03 04:43:08    ERROR] Check log at /H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments/terminal_output.log
    [07-03 04:45:57     INFO] ===============
    >>>>Executing command ['/home/anaconda3/bin/python', '-u', 'model.py', '--dataset_path', ’/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb', '--dataset', 'ind.cora-unmodified-0.48p__0.2p', '--run_id=H2GCN --network_setup M64-T1-G-V-C1-MO --adj_nhood 1 2 --l2_regularize_weight 1e-5@ce7091201460a2a7d3384b919b781b63_9caf91da0e68a2cb7a0d7b4c648929c4_20c7d1b09c28a20e6fa374cc6c8544e8_8b2d686c68a5e9f7ff474419f73f3a6d_0c634e96f738e8800231132e2848e303_248ec654a2e8d5f30df7ac8ef12d3cf5_a4007892967f684b0be93efd41b1d2f3_83150345b828a1192e923bc2e926b66b', '--use_signac', '--signac_root', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments', '--val_size', '1019', 'H2GCN', '--network_setup', 'M64-T1-G-V-C1-MO', '--adj_nhood', '1', '2', '--l2_regularize_weight', '1e-5']
    ===============
    [07-03 04:45:58     INFO] Traceback (most recent call last):
    [07-03 04:45:58     INFO] File "model.py", line 10, in <module>
    [07-03 04:45:58     INFO] import dataset
    [07-03 04:45:58     INFO] ModuleNotFoundError: No module named 'dataset'
    [07-03 04:45:58    ERROR] Check log at /H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments/terminal_output.log
    [07-03 04:53:11     INFO] ===============
    >>>>Executing command ['/anaconda3/bin/python', '-u', 'model.py', '--dataset_path', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb', '--dataset', 'ind.cora-unmodified-0.48p__0.2p', '--run_id=--hid_units 64 --epochs 500@ce7091201460a2a7d3384b919b781b63_9caf91da0e68a2cb7a0d7b4c648929c4_20c7d1b09c28a20e6fa374cc6c8544e8_8b2d686c68a5e9f7ff474419f73f3a6d_0c634e96f738e8800231132e2848e303_248ec654a2e8d5f30df7ac8ef12d3cf5_a4007892967f684b0be93efd41b1d2f3_83150345b828a1192e923bc2e926b66b', '--use_signac', '--signac_root', '/H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments', '--val_size', '1019', '--hid_units', '64', '--epochs', '500']
    ===============
    [07-03 04:53:12     INFO] Traceback (most recent call last):
    [07-03 04:53:12     INFO] File "model.py", line 10, in <module>
    [07-03 04:53:12     INFO] import dataset
    [07-03 04:53:12     INFO] ModuleNotFoundError: No module named 'dataset'
    [07-03 04:53:13    ERROR] Check log at /H2GCN/experiments/h2gcn/workspace/4c95f334af46cefe80977d9c96123445/features/e9cf54653de9231697763f2d1216eb5c/splits/eb8f45d8594346aa36c4773fc43f13fb/experiments/graphsage_experiments/terminal_output.log
    
    

    I don't know whether it is the right way to directly run the baseline models, could you help me solve this problem? Thx

    opened by JhuoW 2
  • How to get and use the node features of the syn-cora dataset?

    How to get and use the node features of the syn-cora dataset?

    Recently I had the pleasure of reading your paper "Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs" published in NIPS and was impressed with the algorithm and experimental results. I am following the instructions on your github to try to reproduce your experiment using the H2GCN algorithm alone (instead of signac). I downloaded the synthetic dataset using the "get-syn-cora.sh" script file you provided, but during the experiment I noticed that the downloaded ".allx" file does not seem to contain the node features. (The data in ".allx" are some sparse, binary-valued matrices, although the format is ture.) Therefore, I would like to check with you: Does the downloaded dataset named "workspace" already include node features? If yes, how do I use it? Is it possible to call it directly using the same interface as Kipf? If not, how should I regenerate the features to be able to reproduce the results reported in Figure2(a)? Could you please provide it in the same as Kipf et.al, I believe it would be very profitable for the community and to promote your work. We sincerely look forward to hearing from you!

    opened by LirongWu 2
  • about downloading dataset

    about downloading dataset

    thanks for your impressive work! When I download the dataset, it seems that I can't connect to https://umich.app.box.com. Would you like to give me some suggestion about it ? thx! `--2021-01-11 19:04:34-- (try: 4) https://umich.app.box.com/public/static/oerjpreqd1u1cn481mk1m788fd4ahcvv.gz

    Connecting to umich.app.box.com (umich.app.box.com)|2001::42dc:932f|:443... failed: Connection timed out. Retrying.`

    opened by ShunliRen 2
  • About heterophilic datasets

    About heterophilic datasets

    Firstly, thanks for your impressive work and the source code. I downloaded the datasets based on your script. For heterophilic datasets, I found there are two files in data_source folder (one for graph edges and one for node&feature&label). I would like to know are these two files the same as here (https://github.com/graphdml-uiuc-jlu/geom-gcn/tree/master/new_data)? Thanks!

    opened by liu-jc 2
  • About downloading syn-cora

    About downloading syn-cora

    Hi, I try to access to https://umich.app.box.com/public/static/oerjpreqd1u1cn481mk1m788fd4ahcvv.gz. But it says 'The shared file or folder has been removed.' How to achieve the syn-cora dataset?

    opened by GNN-zl 2
Owner
GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan
Code repository for work by the GEMS Lab: https://gemslab.github.io/research/
GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan
An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

알고리즘 스터디 ?? 부스트캠프 웹모바일 6기 iOS 10조의 알고리즘 스터디 입니다. 개인적인 사정 등으로 S034, S055만 참가하였습니다. 스터디 목적 상진: 코테 합격 + 부캠끝나고 아침에 일어나기 위해 필요한 사이클 기완: 꾸준하게 자리에 앉아 공부하기 +

null 2 Jan 11, 2022
This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

null 4 Aug 2, 2022
Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Self-supervised Graph-level Representation Learning with Local and Global Structure Introduction This project is an implementation of ``Self-supervise

MilaGraph 50 Dec 9, 2022
MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

LxzGordon 1 Oct 24, 2021
Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

QC-DGM This is the official PyTorch implementation and models for our CVPR 2021 paper: Deep Graph Matching under Quadratic Constraint. It also contain

Quankai Gao 55 Nov 14, 2022
The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" ?? ?? . ?? Re

Shuai Shen 87 Dec 28, 2022
The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

SHGNN: Structure-Aware Heterogeneous Graph Neural Network The source code and dataset of the paper: SHGNN: Structure-Aware Heterogeneous Graph Neural

Wentao Xu 7 Nov 13, 2022
RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Iterative refinement graph neural network for antibody sequence-structure co-des

Wengong Jin 83 Dec 31, 2022
Learning to trade under the reinforcement learning framework

Trading Using Q-Learning In this project, I will present an adaptive learning model to trade a single stock under the reinforcement learning framework

Uirá Caiado 470 Nov 28, 2022
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022
Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

JunSeok 8 Oct 15, 2021
This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

null 12 Oct 28, 2022
Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

zshicode 1 Nov 18, 2021
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

pgmpy 2.2k Jan 3, 2023
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

Microsoft 247 Dec 25, 2022
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code Transformer This is an official PyTorch implementation of the CodeTransformer model proposed in: D. Zügner, T. Kirschstein, M. Catasta, J. Leskov

Daniel Zügner 131 Dec 13, 2022
Geometric Vector Perceptron --- a rotation-equivariant GNN for learning from biomolecular structure

Geometric Vector Perceptron Code to accompany Learning from Protein Structure with Geometric Vector Perceptrons by B Jing, S Eismann, P Suriana, RJL T

Dror Lab 85 Dec 29, 2022