On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

Overview

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

We provide the code (in PyTorch) and datasets for our paper "On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks" (SOLT-GNN for short), which is published in WWW-2022.

1. Descriptions

The repository is organised as follows:

  • dataset/: the original data and sampled subgraphs of the five benchmark datasets.
  • main.py: the main entry of tail graph classificaiton for SOLT-GIN.
  • gin.py: base GIN model.
  • PatternMemory.py: the module of pattern memory.
  • utils.py: contains tool functions for loading the data and data split.
  • subgraph_sample.py: contains codes for subgraph sampling.

2. Requirements

  • Python-3.8.5
  • Pytorch-1.8.1
  • Networkx-2.4
  • numpy-1.18.1

3. Running experiments

Experimental environment

Our experimental environment is Ubuntu 20.04.1 LTS (GNU/Linux 5.8.0-55-generic x86_64), and we train our model using NVIDIA GeForce RTX 1080 GPU with CUDA 11.0.

How to run

(1) First run subgraph_sample.py to complete the step of subgraph sampling before running the main.py. Note that, the sampled subgraph data may occupy some storage space.

  • python subgraph_sample.py

(2) Tail graph classification:

  • python main.py --dataset PTC --K 72 --alpha 0.3 --mu1 1.5 --mu2 1.5
  • python main.py --dataset PROTEINS --K 251 --alpha 0.15 --mu1 2 --mu2 2
  • python main.py --dataset DD --K 228 --alpha 0.1 --mu1 0.5 --mu2 0.5
  • python main.py --dataset FRANK --K 922 --alpha 0.1 --mu1 2 --mu2 0
  • python main.py --dataset IMDBBINARY --K 205 --alpha 0.15 --mu1 1 --mu2 1

Note

  • We repeat the experiments for five times and average the results for report (with standard deviation). Note that, for the five runs, we employ seeds {0, 1, 2, 3, 4} for parameters initialization, respectively.
  • The change of experimental environment (including the Requirements) may result in performance fluctuation for both the baselines and our SOLT-GNN. To reproduce the results in the paper, please set the experimental environment as illustrated above as much as possible. The utilized parameter settings are illustrated in the python commands. Note that, for the possible case of SOLT-GNN performing a bit worse which originates from environment change, the readers can further tune the parameters, including $\mu_1$, $\mu_2$, $\alpha$ and $d_m$. In particular, for these four hyper-parameters, we recommend the authors to tune them in {0.1, 0.5, 1, 1.5, 2}, {0.1, 0.5, 1, 1.5, 2}, {0.05, 0.1, 0.15, 0.2, 0.25, 0.3}, {16, 32, 64, 128}, respectively. As the performance of SOLT-GIN highly relates to GIN, so the tuning of hyper-parameters for GIN is encouraged. When tuning the hyper-parameters for SOLT-GNN, please first fix the configuration of GIN for efficiency.
  • To run the model on your own datasets, please refer to the following part (4. Input Data Format) for the dataset format.
  • The implementation of SOLT-GNN is based on the official implementation of GIN (https://github.com/weihua916/powerful-gnns).
  • To tune the other hyper-parameters, please refer to main.py for more details.
    • In particular, for the number of head graphs (marked as K in the paper) in each dataset, which decides the division of the heads/tails, the readers can tune K to explore the effect of different head/tail divisions.
    • Parameters $n_n$ and $n_g$ are the number of triplets for node- and subgraph-levels we used in the training, respectively. Performance improvement might be achieved by appropriately increasing the training triplets.

4. Input Data Format

In order to run SOLT-GNN on your own datasets, here we provide the input data format for SOLT-GNN as follows.

Each dataset XXX only contains one file, named as XXX.txt. Note that, in each dataset, we have a number of graphs. In particular, for each XXX.txt,

  • The first line only has one column, which is the number of graphs (marked as N) contained in this dataset; and the following part of this XXX.txt file is the data of each graph, including a total of N graphs.
  • In the data of each graph, the first line has two columns, which denote the number of nodes (marked as n) in this graph and the label of this graph, respectively. Following this line, there are n lines, with the i-th line corresponding to the information of node i in this graph (index i starts from 0). In each of these n lines (n nodes), the first column is the node label, the second column is the number of its neighbors (marked as m), and the following m columns correspond to the indeces (ids) of its neighbors.
    • Therefore, each graph has n+1 lines.

5. Cite

@inproceedings{liu2022onsize,
  title={On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks},
  author={Liu, Zemin and Mao, Qiheng and Liu, Chenghao and Fang, Yuan and Sun, Jianling},
  booktitle={Proceedings of the ACM Web Conference 2022},
  year={2022}
}

6. Contact

If you have any questions on the code and data, please contact Qiheng Mao ([email protected]).

You might also like...
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement

A Simple Long-Tailed Rocognition Baseline via Vision-Language Model
A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.
This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform

OpenLT: An open-source project for long-tail classification

OpenLT: An open-source project for long-tail classification Supported Methods for Long-tailed Recognition: Cross-Entropy Loss Focal Loss (ICCV'17) Cla

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

Complex-Valued Neural Networks (CVNN) Done by @NEGU93 - J. Agustin Barrachina Using this library, the only difference with a Tensorflow code is that y

Oriented Response Networks, in CVPR 2017
Oriented Response Networks, in CVPR 2017

Oriented Response Networks [Home] [Project] [Paper] [Supp] [Poster] Torch Implementation The torch branch contains: the official torch implementation

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

Owner
Zemin Liu
My email address : liuzemin [AT] zju [DOT] edu [DOT] cn, liu [DOT] zemin [AT] hotmail [DOT] com
Zemin Liu
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

null 16 Dec 14, 2022
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022
🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

YOLOv5-Lite:lighter, faster and easier to deploy Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, a

pogg 1.5k Jan 5, 2023
A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

sne4onnx A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or

Katsuya Hyodo 10 Aug 30, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

Jia Research Lab 116 Dec 20, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Jia Research Lab 19 Apr 28, 2021
Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Adversarial Long-Tail This repository contains the PyTorch implementation of the paper: Adversarial Robustness under Long-Tailed Distribution, CVPR 20

Tong WU 89 Dec 15, 2022
Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

Zhongqi Miao 761 Dec 26, 2022
Awesome Long-Tailed Learning

Awesome Long-Tailed Learning This repo pays specially attention to the long-tailed distribution, where labels follow a long-tailed or power-law distri

Stomach_ache 284 Jan 6, 2023
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

DV Lab 116 Dec 20, 2022