Attention over nodes in Graph Neural Networks using PyTorch (NeurIPS 2019)

Boris Knyazev

Last update: Jan 6, 2023

Related tags

Deep Learning graph pytorch attention attention-mechanism pooling graph-neural-networks neurips pytorch-geometric neurips-2019

Overview

Intro

This repository contains code to generate data and reproduce experiments from our NeurIPS 2019 paper:

Boris Knyazev, Graham W. Taylor, Mohamed R. Amer. Understanding Attention and Generalization in Graph Neural Networks.

See slides here.

An earlier short version of our paper was presented as a contributed talk at ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.

Update:

In the code for MNIST, the dist variable should have been squared to make it a Gaussian. All figures and results were generated without squaring it. I don't think it's very important in terms of results, but if you square it, sigma should be adjusted accordingly.

MNIST	TRIANGLES

For MNIST from top to bottom rows:

input test images with additive Gaussian noise with standard deviation in the range from 0 to 1.4 with step 0.2
attention coefficients (alpha) predicted by the unsupervised model
attention coefficients (alpha) predicted by the supervised model
attention coefficients (alpha) predicted by our weakly-supervised model

For TRIANGLES from top to bottom rows:

on the left: input test graph (with 4-100 nodes) with ground truth attention coefficients, on the right: graph obtained by ground truth node pooling
on the left: input test graph (with 4-100 nodes) with unsupervised attention coefficients, on the right: graph obtained by unsupervised node pooling
on the left: input test graph (with 4-100 nodes) with supervised attention coefficients, on the right: graph obtained by supervised node pooling
on the left: input test graph (with 4-100 nodes) with weakly-supervised attention coefficients, on the right: graph obtained by weakly-supervised node pooling

Note that during training, our MNIST models have not encountered noisy images and our TRIANGLES models have not encountered graphs larger than with N=25 nodes.

Examples using PyTorch Geometric

COLORS and TRIANGLES datasets are now also available in the TU format, so that you can use a general TU datareader. See PyTorch Geometric examples for COLORS and TRIANGLES.

Example of evaluating a pretrained model on MNIST

For more examples, see MNIST_eval_models and TRIANGLES_eval_models.

# Download model checkpoint or 'git clone' this repo
import urllib.request
# Let's use the model with supervised attention (other models can be found in the Table below)
model_name = 'checkpoint_mnist-75sp_139255_epoch30_seed0000111.pth.tar'
model_url = 'https://github.com/bknyaz/graph_attention_pool/raw/master/checkpoints/%s' % model_name
model_path = 'checkpoints/%s' % model_name
urllib.request.urlretrieve(model_url, model_path)

# Load the model
import torch
from chebygin import ChebyGIN

state = torch.load(model_path)
args = state['args']
model = ChebyGIN(in_features=5, out_features=10, filters=args.filters, K=args.filter_scale,
                 n_hidden=args.n_hidden, aggregation=args.aggregation, dropout=args.dropout,
                 readout=args.readout, pool=args.pool, pool_arch=args.pool_arch)
model.load_state_dict(state['state_dict'])
model = model.eval()

# Load image using standard PyTorch Dataset
from torchvision import datasets
data = datasets.MNIST('./data', train=False, download=True)
images = (data.test_data.numpy() / 255.)
import numpy as np
img = images[0].astype(np.float32)  # 28x28 MNIST image

# Extract superpixels and create node features
import scipy.ndimage
from skimage.segmentation import slic
from scipy.spatial.distance import cdist

# The number (n_segments) of superpixels returned by SLIC is usually smaller than requested, so we request more
superpixels = slic(img, n_segments=95, compactness=0.25, multichannel=False)
sp_indices = np.unique(superpixels)
n_sp = len(sp_indices)  # should be 74 with these parameters of slic

sp_intensity = np.zeros((n_sp, 1), np.float32)
sp_coord = np.zeros((n_sp, 2), np.float32)  # row, col
for seg in sp_indices:
    mask = superpixels == seg
    sp_intensity[seg] = np.mean(img[mask])
    sp_coord[seg] = np.array(scipy.ndimage.measurements.center_of_mass(mask))

# The model is invariant to the order of nodes in a graph
# We can shuffle nodes and obtain exactly the same results
ind = np.random.permutation(n_sp)
sp_coord = sp_coord[ind]
sp_intensity = sp_intensity[ind]

# Create edges between nodes in the form of adjacency matrix
sp_coord = sp_coord / images.shape[1]
dist = cdist(sp_coord, sp_coord)  # distance between all pairs of nodes
sigma = 0.1 * np.pi  # width of a Guassian
A = np.exp(- dist / sigma ** 2)  # transform distance to spatial closeness
A[np.diag_indices_from(A)] = 0  # remove self-loops
A = torch.from_numpy(A).float().unsqueeze(0)

# Prepare an input to the model and process it
N_nodes = sp_intensity.shape[0]
mask = torch.ones(1, N_nodes, dtype=torch.uint8)

# mean and std computed for superpixel features in the training set
mn = torch.tensor([0.11225057, 0.11225057, 0.11225057, 0.44206527, 0.43950436]).view(1, 1, -1)
sd = torch.tensor([0.2721889,  0.2721889,  0.2721889,  0.2987583,  0.30080357]).view(1, 1, -1)

node_features = (torch.from_numpy(np.pad(np.concatenate((sp_intensity, sp_coord), axis=1),
                                         ((0, 0), (2, 0)), 'edge')).unsqueeze(0) - mn) / sd    

y, other_outputs = model([node_features, A, mask, None, {'N_nodes': torch.zeros(1, 1) + N_nodes}])
alpha = other_outputs['alpha'][0].data

y is a vector with 10 unnormalized class scores. To get a predicted label, we can use torch.argmax(y).
alpha is a vector of attention coefficients alpha for each node.

Tasks & Datasets

We design two synthetic graph tasks, COLORS and TRIANGLES, in which we predict the number of green nodes and the number of triangles respectively.
We also experiment with the MNIST image classification dataset, which we preprocess by extracting superpixels - a more natural way to feed images to a graph. We denote this dataset as MNIST-75sp.
We validate our weakly-supervised approach on three common graph classification benchmarks: COLLAB, PROTEINS and D&D.

For COLORS, TRIANGLES and MNIST we know ground truth attention for nodes, which allows us to study graph neural networks with attention in depth.

Data generation

To generate all data using a single command: ./scripts/prepare_data.sh.

All generated/downloaded ata will be stored in the local ./data directory. It can take about 1 hour to prepare all data (see my log) and all data take about 2 GB.

Alternatively, you can generate data for each task as described below.

In case of any issues with running these scripts, data can be downloaded from here.

COLORS

To generate training, validation and test data for our Colors dataset with different dimensionalities:

for dim in 3 8 16 32; do python generate_data.py --dim $dim; done

MNIST-75sp

To generate training and test data for our MNIST-75sp dataset using 4 CPU threads:

for split in train test; do python extract_superpixels.py -s $split -t 4; done

Data visualization

Once datasets are generated or downloaded, you can use the following IPython notebooks to load and visualize data:

COLORS and TRIANGLES, MNIST and COLLAB, PROTEINS and D&D.

Pretrained ChebyGIN models

Generalization results on the test sets for three tasks. Other results are available in the paper.

Click on the result to download a trained model in the PyTorch format.

Model	COLORS-Test-LargeC	TRIANGLES-Test-Large	MNIST-75sp-Test-Noisy
Script to train models	colors.sh	triangles.sh	mnist_75sp.sh
Global pooling	15 ± 7	30 ± 1	80 ± 12
Unsupervised attention	11 ± 6	26 ± 2	80 ± 23
Supervised attention	75 ± 17	48 ± 1	92.3 ± 0.4
Weakly-supervised attention	73 ± 14	30 ± 1	88.8 ± 4

The scripts to train the models must be run from the main directory, e.g.: ./scripts/mnist_75sp.sh

Examples of evaluating our trained models can be found in notebooks: MNIST_eval_models and TRIANGLES_eval_models.

Other examples of training models

To tune hyperparameters on the validation set for COLORS, TRIANGLES and MNIST, use the --validation flag.

For COLLAB, PROTEINS and D&D tuning of hyperparameters is included in the training script. Use the --ax flag.

Example of running 10 weakly-supervised experiments on PROTEINS with cross-validation of hyperparameters including initialization parameters (distribution and scale) of the attention model (the --tune_init flag):

for i in $(seq 1 1 10); do dataseed=$(( ( RANDOM % 10000 ) + 1 )); for j in $(seq 1 1 10); do seed=$(( ( RANDOM % 10000 ) + 1 )); python main.py --seed $seed -D TU --n_nodes 25 --epochs 50 --lr_decay_step 25,35,45 --test_batch_size 100 -f 64,64,64 -K 1 --readout max --dropout 0.1 --pool attn_sup_threshold_skip_skip_0 --pool_arch fc_prev --results None --data_dir ./data/PROTEINS --seed_data $dataseed --cv --cv_folds 5 --cv_threads 5 --ax --ax_trials 30 --scale None --tune_init | tee logs/proteins_wsup_"$dataseed"_"$seed".log; done; done

No initialization tuning on COLLAB:

for i in $(seq 1 1 10); do dataseed=$(( ( RANDOM % 10000 ) + 1 )); for j in $(seq 1 1 10); do seed=$(( ( RANDOM % 10000 ) + 1 )); python main.py --seed $seed -D TU --n_nodes 35 --epochs 50 --lr_decay_step 25,35,45 --test_batch_size 32 -f 64,64,64 -K 3 --readout max --dropout 0.1 --pool attn_sup_threshold_skip_skip_skip_0 --pool_arch fc_prev --results None --data_dir ./data/COLLAB --seed_data $dataseed --cv --cv_folds 5 --cv_threads 5 --ax --ax_trials 30 --scale None | tee logs/collab_wsup_"$dataseed"_"$seed".log; done; done

Note that results can be better if using --pool_arch gnn_prev, but we didn't focus on that.

Requirements

Python packages required (can be installed via pip or conda):

python >= 3.6.1
PyTorch >= 0.4.1
Ax for hyper-parameter tuning on COLLAB, PROTEINS and D&D
networkx
OpenCV
SciPy
scikit-image
scikit-learn

Reference

Please cite our paper if you use our data or code:

@inproceedings{knyazev2019understanding,
  title={Understanding attention and generalization in graph neural networks},
  author={Knyazev, Boris and Taylor, Graham W and Amer, Mohamed},
  booktitle={Advances in Neural Information Processing Systems},
  pages={4202--4212},
  year={2019},
  pdf={http://arxiv.org/abs/1905.02850}
}

Comments

How to prepare for my own data?

Hi, I think this is a valuable work. I want to know how to create my own dataset as the format of mnist_75sp_train. For example, I write a Pytorch DataLoader as follows: data_loader=DataLoader(my_dataset,batch_size=batch_size,shuffle=True) It returns batch_images (shape: batch_size*H*W) and batch_labels (shape: batch_size*1) in each batch. How should I do to construct my own dataset and graph for the train? Can you provide a code to help to do this? Thank you very much!

opened by hzwfl2 5
Are you using sparse graphs for MNIST?

Hi @bknyaz, thanks for the well-documented codebase and new graph datasets!

I wanted to confirm: are the GNN models for the MNIST dataset operating on the fully-connected graph or a k-Nearest Neighbor graph (as in the MoNet and ChebNet papers, which use k = 8)?

To me, it seems the code is using dense graphs for now.

opened by chaitjo 2
Please add a license to this repo

Thank you for sharing this repo with us!

Could you please add an explicit LICENSE file to the repo so that it's clear under what terms the content is provided, and under what terms user contributions are licensed?

GitHub docs on licensing

However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work. If you're creating an open source project, we strongly encourage you to include an open source license. The Open Source Guide provides additional guidance on choosing the correct license for your project.

Thanks!

opened by hust-nj 1
Threshold-based pooling

Hello!

Is the threshold-based pooling method in your paper pretty much the same as using the "min_score" argument in pytorch geometric's TopKPooling?

opened by dchang56 1
Error generating mnist75

I am trying to generate the mnist75 dataset by running: ./scripts/prepare_data.sh and I am getting the following stacktrace: Fr Dez 16 17:57:00 CET 2022 start time: 2022-12-16 17:57:01.936994 dataset mnist data_dir ./data out_dir ./data split train threads 0 n_sp 75 compactness 0.25 seed 111 /home/mada/anaconda3/lib/python3.9/site-packages/skimage/_shared/utils.py:338: FutureWarning: multichannel is a deprecated argument name for slic. It will be removed in version 1.0. Please use channel_axis instead. warnings.warn(self.warning_msg.format( Traceback (most recent call last): File "graph_attention_pool/extract_superpixels.py", line 128, in sp_data.append(process_image((images[i], i, n_images, args, True, True))) File "graph_attention_pool/extract_superpixels.py", line 55, in process_image assert n_sp_extracted == np.max(superpixels) + 1, ('superpixel indices', np.unique(superpixels)) # make sure superpixel indices are numbers from 0 to n-1 AssertionError: ('superpixel indices', array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71]))

Do you know what the issue might be?

Thank you in advance!

opened by ciortanmadalina 0

Attention over nodes in Graph Neural Networks using PyTorch (NeurIPS 2019)

Related tags

Overview

Intro

Examples using PyTorch Geometric

Example of evaluating a pretrained model on MNIST

Tasks & Datasets

Data generation

COLORS

MNIST-75sp

Data visualization

Pretrained ChebyGIN models

Other examples of training models

Requirements

Reference

Comments

How to prepare for my own data?

Are you using sparse graphs for MNIST?

Please add a license to this repo

Threshold-based pooling

Error generating mnist75

Owner

Boris Knyazev

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

A PyTorch implementation of "Graph Wavelet Neural Network" (ICLR 2019)

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Defending graph neural networks against adversarial attacks (NeurIPS 2020)

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Addon and nodes for working with structural biology and molecular data in Blender.

《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"