Graph Transformer Architecture. Source code for

Overview

Graph Transformer Architecture

Source code for the paper "A Generalization of Transformer Networks to Graphs" by Vijay Prakash Dwivedi and Xavier Bresson, at AAAI'21 Workshop on Deep Learning on Graphs: Methods and Applications (DLG-AAAI'21).

We propose a generalization of transformer neural network architecture for arbitrary graphs: Graph Transformer.
Compared to the Standard Transformer, the highlights of the presented architecture are:

  • The attention mechanism is a function of neighborhood connectivity for each node in the graph.
  • The position encoding is represented by Laplacian eigenvectors, which naturally generalize the sinusoidal positional encodings often used in NLP.
  • The layer normalization is replaced by a batch normalization layer.
  • The architecture is extended to have edge representation, which can be critical to tasks with rich information on the edges, or pairwise interactions (such as bond types in molecules, or relationship type in KGs. etc).

Graph Transformer Architecture
Figure: Block Diagram of Graph Transformer Architecture

1. Repo installation

This project is based on the benchmarking-gnns repository.

Follow these instructions to install the benchmark and setup the environment.


2. Download datasets

Proceed as follows to download the datasets used to evaluate Graph Transformer.


3. Reproducibility

Use this page to run the codes and reproduce the published results.


4. Reference

Paper on arXiv

@article{dwivedi2021generalization,
  title={A Generalization of Transformer Networks to Graphs},
  author={Dwivedi, Vijay Prakash and Bresson, Xavier},
  journal={AAAI Workshop on Deep Learning on Graphs: Methods and Applications},
  year={2021}
}




Comments
  • About Equations 11~12

    About Equations 11~12

    Hi,

    Great work!

    I want to confirm whether my understanding of equations 11~12 is correct.

    I understand equation 12 in this way: (Q h_i * K h_j / sqrt(d_k)) is a scalar, and (E e_ij) is a d_k-dim vector. Then a scalar multiplying a vector gives a d_k-dim vector. In equation 11, this d_k-dim vector is transformed to a scalar by computing w_1+w_2+...+w_dk. Is it correct?

    opened by ZikangZhou 4
  • Eval sign flipping

    Eval sign flipping

    Hi Vijay,

    Thanks for your repo!

    Question: I see your doing sign flipping of eigen pos_enc during training, but it seems that you are not doing so during eval time. I understand that we want to make deterministic predictions so we don't have random flipping when evaluating it. Do you have further comments or justification for this?

    Best Kezhi

    opened by devnkong 3
  • Attention Matrix

    Attention Matrix

    Hi ! Congratulations for your paper and thank you for making the implementation publicly available as well.

    Quick question on this function :

        def func(edges):
            return {out_field: (edges.src[src_field] * edges.dst[dst_field])}
        return func
    

    Why do you do a multiplication of K and Q and not a dot product? The dimensions of the scores are [num_edges, num_heads, hidden_dim/num_heads]. But I expect a [num_edges,num_edges] matrix .

    You can also reach me here : [email protected] Hope to hear from you soon , Pietro Bonazzi

    opened by pbonazzi 3
  • Graph Classification

    Graph Classification

    Hello there, First of all, thank you for providing such an amazing work. I'd like to know how can I leverage graphtransformer on Graph Classification task with textual data, for instance, I first extract nodes and edge info from the text data, given node features and edge information (only one type of edge in my case), the model generate binary targets based on those given features.

    Kind Regards Michael

    opened by MichaelFu1998-create 2
  • pos_enc_dim value

    pos_enc_dim value

    For graphs with large difference in the number of vertices, how to determine the value of pos_enc_dim? For example, the number of vertices of a graph is 7, and the number of vertices of another graph is 3? Do you take the pos_enc_dim is 8 because the number of vertices in your experimental data set is greater than 8?

    opened by aristo-panhu 2
  • Sparse graph and full graph

    Sparse graph and full graph

    Thanks for the innovative work! Could you please tell me how can we get a full graph?Did full graph mean the full attention map?Did sparse graph mean that we only retain the immediate neighbor nodes‘ value of full graph?

    opened by immortal13 2
  • Superpixel dataset

    Superpixel dataset

    Thanks for sharing codes for your interesting paper. I'm interested in applying this method to MNIST superpixel dataset (your benchmarking work) where the task is graph classification, graphs have both node features and edge features, and the number of nodes/edges are different between graphs. What modifications should I made to the current code?

    opened by jamzad 2
  • about attention

    about attention

    图片 图片 Hello, about the problem of calculating attention, the attention of node i and its adjacent nodes is calculated in the formula, but I find that the final calculation is the attention of all nodes, and it does not distinguish whether the nodes are connected. Is there a problem with my understanding?

    opened by wxs-newboy 2
  • Detail on softmax

    Detail on softmax

    Great work!

    I have a question concerning the implementation of softmax in the graph_transformer_edge_layer.py

    When you define the softmax, you use the following function:

    def exp(field):
        def func(edges):
            # clamp for softmax numerical stability
            return {field: torch.exp((edges.data[field].sum(-1, keepdim=True)).clamp(-5, 5))}
        return func
    

    Shouldn't the attention weights/scores be scalars? From what I see, each head has an 8-dimensional score vector which you then compute .sum() on. The graph_transformer_layer.py layer does not have this .sum() function.

    def scaled_exp(field, scale_constant):
        def func(edges):
            # clamp for softmax numerical stability
            return {field: torch.exp((edges.data[field] / scale_constant).clamp(-5, 5))}
    
        return func
    

    Would appreciate any clarification on this :)

    Best, Devin

    opened by DevinKreuzer 2
  • Technical question

    Technical question

    Hi, thanks for the great paper :)

    I was just curious as to what the 'z' variable is in line 59 of the graph_transformer_layer.py code? I cannot seem to find the equivalent in the paper. It seems you are normalizing the output heads by the sum of the attention weights?

    Would appreciate a little point :)

    Thanks, Devin

    opened by DevinKreuzer 2
  • is the installation instruction still valid ?

    is the installation instruction still valid ?

    I'm trying to setup the environment on mac, followed your instructions carefully however I got the below error when building the env by conda: ResolvePackageNotFound:

    • h5py=2.9.0
    • tensorboard=1.14.0
    • requests==2.22.0
    • ipython=7.7.0
    • ipykernel=5.1.2
    • notebook=6.0.0
    • pip=19.2.3
    • scikit-image=0.15.0
    • scipy=1.3.0
    • torchvision==0.7.0
    • pytorch=1.6.0
    • matplotlib=3.1.0
    • pillow==6.1
    • mkl=2019.4
    • dgl=0.6.1
    • scikit-learn=0.21.2
    • python=3.7.4
    opened by raminass 1
  • node update

    node update

    g.send_and_recv(eids, fn.src_mul_edge('V_h', 'score', 'V_h'), fn.sum('V_h', 'wV')) this only update target node; head_out = g.ndata['wV']/g.ndata['z'] so this only update target node;the source node not update?

    opened by zhouxiaozhang 0
  • Why did you divide this term?

    Why did you divide this term?

    Hi there,

    I was reading your code on graphtransformer, I'm kind of curious on the operation shown below. Why did you divide the wV score by the w(or so called 'score' term), I didn't see any terms shown in your equation 4 or equation 9 in the paper. Could you illustrated that? https://github.com/graphdeeplearning/graphtransformer/blob/c9cd49368eed4507f9ae92a137d90a7a9d7efc3a/layers/graph_transformer_edge_layer.py#L112

    Thanks

    opened by sperfu 1
  • Memory consumption

    Memory consumption

    Could you provide some additional information about the memory consumption using your Graph Transformer?

    You state, that sparse-attention favors both computation time and memory consumption, but do not provide actual measurements of the second property in your evaluation or do not state clearly, if and how your implementation is able to take advantage of it. Some peak memory measurements of your experiments as an addendum to your evaluation of the computation times (e.g. Table 1) could be beneficial to others, too. As in my case the quadratic growth of the memory consumption w.r.t. the sequence length prevents an efficient use of Transformers for some task, where connectivity information is given and can be simply modeled by masking out (-Inf) the attention scores in the attention matrix.

    Also some exemplary or artificial data could be interesting, e.g. (Mean) number of nodes n = {128, 1024, 2048, 4096}, (mean) number of edges per node e = {4, 8, 16, 64, 128}, to get an impression of the resource consumption of your Graph Transformer with Sparse Graph vs. NLP-Transformer (Full Graph with masking).

    (I hope, that I could run the experiments myself, but I suppose your evaluation pipeline is already running and data provided by the original author should be more precise and trustworthy to other researchers, too.)

    opened by GregorKobsik 1
Owner
NTU Graph Deep Learning Lab
We investigate fundamental techniques in Graph Deep Learning, a new framework that combines graph theory and deep neural networks.
NTU Graph Deep Learning Lab
Alex Pashevich 62 Dec 24, 2022
Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

SentiBERT Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020). https://arxiv.org/abs/20

Da Yin 66 Aug 13, 2022
Code for ViTAS_Vision Transformer Architecture Search

Vision Transformer Architecture Search This repository open source the code for ViTAS: Vision Transformer Architecture Search. ViTAS aims to search fo

null 46 Dec 17, 2022
The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

GSDN-F and GSDN-EF This repository provides a reference implementation of GSDN-F and GSDN-EF as described in the paper "Understanding Graph Neural Net

Guoji Fu 18 Nov 14, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 3, 2023
TransCD: Scene Change Detection via Transformer-based Architecture

TransCD: Scene Change Detection via Transformer-based Architecture

wangzhixue 29 Dec 11, 2022
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

BUPT GAMMA Lab 519 Jan 2, 2023
Official implementation of Rethinking Graph Neural Architecture Search from Message-passing (CVPR2021)

Rethinking Graph Neural Architecture Search from Message-passing Intro The GNAS can automatically learn better architecture with the optimal depth of

Shaofei Cai 48 Sep 30, 2022
nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

jsguo 610 Dec 28, 2022
Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

MKGFormer Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion" Model Architecture Illu

ZJUNLP 68 Dec 28, 2022
Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

Bayesian Methods Research Group 56 Nov 15, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

null 67 Dec 5, 2022
Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

CopeNLU 36 Dec 5, 2022
source code of “Visual Saliency Transformer” (ICCV2021)

Visual Saliency Transformer (VST) source code for our ICCV 2021 paper “Visual Saliency Transformer” by Nian Liu, Ni Zhang, Kaiyuan Wan, Junwei Han, an

null 89 Dec 21, 2022
Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

AniFormer This is the PyTorch implementation of our BMVC 2021 paper AniFormer: Data-driven 3D Animation with Transformer. Haoyu Chen, Hao Tang, Nicu S

null 7 Oct 22, 2021
Official source code of Fast Point Transformer, CVPR 2022

Fast Point Transformer Project Page | Paper This repository contains the official source code and data for our paper: Fast Point Transformer Chunghyun

null 182 Dec 23, 2022