PyGCL: Graph Contrastive Learning Library for PyTorch

GCL: Graph Contrastive Learning Library for PyTorch

Last update: Jan 8, 2023

Related tags

Overview

PyGCL: Graph Contrastive Learning for PyTorch

PyGCL is an open-source library for graph contrastive learning (GCL), which features modularized GCL components from published papers, standardized evaluation, and experiment management.

Prerequisites

PyGCL needs the following packages to be installed beforehand:

Python 3.8+
PyTorch 1.7+
PyTorch-Geometric 1.7
DGL 0.5+
Scikit-learn 0.24+

Getting Started

Take a look at various examples located at the root directory. For example, try the following command to train a simple GCN for node classification on the WikiCS dataset using the local-local contrasting mode:

python train_node_l2l.py --dataset WikiCS --param_path params/GRACE/[email protected] --base_model GCNConv

For detailed parameter settings, please refer to [email protected]. These examples are mainly for reproducing experiments in our benchmarking study. You can find more details regarding general practices of graph contrastive learning in the paper.

Usage

Package Overview

Our PyGCL implements four main components of graph contrastive learning algorithms:

graph augmentation: transforms input graphs into congruent graph views.
contrasting modes: specifies positive and negative pairs.
contrastive objectives: computes the likelihood score for positive and negative pairs.
negative mining strategies: improves the negative sample set by considering the relative similarity (the hardness) of negative sample.

We also implement utilities for loading datasets, training models, and running experiments.

Building Your Own GCL Algorithms

Besides try the above examples for node and graph classification tasks, you can also build your own graph contrastive learning algorithms straightforwardly.

Graph Augmentation

In GCL.augmentors, PyGCL provides the Augmentor base class, which offers a universal interface for graph augmentation functions. Specifically, PyGCL implements the following augmentation functions:

Augmentation	Class name
Edge Adding (EA)	`EdgeAdding`
Edge Removing (ER)	`EdgeRemoving`
Feature Masking (FM)	`FeatureMasking`
Feature Dropout (FD)	`FeatureDropout`
Personalized PageRank (PPR)	`PPRDiffusion`
Markov Diffusion Kernel (MDK)	`MarkovDiffusion`
Node Dropping (ND)	`NodeDropping`
Subgraphs induced by Random Walks (RWS)	`RWSampling`
Ego-net Sampling (ES)	`Identity`

Call these augmentation functions by feeding with a graph of in a tuple form of node features, edge index, and edge features x, edge_index, edge_weightswill produce corresponding augmented graphs.

PyGCL also supports composing arbitrary number of augmentations together. To compose a list of augmentation instances augmentors, you only need to use the right shift operator >>:

aug = augmentors[0]
for a in augs[1:]:
    aug = aug >> a

You can also write your own augmentation functions by defining the augment function.

Contrasting Modes

PyGCL implements three contrasting modes: (a) local-local, (b) global-local, and (c) global-global modes. You can refer to the models folder for details. Note that the bootstrapping latent loss involves some special model design (asymmetric online/offline encoders and momentum weight updates) and thus we implement contrasting modes involving this contrastive objective in a separate BGRL model.

Contrastive Objectives

In GCL.losses, PyGCL implements the following contrastive objectives:

Contrastive objectives	Class name
InfoNCE loss	`InfoNCELoss`
Jensen-Shannon Divergence (JSD) loss	`JSDLoss`
Triplet Margin (TM) loss	`TripletLoss`
Bootstrapping Latent (BL) loss	`BootstrapLoss`
Barlow Twins (BT) loss	`BTLoss`
VICReg loss	`VICRegLoss`

All these objectives are for contrasting positive and negative pairs at the same scale (i.e. local-local and global-global modes). For global-local modes, we offer G2L variants except for Barlow Twins and VICReg losses. Moreover, for InfoNCE, JSD, and Triplet losses, we further provide G2LEN variants, primarily for node-level tasks, which involve explicit construction of negative samples. You can find their examples in the root folder.

Negative Mining Strategies

In GCL.losses, PyGCL further implements four negative mining strategies that are build upon the InfoNCE contrastive objective:

Hard negative mining strategies	Class name
Hard negative mixing	`HardMixingLoss`
Conditional negative sampling	`RingLoss`
Debiased contrastive objective	`InfoNCELoss(debiased_nt_xent_loss)`
Hardness-biased negative sampling	`InfoNCELoss(hardness_nt_xent_loss)`

Utilities

PyGCL provides various utilities for data loading, model training, and experiment execution.

In GCL.util you can use the following utilities:

split_dataset: splits the dataset into train/test/validation sets according to public or random splits. Currently, four split modes are supported: [rand, ogb, wikics, preload] .
seed_everything: manually sets the seed to numpy and PyTorch environments to ensure better reproducebility.
SimpleParam: provides a simple parameter configuration class to manage parameters from microsoft-nni, JSON, and YAML files.

We also implement two downstream classifiersLR_classification and SVM_classification in GCL.eval based on PyTorch and Scikit-learn respectively.

Moreover, based on PyTorch Geometric, we provide functions for loading common node and graph datasets. You can useload_node_dataset and load_graph_dataset in utils.py.

Comments

Installation problem for people in China

Installation Issue, While I already have installed dgl manually. but it looks like you have a requirements file that imposing to install the wrong version or something. note: I am trying to install via pip mirror because I am in China. I tried both USTC and Tshinghua mirror and both ends up with the same error.

DGL latest version is 0.6.1 as per the below screenshot, while Pygcl requires 0.7 ? I already installed the latest version as shown in error. pip install -i https://mirrors.aliyun.com/pypi/simple/ dgl==0.7a210527

What I am doing wrong here?

opened by mhadnanali 15
Can't import GCL after installation

My installation is complete but I can't import it, Even I verified its installation location, I can import other packages in the same folder but can not import this one.

I have posted the question on StackOverflow https://stackoverflow.com/questions/69527466/package-import-error-after-installation-with-pip-install

It has screenshots and a detailed description.

opened by mhadnanali 14
About the installation

I have installed dgl-cu11 0.7.2. When I installed pygcl, I was prompted with "error: no matching distribution found for DGL > = 0.7 (from pygcl)". Do I have to install DGL without CUDA?

opened by CocoGzh 8

关于A.EdgeAdding()

你好，我测试了GraphCL例子里面所有生成方法，全部ok。唯独这个aug1 = A.EdgeAdding(pe=0.1)好像有点问题。可以帮看一下吗？

import torch
import os.path as osp
import GCL.losses as L
import GCL.augmentors as A
import torch.nn.functional as F

from torch import nn
from tqdm import tqdm
from torch.optim import Adam
from GCL.eval import get_split, SVMEvaluator
from GCL.models import DualBranchContrast
from torch_geometric.nn import GINConv, global_add_pool
from torch_geometric.data import DataLoader
from torch_geometric.datasets import TUDataset


def make_gin_conv(input_dim, out_dim):
    return GINConv(nn.Sequential(nn.Linear(input_dim, out_dim), nn.ReLU(), nn.Linear(out_dim, out_dim)))


class GConv(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_layers):
        super(GConv, self).__init__()
        self.layers = nn.ModuleList()
        self.batch_norms = nn.ModuleList()

        for i in range(num_layers):
            if i == 0:
                self.layers.append(make_gin_conv(input_dim, hidden_dim))
            else:
                self.layers.append(make_gin_conv(hidden_dim, hidden_dim))
            self.batch_norms.append(nn.BatchNorm1d(hidden_dim))

        project_dim = hidden_dim * num_layers
        self.project = torch.nn.Sequential(
            nn.Linear(project_dim, project_dim),
            nn.ReLU(inplace=True),
            nn.Linear(project_dim, project_dim))

    def forward(self, x, edge_index, batch):
        z = x
        zs = []
        for conv, bn in zip(self.layers, self.batch_norms):
            z = conv(z, edge_index)
            z = F.relu(z)
            z = bn(z)
            zs.append(z)
        gs = [global_add_pool(z, batch) for z in zs]
        z, g = [torch.cat(x, dim=1) for x in [zs, gs]]
        return z, g


class Encoder(torch.nn.Module):
    def __init__(self, encoder, augmentor):
        super(Encoder, self).__init__()
        self.encoder = encoder
        self.augmentor = augmentor

    def forward(self, x, edge_index, batch):
        aug1, aug2 = self.augmentor
        x1, edge_index1, edge_weight1 = aug1(x, edge_index)
        x2, edge_index2, edge_weight2 = aug2(x, edge_index)
        z, g = self.encoder(x, edge_index, batch)
        z1, g1 = self.encoder(x1, edge_index1, batch)
        z2, g2 = self.encoder(x2, edge_index2, batch)
        return z, g, z1, z2, g1, g2


def train(encoder_model, contrast_model, dataloader, optimizer):
    encoder_model.train()
    epoch_loss = 0
    for data in dataloader:
        data = data.to('cuda')
        optimizer.zero_grad()

        if data.x is None:
            num_nodes = data.batch.size(0)
            data.x = torch.ones((num_nodes, 1), dtype=torch.float32, device=data.batch.device)

        _, _, _, _, g1, g2 = encoder_model(data.x, data.edge_index, data.batch)
        g1, g2 = [encoder_model.encoder.project(g) for g in [g1, g2]]
        loss = contrast_model(g1=g1, g2=g2, batch=data.batch)
        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()
    return epoch_loss


def test(encoder_model, dataloader):
    encoder_model.eval()
    x = []
    y = []
    for data in dataloader:
        data = data.to('cuda')
        if data.x is None:
            num_nodes = data.batch.size(0)
            data.x = torch.ones((num_nodes, 1), dtype=torch.float32, device=data.batch.device)
        _, g, _, _, _, _ = encoder_model(data.x, data.edge_index, data.batch)
        x.append(g)
        y.append(data.y)
    x = torch.cat(x, dim=0)
    y = torch.cat(y, dim=0)

    split = get_split(num_samples=x.size()[0], train_ratio=0.8, test_ratio=0.1)
    result = SVMEvaluator(linear=True)(x, y, split)
    return result


def main():
    device = torch.device('cuda')
    path = osp.join(osp.expanduser('~'), 'datasets')
    dataset = TUDataset(path, name='PTC_MR')
    dataloader = DataLoader(dataset, batch_size=128)
    input_dim = max(dataset.num_features, 1)

>     aug1 = A.EdgeAdding(pe=0.1)
>     aug2 = A.EdgeAdding(pe=0.1)

    gconv = GConv(input_dim=input_dim, hidden_dim=32, num_layers=2).to(device)
    encoder_model = Encoder(encoder=gconv, augmentor=(aug1, aug2)).to(device)
    contrast_model = DualBranchContrast(loss=L.InfoNCE(tau=0.2), mode='G2G').to(device)

    optimizer = Adam(encoder_model.parameters(), lr=0.01)

    with tqdm(total=100, desc='(T)') as pbar:
        for epoch in range(1, 101):
            loss = train(encoder_model, contrast_model, dataloader, optimizer)
            pbar.set_postfix({'loss': loss})
            pbar.update()

    test_result = test(encoder_model, dataloader)
    print(f'(E): Best test F1Mi={test_result["micro_f1"]:.4f}, F1Ma={test_result["macro_f1"]:.4f}')


if __name__ == '__main__':
    main()

opened by kou18n 7

Why did not implement "Graph Contrastive Learning with Adaptive Augmentation"

I am reading your paper "Graph Contrastive Learning with Adaptive Augmentation" and want to compare its results with my work, but I see you did not implement this paper in PyGCL. Any specific reason?

opened by mhadnanali 4
The augmentation "Node Dropping"

Hi, I‘m curious about the augmentation "Dropping Node", I find both of your implementation and the code published by the author of GraphCL just isolated the selected nodes but don't remove the selected nodes from the node feature matrix. In this situation, when we do the graph classification task and use some operations like summation, the isolated nodes will still have an impact on the final learned representation. So, shouldn't we remove the selected nodes from the feature matrix or this is a standard for graph augmentation?

opened by HoytWen 3

PyGCL installation not finding DGL

I have tried installing PyGCL with multiple versions of DGL >= 0.7 I am running on Centos 7 in an HPC environment. Here is the error I am receiving: ERROR: Could not find a version that satisfies the requirement dgl>=0.7 (from PyGCL)

Steps to reproduce:

spaulus@test:~$ pip3.8 install --user PyGCL
Collecting PyGCL
  Using cached PyGCL-0.1.1-py3-none-any.whl (32 kB)
Requirement already satisfied: scipy in /opt/ohpc/pub/spack/opt/spack/linux-centos7-x86_64/gcc-9.2.0/py-scipy-1.6.3-pz52lt4qo22xudj5rxfswm3ohjaed2t5/lib/python3.8/site-packages (from PyGCL) (1.6.3)
ERROR: Could not find a version that satisfies the requirement dgl>=0.7 (from PyGCL) (from versions: 0.1.0, 0.1.2, 0.1.3, 0.4.3, 0.4.3.post1, 0.4.3.post2, 0.5.0, 0.5.1, 0.5.2, 0.5.3, 0.6.0, 0.6.0.post1, 0.6.1, 0.7a210406, 0.7a210407, 0.7a210408, 0.7a210409, 0.7a210410, 0.7a210412, 0.7a210413, 0.7a210414, 0.7a210415, 0.7a210416, 0.7a210420, 0.7a210421, 0.7a210422, 0.7a210423, 0.7a210424, 0.7a210425, 0.7a210426, 0.7a210427, 0.7a210429, 0.7a210501, 0.7a210503, 0.7a210506, 0.7a210507, 0.7a210508, 0.7a210511, 0.7a210512, 0.7a210513, 0.7a210514, 0.7a210515, 0.7a210517, 0.7a210518, 0.7a210519, 0.7a210520, 0.7a210525, 0.7a210527)
ERROR: No matching distribution found for dgl>=0.7 (from PyGCL)

spaulus@test:~$ pip3.8 list
Package               Version
--------------------- ----------
certifi               2021.10.8
charset-normalizer    2.0.7
decorator             5.1.0
dgl                   0.7a210525
googledrivedownloader 0.4
idna                  3.3
isodate               0.6.0
Jinja2                3.0.2
joblib                1.0.1
MarkupSafe            2.0.1
networkx              2.2
numpy                 1.20.1
pandas                1.3.4
pip                   20.2
pyparsing             3.0.4
python-dateutil       2.8.2
pytz                  2021.3
PyYAML                6.0
rdflib                6.0.2
requests              2.26.0
scikit-learn          0.24.1
scipy                 1.6.3
setuptools            50.3.2
six                   1.16.0
threadpoolctl         2.0.0
torch                 1.10.0
torch-geometric       2.0.2
torch-scatter         2.0.9
torch-sparse          0.6.12
tqdm                  4.59.0
typing-extensions     3.10.0.2
urllib3               1.26.7
yacs                  0.1.8

spaulus@test:~$ module list
Currently Loaded Modules:
  1) autotools                        9) gcc-9.2.0-gcc-8.3.0-ebpgkrt
  2) prun/1.3                        10) cuda-11.2.0-gcc-9.2.0-3fwlgae
  3) gnu8/8.3.0                      11) cudnn-8.0.4.30-11.1-gcc-9.2.0-fyvouhn
  4) ohpc                            12) py-pip-20.2-gcc-9.2.0-d66cbwk
  5) gcc/1                           13) py-scikit-learn-0.24.1-gcc-9.2.0-srlkj6p
  6) slurm/1                         14) py-numpy-1.20.1-gcc-9.2.0-25bs7fj
  7) openmpi/4.1.0                   15) py-tqdm-4.59.0-gcc-9.2.0-jliepte
  8) python-3.8.7-gcc-9.2.0-fn3m3au  16) py-networkx-2.2-gcc-8.3.0-ovwwomc

Any help would be greatly appreciated!

opened by skylerpaulus 3

About different results running the same code
Though I have fix the seed, the results after running the same code are different. Do u have any ideas about that? ` random.seed(0)

np.random.seed(0) torch.manual_seed(0) torch.cuda.manual_seed_all(0) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False`
opened by zhandand 2
Could not find dgl>=0.7 when installing PyGCL

Hello pygcl team, thanks for your excellent work!

It seems that the latest version of DGL is 0.6.1 when I install it following the official instruction. Then I meet the error "Could not find dgl>=0.7" when I try to install PyGCL. Could you please give me some suggestions to fix this issue?

opened by scottshufe 2
About the ''edge_weight'' in ''PPRDiffusion'' augment

Thanks for opensourcing.

When I read examples "MVGRL_graph.py", I found only augmented 'edge_index2' and 'x2' were used as the input of 'gcn2'. Why did not input the augmented ''edge_weight2'' to 'gcn2' ?

It would be appreciated if anyone could help tell me the reason.

opened by wwb19951230 2

Is there anyway to read the documentation?

Thanks for opensourcing such good contribution. I wrote a test code like this:

import torch
import GCL.augmentors as A
aug=A.Compose([A.EdgeAdding(pe=0.4),A.FeatureDropout(pf=0.5)])
edge_index=torch.randint(0,10,(2,10)) 
x=torch.randn((10,128))
auged=aug(x,edge_index)
print(auged)

and it got this:

num_edges = edge_index.size()[1]
IndexError: tuple index out of range

It would be appreciated if anyone could help tell me how to read the documentation.

After I checked the code in add_edge function in functional.py, there could be a little problem with these piece of code:

def add_edge(edge_index: torch.Tensor, ratio: float) -> torch.Tensor:
    num_edges = edge_index.size()[1]
    num_nodes = edge_index.max().item() + 1
    num_add = int(num_edges * ratio)

    new_edge_index = torch.randint(0, num_nodes - 1, size=(2, num_add)).to(edge_index.device)
    edge_index = torch.cat([edge_index, new_edge_index], dim=1)
    
    # here it could be wrongly written. [0] might be removed.
    edge_index = sort_edge_index(edge_index)[0]


    return coalesce_edge_index(edge_index)[0]

opened by Yonggie 2

Node representation for batch

Hi, thank you for opening your codes. I wonder that how can I obtain representation of node-levels when I use 'batch'.

For example, when I use GraphCL code, z's dimension is ((batch_size*number of nodes)*representation dimension) and g's dimension is (batch_size * representation dimension). However, I want to obtain z as (batch size * number of nodes * representation dimension). It is hard to use 'contrast model' provided in the examples because z is 3D..

It would be appreciated if you could help this issue :)

opened by shlee-home 1
关于Negative Sampling Strategies

您好，我正在尝试给GRACE.py加上一些负采样策略，直接使用GCL.losses.DebiasedInfoNCE或GCL.losses.HardnessInfoNCE都会报错，检查后发现是图中80行dim的问题，我将80行修改为81行能够跑通，但是测试f1特别低，为0.2，0.3的样子，请问我修改的是否有问题?以为对于其他负采样，ring，hardmixing的使用有没有更详细的说明文档？

opened by Heihaierr 15
Multi-network contrastive learning

All the cases are for the comparison learning of a graph, may I ask whether multi-network comparison learning will be added, such as "Contrastive Multi-View Multiplex Network Embedding with Applications to Robust Network Alignment"

opened by huangyuanhao 0
The augmentation "add edge"

Hi! I found that the augmentation "add edge" may not support for the mini-batch graph-level contrastive learning. In my understanding, edges should be added for each graph in the mini-batch case. Could you please check this?

opened by cjfcsjt 1

PyGCL: Graph Contrastive Learning Library for PyTorch

Related tags

Overview

PyGCL: Graph Contrastive Learning for PyTorch

Prerequisites

Getting Started

Usage

Package Overview

Building Your Own GCL Algorithms

Graph Augmentation

Contrasting Modes

Contrastive Objectives

Negative Mining Strategies

Utilities

Comments

Owner

GCL: Graph Contrastive Learning Library for PyTorch

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21

The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

[ICML 2021] "Graph Contrastive Learning Automated" by Yuning You, Tianlong Chen, Yang Shen, Zhangyang Wang

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)