Adversarial Graph Augmentation to Improve Graph Contrastive Learning

susheel suresh

Last update: Nov 19, 2022

Related tags

Deep Learning adgcl

Overview

ADGCL : Adversarial Graph Augmentation to Improve Graph Contrastive Learning

Introduction

This repo contains the Pytorch [1] implementation of Adversarial Graph Contrastive Learning (AD-GCL) principle instantiated with learnable edge dropping augmentation. The paper is available on arxiv.

Requirements and Environment Setup

Code developed and tested in Python 3.8.8 using PyTorch 1.8. Please refer to their official websites for installation and setup.

Some major requirements are given below

numpy~=1.20.1
networkx~=2.5.1
torch~=1.8.1
tqdm~=4.60.0
scikit-learn~=0.24.1
pandas~=1.2.4
gensim~=4.0.1
scipy~=1.6.2
ogb~=1.3.1
matplotlib~=3.4.2
torch-cluster~=1.5.9
torch-geometric~=1.7.0
torch-scatter~=2.0.6
torch-sparse~=0.6.9
torch-spline-conv~=1.2.1
rdkit~=2021.03.1

Datasets

The package datasets contains the modules required for downloading and loading the TU Benchmark Dataset, ZINC and transfer learning pre-train and fine-tuning datasets.

Create a folder to store all datasets using mkdir original_datasets. Except for the transfer learning datasets all the others are automatically downloaded and loaded using the datasets package. Follow and download chem and bio datasets for transfer learning from here and place it inside a newly created folder called transfer within original_datasets.

The Open Graph Benchmark datasets are downloaded and loaded using the ogb library. Please refer here for more details and installation.

AD-GCL Training

For running AD-GCL on Open Graph Benchmark. e.g. CUDA_VISIBLE_DEVICES=0 python test_minmax_ogbg.py --dataset ogbg-molesol --reg_lambda 0.4

usage: test_minmax_ogbg.py [-h] [--dataset DATASET] [--model_lr MODEL_LR] [--view_lr VIEW_LR] [--num_gc_layers NUM_GC_LAYERS] [--pooling_type POOLING_TYPE] [--emb_dim EMB_DIM] [--mlp_edge_model_dim MLP_EDGE_MODEL_DIM] [--batch_size BATCH_SIZE] [--drop_ratio DROP_RATIO]
                           [--epochs EPOCHS] [--reg_lambda REG_LAMBDA] [--seed SEED]

AD-GCL ogbg-mol*

optional arguments:
  -h, --help            show this help message and exit
  --dataset DATASET     Dataset
  --model_lr MODEL_LR   Model Learning rate.
  --view_lr VIEW_LR     View Learning rate.
  --num_gc_layers NUM_GC_LAYERS
                        Number of GNN layers before pooling
  --pooling_type POOLING_TYPE
                        GNN Pooling Type Standard/Layerwise
  --emb_dim EMB_DIM     embedding dimension
  --mlp_edge_model_dim MLP_EDGE_MODEL_DIM
                        embedding dimension
  --batch_size BATCH_SIZE
                        batch size
  --drop_ratio DROP_RATIO
                        Dropout Ratio / Probability
  --epochs EPOCHS       Train Epochs
  --reg_lambda REG_LAMBDA
                        View Learner Edge Perturb Regularization Strength
  --seed SEED

Similarly, one can run for ZINC and TU datasets using for e.g. CUDA_VISIBLE_DEVICES=0 python test_minmax_zinc.py and CUDA_VISIBLE_DEVICES=0 python test_minmax_tu.py --dataset REDDIT-BINARY respectively. Adding a --help at the end will provide more details.

Pretraining for transfer learning

usage: test_minmax_transfer_pretrain_chem.py [-h] [--dataset DATASET] [--model_lr MODEL_LR] [--view_lr VIEW_LR] [--num_gc_layers NUM_GC_LAYERS] [--pooling_type POOLING_TYPE] [--emb_dim EMB_DIM] [--mlp_edge_model_dim MLP_EDGE_MODEL_DIM] [--batch_size BATCH_SIZE]
                                             [--drop_ratio DROP_RATIO] [--epochs EPOCHS] [--reg_lambda REG_LAMBDA] [--seed SEED]

Transfer Learning AD-GCL Pretrain on ZINC 2M

optional arguments:
  -h, --help            show this help message and exit
  --dataset DATASET     Dataset
  --model_lr MODEL_LR   Model Learning rate.
  --view_lr VIEW_LR     View Learning rate.
  --num_gc_layers NUM_GC_LAYERS
                        Number of GNN layers before pooling
  --pooling_type POOLING_TYPE
                        GNN Pooling Type Standard/Layerwise
  --emb_dim EMB_DIM     embedding dimension
  --mlp_edge_model_dim MLP_EDGE_MODEL_DIM
                        embedding dimension
  --batch_size BATCH_SIZE
                        batch size
  --drop_ratio DROP_RATIO
                        Dropout Ratio / Probability
  --epochs EPOCHS       Train Epochs
  --reg_lambda REG_LAMBDA
                        View Learner Edge Perturb Regularization Strength
  --seed SEED

usage: test_minmax_transfer_pretrain_bio.py [-h] [--dataset DATASET] [--model_lr MODEL_LR] [--view_lr VIEW_LR] [--num_gc_layers NUM_GC_LAYERS] [--pooling_type POOLING_TYPE] [--emb_dim EMB_DIM] [--mlp_edge_model_dim MLP_EDGE_MODEL_DIM] [--batch_size BATCH_SIZE]
                                            [--drop_ratio DROP_RATIO] [--epochs EPOCHS] [--reg_lambda REG_LAMBDA] [--seed SEED]

Transfer Learning AD-GCL Pretrain on PPI-306K

optional arguments:
  -h, --help            show this help message and exit
  --dataset DATASET     Dataset
  --model_lr MODEL_LR   Model Learning rate.
  --view_lr VIEW_LR     View Learning rate.
  --num_gc_layers NUM_GC_LAYERS
                        Number of GNN layers before pooling
  --pooling_type POOLING_TYPE
                        GNN Pooling Type Standard/Layerwise
  --emb_dim EMB_DIM     embedding dimension
  --mlp_edge_model_dim MLP_EDGE_MODEL_DIM
                        embedding dimension
  --batch_size BATCH_SIZE
                        batch size
  --drop_ratio DROP_RATIO
                        Dropout Ratio / Probability
  --epochs EPOCHS       Train Epochs
  --reg_lambda REG_LAMBDA
                        View Learner Edge Perturb Regularization Strength
  --seed SEED

Pre-train models will be automatically saved in a folder called models_minmax. Please use those when finetuning to initialize the GNN. More details below.

Fine-tuning for evaluating transfer learning

For fine-tuning evaluation for transfer learning.

usage: test_transfer_finetune_chem.py [-h] [--device DEVICE] [--batch_size BATCH_SIZE] [--epochs EPOCHS] [--lr LR] [--lr_scale LR_SCALE] [--decay DECAY] [--num_layer NUM_LAYER] [--emb_dim EMB_DIM] [--dropout_ratio DROPOUT_RATIO] [--graph_pooling GRAPH_POOLING] [--JK JK]
                                      [--gnn_type GNN_TYPE] [--dataset DATASET] [--input_model_file INPUT_MODEL_FILE] [--seed SEED] [--split SPLIT] [--eval_train EVAL_TRAIN] [--num_workers NUM_WORKERS]

Finetuning Chem after pre-training of graph neural networks

optional arguments:
  -h, --help            show this help message and exit
  --device DEVICE       which gpu to use if any (default: 0)
  --batch_size BATCH_SIZE
                        input batch size for training (default: 32)
  --epochs EPOCHS       number of epochs to train (default: 100)
  --lr LR               learning rate (default: 0.001)
  --lr_scale LR_SCALE   relative learning rate for the feature extraction layer (default: 1)
  --decay DECAY         weight decay (default: 0)
  --num_layer NUM_LAYER
                        number of GNN message passing layers (default: 5).
  --emb_dim EMB_DIM     embedding dimensions (default: 300)
  --dropout_ratio DROPOUT_RATIO
                        dropout ratio (default: 0.5)
  --graph_pooling GRAPH_POOLING
                        graph level pooling (sum, mean, max, set2set, attention)
  --JK JK               how the node features across layers are combined. last, sum, max or concat
  --gnn_type GNN_TYPE
  --dataset DATASET     dataset. For now, only classification.
  --input_model_file INPUT_MODEL_FILE
                        filename to read the pretrain model (if there is any)
  --seed SEED           Seed for minibatch selection, random initialization.
  --split SPLIT         random or scaffold or random_scaffold
  --eval_train EVAL_TRAIN
                        evaluating training or not
  --num_workers NUM_WORKERS
                        number of workers for dataset loading

Similarly, for the bio dataset use python test_transfer_finetune_bio.py --help for details.

Please refer to the appendix of our paper for more details regarding hyperparameter settings.

Acknowledgements

This reference implementation is inspired and based on earlier works [2] and [3].

Please cite our paper if you use this code in your own work.

@article{suresh2021adversarial,
  title={Adversarial Graph Augmentation to Improve Graph Contrastive Learning},
  author={Suresh, Susheel and Li, Pan and Hao, Cong and Neville, Jennifer},
  journal={arXiv preprint arXiv:2106.05819},
  year={2021}
}

References

[1] Paszke, Adam, et al. "PyTorch: An Imperative Style, High-Performance Deep Learning Library." Advances in Neural Information Processing Systems 32 (2019): 8026-8037.

[2] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph contrastive learning with augmentations”. Advances in Neural Information Processing Systems, vol. 33, 2020

[3] Weihua Hu*, Bowen Liu*, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec. "Strategies for Pre-training Graph Neural Networks". ICLR 2020

Comments

Question about ablation study in unsupervised learning

Hi, thanks for your great work! I have a question about the detail of NAD-GCL. In 5.1 of the paper it writes:

NAD-GCL drops the edges of a graph uniformly at random. We consider NAD-GCL-FIX and NAD-GCL-OPT with different edge drop ratios. NAD-GCL-GCL adopts the edge drop ratio of AD-GCL-FIX at the saddle point of the optimization (Eq.8) while NAD-GCL-OPT optimally tunes the edge drop ratio over the validation datasets to match AD-GCL-OPT.

I didn't quite understand how to define the edge drop ratio in NAD-GCL. What's the difference between NAD-GCL and GraphCL which using EdgePert? Thank you!

opened by zwb29 7
Unsupervised learning on TU dataset.

Hi! I'm trying to reproduce your AD-GCL results on TU-dataset (unsupervised learning). I could achieve nearly the same results as your paper reported. However, when I split the training process into two steps (AD_GCL for latent vector generation and using latent vector for linear classification (linear SVC). The result is bad enough (training : 65%, val and test: ~50%)). I wonder what is the difference between these two training strategies. Thanks a lot

opened by jerryzhang1119 4
Can adgcl be used on single graph for node-level task?

Hi, thanks for your excellent work! I find that you evaluate adgcl mainly on graph-level task with multiple datasets. And I wonder whether adgcl can be applied on a single graph dataset like Cora or Citeseer for node classification?

opened by scottshufe 2
How to reproduce the results of baselines on OGBG?

Hi @susheels

Thanks for your great work. I have a minor request that could you pls release the code of baselines (e.g., GraphCL) for OGBG. I think it's a bit difficult to adapt the test_minimax_ogbg.py directly. It's really helpful if you could release them. Thanks a lot!

opened by ha-lins 2
Questions about the transfer learning

Hi @susheels

Thanks for the great work. I tried to reproduce the transfer learning results of AD-GCL. Concretely, I pretrained the model on the ZINC-2M for 100 epochs, and fine-tuned it on the downstream tasks. However, the reproduced results are lower than ones in the paper. Could you pls help me with it? Thanks!

opened by ha-lins 2
About pretraining models

Hi! Could you please your pretrained model files of transfer learning(bio and chem dataset)? Thus I can use it for finetuning and better approximate the results of transfer learning in your paper.

opened by HeyMercer 0

Adversarial Graph Augmentation to Improve Graph Contrastive Learning

Related tags

Overview

ADGCL : Adversarial Graph Augmentation to Improve Graph Contrastive Learning

Introduction

Requirements and Environment Setup

Datasets

AD-GCL Training

Pretraining for transfer learning

Fine-tuning for evaluating transfer learning

Acknowledgements

References

Comments

Question about ablation study in unsupervised learning

Unsupervised learning on TU dataset.

Can adgcl be used on single graph for node-level task?

How to reproduce the results of baselines on OGBG?

Questions about the transfer learning

About pretraining models

Owner

susheel suresh

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization

Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Causal-Adversarial-Instruments - PyTorch Implementation for Developing Library of Investigating Adversarial Examples on A Causal View by Instruments

Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Keep CALM and Improve Visual Feature Attribution

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

SmallInitEmb - LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence