Semi-Supervised Signed Clustering Graph Neural Network (and Implementation of Some Spectral Methods)

Overview

SSSNET

SSSNET: Semi-Supervised Signed Network Clustering

For details, please read our paper.

Environment Setup

Overview

The project has been tested on the following environment specification:

  1. Ubuntu 18.04.5 LTS (Other x86_64 based Linux distributions should also be fine, such as Fedora 32)
  2. Nvidia Graphic Card (NVIDIA GeForce RTX 2080 with driver version 440.36, and NVIDIA RTX 8000) and CPU (Intel Core i7-10700 CPU @ 2.90GHz)
  3. Python 3.6.13 (and Python 3.6.12)
  4. CUDA 10.2 (and CUDA 9.2)
  5. Pytorch 1.8.0 (built against CUDA 10.2) and Python 1.6.0 (built against CUDA 9.2)
  6. Other libraries and python packages (See below)

You should handle (1),(2) yourself. For (3), (4), (5) and (6), see following methods.

Installation Method 1 (Using Installation Script)

We provide two examples of environmental setup, one with CUDA 10.2 and GPU, the other with CPU.

Following steps assume you've done with (1) and (2).

  1. Install conda. Both Miniconda and Anaconda are OK.

  2. Run the following bash script under SSSNET's root directory.

./create_conda_env.sh

Installation Method 2 (.yml files)

We provide two examples of envionmental setup, one with CUDA 10.2 and GPU, the other with CPU.

Following steps assume you've done with (1) and (2).

  1. Install conda. Both Miniconda and Anaconda are OK.

  2. Create an environment and install python packages (GPU):

conda env create -f environment_GPU.yml
  1. Create an environment and install python packages (CPU):
conda env create -f environment_CPU.yml

Installation Method 3 (Manually Install)

The codebase is implemented in Python 3.6.12. package versions used for development are just below.

networkx           2.5
tqdm               4.50.2
numpy              1.19.2
pandas             1.1.4
texttable          1.6.3
latextable         0.1.1
scipy              1.5.4
argparse           1.1.0
sklearn            0.23.2
torch              1.8.1
torch-scatter      2.0.5
torch-geometric    1.6.3 (follow https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html)
matplotlib         3.3.4 (for generating plots and results)
SigNet         (for comparison methods, can get from the command: pip install git+https://github.com/alan-turing-institute/SigNet.git)

Execution checks

When installation is done, you could check you enviroment via:

bash setup_test.sh

Folder structure

  • ./execution/ stores files that can be executed to generate outputs. For vast number of experiments, we use GNU parallel, can be downloaded in command line and make it executable via:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 ./parallel
  • ./joblog/ stores job logs from parallel. You might need to create it by
mkdir joblog
  • ./Output/ stores raw outputs (ignored by Git) from parallel. You might need to create it by
mkdir Output
  • ./data/ stores processed data sets for node clustering.

  • ./src/ stores files to train various models, utils and metrics.

  • ./result_arrays/ stores results for different data sets. Each data set has a separate subfolder.

  • ./result_anlysis/ stores notebooks for generating result plots or tables.

  • ./logs/ stores trained models and logs, as well as predicted clusters (optional). When you are in debug mode (see below), your logs will be stored in ./debug_logs/ folder.

Options

SSSNET provides the following command line arguments, which can be viewed in the ./src/param_parser.py and ./src/link_sign_param_parser.py.

Synthetic data options:

See file ./src/param_parser.py.

  --p                     FLOAT         Probability of the existence of a link.                 Default is 0.02. 
  --eta                   FLOAT         Probability of flipping the sign of each edge.          Default is 0.1.
  --N                     INT           (Expected) Number of nodes in an SSBM.                  Default is 1000.
  --K                     INT           Number of blocks in an SSBM.                            Default is 3.
  --total_n               INT           Total number of nodes in the polarized network.         Default is 1050.
  --num_com               INT           Number of polarized communities (SSBMs).                Default is 2.

Major model options:

See file ./src/param_parser.py.

  --epochs                INT         Number of SSSNET (maximum) training epochs.               Default is 300. 
  --early_stopping        INT         Number of SSSNET early stopping epochs.                   Default is 100. 
  --num_trials            INT         Number of trials to generate results.                     Default is 10.
  --seed_ratio            FLOAT       Ratio in the training set of each cluster 
                                                        to serve as seed nodes.                 Default is 0.1.
  --loss_ratio            FLOAT       Ratio of loss_pbnc to loss_pbrc. -1 means only loss_pbnc. Default is -1.0.
  --supervised_loss_ratio FLOAT       Ratio of factor of supervised loss part to
                                      self-supervised loss part.                                Default is 50.
  --triplet_loss_ratio    FLOAT       Ratio of triplet loss to cross entropy loss in 
                                      supervised loss part.                                     Default is 0.1.
  --tau                   FLOAT       Regularization parameter when adding self-loops to the positive 
                                      part of the adjacency matrix, i.e. A -> A + tau * I,
                                      where I is the identity matrix.                           Default is 0.5.
  --hop                   INT         Number of hops to consider for the random walk.           Default is 2.
  --samples               INT         Number of samples in triplet loss.                        Default is 10000.
  --train_ratio           FLOAT       Training ratio.                                           Default is 0.8.  
  --test_ratio            FLOAT       Test ratio.                                               Default is 0.1.
  --lr                    FLOAT       Initial learning rate.                                    Default is 0.01.  
  --weight_decay          FLOAT       Weight decay (L2 loss on parameters).                     Default is 5^-4. 
  --dropout               FLOAT       Dropout rate (1 - keep probability).                      Default is 0.5.
  --hidden                INT         Number of hidden units.                                   Default is 32. 
  --seed                  INT         Random seed.                                              Default is 31.
  --no-cuda               BOOL        Disables CUDA training.                                   Default is False.
  --debug, -D             BOOL        Debug with minimal training setting, not to get results.  Default is False.
  --directed              BOOL        Directed input graph.                                     Default is False.
  --no_validation         BOOL        Whether to disable validation and early stopping
                                      during traing.                                            Default is False.
  --regenerate_data       BOOL        Whether to force creation of data splits.                 Default is False.
  --load_only             BOOL        Whether not to store generated data.                      Default is False.
  --dense                 BOOL        Whether not to use torch sparse.                          Default is False.
  -AllTrain, -All         BOOL        Whether to use all data to do gradient descent.           Default is False.
  --SavePred, -SP         BOOL        Whether to save predicted labels.                         Default is False.
  --dataset               STR         Data set to consider.                                     Default is 'SSBM/'.
  --all_methods           LST         Methods to use to generate results.                       Default is ['spectral','SSSNET'].
  --feature_options       LST         Features to use for SSSNET. 
                                      Can choose from ['A_reg','L','given','None'].            Default is ['A_reg'].

Reproduce results

First, get into the ./execution/ folder:

cd execution

To reproduce SSBM results.

bash SSBM.sh

To reproduce results on polarized SSBMs.

bash polarized.sh

To reproduce results of node clustering on real data.

bash real.sh

Note that if you are operating on CPU, you may delete the commands ``CUDA_VISIBLE_DEVICES=xx". You can also set you own number of parallel jobs, not necessarily following the j numbers in the .sh files.

You can also use CPU for training if you add ``--no-duca", or GPU if you delete this.

Direct execution with training files

First, get into the ./src/ folder:

cd src

Then, below are various options to try:

Creating an SSSNET model for SSBM of the default setting.

python ./train.py

Creating an SSSNET model for polarized SSBMs with 5000 nodes, N=500.

python ./train.py --dataset polarized --total_n 5000 --N 500

Creating a model for S&P1500 data set with some custom learning rate and epoch number.

python ./train.py --dataset SP1500 --lr 0.001 --epochs 300

Creating a model for Wiki-Rfa data set (directed) with specific number of trials and use CPU.

python ./train.py --dataset wikirfa --directed --no-cuda --num_trials 5

Note

  • When no ground-truth exists, the labels loaded for training/testing are not really meaningful. They simply provides some relative Adjusted Rand Index (ARI) for the model's predicted clustering to some fitted/dummy clustering. The codebase loads these fitted/dummy labels and prints out ARIs for completeness instead of evaluation purpose.

  • Other versions of the code. ./src/PyG_models.py provides a version of SSSNET with pytorch geometric message passing implementation, and ./src/PyG_train.py runs this model for SSSNET. Note that this implementation gives almost the same results.


You might also like...
Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

MetaSDF: Meta-learning Signed Distance Functions Project Page | Paper | Data Vincent Sitzmann*, Eric Ryan Chan*, Richard Tucker, Noah Snavely Gordon W

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Collections for the lasted paper about multi-view clustering methods (papers, codes)

Multi-View Clustering Papers Collections for the lasted paper about multi-view clustering methods (papers, codes). There also exists some repositories

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning
UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning This is the official PyTorch implementation for UniMoCo pape

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021

MRefG Wanli Li and Tieyun Qian: "Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction", IJCNN 2021 1. Requirements To reproduc

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Comments
  • Bump numpy from 1.19.2 to 1.22.0

    Bump numpy from 1.19.2 to 1.22.0

    Bumps numpy from 1.19.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Owner
Yixuan He
DPhil in Statistics @ University of Oxford
Yixuan He
Python Implementation of algorithms in Graph Mining, e.g., Recommendation, Collaborative Filtering, Community Detection, Spectral Clustering, Modularity Maximization, co-authorship networks.

Graph Mining Author: Jiayi Chen Time: April 2021 Implemented Algorithms: Network: Scrabing Data, Network Construbtion and Network Measurement (e.g., P

Jiayi Chen 3 Mar 3, 2022
Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Yaoming Cai 5 Jul 18, 2022
A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

SGCN ⠀ A PyTorch implementation of Signed Graph Convolutional Network (ICDM 2018). Abstract Due to the fact much of today's data can be represented as

Benedek Rozemberczki 251 Nov 30, 2022
Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting This repository is the official implementation of Spectral Temporal Gr

Microsoft 306 Dec 29, 2022
A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL ⠀⠀⠀ A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

Benedek Rozemberczki 202 Dec 27, 2022
This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

null 12 Oct 28, 2022
Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

JunSeok 8 Oct 15, 2021
Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021)

Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021) This repository contains the code

null 149 Dec 15, 2022
Implementation for Simple Spectral Graph Convolution in ICLR 2021

Simple Spectral Graph Convolutional Overview This repo contains an example implementation of the Simple Spectral Graph Convolutional (S^2GC) model. Th

allenhaozhu 64 Dec 31, 2022
PyTorch implementation of spectral graph ConvNets, NIPS’16

Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbresson

Xavier Bresson 287 Jan 4, 2023