A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Overview

Splitter Arxiv repo sizebenedekrozemberczki

A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019).

Abstract

Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.

This repository provides a PyTorch implementation of Splitter as described in the paper:

Splitter: Learning Node Representations that Capture Multiple Social Contexts. Alessandro Epasto and Bryan Perozzi. WWW, 2019. [Paper]

The original Tensorflow implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          1.11
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
gensim            3.6.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/` directory.

Outputs

The embeddings are saved in the `input/` directory. Each embedding has a header and a column with the node IDs. Finally, the node embedding is sorted by the node ID column.

Options

The training of a Splitter embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path               STR    Edge list csv.           Default is `input/chameleon_edges.csv`.
  --embedding-output-path   STR    Embedding output csv.    Default is `output/chameleon_embedding.csv`.
  --persona-output-path     STR    Persona mapping JSON.    Default is `output/chameleon_personas.json`.

Model options

  --seed               INT     Random seed.                       Default is 42.
  --number of walks    INT     Number of random walks per node.   Default is 10.
  --window-size        INT     Skip-gram window size.             Default is 5.
  --negative-samples   INT     Number of negative samples.        Default is 5.
  --walk-length        INT     Random walk length.                Default is 40.
  --lambd              FLOAT   Regularization parameter.          Default is 0.1
  --dimensions         INT     Number of embedding dimensions.    Default is 128.
  --workers            INT     Number of cores for pre-training.  Default is 4.   
  --learning-rate      FLOAT   SGD learning rate.                 Default is 0.025

Examples

The following commands learn an embedding and save it with the persona map. Training a model on the default dataset.

python src/main.py

Training a Splitter model with 32 dimensions.

python src/main.py --dimensions 32

Increasing the number of walks and the walk length.

python src/main.py --number-of-walks 20 --walk-length 80

License


Comments
  • Index Error

    Index Error

    I'm getting this error while running your code:

    File "C:\Users\ANJALI\environments\splitter\src\splitter.py", line 41, in <listcomp>
    persona_embedding = np.array([base_node_embedding[original_node] for node, original_node in mapping.items()])
    IndexError: index 599 is out of bounds for axis 0 with size 599
    

    The number of nodes in my graph are 599. What is the cause for this error?

    opened by anjalibhavan 9
  • AttributeError: module 'networkx' has no attribute 'selfloop_edges'

    AttributeError: module 'networkx' has no attribute 'selfloop_edges'

    Hello, I followed the instruction to install all the required python packages but when I ran 'python3 src/main.py', I got the following error. Could you please let me know how I can fix it? Thank you!

    Traceback (most recent call last): File "src/main.py", line 24, in main() File "src/main.py", line 17, in main graph = graph_reader(args.edge_path) File "/Users/machunyu/KoslickiLab/Splitter/src/utils.py", line 26, in graph_reader graph.remove_edges_from(nx.selfloop_edges(graph)) AttributeError: module 'networkx' has no attribute 'selfloop_edges'

    opened by chunyuma 2
  • an error in walker maybe?

    an error in walker maybe?

    I dont know much about this work in details. however, while implementing i am running into sample larger than population. I think in src/walker.py it should be

    def small_walk(self, start_node):
            """
            Doing a truncated random walk.
            :param start_node: Start node for random walk.
            :return walk: Truncated random walk with fixed maximal length.
            """
            walk = [start_node]
            while len(walk) < self.args.walk_length:
                if len(nx.neighbors(self.graph,walk[-1])) ==0:
                    break
                walk = walk + [random.sample(nx.neighbors(self.graph,walk[-1]),1)[0]]
            return walk
    

    Instead of

    def small_walk(self, start_node):
            """
            Doing a truncated random walk.
            :param start_node: Start node for random walk.
            :return walk: Truncated random walk with fixed maximal length.
            """
            walk = [start_node]
            while len(walk) < self.args.walk_length:
                walk = walk + [random.sample(nx.neighbors(self.graph,walk[-1]),1)[0]]
                if len(nx.neighbors(self.graph,walk[-1])) ==0:
                    break
            return walk
    
    opened by Nitinsiwach 2
  • `community` module not found

    `community` module not found

    Hi, I'm trying to test the splitter embedding and encountered this issue of not being able to find the community module, which is being imported in ego_splittying.py. And I think I've installed all the required packages.

    https://github.com/benedekrozemberczki/Splitter/blob/b5b330f90cd585614aa221c608d6fbc4d9a2a7fe/src/ego_splitting.py#L3

    opened by RemyLau 1
  • Abnormal results of embedding

    Abnormal results of embedding

    When i run this model in different datasets (e.g blogcatalog), Some of the dimensions of the vector are greater than or less than negative 1, for example, (1.0,-7.087762355804443,-26.554523468017578,4.721840858459473, ...)

    blogcatalog.zip

    opened by Peterecoding 1
  • an unexpected keyword argument 'iter' in walker?

    an unexpected keyword argument 'iter' in walker?

    Hello! I am new in this work ,but when I try to run the code , the error "an unexpected keyword argument 'iter" occured in the function def learn_base_embedding(self) in walkers.py. It seems that the parameter 'iter' in model = Word2Vec(self.paths, size=self.args.dimensions, window=self.args.window_size, min_count=1, sg=1, workers=self.args.workers, iter=1) is wrong ?

    opened by shizia 0
  • IndexError: index ... is out of bounds for axis 0 with size ...

    IndexError: index ... is out of bounds for axis 0 with size ...

    I'm getting the same error.

    Traceback (most recent call last):
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/main.py", line 24, in <module>
        main()
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/main.py", line 19, in main
        trainer.fit()
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 229, in fit
        self.setup_model()
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 159, in setup_model
        self.egonet_splitter.personality_map)
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 52, in initialize_weights
        persona_embedding = np.array([base_node_embedding[n] for _, n in mapping.items()])
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 52, in <listcomp>
        persona_embedding = np.array([base_node_embedding[n] for _, n in mapping.items()])
    IndexError: index 8637 is out of bounds for axis 0 with size 8637
    

    The dataset is sorted and also IDs start from zero with no index and header. Also, it just happens on the CA-HepTh dataset and not others which is strange.

    opened by alirezabayatmk 0
Owner
Benedek Rozemberczki
Machine Learning Engineer at AstraZeneca | PhD from The University of Edinburgh.
Benedek Rozemberczki
A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

Benedek Rozemberczki 696 Nov 23, 2022
A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL ⠀⠀⠀ A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

Benedek Rozemberczki 201 Nov 24, 2022
A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

CapsGNN ⠀⠀ A PyTorch implementation of Capsule Graph Neural Network (ICLR 2019). Abstract The high-quality node embeddings learned from the Graph Neur

Benedek Rozemberczki 1.2k Nov 19, 2022
A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

APPNP ⠀ A PyTorch implementation of Predict then Propagate: Graph Neural Networks meet Personalized PageRank (ICLR 2019). Abstract Neural message pass

Benedek Rozemberczki 322 Nov 15, 2022
A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

SimGNN ⠀⠀⠀ A PyTorch implementation of SimGNN: A Neural Network Approach to Fast Graph Similarity Computation (WSDM 2019). Abstract Graph similarity s

Benedek Rozemberczki 521 Nov 18, 2022
A PyTorch implementation of "Graph Wavelet Neural Network" (ICLR 2019)

Graph Wavelet Neural Network ⠀⠀ A PyTorch implementation of Graph Wavelet Neural Network (ICLR 2019). Abstract We present graph wavelet neural network

Benedek Rozemberczki 482 Nov 14, 2022
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

Sicheng Xu 798 Nov 25, 2022
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 669 Nov 12, 2022
Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

Zhongqi Miao 757 Nov 16, 2022
A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

SlowFast A PyTorch implementation of SlowFast based on ICCV 2019 paper SlowFast Networks for Video Recognition. Requirements Anaconda PyTorch conda in

Hao Ren 9 Oct 14, 2022
Unofficial PyTorch Implementation of AHDRNet (CVPR 2019)

AHDRNet-PyTorch This is the PyTorch implementation of Attention-guided Network for Ghost-free High Dynamic Range Imaging (CVPR 2019). The official cod

Yutong Zhang 4 Sep 8, 2022
This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

Jiaqi Wang 42 Oct 19, 2022
An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

MixHop and N-GCN ⠀ A PyTorch implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019)

Benedek Rozemberczki 390 Nov 17, 2022
[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

FiGNN for CTR prediction The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Predicti

Big Data and Multi-modal Computing Group, CRIPAC 74 Oct 18, 2022
Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

Nicholas Monath 35 Nov 16, 2022
《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

A-CNN: Annularly Convolutional Neural Networks on Point Clouds Created by Artem Komarichev, Zichun Zhong, Jing Hua from Department of Computer Science

Artёm Komarichev 44 Feb 24, 2022
《Deep Single Portrait Image Relighting》(ICCV 2019)

Ratio Image Based Rendering for Deep Single-Image Portrait Relighting [Project Page] This is part of the Deep Portrait Relighting project. If you find

null 61 Nov 12, 2022
《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Single-Image-Reflection-Removal-Beyond-Linearity Paper Single Image Reflection Removal Beyond Linearity. Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, G

Qiang Wen 51 Jun 24, 2022
Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

Robin Jia 38 Oct 16, 2022