A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Benedek Rozemberczki

Last update: Nov 9, 2022

Related tags

Deep Learning machine-learning deep-learning clustering word2vec community-detection pytorch deepwalk gensim factorization network-embedding node2vec graph-embedding overlapping-community-detection deep-neural-network graph-representation-learning node-embedding implicit-factorization graph-neural-network ego-splitting word-vector

Overview

Splitter ⠀⠀

A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019).

Abstract

Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.

This repository provides a PyTorch implementation of Splitter as described in the paper:

Splitter: Learning Node Representations that Capture Multiple Social Contexts. Alessandro Epasto and Bryan Perozzi. WWW, 2019. [Paper]

The original Tensorflow implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          1.11
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
gensim            3.6.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/` directory.

Outputs

The embeddings are saved in the `input/` directory. Each embedding has a header and a column with the node IDs. Finally, the node embedding is sorted by the node ID column.

Options

The training of a Splitter embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path               STR    Edge list csv.           Default is `input/chameleon_edges.csv`.
  --embedding-output-path   STR    Embedding output csv.    Default is `output/chameleon_embedding.csv`.
  --persona-output-path     STR    Persona mapping JSON.    Default is `output/chameleon_personas.json`.

Model options

  --seed               INT     Random seed.                       Default is 42.
  --number of walks    INT     Number of random walks per node.   Default is 10.
  --window-size        INT     Skip-gram window size.             Default is 5.
  --negative-samples   INT     Number of negative samples.        Default is 5.
  --walk-length        INT     Random walk length.                Default is 40.
  --lambd              FLOAT   Regularization parameter.          Default is 0.1
  --dimensions         INT     Number of embedding dimensions.    Default is 128.
  --workers            INT     Number of cores for pre-training.  Default is 4.   
  --learning-rate      FLOAT   SGD learning rate.                 Default is 0.025

Examples

The following commands learn an embedding and save it with the persona map. Training a model on the default dataset.

python src/main.py

Training a Splitter model with 32 dimensions.

python src/main.py --dimensions 32

Increasing the number of walks and the walk length.

python src/main.py --number-of-walks 20 --walk-length 80

License

GNU License

Comments

Index Error

I'm getting this error while running your code:

File "C:\Users\ANJALI\environments\splitter\src\splitter.py", line 41, in <listcomp>
persona_embedding = np.array([base_node_embedding[original_node] for node, original_node in mapping.items()])
IndexError: index 599 is out of bounds for axis 0 with size 599

The number of nodes in my graph are 599. What is the cause for this error?

opened by anjalibhavan 9

AttributeError: module 'networkx' has no attribute 'selfloop_edges'

Hello, I followed the instruction to install all the required python packages but when I ran 'python3 src/main.py', I got the following error. Could you please let me know how I can fix it? Thank you!

Traceback (most recent call last): File "src/main.py", line 24, in main() File "src/main.py", line 17, in main graph = graph_reader(args.edge_path) File "/Users/machunyu/KoslickiLab/Splitter/src/utils.py", line 26, in graph_reader graph.remove_edges_from(nx.selfloop_edges(graph)) AttributeError: module 'networkx' has no attribute 'selfloop_edges'

opened by chunyuma 2

an error in walker maybe?

I dont know much about this work in details. however, while implementing i am running into sample larger than population. I think in src/walker.py it should be

def small_walk(self, start_node):
        """
        Doing a truncated random walk.
        :param start_node: Start node for random walk.
        :return walk: Truncated random walk with fixed maximal length.
        """
        walk = [start_node]
        while len(walk) < self.args.walk_length:
            if len(nx.neighbors(self.graph,walk[-1])) ==0:
                break
            walk = walk + [random.sample(nx.neighbors(self.graph,walk[-1]),1)[0]]
        return walk

Instead of

def small_walk(self, start_node):
        """
        Doing a truncated random walk.
        :param start_node: Start node for random walk.
        :return walk: Truncated random walk with fixed maximal length.
        """
        walk = [start_node]
        while len(walk) < self.args.walk_length:
            walk = walk + [random.sample(nx.neighbors(self.graph,walk[-1]),1)[0]]
            if len(nx.neighbors(self.graph,walk[-1])) ==0:
                break
        return walk

opened by Nitinsiwach 2

`community` module not found

Hi, I'm trying to test the splitter embedding and encountered this issue of not being able to find the community module, which is being imported in ego_splittying.py. And I think I've installed all the required packages.

https://github.com/benedekrozemberczki/Splitter/blob/b5b330f90cd585614aa221c608d6fbc4d9a2a7fe/src/ego_splitting.py#L3

opened by RemyLau 1
Abnormal results of embedding

When i run this model in different datasets (e.g blogcatalog), Some of the dimensions of the vector are greater than or less than negative 1, for example, (1.0,-7.087762355804443,-26.554523468017578,4.721840858459473, ...)

blogcatalog.zip

opened by gangwu001 1
an unexpected keyword argument 'iter' in walker?

Hello! I am new in this work ,but when I try to run the code , the error "an unexpected keyword argument 'iter" occured in the function def learn_base_embedding(self) in walkers.py. It seems that the parameter 'iter' in model = Word2Vec(self.paths, size=self.args.dimensions, window=self.args.window_size, min_count=1, sg=1, workers=self.args.workers, iter=1) is wrong ?

opened by shizia 0

IndexError: index ... is out of bounds for axis 0 with size ...

I'm getting the same error.

Traceback (most recent call last):
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/main.py", line 24, in <module>
    main()
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/main.py", line 19, in main
    trainer.fit()
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 229, in fit
    self.setup_model()
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 159, in setup_model
    self.egonet_splitter.personality_map)
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 52, in initialize_weights
    persona_embedding = np.array([base_node_embedding[n] for _, n in mapping.items()])
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 52, in <listcomp>
    persona_embedding = np.array([base_node_embedding[n] for _, n in mapping.items()])
IndexError: index 8637 is out of bounds for axis 0 with size 8637

The dataset is sorted and also IDs start from zero with no index and header. Also, it just happens on the CA-HepTh dataset and not others which is strange.

opened by alirezabayatmk 0

Owner

Benedek Rozemberczki

Machine Learning Engineer at AstraZeneca | PhD from The University of Edinburgh.

GitHub

A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

697 Dec 27, 2022

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL ⠀⠀⠀ A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

202 Dec 27, 2022

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

CapsGNN ⠀⠀ A PyTorch implementation of Capsule Graph Neural Network (ICLR 2019). Abstract The high-quality node embeddings learned from the Graph Neur

1.2k Jan 2, 2023

A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

APPNP ⠀ A PyTorch implementation of Predict then Propagate: Graph Neural Networks meet Personalized PageRank (ICLR 2019). Abstract Neural message pass

329 Dec 30, 2022

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

SimGNN ⠀⠀⠀ A PyTorch implementation of SimGNN: A Neural Network Approach to Fast Graph Similarity Computation (WSDM 2019). Abstract Graph similarity s

534 Dec 25, 2022

A PyTorch implementation of "Graph Wavelet Neural Network" (ICLR 2019)

Graph Wavelet Neural Network ⠀⠀ A PyTorch implementation of Graph Wavelet Neural Network (ICLR 2019). Abstract We present graph wavelet neural network

490 Dec 16, 2022

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

833 Dec 28, 2022

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

677 Dec 25, 2022

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

761 Dec 26, 2022

A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

SlowFast A PyTorch implementation of SlowFast based on ICCV 2019 paper SlowFast Networks for Video Recognition. Requirements Anaconda PyTorch conda in

8 Dec 23, 2022

Unofficial PyTorch Implementation of AHDRNet (CVPR 2019)

AHDRNet-PyTorch This is the PyTorch implementation of Attention-guided Network for Ghost-free High Dynamic Range Imaging (CVPR 2019). The official cod

4 Sep 8, 2022

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

42 Jan 7, 2023

An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

MixHop and N-GCN ⠀ A PyTorch implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019)

393 Dec 13, 2022

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

FiGNN for CTR prediction The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Predicti

Big Data and Multi-modal Computing Group, CRIPAC

75 Dec 30, 2022

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

35 Nov 16, 2022

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

38 Oct 16, 2022

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Related tags

Overview

Splitter ⠀⠀

Abstract

Requirements

Datasets

Outputs

Options

Input and output options

Model options

Examples

Comments

Index Error

AttributeError: module 'networkx' has no attribute 'selfloop_edges'

an error in walker maybe?

`community` module not found

Abnormal results of embedding

an unexpected keyword argument 'iter' in walker?

IndexError: index ... is out of bounds for axis 0 with size ...

Owner

Benedek Rozemberczki

A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

A PyTorch implementation of "Graph Wavelet Neural Network" (ICLR 2019)

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

Unofficial PyTorch Implementation of AHDRNet (CVPR 2019)

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

《Deep Single Portrait Image Relighting》(ICCV 2019)

《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)