A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Benedek Rozemberczki

Last update: Nov 9, 2022

Related tags

Text Data & NLP machine-learning deep-learning clustering word2vec community-detection pytorch deepwalk gensim factorization network-embedding node2vec graph-embedding overlapping-community-detection deep-neural-network graph-representation-learning node-embedding implicit-factorization graph-neural-network ego-splitting word-vector

Overview

Splitter ⠀⠀

A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019).

Abstract

Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.

This repository provides a PyTorch implementation of Splitter as described in the paper:

Splitter: Learning Node Representations that Capture Multiple Social Contexts. Alessandro Epasto and Bryan Perozzi. WWW, 2019. [Paper]

The original Tensorflow implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          1.11
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
gensim            3.6.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/` directory.

Outputs

The embeddings are saved in the `input/` directory. Each embedding has a header and a column with the node IDs. Finally, the node embedding is sorted by the node ID column.

Options

The training of a Splitter embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path               STR    Edge list csv.           Default is `input/chameleon_edges.csv`.
  --embedding-output-path   STR    Embedding output csv.    Default is `output/chameleon_embedding.csv`.
  --persona-output-path     STR    Persona mapping JSON.    Default is `output/chameleon_personas.json`.

Model options

  --seed               INT     Random seed.                       Default is 42.
  --number of walks    INT     Number of random walks per node.   Default is 10.
  --window-size        INT     Skip-gram window size.             Default is 5.
  --negative-samples   INT     Number of negative samples.        Default is 5.
  --walk-length        INT     Random walk length.                Default is 40.
  --lambd              FLOAT   Regularization parameter.          Default is 0.1
  --dimensions         INT     Number of embedding dimensions.    Default is 128.
  --workers            INT     Number of cores for pre-training.  Default is 4.   
  --learning-rate      FLOAT   SGD learning rate.                 Default is 0.025

Examples

The following commands learn an embedding and save it with the persona map. Training a model on the default dataset.

python src/main.py

Training a Splitter model with 32 dimensions.

python src/main.py --dimensions 32

Increasing the number of walks and the walk length.

python src/main.py --number-of-walks 20 --walk-length 80

License

GNU License

Comments

Index Error

I'm getting this error while running your code:

File "C:\Users\ANJALI\environments\splitter\src\splitter.py", line 41, in <listcomp>
persona_embedding = np.array([base_node_embedding[original_node] for node, original_node in mapping.items()])
IndexError: index 599 is out of bounds for axis 0 with size 599

The number of nodes in my graph are 599. What is the cause for this error?

opened by anjalibhavan 9

AttributeError: module 'networkx' has no attribute 'selfloop_edges'

Hello, I followed the instruction to install all the required python packages but when I ran 'python3 src/main.py', I got the following error. Could you please let me know how I can fix it? Thank you!

Traceback (most recent call last): File "src/main.py", line 24, in main() File "src/main.py", line 17, in main graph = graph_reader(args.edge_path) File "/Users/machunyu/KoslickiLab/Splitter/src/utils.py", line 26, in graph_reader graph.remove_edges_from(nx.selfloop_edges(graph)) AttributeError: module 'networkx' has no attribute 'selfloop_edges'

opened by chunyuma 2

an error in walker maybe?

I dont know much about this work in details. however, while implementing i am running into sample larger than population. I think in src/walker.py it should be

def small_walk(self, start_node):
        """
        Doing a truncated random walk.
        :param start_node: Start node for random walk.
        :return walk: Truncated random walk with fixed maximal length.
        """
        walk = [start_node]
        while len(walk) < self.args.walk_length:
            if len(nx.neighbors(self.graph,walk[-1])) ==0:
                break
            walk = walk + [random.sample(nx.neighbors(self.graph,walk[-1]),1)[0]]
        return walk

Instead of

def small_walk(self, start_node):
        """
        Doing a truncated random walk.
        :param start_node: Start node for random walk.
        :return walk: Truncated random walk with fixed maximal length.
        """
        walk = [start_node]
        while len(walk) < self.args.walk_length:
            walk = walk + [random.sample(nx.neighbors(self.graph,walk[-1]),1)[0]]
            if len(nx.neighbors(self.graph,walk[-1])) ==0:
                break
        return walk

opened by Nitinsiwach 2

`community` module not found

Hi, I'm trying to test the splitter embedding and encountered this issue of not being able to find the community module, which is being imported in ego_splittying.py. And I think I've installed all the required packages.

https://github.com/benedekrozemberczki/Splitter/blob/b5b330f90cd585614aa221c608d6fbc4d9a2a7fe/src/ego_splitting.py#L3

opened by RemyLau 1
Abnormal results of embedding

When i run this model in different datasets (e.g blogcatalog), Some of the dimensions of the vector are greater than or less than negative 1, for example, (1.0,-7.087762355804443,-26.554523468017578,4.721840858459473, ...)

blogcatalog.zip

opened by gangwu001 1
an unexpected keyword argument 'iter' in walker?

Hello! I am new in this work ,but when I try to run the code , the error "an unexpected keyword argument 'iter" occured in the function def learn_base_embedding(self) in walkers.py. It seems that the parameter 'iter' in model = Word2Vec(self.paths, size=self.args.dimensions, window=self.args.window_size, min_count=1, sg=1, workers=self.args.workers, iter=1) is wrong ?

opened by shizia 0

IndexError: index ... is out of bounds for axis 0 with size ...

I'm getting the same error.

Traceback (most recent call last):
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/main.py", line 24, in <module>
    main()
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/main.py", line 19, in main
    trainer.fit()
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 229, in fit
    self.setup_model()
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 159, in setup_model
    self.egonet_splitter.personality_map)
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 52, in initialize_weights
    persona_embedding = np.array([base_node_embedding[n] for _, n in mapping.items()])
  File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 52, in <listcomp>
    persona_embedding = np.array([base_node_embedding[n] for _, n in mapping.items()])
IndexError: index 8637 is out of bounds for axis 0 with size 8637

The dataset is sorted and also IDs start from zero with no index and header. Also, it just happens on the CA-HepTh dataset and not others which is strange.

opened by alirezabayatmk 0

Owner

Benedek Rozemberczki

Machine Learning Engineer at AstraZeneca | PhD from The University of Edinburgh.

GitHub

This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Advent-of-cyber-2019-writeup This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe https://tryhackme.com/shivam007/badges/c

5 Jul 17, 2022

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Med-VQA In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Mo

8 Apr 14, 2022

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

1 Jan 1, 2022

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

The PyTorch-Kaldi Speech Recognition Toolkit PyTorch-Kaldi is an open-source repository for developing state-of-the-art DNN/HMM speech recognition sys

2.3k Dec 27, 2022

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Related tags

Overview

Splitter ⠀⠀

Abstract

Requirements

Datasets

Outputs

Options

Input and output options

Model options

Examples

Comments

Index Error

AttributeError: module 'networkx' has no attribute 'selfloop_edges'

an error in walker maybe?

`community` module not found

Abnormal results of embedding

an unexpected keyword argument 'iter' in walker?

IndexError: index ... is out of bounds for axis 0 with size ...

Owner

Benedek Rozemberczki

This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

SAINT PyTorch implementation

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

A fast and easy implementation of Transformer with PyTorch.

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Pytorch implementation of Tacotron

Google AI 2018 BERT pytorch implementation

Unofficial PyTorch implementation of Google AI's VoiceFilter system

Implementation of ProteinBERT in Pytorch

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".