A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Overview

Splitter Arxiv repo sizebenedekrozemberczki

A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019).

Abstract

Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.

This repository provides a PyTorch implementation of Splitter as described in the paper:

Splitter: Learning Node Representations that Capture Multiple Social Contexts. Alessandro Epasto and Bryan Perozzi. WWW, 2019. [Paper]

The original Tensorflow implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          1.11
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
gensim            3.6.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/` directory.

Outputs

The embeddings are saved in the `input/` directory. Each embedding has a header and a column with the node IDs. Finally, the node embedding is sorted by the node ID column.

Options

The training of a Splitter embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path               STR    Edge list csv.           Default is `input/chameleon_edges.csv`.
  --embedding-output-path   STR    Embedding output csv.    Default is `output/chameleon_embedding.csv`.
  --persona-output-path     STR    Persona mapping JSON.    Default is `output/chameleon_personas.json`.

Model options

  --seed               INT     Random seed.                       Default is 42.
  --number of walks    INT     Number of random walks per node.   Default is 10.
  --window-size        INT     Skip-gram window size.             Default is 5.
  --negative-samples   INT     Number of negative samples.        Default is 5.
  --walk-length        INT     Random walk length.                Default is 40.
  --lambd              FLOAT   Regularization parameter.          Default is 0.1
  --dimensions         INT     Number of embedding dimensions.    Default is 128.
  --workers            INT     Number of cores for pre-training.  Default is 4.   
  --learning-rate      FLOAT   SGD learning rate.                 Default is 0.025

Examples

The following commands learn an embedding and save it with the persona map. Training a model on the default dataset.

python src/main.py

Training a Splitter model with 32 dimensions.

python src/main.py --dimensions 32

Increasing the number of walks and the walk length.

python src/main.py --number-of-walks 20 --walk-length 80

License


Comments
  • Index Error

    Index Error

    I'm getting this error while running your code:

    File "C:\Users\ANJALI\environments\splitter\src\splitter.py", line 41, in <listcomp>
    persona_embedding = np.array([base_node_embedding[original_node] for node, original_node in mapping.items()])
    IndexError: index 599 is out of bounds for axis 0 with size 599
    

    The number of nodes in my graph are 599. What is the cause for this error?

    opened by anjalibhavan 9
  • AttributeError: module 'networkx' has no attribute 'selfloop_edges'

    AttributeError: module 'networkx' has no attribute 'selfloop_edges'

    Hello, I followed the instruction to install all the required python packages but when I ran 'python3 src/main.py', I got the following error. Could you please let me know how I can fix it? Thank you!

    Traceback (most recent call last): File "src/main.py", line 24, in main() File "src/main.py", line 17, in main graph = graph_reader(args.edge_path) File "/Users/machunyu/KoslickiLab/Splitter/src/utils.py", line 26, in graph_reader graph.remove_edges_from(nx.selfloop_edges(graph)) AttributeError: module 'networkx' has no attribute 'selfloop_edges'

    opened by chunyuma 2
  • an error in walker maybe?

    an error in walker maybe?

    I dont know much about this work in details. however, while implementing i am running into sample larger than population. I think in src/walker.py it should be

    def small_walk(self, start_node):
            """
            Doing a truncated random walk.
            :param start_node: Start node for random walk.
            :return walk: Truncated random walk with fixed maximal length.
            """
            walk = [start_node]
            while len(walk) < self.args.walk_length:
                if len(nx.neighbors(self.graph,walk[-1])) ==0:
                    break
                walk = walk + [random.sample(nx.neighbors(self.graph,walk[-1]),1)[0]]
            return walk
    

    Instead of

    def small_walk(self, start_node):
            """
            Doing a truncated random walk.
            :param start_node: Start node for random walk.
            :return walk: Truncated random walk with fixed maximal length.
            """
            walk = [start_node]
            while len(walk) < self.args.walk_length:
                walk = walk + [random.sample(nx.neighbors(self.graph,walk[-1]),1)[0]]
                if len(nx.neighbors(self.graph,walk[-1])) ==0:
                    break
            return walk
    
    opened by Nitinsiwach 2
  • `community` module not found

    `community` module not found

    Hi, I'm trying to test the splitter embedding and encountered this issue of not being able to find the community module, which is being imported in ego_splittying.py. And I think I've installed all the required packages.

    https://github.com/benedekrozemberczki/Splitter/blob/b5b330f90cd585614aa221c608d6fbc4d9a2a7fe/src/ego_splitting.py#L3

    opened by RemyLau 1
  • Abnormal results of embedding

    Abnormal results of embedding

    When i run this model in different datasets (e.g blogcatalog), Some of the dimensions of the vector are greater than or less than negative 1, for example, (1.0,-7.087762355804443,-26.554523468017578,4.721840858459473, ...)

    blogcatalog.zip

    opened by gangwu001 1
  • an unexpected keyword argument 'iter' in walker?

    an unexpected keyword argument 'iter' in walker?

    Hello! I am new in this work ,but when I try to run the code , the error "an unexpected keyword argument 'iter" occured in the function def learn_base_embedding(self) in walkers.py. It seems that the parameter 'iter' in model = Word2Vec(self.paths, size=self.args.dimensions, window=self.args.window_size, min_count=1, sg=1, workers=self.args.workers, iter=1) is wrong ?

    opened by shizia 0
  • IndexError: index ... is out of bounds for axis 0 with size ...

    IndexError: index ... is out of bounds for axis 0 with size ...

    I'm getting the same error.

    Traceback (most recent call last):
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/main.py", line 24, in <module>
        main()
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/main.py", line 19, in main
        trainer.fit()
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 229, in fit
        self.setup_model()
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 159, in setup_model
        self.egonet_splitter.personality_map)
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 52, in initialize_weights
        persona_embedding = np.array([base_node_embedding[n] for _, n in mapping.items()])
      File "/home/shady/Projects/GML/SPLITTER/Splitter/src/splitter.py", line 52, in <listcomp>
        persona_embedding = np.array([base_node_embedding[n] for _, n in mapping.items()])
    IndexError: index 8637 is out of bounds for axis 0 with size 8637
    

    The dataset is sorted and also IDs start from zero with no index and header. Also, it just happens on the CA-HepTh dataset and not others which is strange.

    opened by alirezabayatmk 0
Owner
Benedek Rozemberczki
Machine Learning Engineer at AstraZeneca | PhD from The University of Edinburgh.
Benedek Rozemberczki
This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Advent-of-cyber-2019-writeup This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe https://tryhackme.com/shivam007/badges/c

shivam danawale 5 Jul 17, 2022
In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Med-VQA In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Mo

Kshitij Ambilduke 8 Apr 14, 2022
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

null 4 Nov 11, 2021
Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

Balaji R 1 Jan 1, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 1, 2022
SAINT PyTorch implementation

SAINT-pytorch A Simple pyTorch implementation of "Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing" based on https://arx

Arshad Shaikh 63 Dec 25, 2022
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a

Phil Wang 44 Jul 28, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Phil Wang 5k Jan 2, 2023
A fast and easy implementation of Transformer with PyTorch.

FasySeq FasySeq is a shorthand as a Fast and easy sequential modeling toolkit. It aims to provide a seq2seq model to researchers and developers, which

宁羽 7 Jul 18, 2022
A PyTorch Implementation of End-to-End Models for Speech-to-Text

speech Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Conne

Awni Hannun 647 Dec 25, 2022
Pytorch implementation of Tacotron

Tacotron-pytorch A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. Requirements Install python 3 Install pytorc

soobin seo 203 Dec 2, 2022
Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Junseong Kim 5.3k Jan 7, 2023
Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

MINDs Lab 881 Jan 3, 2023
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

Keon Lee 142 Jan 6, 2023
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

Facebook Research 605 Jan 2, 2023
A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

null 286 Jan 2, 2023