A Broader Picture of Random-walk Based Graph Embedding

Zexi Huang

Last update: Dec 13, 2022

Related tags

Overview

Random-walk Embedding Framework

This repository is a reference implementation of the random-walk embedding framework as described in the paper:

A Broader Picture of Random-walk Based Graph Embedding.
Zexi Huang, Arlei Silva, Ambuj Singh.
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2021.

The framework decomposes random-walk based graph embedding into three major components: random-walk process, similarity function, and embedding algorithm. By tuning the components, it not only covers many existing approaches such as DeepWalk but naturally motivates novel ones that have shown superior performance on certain downstream tasks.

Usage

Example

To use the framework with default settings to embed the BlogCatalog network:
python src/embedding.py --graph graph/blogcatalog.edges --embeddings emb/blogcatalog.embeddings
where graph/blogcatalog.edges stores the input graph and emb/blogcatalog.embeddings is the target file for output embeddings.

Options

You can check out all the available options (framework components, Markov time parameters, graph types, etc.) with:
python src/embedding.py --help

Input Graph

The supported input graph format is a list of edges:

node1_id_int node2_id_int <weight_float, optional>

where node ids are should be consecutive integers starting from 1. The graph is by default undirected and unweighted, which can be changed by setting appropriate flags.

Output Embeddings

The output embedding file has n lines where n is the number of nodes in the graph. Each line stores the learned embedding of the node with its id equal to the line number:

emb_dim1 emb_dim2 ... emb_dimd

Evaluating

Here, we show by examples how to evaluate and compare different settings of our framework on node classification, link prediction, and community detection tasks. Full evaluation options are can be found with:
python src/evaluating.py --help

Note that the results shown below may not be identical to those in the paper due to different random seeds, but the conclusions are the same.

Node Classification

Once we generate the embedding with the script in previous section, we can call
python src/evaluating.py --task node-classification --embeddings emb/blogcatalog.embeddings --training-ratio 0.5
to compute the Micro-F1 and Macro-F1 scores of the node classification.

The results for comparing Pointwise Mutual Information (PMI) and Autocovariance (AC) similarity metrics with the best Markov times and varying training ratios are as follows:

	Training Ratio	10%	20%	30%	40%	50%	60%	70%	80%	90%
PMI	Micro-F1	0.3503	0.3814	0.3993	0.4106	0.4179	0.4227	0.4255	0.4222	0.4228
(time=4)	Macro-F1	0.2212	0.2451	0.2575	0.2669	0.2713	0.2772	0.2768	0.2689	0.2678
AC	Micro-F1	0.3547	0.3697	0.3785	0.3837	0.3872	0.3906	0.3912	0.3927	0.3930
(time=5)	Macro-F1	0.2137	0.2299	0.2371	0.2406	0.2405	0.2413	0.2385	0.2356	0.2352

Link Prediction

Prepare

To evaluate the embedding method on link prediction, we first have to remove a ratio of edges in the original graph:
python src/evaluating.py --task link-prediction --mode prepare --graph graph/blogcatalog.edges --remaining-edges graph/blogcatalog.remaining-edges --removed-edges graph/blogcatalog.removed-edges

This takes the original graph graph/blogcatalog.edges as input and output the removed and remaining edges to graph/blogcatalog.removed-edges and graph/blogcatalog.remaining-edges.

Embed

Then, we embed based on the remaining edges of the network with the embedding script. For example:
python src/embedding.py --graph graph/blogcatalog.remaining-edges --embeddings emb/blogcatalog.residual-embeddings

Evaluate

Finally, we evaluate the performance of link prediction in terms of precision@k based on the embeddings of the residual graph and the removed edges:
python src/evaluating.py --task link-prediction --mode evaluate --embeddings emb/blogcatalog.residual-embeddings --remaining-edges graph/blogcatalog.remaining-edges --removed-edges graph/blogcatalog.removed-edges --k 1.0

The results for comparing PMI and autocovariance similarity metrics with the best Markov times and varying k are as follows:

k	10%	20%	30%	40%	50%	60%	70%	80%	90%	100%
PMI (time=1)	0.2958	0.2380	0.2068	0.1847	0.1678	0.1560	0.1464	0.1382	0.1315	0.1260
AC (time=3)	0.4213	0.3420	0.2982	0.2667	0.2434	0.2253	0.2112	0.2000	0.1893	0.1802

Community Detection

Assume the embeddings for the Airport network emb/airport.embeddings have been generated. The following computes the Normalized Mutual Information (NMI) between the ground-truth country communities and the k-means clustering of embeddings:
python src/evaluating.py --task community-detection --embeddings emb/airport.embeddings --communities graph/airport.country-labels

Citing

If you find our framework useful, please consider citing the following paper:

@inproceedings{random-walk-embedding,
author = {Huang, Zexi and Silva, Arlei and Singh, Ambuj},
 title = {A Broader Picture of Random-walk Based Graph Embedding},
 booktitle = {SIGKDD},
 year = {2021}
}

You might also like...

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

EmbedSeg Introduction This repository hosts the version of the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

88 Dec 25, 2022

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

Simple embedding based text classifier inspired by fastText, implemented in tensorflow

FastText in Tensorflow This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of

306 Dec 2, 2022

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==

11 Oct 20, 2022

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

519 Jan 2, 2023

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

12 Oct 28, 2022

A Broader Picture of Random-walk Based Graph Embedding

Related tags

Overview

Random-walk Embedding Framework

Usage

Example

Options

Input Graph

Output Embeddings

Evaluating

Node Classification

Link Prediction

Prepare

Embed

Evaluate

Community Detection

Citing

You might also like...

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Simple embedding based text classifier inspired by fastText, implemented in tensorflow

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Owner

Zexi Huang

Random-Afg - Afghanistan Random Old Idz Cloner Tools

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.

use tensorflow 2.0 to tell a dog and cat from a specified picture

A small tool to joint picture including gif

This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Open CV - Convert a picture to look like a cartoon sketch in python

Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding"

MGFN: Multi-Graph Fusion Networks for Urban Region Embedding was accepted by IJCAI-2022.