MG-GCN: Scalable Multi-GPU GCN Training Framework

Related tags

Deep Learning MG-GCN
Overview

MG-GCN

MG-GCN: multi-GPU GCN training framework.

For more information, please read our paper.

After cloning our repository, run git submodule update --init to download the submodules.

Our software depends on a recent CUDA installation, tested on CUDA 11.4

For parallel preprocessing, our software makes use of the parallel standard library, GCC implementation of the standard library depends on TBB. A recent version of TBB is required, the following is the most recent TBB version that is compatible for our purpose: tbb release

One can use the following command to compile and install TBB:

python3 build/build.py --prefix="<PATH-TO-INSTALL>" --install-libs --install-devel

If tbb is not found, add environment variable: export TBB_ROOT=<PATH-TO-INSTALL>

A recent version of NCCL is also required. You can follow the instructions here to compile and install it: nccl github

GCC version 9 or above is required to compile our software.

When all prerequisites are installed, one can create a build directory and compile our software as follows:

mkdir build
cd build
cmake ..
make -j

To download and preprocess the datasets used in our experiments, first change directory into test/data. Then run prep.py as follows:

cd test/data
mkdir permuted
python3 prep.py -s=0
python3 prep.py -s=1

These commands will download the reddit dataset and output them into the test/data directory. If you want to download other datasets, uncomment the corresponding lines at the end of prep.py and run our script as above. Note that this script requires an installation of dgl, ogb and some other python packages.

Finally, to run our code on the reddit dataset, use the following line from the root directory of our repository:

build/src/mg_gcn -P 4 -R 1 train test/data/permuted/reddit/ 3 128 128 128

-P is for the number of GPUS, 3 128 128 128 denotes the number of hidden layers and their dimensions.

You might also like...
Deep Learning GPU Training System

DIGITS DIGITS (the Deep Learning GPU Training System) is a webapp for training deep learning models. The currently supported frameworks are: Caffe, To

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch
PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

A Memory Tracer For PyTorch OOM is a nightmare for PyTorch users. However, most

 WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Optimized primitives for collective multi-GPU communication

NCCL Optimized primitives for inter-GPU communication. Introduction NCCL (pronounced "Nickel") is a stand-alone library of standard communication rout

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie_recs Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Coll

Scalable training for dense retrieval models.

Scalable implementation of dense retrieval. Training on cluster By default it trains locally: PYTHONPATH=.:$PYTHONPATH python dpr_scale/main.py traine

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Collie do

Owner
Translational Data Analytics (TDA) Lab @GaTech
This is the Github page of the Translational Data Analytics (TDA) Lab of Dr. Çatalyürek in the School of Computational Science and Engineering at GaTech.
Translational Data Analytics (TDA) Lab @GaTech
GrabGpu_py: a scripts for grab gpu when gpu is free

GrabGpu_py a scripts for grab gpu when gpu is free. WaitCondition: gpu_memory >

tianyuluan 3 Jun 18, 2022
A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

Benedek Rozemberczki 697 Dec 27, 2022
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

null 78 Dec 27, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021
An example showing how to use jax to train resnet50 on multi-node multi-GPU

jax-multi-gpu-resnet50-example This repo shows how to use jax for multi-node multi-GPU training. The example is adapted from the resnet50 example in d

Yangzihao Wang 20 Jul 4, 2022
A tensorflow implementation of GCN-LPA

GCN-LPA This repository is the implementation of GCN-LPA (arXiv): Unifying Graph Convolutional Neural Networks and Label Propagation Hongwei Wang, Jur

Hongwei Wang 83 Nov 28, 2022
A new GCN model for Point Cloud Analyse

Pytorch Implementation of PointNet and PointNet++ This repo is implementation for VA-GCN in pytorch. Classification (ModelNet10/40) Data Preparation D

null 12 Feb 2, 2022
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022
Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

null 53 Nov 22, 2022