GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs

Overview

GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs

GraphLily is the first FPGA overlay for graph processing. GraphLily supports a rich set of graph algorithms by adopting the GraphBLAS programming interface, which formulates graph algorithms as sparse linear algebra kernels. GraphLily effectively utilizes the high bandwidth of HBM to accelerate SpMV and SpMSpV, the two widely-used kernels in GraphBLAS, by co-designing the data layout and the accelerator architecture. GraphLily further builds a middleware to provide runtime support, enabling users to easily port existing GraphBLAS programs from CPUs/GPUs to FPGAs.

For more information, refer to our ICCAD'21 paper.

@article{hu2021graphlily,
  title={GraphLily: Accelerating Graph Linear Algebra on HBM-Equipped FPGAs},
  author={Hu, Yuwei and Du, Yixiao and Ustun, Ecenur and Zhang, Zhiru},
  journal={International Conference On Computer Aided Design},
  year={2021}
}

Prerequisites

  • Platform: Xilinx Alveo U280
  • Tool: Xilinx Vitis 2019.2

Run Benchmarking

Clone the repo

git clone [email protected]:cornell-zhang/GraphLily.git
export GRAPHLILY_ROOT_PATH=/path/to/GraphLily

Get the bitstream

  • A pre-compiled bitstream (166 MHz) is provided here.
  • To generate a new bitstream:
cd GraphLily/generate_bitstream
make synthesize

Prepare datasets

The input is an adjacency matrix in csr format stored as a scipy npz file. Please install cnpy, which is required for data loading.

Our ICCAD'21 paper evaluated the following six graph datasets:

Run

Go to the GraphLily/benchmark folder, modify the cnpy path in Makefile, modify the bitstream path and the datasets path in run_bfs.sh, then:

bash run_bfs.sh
You might also like...
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

An automated algorithm to extract the linear blend skinning (LBS) from a set of example poses
An automated algorithm to extract the linear blend skinning (LBS) from a set of example poses

Dem Bones This repository contains an implementation of Smooth Skinning Decomposition with Rigid Bones, an automated algorithm to extract the Linear B

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks
Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

PWLQ Updates 2020/07/16 - We are working on getting permission from our institution to release our source code. We will release it once we are granted

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.
The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

A python library to build Model Trees with Linear Models at the leaves.
A python library to build Model Trees with Linear Models at the leaves.

A python library to build Model Trees with Linear Models at the leaves.

Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers"

Recurrent Fast Weight Programmers This is the official repository containing the code we used to produce the experimental results reported in the pape

Relative Positional Encoding for Transformers with Linear Complexity
Relative Positional Encoding for Transformers with Linear Complexity

Stochastic Positional Encoding (SPE) This is the source code repository for the ICML 2021 paper Relative Positional Encoding for Transformers with Lin

A PyTorch Implementation of the Luna: Linear Unified Nested Attention
A PyTorch Implementation of the Luna: Linear Unified Nested Attention

Unofficial PyTorch implementation of Luna: Linear Unified Nested Attention The quadratic computational and memory complexities of the Transformer’s at

Comments
  • Deadlock happens after scaling SpMSpV to 8 HBM channels

    Deadlock happens after scaling SpMSpV to 8 HBM channels

    After fusing the three interfaces of one matrix partition (partptr, indptr, packets) into a single interface (commit: 1409b0d4085d37d975e2a64ebb1afbdbadaeb682), we are now able to scale SpMSpV to up to 32 channels.

    Deadlock happens when I benchmark an 8-channel SpMSpV. On uniform_10K_10, it passed the first iteration and ran into deadlock after several iterations; on ogbl_ppa_576K_42M, it ran into deadlock at the first iteration. With hardware emulation, there is no deadlock. I tried increasing the FIFO depth, but deadlock still happens.

    @yxd97 Please take a look at this issue later when you have time. It might be related to the shuffler. Also check the data loader and stream merger.

    opened by Huyuwei 3
  • Reproduce 165 MHz

    Reproduce 165 MHz

    To reproduce the 165 MHz design in our paper, this PR makes three changes:

    1. Use a 3-D output buffer for SpMSpV instead of 2-D
    2. Set the latency of both URAM and BRAM to 4
    3. Use interleaving (not clear why it helps with timing)
    4. Set the target frequency to 200 MHz
    opened by Huyuwei 0
  • remove results_.resize in SpMSpVModule::send_results_device_to_host

    remove results_.resize in SpMSpVModule::send_results_device_to_host

    An error occurs when we call BFS::pull_push multiple times on the same dataset with different source vertices. This is due to the results_.resize function call in SpMSpVModule::send_results_device_to_host. Specifically, the following piece of code causes wrong results:

    this->command_queue_.enqueueMigrateMemObjects({this->results_buf}, CL_MIGRATE_MEM_OBJECT_HOST);
    this->command_queue_.finish();
    this->results_.resize(a_small_number);
    return this->results_;
    
    this->command_queue_.enqueueMigrateMemObjects({this->results_buf}, CL_MIGRATE_MEM_OBJECT_HOST);
    this->command_queue_.finish();
    this->results_.resize(a_large_number);
    return this->results_;  // wrong because this->results_ has been resized to a small number
    

    This PR solves the bug by removing results_.resize, which is indeed not useful at all. I don't remember why we do it; maybe it makes it clear that the results vector is sparse with a size determined at run time.

    @zzczzc20 Zhongchun, please check if this PR solves the bug you encountered and then merge it.

    opened by Huyuwei 0
Owner
Cornell Zhang Research Group
Cornell Zhang Research Group
With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function. At the moment, only TensorFlow sequential models are supported. Interfaces to either the Pyomo or Gurobi modeling environments are offered.

ChemEngAI 40 Dec 27, 2022
Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Simple_Linear_2nd_ODE_Solver_GUI Description It is a 2nd constant coefficient li

:) 4 Feb 5, 2022
Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

AyseBuyukcelik 2 Jan 26, 2022
⚾🤖⚾ Automatic baseball pitching overlay in realtime

⚾ Automatically overlaying pitch motion and trajectory with machine learning! This project takes your baseball pitching clips and automatically genera

Tony Chou 240 Dec 5, 2022
OpenCV, MediaPipe Pose Estimation, Affine Transform for Icon Overlay

Yoga Pose Identification and Icon Matching Project Goal Detect yoga poses performed by a user and overlay a corresponding icon image. Running the main

Anna Garverick 1 Dec 3, 2021
Geometric Algebra package for JAX

JAXGA - JAX Geometric Algebra GitHub | Docs JAXGA is a Geometric Algebra package on top of JAX. It can handle high dimensional algebras by storing onl

Robin Kahlow 36 Dec 22, 2022
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Erik Linder-Norén 21.8k Jan 9, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 3, 2023