GraPE is a Rust/Python library for high-performance Graph Processing and Embedding.

Related tags

Deep Learning grape
Overview

images/GRAPE.jpg

GraPE

Pypi project Pypi total project downloads

GraPE (Graph Processing and Embedding) is a fast graph processing and embedding library, designed to scale with big graphs and to run on both off-the-shelf laptop and desktop computers and High Performance Computing clusters of workstations.

The library is written in Rust and Python programming languages, and has been developed by AnacletoLAB (Dept.of Computer Science of the University of Milan), in collaboration with the RobinsonLab (Jackson Laboratory for Genomic Medicine) and the BPOP (Lawrence Berkeley National Laboratory).

GraPE is composed of two main modules: Ensmallen (ENabler of SMALL runtimE and memory Needs) and Embiggen (EMBeddInG GENerator), that run synergistically using parallel computation and efficient data structures.

Ensmallen efficiently executes graph processing operations including large-scale first and second-order random walks, while Embiggen leverages the large amount of sampled random walks generated by Ensmallen by computing effective node and edge embeddings. Beside being helpful for unsupervised exploratory analysis of graphs, the computed embeddings can be used for trainining any of the flexible neural models for edge and node label prediction, provided by Embiggen itself.

The following figure shows the main relationships between Ensmallen and Embiggen modules:

images/link_prediction_model.png

Installation of GraPE

For most computers you can just download it using pip:

pip install grape

Since Ensmallen is written in Rust, on PyPi we distribute pre-compiled packages for Windows, Linux, MacOs for the Python version 3.6, 3.7, 3.8, 3.9 for x86_64 cpus.

For the Linux binaires we follow the Python's ManyLinux2010 (PEP 571) standard which requires libc version >= 2.12, this version was releasted in 03/08/2010 so any Linux System in the last ten years should be compatible. To check your current libc version you can run ldd --version.

We also assume that the cpu has the following features: sse, sse2, ssse3, sse4_1, sse4_2, avx, avx2, bmi1, bmi2, popcnt. If these features are not present, you cannot use the PyPi pre-compiled binaries and you have to manually compile Ensmallen (Guide) . On Linux you can check if your CPU supports these features by running cat /proc/cpuinfo and ensuring that all these features are presents under the flags section. While these features are not strictly required, they significanly speed-up the executions and should be supported by any x86_64 CPU newer than Intel's Haswell architecture (2013).

If the your CPU doesn't support them you will get, on import, a ValueError exception with the following message:

This library was compiled assuming that SIMD instruction commonly available in CPU hardware since 2013 are present
on the machine where this library is intended to run.
On the current machine, the flags <MISSING_FLAGS> are not available.
You could still compile Ensmallen on this machine and have a version of the library that can execute here, but the
library has been extensively designed to use SIMD instructions, so you would have a version slower than the one
provided on Pypi.

These requirements were chosen to provide a good tradeoff between compatability and performance. If your system is not compatible, you can manually compile Ensmallen for any Os, libc version, and CPU architecture (such as Arm, AArch64, RiscV, Mips) which are supported by Rust and LLVM. Manually compiling Ensmallen might require more than half an hour and around 10Gb of RAM, if you encounter any error during the installation and/or compilation feel free to open an Issue here on Github and we will help troubleshoot it.

Main functionalities of the library

  • Robust graph loading and automatic graph retrieval:

    • More than 13000 graphs directly available from the library for benchmarking
    • Support for multiple graph formats
    • Automatic human readable reports of format errors
    • Automatic human readable reports of the main graph characteristics
  • Random walks:

    • Exact and approximated first and second order random walks
    • Massive generation of sampled random walks for graph embedding
    • Automatic dispatching of 8 optimized random walk algorithms depending on the parameters of the random walk and the type (weighted/unweighted) of the graph
  • Node embedding models:

    • SkipGram
    • CBOW
    • GloVe
  • Edge and node prediction models:

    • Perceptron
    • Multi-Layer Perceptron
    • Deep Neural Networks
  • Preprocessing for node embedding and edge prediction:

    • Lazy generation of skip-grams from random walks
    • Lazy generation of balanced batches for edge prediction
    • GloVe co-occurence matrix computation
  • Graph processing operations:

    • Optimized filtering by node, edge and components characteristics
    • Optimized algebraic set operations on graphs
    • Automatic generation of reports summarizing graph features in natural language
  • Graph algorithms:

    • Breadth and Depth-first search
    • Dijkstra, Tarjan's strongly connected component
    • Efficient Diameter computation, spanning arborescence and connected components
    • Approximated vertex cover, triads counting, transitivity, clustering coefficient and triangles counting
    • Betweenness and stress centrality, Closeness and harmonic centrality
  • Graph visualization tools: visualization of node and edge properties

Tutorials

You can find tutorials covering various aspects of the GraPE library here. All tutorials are as self-contained as possible and can be immediately executed on COLAB.

If you want to get quickly started, after having installed GraPE from Pypi as described above, you can try running the following example using the SkipGram embedding model on the Cora-graph:

from ensmallen.datasets.linqs import Cora
from ensmallen.datasets.linqs.parse_linqs import get_words_data
from embiggen.pipelines import compute_node_embedding
from embiggen.visualizations import GraphVisualization
import matplotlib.pyplot as plt

# Dowload, load up the graph and its node features
graph, node_features = get_words_data(Cora())

# Compute a SkipGram node embedding, using a second-order random walk sampling
node_embedding, training_history = compute_node_embedding(
    graph,
    node_embedding_method_name="SkipGram",
    # Let's increase the probability of explore the local neighbourhood
    return_weight=2.0,
    explore_weight=0.1
)

# Visualize the obtained node embeddings
visualizer = GraphVisualization(graph, node_embedding_method_name="SkipGram")
visualizer.fit_transform_nodes(node_embedding)

visualizer.plot_node_types()
plt.show()

You can see a tutorial detailing the above script here, and you can run it on COLAB from here.

Documentation

On line documentation

The on line documentation of the library is available here. Since Ensmallen is written in Rust, and PyO3 (the crate we use for the Python bindings), doesn't support typing, the documentation is obtained generating an empty skeleton package. This allows to have a proper documentation but you won't be able to see the source-code in it.

Using the automatic method suggestions utility

To aid working with the library, Grape provides an integrated recommender system meant to help you either to find a method or, if a method has been renamed for any reason, find its new name.

As an example, after having loaded the STRING Homo Sapiens graph, the function for computing the connected components can be retrieved by simply typing components as follows:

from ensmallen.datasets.string import HomoSapiens

graph = HomoSapiens()
graph.components

The code above will raise the following error, and will suggest methods with a similar or related name:

AttributeError                            Traceback (most recent call last)
<ipython-input-3-52fac30ac7f6> in <module>()
----> 2 graph.components

AttributeError: The method 'components' does not exists, did you mean one of the following?
* 'remove_components'
* 'connected_components'
* 'strongly_connected_components'
* 'get_connected_components_number'
* 'get_total_edge_weights'
* 'get_mininum_edge_weight'
* 'get_maximum_edge_weight'
* 'get_unchecked_maximum_node_degree'
* 'get_unchecked_minimum_node_degree'
* 'get_weighted_maximum_node_degree'

In our example the method we need for computing the graph components would be connected_components.

Now the easiest way to get the method documentation is to use Python's help as follows:

help(graph.connected_components)

And the above will return you:

connected_components(verbose) method of builtins.Graph instance
Compute the connected components building in parallel a spanning tree using [bader's algorithm](https://www.sciencedirect.com/science/article/abs/pii/S0743731505000882).

**This works only for undirected graphs.**

The returned quadruple contains:
- Vector of the connected component for each node.
- Number of connected components.
- Minimum connected component size.
- Maximum connected component size.

Parameters
----------
verbose: Optional[bool]
    Whether to show a loading bar or not.


Raises
-------
ValueError
    If the given graph is directed.
ValueError
    If the system configuration does not allow for the creation of the thread pool.

You can try to run the code described above on COLAB.

Cite GraPE

Please cite the following paper if it was useful for your research:

@misc{cappelletti2021grape,
  title={GraPE: fast and scalable Graph Processing and Embedding},
  author={Luca Cappelletti and Tommaso Fontana and Elena Casiraghi and Vida Ravanmehr and Tiffany J. Callahan and Marcin P. Joachimiak and Christopher J. Mungall and Peter N. Robinson and Justin Reese and Giorgio Valentini},
  year={2021},
  eprint={2110.06196},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

If you believe that any example may be of help, do feel free to open a GitHub issue describing what we are missing in this tutorial.

Comments
  • TransE error:

    TransE error: "ValueError: One of the provided node embedding computed with the TransE method contains NaN values."

    When generating embeddings for KG-Microbe (KGX edge file from KG-Hub) using TransE, the following error was observed:

    ValueError Traceback (most recent call last) in ----> 1 embedding = model.fit_transform(kg)

    ~/Library/Python/3.7/lib/python/site-packages/cache_decorator/cache.py in wrapped(*args, **kwargs) 595 if not cache_enabled: 596 self.logger.info("The cache is disabled") --> 597 result = function(*args, **kwargs) 598 self._check_return_type_compatability(result, self.cache_path) 599 return result

    ~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/abstract_embedding_model.py in fit_transform(self, graph, return_dataframe, verbose) 164 graph=graph, 165 return_dataframe=return_dataframe, --> 166 verbose=verbose 167 ) 168

    ~/Library/Python/3.7/lib/python/site-packages/embiggen/embedders/ensmallen_embedders/transe.py in _fit_transform(self, graph, return_dataframe, verbose) 112 embedding_method_name=self.model_name(), 113 node_embeddings= node_embedding, --> 114 edge_type_embeddings= edge_type_embedding, 115 ) 116

    ~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/embedding_result.py in init(self, embedding_method_name, node_embeddings, edge_embeddings, node_type_embeddings, edge_type_embeddings) 76 if np.isnan(numpy_embedding).any(): 77 raise ValueError( ---> 78 f"One of the provided {embedding_list_name} " 79 f"computed with the {embedding_method_name} method " 80 "contains NaN values."

    ValueError: One of the provided node embedding computed with the TransE method contains NaN values.

    I am attaching a jupyter notebook to reproduce the problem. load_graph_and.ipynb.zip

    The input edge file is here: https://kg-hub.berkeleybop.io/kg-microbe/current/kg-microbe.tar.gz

    opened by realmarcin 7
  • Need documentation on how to use a knowledge graph in grape

    Need documentation on how to use a knowledge graph in grape

    Hello, I have another question on how to import my data in grape. I think it is more a clarification on my method to import my KG.

    kg = Graph.from_csv(directed=True,
                           edge_path="sample_mabkg.tsv",
                           sources_column_number= 0,
                           edge_list_edge_types_column_number=1,edge_list_separator="|",
                           destinations_column_number=2, name="mAbKG", verbose=True, edge_list_header=True)
    

    but i saw that it exists node_path and other properties like in edge_path, so i don't know if i did in the good way my read from_csv. Can you please give me some explanation knowning i have a KG (with edge and node typed). Below is an example of my data.

    Thank you for your answer

    Gaoussou

     node source|edge|node destination
    _:B4dff5e7d17225b25b13ad12737e49779|imgt:isDecidedBy|imgt:EC
    pubmed:2843774|dc:title|Selective killing of HIV-infected cells by recombinant human CD4-Pseudomonas exotoxin hybrid protein.
    imgt:Product_8e9250cf-276a-3282-954f-3791316ac5a6|rdf:type|obo:NCIT_C51980
    imgt:Segment_212_1|obo:BFO_0000050|imgt:Construct_212
    imgt:IgG4-kappa_1001|rdfs:label|IgG4-kappa_1001
    imgt:V-D-GENE|owl:sameAs|obo:SO_0000510
    imgt:Segment_536_1|rdf:type|imgt:Segment
    imgt:LRR13|rdf:type|imgt:RepeatLabel
    imgt:StudyProduct_c2bc9b3a-a15e-376f-bda5-f87089b3f54b|imgt:application_type|Therapeutic
    imgt:StudyProduct_54a14ca8-f916-338b-af18-d079beb598a4|imgt:development_technology|  Dyax human antibody phage display library 
    

    sample_mabkg.txt

    opened by gsanou 6
  • embiggen package error under Windoze

    embiggen package error under Windoze

    The joy on installation on Windoze...

    Collecting embiggen>=0.11.9
      Downloading embiggen-0.11.38.tar.gz (154 kB)
         ---------------------------------------- 154.2/154.2 kB ? eta 0:00:00
      Preparing metadata (setup.py) ... error
      error: subprocess-exited-with-error
    
      × python setup.py egg_info did not run successfully.
      │ exit code: 1
      ╰─> [10 lines of output]
          Traceback (most recent call last):
            File "<string>", line 2, in <module>
            File "<pip-setuptools-caller>", line 34, in <module>
            File "C:\cygwin64\tmp\pip-install-37lyy1_b\embiggen_3ec9ca91df6044b1b2470bb84cb6184d\setup.py", line 54, in <module>
              long_description=readme(),
            File "C:\cygwin64\tmp\pip-install-37lyy1_b\embiggen_3ec9ca91df6044b1b2470bb84cb6184d\setup.py", line 12, in readme
              return f.read()
            File "C:\Users\richa\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
              return codecs.charmap_decode(input,self.errors,decoding_table)[0]
          UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 2: character maps to <undefined>
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: metadata-generation-failed
    
    × Encountered error while generating package metadata.
    ╰─> See above for output.
    
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for details.
    
    
    opened by RichardBruskiewich 6
  • Bipartite Graph predict proba with undirected graph

    Bipartite Graph predict proba with undirected graph

    Hi. I noticed the performance metrics are not identical when using predict_proba_bipartite_graph_from_edge_node_types, when I swap the source and destination nodes. The graph used as input is an undirected graph, which I would expect would yield similar predictions for the same edge type regardless of which is source and destination nodes. Is this behavior intentional?

    Below are the version of the software I am running currently: grape==0.1.17 embiggen==0.11.27 ensmallen==0.8.14

    opened by arpelletier 6
  • ImportError: libgfortran-ed201abd.so.3.0.0: cannot open shared object file: No such file or directory

    ImportError: libgfortran-ed201abd.so.3.0.0: cannot open shared object file: No such file or directory

    In a fresh notebook, attempting to import grape yields an ImportError about a missing libgfortran-ed201abd.so.3.0.0.

    >>> !pip install grape -U
    >>> import grape
    /usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (3.0.4) doesn't match a supported version!
      warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
    Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?276a3afe-1b97-4f33-82e6-6df2db01934a)
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    /home/harry/kg-bioportal/data/merged/KG-Bioportal analysis.ipynb Cell 2' in <cell line: 1>()
    ----> [1](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/harry/kg-bioportal/data/merged/KG-Bioportal%20analysis.ipynb#ch0000001vscode-remote?line=0) import grape
    
    File ~/.local/lib/python3.8/site-packages/grape/__init__.py:9, in <module>
          1 """GraPE main module.
          2 
          3 For now, this is a simple wrapper of GraPE main two sub-modules that for
       (...)
          6 These packages are mimed here by the two sub-directories, ensmallen and embiggen.
          7 """
    ----> 9 from embiggen import *
         10 from ensmallen import Graph
         13 def import_all(module_locals):
    
    File ~/.local/lib/python3.8/site-packages/embiggen/__init__.py:2, in <module>
          1 """Module with models for graph machine learning and visualization."""
    ----> 2 from embiggen.visualizations import GraphVisualizer
          3 from embiggen.utils import (
          4     EmbeddingResult,
          5     get_models_dataframe,
       (...)
          9     get_available_models_for_node_embedding,
         10 )
    ...
        691     'spherical_kn',
        692 ]
        694 from scipy._lib._testutils import PytestTester
    
    ImportError: libgfortran-ed201abd.so.3.0.0: cannot open shared object file: No such file or directory
    

    I've seen that this may be related to libraries packaged with numpy, as seen in the following: https://github.com/ContinuumIO/anaconda-issues/issues/445 https://github.com/numpy/numpy/issues/14348

    This may be environment-specific, of course.

    opened by caufieldjh 6
  • `Illegal instruction (core dumped)` on importing grape

    `Illegal instruction (core dumped)` on importing grape

    In another issue that may have something to do with our aging build server: When we import grape in this environment (see info below), we get only Illegal instruction (core dumped).

    cpuinfo output:

    processor       : 23
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 44
    model name      : Intel(R) Xeon(R) CPU           X5675  @ 3.07GHz
    stepping        : 2
    microcode       : 0x1f
    cpu MHz         : 1599.987
    cache size      : 12288 KB
    physical id     : 1
    siblings        : 12
    core id         : 10
    cpu cores       : 6
    apicid          : 53
    initial apicid  : 53
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 11
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d
    bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
    bogomips        : 6133.21
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 40 bits physical, 48 bits virtual
    power management:
    
    opened by caufieldjh 4
  • Link to the two sub packages

    Link to the two sub packages

    Hi, first of all, thanks for making such an amazing graph embedding resource!

    I'm wondering whether you can add some descriptions in the README clarifying that this repo is a thin wrapper of the two core packages embiggen and ensmallen and add links accordingly. I was a bit confused for a few minutes trying to find the source code and only came to realize it wraps the two libraries after looking at __init__.py.

    opened by RemyLau 4
  • pip install grape failure on support_luca>=1.0.2

    pip install grape failure on support_luca>=1.0.2

    I am attempting to install grape using pip on Ubuntu 20.04.4 LTS with python 3.8.3.

    Most of the build/install appears to work just fine until I hit this error, providing a little additional context. I have also tried to install ensmallen directly with pip install ensmallen and I get the same error. Any advice you have would be appreciated.

    Requirement already satisfied: idna<3,>=2.5 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (2.10)
    Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (1.25.9)
    Requirement already satisfied: certifi>=2017.4.17 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (2020.6.20)
    Requirement already satisfied: chardet<4,>=3.0.2 in /home/corey/anaconda3/lib/python3.8/site-packages (from requests->bioregistry>=0.5.65->ensmallen>=0.8.21->grape) (3.0.4)
    Collecting typing-extensions>=3.7.4.3
      Using cached typing_extensions-4.3.0-py3-none-any.whl (25 kB)
    ERROR: Could not find a version that satisfies the requirement support_luca>=1.0.2 (from dict_hash>=1.1.25->cache_decorator>=2.1.11->ensmallen>=0.8.21->grape) (from versions: none)
    ERROR: No matching distribution found for support_luca>=1.0.2 (from dict_hash>=1.1.25->cache_decorator>=2.1.11->ensmallen>=0.8.21->grape)
    
    opened by amc-corey-cox 4
  • Graph visualization error

    Graph visualization error

    Hello. I am trying the Using CBOW to embed Cora python notebook (linked) and after replacing "CBOWEnsmallen" with "DeepWalkCBOWEnsmallen", the first order embedding runs successfully but fails at the graph visualization. I get the following error:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    /tmp/ipykernel_3453/3695275499.py in <module>
    ----> 1 GraphVisualizer(
          2     graph,
          3     node_embedding_method_name="CBOW - First order"
          4 ).fit_and_plot_all(first_embedding)
    
    ~/anaconda3/lib/python3.9/site-packages/embiggen/visualizations/graph_visualizer.py in fit_and_plot_all(self, node_embedding, number_of_columns, show_letters, include_distribution_plots, **node_embedding_kwargs)
       4236         distribution_plot_methods_to_call = []
       4237 
    -> 4238         if not self._graph.has_constant_non_zero_node_degrees():
       4239             node_scatter_plot_methods_to_call.append(
       4240                 self.plot_node_degrees,
    
    AttributeError: The method 'has_constant_non_zero_node_degrees' does not exists, did you mean one of the following?
    * 'has_constant_edge_weights'
    * 'get_non_zero_subgraph_node_degrees'
    * 'has_nodes'
    * 'has_edges'
    * 'has_selfloops'
    * 'has_node_ontologies'
    * 'has_node_oddities'
    * 'get_node_degrees'
    * 'has_node_name'
    * 'has_node_types'
    

    Looks like the issue has to do with embiggen dependencies in the graph visualization. Below are the package versions I am using: embiggen==0.11.13 ensmallen==0.8.7 grape==0.1.9

    As well, I was not able to successfully run the second-order embeddings

    model = DeepWalkCBOWEnsmallen(
        return_weight=2.0,
        explore_weight=0.1
    )
    second_embedding = model.fit_transform(graph).get_node_embedding_from_index(0)
    

    The above code gives the below error:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    /tmp/ipykernel_3453/3112314827.py in <module>
    ----> 1 model = DeepWalkCBOWEnsmallen(
          2     return_weight=2.0,
          3     explore_weight=0.1
          4 )
          5 second_embedding = model.fit_transform(graph).get_node_embedding_from_index(0)
    
    TypeError: __init__() got an unexpected keyword argument 'return_weight'```
    opened by arpelletier 4
  • Embedding model names not recognized; alternate suggestions are unexpected

    Embedding model names not recognized; alternate suggestions are unexpected

    As of grape 0.1.9, node embedding model names have changed, such that a call to embiggen's AbstractModel.get_task_data(model_name, task_name) with one of the frequently used model names like CBOW or SkipGram throws a ValueError.

    I see from grape.get_available_models_for_node_embedding() that these now have more specific names like Node2Vec CBOW. No problem with being specific, but we'd still like to be able to specify CBOW, SkipGram, or GloVe in config definitions without having to verify the exact model names embiggen is expecting first. Could we use the short names as aliases to a default model, like CBOW will be understood as Node2Vec CBOW, etc?

    The name convention also appears to confuse the alternative suggests provided in the ValueError text, so we get suggestions like this:

    ValueError: The provided model name `CBOW` is not available. Did you mean BoxE?
    
    opened by caufieldjh 4
  • ValueError when trying to use external embedder like in pykeen and karateClub

    ValueError when trying to use external embedder like in pykeen and karateClub

    Hello, Thanks you for your amaeing work, i'm a phD student working on the embeddings of biomedical data particularly in immunogenetics, and currently i'm comparing tools to embed data. I found your works very interesting. I got some issues when i try to use external model from pykeen and karateclub. i got this message :
    ValueError: We have found an useless method in the class StubClass, implementing method HolE from library PyKEEN and task Node Embedding. It does not make sense to implement the `requires_positive_edge_weights` method when the `can_use_edge_weights` always returns False, as it is already handled in the root abstract model class.

    Also for the vizualisation, when i did ```  from grape import GraphVisualizer visualizer = GraphVisualizer(kg.remove_disconnected_nodes()) visualizer.fit_and_plot_all(embedding)

    I got this warning without no visualisation:  FutureWarning: The parameter `square_distances` has not effect and will be removed in version 1.3.
    Thank you in advance for your answer
    Gaoussou
    opened by gsanou 3
  • Use case regarding Customer Analytics or Community detection?

    Use case regarding Customer Analytics or Community detection?

    Thanks for that repo. It seems that you have integrated several tools / libraries / approaches under Grape's hood. Do you intend to create a tutorial for a customer analytics recommendation?

    Thanks in advance.

    opened by stkarlos 2
  • Parallelized Embedding

    Parallelized Embedding

    Hey, I'm trying to process a directed graph, the scales are about 5 million nodes and 100 million edges. I've managed to load the graph from a csv file, i get a very nice Graph object (within 5 minutes). I'm now trying to embedd the graph with grape.embedders.Node2VecSkipGramEnsmallen, but it doesn't seem to succeed, I've let it run for over 10 hours. In order to make it faster, i did enable the Graph's vector_source, vector_cumulative_node_degree and vector_reciprocal_sqrt_degrees. Reading your paper, it seems that the embedding process could be parallelized, but i can't find the way to do that. I'd appreciate if you could describe what part/s of the embedding process are parallelized? and how can i make it run in parallel? Thank you, Bruria.

    opened by bruriah1999 2
  • Getting figure to be inline

    Getting figure to be inline

    matplotlib plots figures inline by default or if we write

    %matplotlib inline
    

    Some of the figures produced by GRAPE get put into "subwindows" in the Jupyter notebook, and one needs to scroll up and down to see the entire figure. GRAPE does not seem to be responsive to the inline magic command above either.

    For instance, in order for a certain figure to really appear online, I need to make it much smaller

    visualizer = GraphVisualizer(sli_graph, automatically_display_on_notebooks=False)
    fig, ax, cap = visualizer.plot_node_degree_distribution()
    fig.set_figheight(3)
    fig.set_figwidth(3)
    

    even though the notebook could comfortably show (5,5) or even (8,8)

    opened by pnrobinson 2
  • Saving classifier models

    Saving classifier models

    Could support for saving classifier models please be added? This came up while meeting with @LucaCappelletti94 recently but it's become relevant again in the course of updating neat-ml to use grape classifiers.

    Training classifiers isn't a major time commitment, but on our neat runs we've separated the process of training+testing vs. applying classifiers, so being unable to save or at least pickle the classifier object means we need to redo training for each model.

    opened by caufieldjh 4
  • Methods for generating node embeddings from word embeddings

    Methods for generating node embeddings from word embeddings

    While updating NEAT to use the most recent grape release, @justaddcoffee and @hrshdhgd and I took a look at what we're using to generate node embeddings based on pretrained word embeddings like BERT etc. : https://github.com/Knowledge-Graph-Hub/NEAT/blob/main/neat/graph_embedding/graph_embedding.py

    We know we can run something like get_okapi_tfidf_weighted_textual_embedding() on a graph, but is there a more "on demand" way to run this in grape now for an arbitrary graph?

    opened by caufieldjh 10
Releases(0.0.6.dev1)
Owner
AnacletoLab
Computational Biology and Bioinformatics Lab - Dept. of Computer Science - UNIMI
AnacletoLab
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021
Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge

What is this? This is where I'll be collecting resources related to the Study Group on Dr. Justin Thaler's Proofs Arguments And Zero Knowledge Book. T

Thor 66 Jan 4, 2023
RLBot Python bindings for the Rust crate rl_ball_sym

RLBot Python bindings for rl_ball_sym 0.6 Prerequisites: Rust & Cargo Build Tools for Visual Studio RLBot - Verify that the file %localappdata%\RLBotG

Eric Veilleux 2 Nov 25, 2022
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

null 943 Jan 7, 2023
HyperPose is a library for building high-performance custom pose estimation applications.

HyperPose is a library for building high-performance custom pose estimation applications.

TensorLayer Community 1.2k Jan 4, 2023
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 3.1k Jan 2, 2023
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 3.1k Jan 4, 2023
Rust bindings for the C++ api of PyTorch.

tch-rs Rust bindings for the C++ api of PyTorch. The goal of the tch crate is to provide some thin wrappers around the C++ PyTorch api (a.k.a. libtorc

Laurent Mazare 2.3k Dec 30, 2022
This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R

Xavier Tao 14 Dec 17, 2022
Spatial color quantization in Rust

rscolorq Rust port of Derrick Coetzee's scolorq, based on the 1998 paper "On spatial quantization of color images" by Jan Puzicha, Markus Held, Jens K

Collyn O'Kane 37 Dec 22, 2022
Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

ProcrustEs-KGE Paddle implementation for Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis ?? A more detailed re

Lincedo Lab 4 Jun 9, 2021
The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding"

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
MGFN: Multi-Graph Fusion Networks for Urban Region Embedding was accepted by IJCAI-2022.

Multi-Graph Fusion Networks for Urban Region Embedding (IJCAI-22) This is the implementation of Multi-Graph Fusion Networks for Urban Region Embedding

null 202 Nov 18, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 3, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 2.8k Feb 12, 2021
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 8, 2023
A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our report on Arxiv.

null 7.7k Jan 6, 2023
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

null 7.7k Jan 3, 2023