DeepGNN is a framework for training machine learning models on large scale graph data.

Overview

DeepGNN Overview

DeepGNN is a framework for training machine learning models on large scale graph data. DeepGNN contains all the necessary features including:

  • Distributed GNN training and inferencing on both CPU and GPU.
  • Custom graph neural network design.
  • Online Sampling: Graph Engine (GE) will load all graph data, each training worker will call GE to get node/edge/neighbor features and labels.
  • Automatic graph partitioning.
  • Highly performant and scalable.

Project is in alpha version, there might be breaking changes in the future and they will be documented in the changelog.

Usage

Install pip package:

python -m pip install deepgnn-torch

If you want to build package from source, see instructions in CONTRIBUTING.md.

Train and evaluate a graphsage model with pytorch on cora dataset:

cd examples/pytorch/graphsage
./run.sh

Training other models

Examples folder contains various models one can experiment with DeepGNN. To train models with Tensorflow you need to install deepgnn-tf package, deepgnn-torch package contains packages to train pytorch examples. Each model folder contains a shell script run.sh that will train a corresponding model on a toy graph, a README.md file with a short description of a model, reference to original paper, and explanation of command line arguments.

Comments
  • make the logger customized with set logger function

    make the logger customized with set logger function

    make the logger customized with set logger function

    we have two options to make customized logger:

    1. client makes changes to LOGGING
    2. create new set_logger function which allows client to pass customized logger

    in this PR, we are implementing the second one.

    opened by DoWhatILove 14
  • Bump numpy from 1.21.0 to 1.22.0

    Bump numpy from 1.21.0 to 1.22.0

    Bumps numpy from 1.21.0 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 5
  • Fix disconnected computational graph

    Fix disconnected computational graph

    Presently, neigh_nodes are sampled twice (once on Line 74 and a second time when query_func is invoked on Line 79), this results in a disconnected graph.

    Before this change:

    >>> numpy.array_equal(context['encoder']['node_feats']['neighbor_feats'], context['encoder']['neighbor_feats']['node_feats'])
    False
    

    After this change:

    >>> numpy.array_equal(context['encoder']['node_feats']['neighbor_feats'], context['encoder']['neighbor_feats']['node_feats'])
    True
    
    opened by Swapnil-Gandhi 4
  • Add Linear format.

    Add Linear format.

    • [x] Forked repo is synced with upstream -> github shows no code delta outside of the desired. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
    • [x] Tests are passing? https://github.com/microsoft/DeepGNN/blob/main/CONTRIBUTING.md#run-tests
    • [x] Documentation is added or updated to reflect new code in the same format as the rest of the repo.
    • [x] PR is labeled using the label menu on the right side.

    New Behavior

    • Add a new converter input format called "Linear" with nodes and edges on separate lines.
    • Built in datasets now use linear for speed gains.
    • Add to unit tests - convert_tests, e2e_tests and sparse_features_test.

    Notes

    • No checks for if the file is sorted properly or not.
    • Linear is similar to the TSV format, but different enough where they cant really be merged(neighbor edges are on the same line as node, feature encoding).
    enhancement 
    opened by coledie 3
  • [1/2] Add example of migrate data.

    [1/2] Add example of migrate data.

    • [x] Forked repo is synced with upstream -> github shows no code delta outside of the desired. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
    • [x] Tests are passing? https://github.com/microsoft/DeepGNN/blob/main/CONTRIBUTING.md#run-tests
    • [x] Changelog and documentation updated.
    • [x] PR is labeled using the label menu on the right side.

    New Behavior

    Add example for what pytorch examples will look like after migrate script.

    enhancement 
    opened by coledie 2
  • Bump setuptools from 59.6.0 to 65.5.1 in /docs

    Bump setuptools from 59.6.0 to 65.5.1 in /docs

    Bumps setuptools from 59.6.0 to 65.5.1.

    Release notes

    Sourced from setuptools's releases.

    v65.5.1

    No release notes provided.

    v65.5.0

    No release notes provided.

    v65.4.1

    No release notes provided.

    v65.4.0

    No release notes provided.

    v65.3.0

    No release notes provided.

    v65.2.0

    No release notes provided.

    v65.1.1

    No release notes provided.

    v65.1.0

    No release notes provided.

    v65.0.2

    No release notes provided.

    v65.0.1

    No release notes provided.

    v65.0.0

    No release notes provided.

    v64.0.3

    No release notes provided.

    v64.0.2

    No release notes provided.

    v64.0.1

    No release notes provided.

    v64.0.0

    No release notes provided.

    v63.4.3

    No release notes provided.

    v63.4.2

    No release notes provided.

    ... (truncated)

    Changelog

    Sourced from setuptools's changelog.

    v65.5.1

    Misc ^^^^

    • #3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok
    • #3659: Fixed REDoS vector in package_index.

    v65.5.0

    Changes ^^^^^^^

    • #3624: Fixed editable install for multi-module/no-package src-layout projects.
    • #3626: Minor refactorings to support distutils using stdlib logging module.

    Documentation changes ^^^^^^^^^^^^^^^^^^^^^

    • #3419: Updated the example version numbers to be compliant with PEP-440 on the "Specifying Your Project’s Version" page of the user guide.

    Misc ^^^^

    • #3569: Improved information about conflicting entries in the current working directory and editable install (in documentation and as an informational warning).
    • #3576: Updated version of validate_pyproject.

    v65.4.1

    Misc ^^^^

    • #3613: Fixed encoding errors in expand.StaticModule when system default encoding doesn't match expectations for source files.
    • #3617: Merge with pypa/distutils@6852b20 including fix for pypa/distutils#181.

    v65.4.0

    Changes ^^^^^^^

    v65.3.0

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • AttributeError: Can't pickle local object 'CDLL.__init__.<locals>._FuncPtr'

    AttributeError: Can't pickle local object 'CDLL.__init__.._FuncPtr'

    • [ ] Issue is labeled using the label menu on the right side.

    Environment

    • Python version: (python -V) 3.7.7
    • deepgnn-ge Version: (python -m pip show deepgnn-ge) 0.1.55.1
    • deepgnn-torch Version: (python -m pip show deepgnn-torch) 0.1.55.1
    • deepgnn-tf Version: (python -m pip show deepgnn-tf) not installed
    • OS: (Windows, Linux, ...) Windows 10 Enterprise

    Issue Details

    • What you did - code sample or commands run

    I installed deepgnn-torch via pip in a virtual environment. Then I cloned the deepgnn repository, cd-ed into the examples/pytorch/gat/ and then ran bash run.sh

    • Expected behavior

    I expected the training script to run without issues.

    • Actual behavior

    I see the error File "c:\users\myid\appdata\local\programs\python\python37\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'CDLL.__init__.<locals>._FuncPtr'

    Full stack trace:

    $ bash run.sh
    + DEVICE=cpu
    ++ dirname run.sh
    + DIR_NAME=.
    + GRAPH=/tmp/cora
    + python -m deepgnn.graph_engine.data.citation --data_dir /tmp/cora
    c:\users\myid\appdata\local\programs\python\python37\lib\runpy.py:125: RuntimeWarning: 'deepgnn.graph_engine.data.citation' found in sys.modules after import of package 'deepgnn.graph_engine.data', but prior to execution of 'deepgnn.graph_engine.data.citation'; this may result in unpredictable behaviour
      warn(RuntimeWarning(msg))
    [2022-09-09 14:46:04,150] {convert.py:100} INFO - worker 0 try to generate partition: 0 - 1
    [2022-09-09 14:46:04,151] {_adl_reader.py:124} INFO - [1,0] Input files: ['C:/Users/myid/AppData/Local/Temp/cora\\graph.json']
    [2022-09-09 14:46:04,782] {dispatcher.py:143} INFO - record processed: 1000
    [2022-09-09 14:46:05,257] {dispatcher.py:143} INFO - record processed: 2000
    [2022-09-09 14:46:05,657] {local.py:44} INFO - Graph data path: C:/Users/myid/AppData/Local/Temp/cora. Partitions [0]. Storage type 0. Config path . Stream False.
    [2022-09-09 14:46:05,707] {local.py:52} INFO - Loaded snark graph. Node counts: [140, 500, 1000, 1068]. Edge counts: [10556]
    graph data: C:/Users/myid/AppData/Local/Temp/cora
    + MODEL_DIR=/tmp/model_fix
    + rm -rf /tmp/model_fix
    + [[ cpu == \g\p\u ]]
    + python ./main.py --data_dir /tmp/cora --mode train --seed 123 --backend snark --graph_type local --converter skip --batch_size 140 --learning_rate 0.005 --num_epochs 180 --sample_file /tmp/cora/train.nodes --node_type 0 --model_dir /tmp/model_fix --metric_dir /tmp/model_fix --save_path /tmp/model_fix --eval_file /tmp/cora/test.nodes --eval_during_train_by_steps 1 --feature_idx 0 --feature_dim 1433 --label_idx 1 --label_dim 1 --head_num 8,1 --num_classes 7 --neighbor_edge_types 0 --attn_drop 0.6 --ffd_drop 0.6 --log_by_steps 1 --use_per_step_metrics
    [2022-09-09 14:46:08,646] {factory.py:38} INFO - GE_OMP_NUM_THREADS=1
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - apex_opt_level=O2
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - attn_drop=0.6
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - backend=snark
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - batch_size=140
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - client_rank=None
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - clip_grad=False
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - config_path=
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - converter=skip
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - data_dir=C:/Users/myid/AppData/Local/Temp/cora
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - data_parallel_num=2
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - dim=256
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - disable_ib=False
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - enable_adl_uploader=False
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - enable_ssl=False
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - eval_during_train_by_steps=1
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - eval_file=C:/Users/myid/AppData/Local/Temp/cora/test.nodes
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - fanouts=[10, 10]
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - featenc_config=None
    [2022-09-09 14:46:08,647] {factory.py:38} INFO - feature_dim=1433
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - feature_idx=0
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - feature_type=float
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - ffd_drop=0.6
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - fp16=amp
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - ge_start_timeout=30
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - gpu=False
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - grad_max_norm=1.0
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - graph_type=local
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - head_num=[8, 1]
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - hidden_dim=8
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - job_id=aa812d6f
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - l2_coef=0.0005
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - label_dim=1
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - label_idx=1
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - learning_rate=0.005
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - local_rank=0
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - log_by_steps=1
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - max_id=None
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - max_samples=0
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - max_saved_ckpts=0
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - meta_dir=
    [2022-09-09 14:46:08,648] {factory.py:38} INFO - metric_dir=C:/Users/myid/AppData/Local/Temp/model_fix
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - mode=train
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - model_args=
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - model_dir=C:/Users/myid/AppData/Local/Temp/model_fix
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - neighbor_count=10
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - neighbor_edge_types=[0]
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - node_type=0
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - num_classes=7
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - num_epochs=180
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - num_ge=0
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - num_negs=5
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - num_parallel=2
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - partitions=[0]
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - prefetch_factor=2
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - prefetch_size=16
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - sample_file=C:/Users/myid/AppData/Local/Temp/cora/train.nodes
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - save_ckpt_by_epochs=1
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - save_ckpt_by_steps=0
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - save_path=C:/Users/myid/AppData/Local/Temp/model_fix
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - seed=123
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - server_idx=None
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - servers=
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - skip_ge_start=False
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - sort_ckpt_by_mtime=False
    [2022-09-09 14:46:08,649] {factory.py:38} INFO - ssl_cert=
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - storage_type=0
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - strategy=RandomWithoutReplacement
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - stream=False
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - sync_dir=
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - trainer=base
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - uploader_process_num=1
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - uploader_store_name=
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - uploader_threads_num=12
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - use_per_step_metrics=True
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - user_name=10.0.0.200
    [2022-09-09 14:46:08,650] {factory.py:38} INFO - warmup=0.0002
    [2022-09-09 14:46:08,654] {local.py:44} INFO - Graph data path: C:/Users/myid/AppData/Local/Temp/cora. Partitions [0]. Storage type 0. Config path . Stream False.
    [2022-09-09 14:46:08,666] {local.py:52} INFO - Loaded snark graph. Node counts: [140, 500, 1000, 1068]. Edge counts: [10556]
    [2022-09-09 14:46:08,666] {main.py:37} INFO - Creating GAT model with seed:123.
    [2022-09-09 14:46:08,668] {base_model.py:39} INFO - [BaseModel] feature_type: FeatureType.FLOAT, feature_idx:0, feature_dim:0.
    [2022-09-09 14:46:08,672] {trainer.py:472} INFO - [1,0] Max steps per epoch:-1
    [2022-09-09 14:46:08,672] {utils.py:107} INFO - 0, input_layer.att_head-0.bias: torch.Size([8]), cpu
    [2022-09-09 14:46:08,672] {utils.py:107} INFO - 1, input_layer.att_head-0.w.weight: torch.Size([8, 1433]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 2, input_layer.att_head-0.attn_l.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 3, input_layer.att_head-0.attn_l.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 4, input_layer.att_head-0.attn_r.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 5, input_layer.att_head-0.attn_r.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 6, input_layer.att_head-1.bias: torch.Size([8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 7, input_layer.att_head-1.w.weight: torch.Size([8, 1433]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 8, input_layer.att_head-1.attn_l.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 9, input_layer.att_head-1.attn_l.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 10, input_layer.att_head-1.attn_r.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 11, input_layer.att_head-1.attn_r.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 12, input_layer.att_head-2.bias: torch.Size([8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 13, input_layer.att_head-2.w.weight: torch.Size([8, 1433]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 14, input_layer.att_head-2.attn_l.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 15, input_layer.att_head-2.attn_l.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 16, input_layer.att_head-2.attn_r.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 17, input_layer.att_head-2.attn_r.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 18, input_layer.att_head-3.bias: torch.Size([8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 19, input_layer.att_head-3.w.weight: torch.Size([8, 1433]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 20, input_layer.att_head-3.attn_l.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 21, input_layer.att_head-3.attn_l.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 22, input_layer.att_head-3.attn_r.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 23, input_layer.att_head-3.attn_r.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 24, input_layer.att_head-4.bias: torch.Size([8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 25, input_layer.att_head-4.w.weight: torch.Size([8, 1433]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 26, input_layer.att_head-4.attn_l.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 27, input_layer.att_head-4.attn_l.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 28, input_layer.att_head-4.attn_r.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,673] {utils.py:107} INFO - 29, input_layer.att_head-4.attn_r.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 30, input_layer.att_head-5.bias: torch.Size([8]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 31, input_layer.att_head-5.w.weight: torch.Size([8, 1433]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 32, input_layer.att_head-5.attn_l.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 33, input_layer.att_head-5.attn_l.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 34, input_layer.att_head-5.attn_r.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 35, input_layer.att_head-5.attn_r.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 36, input_layer.att_head-6.bias: torch.Size([8]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 37, input_layer.att_head-6.w.weight: torch.Size([8, 1433]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 38, input_layer.att_head-6.attn_l.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 39, input_layer.att_head-6.attn_l.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 40, input_layer.att_head-6.attn_r.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 41, input_layer.att_head-6.attn_r.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 42, input_layer.att_head-7.bias: torch.Size([8]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 43, input_layer.att_head-7.w.weight: torch.Size([8, 1433]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 44, input_layer.att_head-7.attn_l.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 45, input_layer.att_head-7.attn_l.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 46, input_layer.att_head-7.attn_r.weight: torch.Size([1, 8]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 47, input_layer.att_head-7.attn_r.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 48, out_layer.att_head-0.bias: torch.Size([7]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 49, out_layer.att_head-0.w.weight: torch.Size([7, 64]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 50, out_layer.att_head-0.attn_l.weight: torch.Size([1, 7]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 51, out_layer.att_head-0.attn_l.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 52, out_layer.att_head-0.attn_r.weight: torch.Size([1, 7]), cpu
    [2022-09-09 14:46:08,674] {utils.py:107} INFO - 53, out_layer.att_head-0.attn_r.bias: torch.Size([1]), cpu
    [2022-09-09 14:46:08,675] {utils.py:116} INFO - parameter count: 92391
    [2022-09-09 14:46:08,675] {logging_utils.py:84} INFO - Training worker started. Model: GAT.
    Traceback (most recent call last):
      File "./main.py", line 126, in <module>
        _main()
      File "./main.py", line 121, in _main
        init_args_fn=init_args,
      File "C:\Users\myid\Downloads\DeepGNN\venv\lib\site-packages\deepgnn\pytorch\training\factory.py", line 134, in run_dist
        eval_dataloader_for_training,
      File "C:\Users\myid\Downloads\DeepGNN\venv\lib\site-packages\deepgnn\pytorch\training\trainer.py", line 100, in run
        self._train(model)
      File "C:\Users\myid\Downloads\DeepGNN\venv\lib\site-packages\deepgnn\pytorch\training\trainer.py", line 171, in _train
        self._train_one_epoch(model, epoch)
      File "C:\Users\myid\Downloads\DeepGNN\venv\lib\site-packages\deepgnn\pytorch\training\trainer.py", line 174, in _train_one_epoch
        for i, data in enumerate(self.dataset):
      File "C:\Users\myid\Downloads\DeepGNN\venv\lib\site-packages\torch\utils\data\dataloader.py", line 444, in __iter__
        return self._get_iterator()
      File "C:\Users\myid\Downloads\DeepGNN\venv\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
        return _MultiProcessingDataLoaderIter(self)
      File "C:\Users\myid\Downloads\DeepGNN\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in __init__
        w.start()
      File "c:\users\myid\appdata\local\programs\python\python37\lib\multiprocessing\process.py", line 112, in start
        self._popen = self._Popen(self)
      File "c:\users\myid\appdata\local\programs\python\python37\lib\multiprocessing\context.py", line 223, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "c:\users\myid\appdata\local\programs\python\python37\lib\multiprocessing\context.py", line 322, in _Popen
        return Popen(process_obj)
      File "c:\users\myid\appdata\local\programs\python\python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
        reduction.dump(process_obj, to_child)
      File "c:\users\myid\appdata\local\programs\python\python37\lib\multiprocessing\reduction.py", line 60, in dump
        ForkingPickler(file, protocol).dump(obj)
    AttributeError: Can't pickle local object 'CDLL.__init__.<locals>._FuncPtr'
    
    bug 
    opened by nabihach 2
  • [BUG FIX]: Dropped Indexing for 2-hop neighbor_feats

    [BUG FIX]: Dropped Indexing for 2-hop neighbor_feats

    Bug Description: Previously, indices idx (output of numpy.unique) of 1-hop neighbors (i.e. neigh_feats_unique computed on Line 83) were being used to reconstruct features of 2-hop neighbors (i.e. neigh_feats_unique["neighbor_feats"] computed on Line 90); this results in slicing of feature matrix for 2-hop neighbors.

    This change drops the indexing on Line 90.

    Steps to reproduce: (Dataset: Cora; Fanout: [10, 10]; Batch-Size: 140; Sampling: RandomWithoutReplacement)

    Before this code change:

    >>> context['encoder']['node_feats']['node_feats'].shape
    (140, 50)
    
    >>> context['encoder']['node_feats']['neighbor_feats'].shape
    (1400, 50)
    
    >>> context['encoder']['neighbor_feats']['node_feats'].shape
    (1400, 50)
    
    >>> context['encoder']['neighbor_feats']['neighbor_feats'].shape
    (1400, 50)
    

    After this code change:

    >>> context['encoder']['node_feats']['node_feats'].shape
    (140, 50)
    
    >>> context['encoder']['node_feats']['neighbor_feats'].shape
    (1400, 50)
    
    >>> context['encoder']['neighbor_feats']['node_feats'].shape
    (1400, 50)
    
    >>> context['encoder']['neighbor_feats']['neighbor_feats'].shape
    (4160, 50)
    
    • [ ] Forked repo is synced with upstream -> github shows no code delta outside of the desired. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
    • [ ] Tests are passing? https://github.com/microsoft/DeepGNN/blob/main/CONTRIBUTING.md#run-tests
    • [ ] Documentation is added or updated to reflect new code in the same format as the rest of the repo.
    • [ ] PR is labeled using the label menu on the right side.

    Previous Behavior

    • Provide relevant issue number if applicable, eg #44.

    New Behavior

    opened by Swapnil-Gandhi 2
  • Neighbor Count Functionality for Memory/Distributed Graphs

    Neighbor Count Functionality for Memory/Distributed Graphs

    • [X ] Forked repo is synced with upstream -> github shows no code delta outside of the desired. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
    • [X ] Tests are passing? https://github.com/microsoft/DeepGNN/blob/main/CONTRIBUTING.md#run-tests
    • [X ] Documentation is added or updated to reflect new code in the same format as the rest of the repo.
    • [ X] PR is labeled using the label menu on the right side.

    Previous Behavior

    • Provide relevant issue number if applicable, eg #44.

    New Behavior

    • Adds neighbor count (node degree) functionality for both memory graph and distributed graphs
    • Adds testing for neighbor count feature for memory/distributed graph (base cases, single/multiple partition, invalid nodes and types)
    • Adds Python handling of c functions leveraging ctypes library
    enhancement 
    opened by adithyabhonsley 2
  • how do I get inference results with node id and node embedding?

    how do I get inference results with node id and node embedding?

    hi,

    I am trying GAT example and I see that the model only returns loss, pred and labels, how can I get node id and node embedding returned?

        def forward(self, inputs):
            """Evaluate model, calculate loss, predictions and extract labels."""
            # fmt: off
            nodes, feat, mask, labels, edges, edges_value, adj_shape = inputs
            nodes = torch.squeeze(nodes)                # [N], N: num of nodes in subgraph
            feat = torch.squeeze(feat)                  # [N, F]
            mask = torch.squeeze(mask)                  # [N]
            labels = torch.squeeze(labels)              # [N]
            edges = torch.squeeze(edges)                # [X, 2], X: num of edges in subgraph
            edges_value = torch.squeeze(edges_value)    # [X]
            adj_shape = torch.squeeze(adj_shape)        # [2]
            # fmt: on
    
            sp_adj = torch.sparse_coo_tensor(edges, edges_value, adj_shape.tolist())
            h_1 = self.input_layer(feat, sp_adj)
            scores = self.out_layer(h_1, sp_adj)
    
            labels = labels.type(torch.int64)
            labels = labels[mask]  # [batch_size]
            scores = scores[mask]  # [batch_size]
            pred = scores.argmax(dim=1)
            loss = self.xent(scores, labels)
            return loss, pred, labels
    
    opened by Chanrom 2
  • Converter accept linear input format

    Converter accept linear input format

    Externalize graph format

    • Make easier to convert from parquet/orvo to our input format.
    • Support neighbor inversion // sorting by dst nodes.
    • JSON can't handle high number of edges well.
    opened by coledie 2
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 1
  • Update bazel install instructions to v5.4.0.

    Update bazel install instructions to v5.4.0.

    • [x] Forked repo is synced with upstream -> github shows no code delta outside of the desired. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
    • [x] Tests are passing? https://github.com/microsoft/DeepGNN/blob/main/CONTRIBUTING.md#run-tests
    • [x] Changelog and documentation updated.
    • [x] PR is labeled using the label menu on the right side.

    Previous Behavior

    • We always use most recent release in bazel install.

    New Behavior

    Fix to version 5.4.0 (latest working) bazel version in install instructions, tested on new machine.

    documentation 
    opened by coledie 0
  • Add TF version of Ray usage.

    Add TF version of Ray usage.

    • [x] Forked repo is synced with upstream -> github shows no code delta outside of the desired. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
    • [x] Tests are passing? https://github.com/microsoft/DeepGNN/blob/main/CONTRIBUTING.md#run-tests
    • [x] Changelog and documentation updated.
    • [x] PR is labeled using the label menu on the right side.

    New Behavior

    Add documentation for example on how to use Ray with Tensorflow. -> This is a repeat of a previous pull request, this is meant to merge in with the torch ray_usage docs before the migration.

    documentation 
    opened by coledie 0
  • Add migrate script from custom trainers to Ray Train Torch.

    Add migrate script from custom trainers to Ray Train Torch.

    • [x] Forked repo is synced with upstream -> github shows no code delta outside of the desired. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
    • [x] Tests are passing? https://github.com/microsoft/DeepGNN/blob/main/CONTRIBUTING.md#run-tests
    • [x] Changelog and documentation updated.
    • [x] PR is labeled using the label menu on the right side.

    Previous Behavior

    • We implement custom trainers and use them in all examples.

    New Behavior

    • Add migrate script from custom trainers to Ray Train Torch.
    • Remove all custom trainers.
    • Next PR will migrate all examples to train torch.
    • Migrate all models to use Ray using migrate script.
    • Unit tests leverage same training loop using Ray as run.sh.
    • Removed redundant // overly complex tests for some models.
    enhancement 
    opened by coledie 1
  • [Temporal graph][2/4] Feature extraction

    [Temporal graph][2/4] Feature extraction

    • [x] Tests are passing? https://github.com/microsoft/DeepGNN/blob/main/CONTRIBUTING.md#run-tests
    • [x] Changelog and documentation updated.
    • [x] PR is labeled using the label menu on the right side.

    Previous Behavior

    1. Only static features supported.
    2. Feature extraction relied on using node types/edge flags to identify missing features.

    New Behavior

    1. Every node/edge has an optional parameter timestamp associated with it. If parameter is present, then we are going to do a binary search to find the latest feature. Otherwise return raw feature data.
    2. Use feature timestamps buffer to identify missing features and fill with zeros if it is missing.
    enhancement 
    opened by mortid0 0
  • [Temporal graph][1/4] Documentation

    [Temporal graph][1/4] Documentation

    • [x] Tests are passing? https://github.com/microsoft/DeepGNN/blob/main/CONTRIBUTING.md#run-tests
    • [x] Changelog and documentation updated.
    • [x] PR is labeled using the label menu on the right side.

    Previous Behavior

    Static graphs without any time dependencies across its components.

    New Behavior

    Add dynamic behavior to graphs in 4 parts:

    Documentation Feature extraction Neighbor sampling Converters, python E2E tests

    documentation enhancement 
    opened by mortid0 0
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Galileo library for large scale graph training by JD

近年来,图计算在搜索、推荐和风控等场景中获得显著的效果,但也面临超大规模异构图训练,与现有的深度学习框架Tensorflow和PyTorch结合等难题。 Galileo(伽利略)是一个图深度学习框架,具备超大规模、易使用、易扩展、高性能、双后端等优点,旨在解决超大规模图算法在工业级场景的落地难题,提

JD Galileo Team 128 Nov 29, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

Aiden Nibali 36 Oct 30, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

Aiden Nibali 25 Jun 20, 2021
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

null 212 Dec 25, 2022
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

ColossalAI An integrated large-scale model training system with efficient parallelization techniques Installation PyPI pip install colossalai Install

HPC-AI Tech 7.1k Jan 3, 2023
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm

Appen Repos 86 Dec 7, 2022
null 190 Jan 3, 2023
The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" ?? ?? . ?? Re

Shuai Shen 87 Dec 28, 2022
Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

EleutherAI 432 Dec 16, 2022
An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

null 33 Jun 27, 2021
Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

MidiBERT-Piano Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen Introduction This is the official repository for the paper, MidiBERT-Piano: Large-

null 137 Dec 15, 2022
UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.)

Microsoft 7.6k Jan 1, 2023
Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

LUPerson-NL Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL) The repository is for our CVPR2022 paper Large-Scale

null 43 Dec 26, 2022
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue. This

null 290 Dec 29, 2022
Large scale embeddings on a single machine.

Marius Marius is a system under active development for training embeddings for large-scale graphs on a single machine. Training on large scale graphs

Marius 107 Jan 3, 2023
Evaluation suite for large-scale language models.

This repo contains code for running the evaluations and reproducing the results from the Jurassic-1 Technical Paper (see blog post), with current support for running the tasks through both the AI21 Studio API and OpenAI's GPT3 API.

null 71 Dec 17, 2022
OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

TUNiB 280 Nov 24, 2022
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

Microsoft 125 Jan 4, 2023