This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

Last update: Dec 25, 2022

Related tags

Deep Learning grover

Overview

Self-Supervised Graph Transformer on Large-Scale Molecular Data

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

Requirements

Python 3.6.8
For the other packages, please refer to the requirements.txt. To resolve PackageNotFoundError, please add the following channels before creating the environment.

   conda config --add channels pytorch
   conda config --add channels rdkit
   conda config --add channels conda-forge
   conda config --add channels rmg

You can just execute following command to create the conda environment.

conda create --name chem --file requirements.txt

We also provide the Dockerfile to build the environment, please refer to the Dockerfile for more details.

Pretained Model Download

We provide the pretrained models used in paper.

Usage

The whole framework supports pretraining, finetuning, prediction, fingerprint generation, and evaluation functions.

Pretraining

Pretrain GTransformer model given the unlabelled molecular data.

Data Preparation

We provide an input example of unlabelled molecular data at exampledata/pretrain/tryout.csv.

Semantic Motif Label Extraction

The semantic motif label is extracted by scripts/save_feature.py with feature generator fgtasklabel.

python scripts/save_features.py --data_path exampledata/pretrain/tryout.csv  \
                                --save_path exampledata/pretrain/tryout.npz   \
                                --features_generator fgtasklabel \
                                --restart

Contributing guide: you are welcomed to register your own feature generator to add more semantic motif for the graph-level prediction task. For more details, please refer to grover/data/task_labels.py.

Atom/Bond Contextual Property (Vocabulary)

The atom/bond Contextual Property (Vocabulary) is extracted by scripts/build_vocab.py.

python scripts/build_vocab.py --data_path exampledata/pretrain/tryout.csv  \
                             --vocab_save_folder exampledata/pretrain  \
                             --dataset_name tryout

The outputs of this script are vocabulary dicts of atoms and bonds, tryout_atom_vocab.pkl and tryout_bond_vocab.pkl, respectively. For more options for contextual property extraction, please refer to scripts/build_vocab.py.

Data Splitting

To accelerate the data loading and reduce the memory cost in the multi-gpu pretraining scenario, the unlabelled molecular data need to be spilt into several parts using scrpits/split_data.py.

Note: This step is required for single-gpu pretraining scenario as well.

python scripts/split_data.py --data_path exampledata/pretrain/tryout.csv  \
                             --features_path exampledata/pretrain/tryout.npz  \
                             --sample_per_file 100  \
                             --output_path exampledata/pretrain/tryout

It's better to set a larger sample_per_file for the large dataset.

The output dataset folder will look like this:

tryout
  |- feature # the semantic motif labels
  |- graph # the smiles
  |- summary.txt

Running Pretraining on Single GPU

Note: There are more hyper-parameters which can be tuned during pretraining. Please refer to add_pretrain_args inutil/parsing.py .

python main.py pretrain \
               --data_path exampledata/pretrain/tryout \
               --save_dir model/tryout \
               --atom_vocab_path exampledata/pretrain/tryout_atom_vocab.pkl \
               --bond_vocab_path exampledata/pretrain/tryout_bond_vocab.pkl \
               --batch_size 32 \
               --dropout 0.1 \
               --depth 5 \
               --num_attn_head 1 \
               --hidden_size 100 \
               --epochs 3 \
               --init_lr 0.0002 \
               --max_lr 0.0004 \
               --final_lr 0.0001 \
               --weight_decay 0.0000001 \
               --activation PReLU \
               --backbone gtrans \
               --embedding_output_type both

Running Pretraining on Multiple GPU

We have implemented distributed pretraining on multiple GPU using horovod. To start the distributed pretraining, please refer to this link. To enable the multi-GPU training of the pretraining model, --enable_multi_gpu flag should be proposed in the above command line.

Training & Finetuning

The finetune dataset is organized as a .csv file. This file should contain a column named as smiles.

(Optional) Molecular Feature Extraction

Given a labelled molecular dataset, it is possible to extract the additional molecular features in order to train & finetune the model from the existing pretrained model. The feature matrix is stored as .npz.

python scripts/save_features.py --data_path exampledata/finetune/bbbp.csv \
                                --save_path exampledata/finetune/bbbp.npz \
                                --features_generator rdkit_2d_normalized \
                                --restart

Finetuning with Existing Data

Given the labelled dataset and the molecular features, we can use finetune function to finetunning the pretrained model.

Note: There are more hyper-parameters which can be tuned during finetuning. Please refer to add_finetune_args inutil/parsing.py .

python main.py finetune --data_path exampledata/finetune/bbbp.csv \
                        --features_path exampledata/finetune/bbbp.npz \
                        --save_dir model/finetune/bbbp/ \
                        --checkpoint_path model/tryout/model.ep3 \
                        --dataset_type classification \
                        --split_type scaffold_balanced \
                        --ensemble_size 1 \
                        --num_folds 3 \
                        --no_features_scaling \
                        --ffn_hidden_size 200 \
                        --batch_size 32 \
                        --epochs 10 \
                        --init_lr 0.00015

The final finetuned model is stored in model/bbbp and will be used in the subsequent prediction and evaluation tasks.

Prediction

Given the finetuned model, we can use it to make the prediction of the target molecules. The final prediction is made by the averaging the prediction of all sub models (num_folds * ensemble_size).

(Optional) Molecular Feature Extraction

Note: If the finetuned model uses the molecular feature as input, we need to generate the molecular feature for the target molecules as well.

python scripts/save_features.py --data_path exampledata/finetune/bbbp.csv \
                                --save_path exampledata/finetune/bbbp.npz \
                                --features_generator rdkit_2d_normalized \
                                --restart

Prediction with Finetuned Model

python main.py predict --data_path exampledata/finetune/bbbp.csv \
               --features_path exampledata/finetune/bbbp.npz \
               --checkpoint_dir ./model \
               --no_features_scaling \
               --output data_pre.csv

Generating Fingerprints

The pretrained model can also be used to generate the molecular fingerprints.

Note: We provide three ways to generate the fingerprint.

atom: The mean pooling of atom embedding from node-view GTransformer and edge-view GTransformer.
bond: The mean pooling of bond embedding from node-view GTransformer and edge-view GTransformer.
both: The concatenation of atom and bond fingerprints. Moreover, the additional molecular features are appended to the output of GTransformer as well if provided.

python main.py fingerprint --data_path exampledata/finetune/bbbp.csv \
                           --features_path exampledata/finetune/bbbp.npz \
                           --checkpoint_path model/tryout/model.ep3 \
                           --fingerprint_source both \
                           --output model/fingerprint/fp.npz

The Results

The classification datasets.

Model	BBBP	SIDER	ClinTox	BACE	Tox21	ToxCast
GROVER_base	0.936(0.008)	0.656(0.006)	0.925(0.013)	0.878(0.016)	0.819(0.020)	0.723(0.010)
GROVER_large	0.940(0.019)	0.658(0.023)	0.944(0.021)	0.894(0.028)	0.831(0.025)	0.737(0.010)

The regression datasets.

Model	FreeSolv	ESOL	Lipo	QM7	QM8
GROVER_base	1.592(0.072)	0.888(0.116)	0.563(0.030)	72.5(5.9)	0.0172(0.002)
GROVER_large	1.544(0.397)	0.831(0.120)	0.560(0.035)	72.6(3.8)	0.0125(0.002)

The Reproducibility Issue

Due to the non-deterministic behavior of the function index_select_nd( See link), it is hard to exactly reproduce the training process of finetuning. Therefore, we provide the finetuned model for eleven datasets to guarantee the reproducibility of the experiments.

BBBP: BASE, LARGE
SIDER: BASE, LARGE
ClinTox: BASE, LARGE
BACE: BASE, LARGE
Tox21: BASE, LARGE
ToxCast: BASE, LARGE
FreeSolv: BASE, LARGE
ESOL: BASE, LARGE
Lipo: BASE, LARGE
QM7: BASE, LARGE
QM8 BASE, LARGE

We provide the eval function to reproduce the experiments. Suppose the finetuned model is placed in model/finetune/.

python main.py eval --data_path exampledata/finetune/bbbp.csv \
                    --features_path exampledata/finetune/bbbp.npz \
                    --checkpoint_dir model/finetune/bbbp \
                    --dataset_type classification \
                    --split_type scaffold_balanced \
                    --ensemble_size 1 \
                    --num_folds 3 \
                    --metric auc \
                    --no_features_scaling

Note: The defualt metric setting is rmse for regression tasks. For QM7 and QM8 datasets, you need to set metric as mae to reproduce the results. For classification tasks, you need to set metric as auc.

Known Issues

Comparing with the original implementation, we add the dense connection in MessagePassing layer in GTransformer . If you do not want to add the dense connection in MessagePasssing layer, please fix it at L256 of model/layers.py.

Roadmap

Implementation of GTransformer in DGL / PyG.
The improvement of self-supervised tasks, e.g. more semantic motifs.

Reference

@article{rong2020self,
  title={Self-Supervised Graph Transformer on Large-Scale Molecular Data},
  author={Rong, Yu and Bian, Yatao and Xu, Tingyang and Xie, Weiyang and Wei, Ying and Huang, Wenbing and Huang, Junzhou},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

Disclaimer

This is not an officially supported Tencent product.

Comments

Package not found error

Sorry I'm new to this. But can I install grover on my gpu-supported windows laptop?

I do add conda channels in advance but it still gives me package not found error.

conda create --name pretrain --file requirements.txt

Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - torchvision==0.3.0=py36_cu9.0.176_1
  - python==3.6.8=h0371630_0
  - numpy-base==1.16.4=py36hde5b4d6_0
  - boost==1.68.0=py36h8619c78_1001
  - numpy==1.16.4=py36h7e9f1db_0
  - rdkit==2019.03.4.0=py36hc20afe1_1
  - pandas==0.25.0=py36hb3f55d8_0
  - pytorch==1.1.0=py3.6_cuda9.0.176_cudnn7.5.1_0
  - readline==7.0=h7b6447c_5
  - boost-cpp==1.68.0=h11c811c_1000
  - scikit-learn==0.21.2=py36hcdab131_1
  - scipy==1.3.0=py36h921218d_1

Current channels:

  - https://conda.anaconda.org/rmg/win-64
  - https://conda.anaconda.org/rmg/noarch
  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://conda.anaconda.org/rdkit/win-64
  - https://conda.anaconda.org/rdkit/noarch
  - https://conda.anaconda.org/pytorch/win-64
  - https://conda.anaconda.org/pytorch/noarch
  - http://conda.anaconda.org/gurobi/win-64
  - http://conda.anaconda.org/gurobi/noarch
  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

opened by shangzhu-cmu 0

The incorrect implementation of multi-head attention!

Assuming the number of attention heads is 4. I find that self-attention is computed between different heads rather than different atoms. The attention scores shape is (num_atoms, 4, 4, 4) which should be (batch_size, max_num_atoms, max_num_atoms). The flattened atom features (num_atoms, node_fdim) should be processed into padded batch data(batch_size, max_num_atoms, node_fdim).

For details, I only extract the main codes there to illustrate why the self-attention is computed between different heads rather than different atoms.

For MTBlock class:

# in the __init__ function
  self.attn = MultiHeadedAttention(h=num_attn_head,
                                   d_model=self.hidden_size,
                                   bias=bias,
                                   dropout=dropout)
for _ in range(num_attn_head):
            self.heads.append(Head(args, hidden_size=hidden_size, atom_messages=atom_messages))

# in the forward function
for head in self.heads:
            q, k, v = head(f_atoms, f_bonds, a2b, a2a, b2a, b2revb)
            queries.append(q.unsqueeze(1))
            keys.append(k.unsqueeze(1))
            values.append(v.unsqueeze(1))

queries = torch.cat(queries, dim=1) # (num_atoms, 4, hidden_size)
keys = torch.cat(keys, dim=1) # (num_atoms, 4, hidden_size)
values = torch.cat(values, dim=1) # (num_atoms, 4, hidden_size)

x_out = self.attn(queries, keys, values, past_key_value)  # multi-headed attention

Now, the queries, keys and values will be fed into multi-head attention to get new results. For MultiHeadedAttention class:

# in the __init__ function
self.attention = Attention()
self.d_k = d_model // h # equals hidden_size // num_attn_head
self.linear_layers = nn.ModuleList([nn.Linear(d_model, d_model) for _ in range(3)])  # why 3: query, key, value

# in the forward function
    # 1) Do all the linear projections in batch from d_model => h x d_k
query, key, value = [l(x).view(batch_size, -1, self.h, self.d_k).transpose(1, 2)
                     for l, x in zip(self.linear_layers, (query, key, value))] # q, k, v 's shape will be (num_bonds, 4, 4, d_k)
x, _ = self.attention(query, key, value, mask=mask, dropout=self.dropout)

For the Attention class:

class Attention(nn.Module):
    """
    Compute 'Scaled Dot Product SelfAttention
    """

    def forward(self, query, key, value, mask=None, dropout=None):
        scores = torch.matmul(query, key.transpose(-2, -1)) \
                 / math.sqrt(query.size(-1)) # scores shape is (num_atoms, 4, 4, 4)

        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e9)

        p_attn = F.softmax(scores, dim=-1)

        return torch.matmul(p_attn, value), p_attn # the new output is (num_atoms, 4, 4, d_k) which will be processed into (num_atoms, 4, hidden_size)

As you can see, the scores shape is (num_atoms, 4, 4, 4) which is computed between different heads rather than different atoms. That is, each atom's representation is the combination of different heads' information which is meaningless.

opened by ZhuYun97 0

I couldn't install horovod in grover conda environment.

I wanna use grover in multi-gpu environment so I've tried to install horovod in my grover conda env. but I got this error message.

Import Error

Traceback (most recent call last): File "", line 1, in File "/home/eung0/miniconda3/envs/grover_1/lib/python3.6/site-packages/horovod/torch/init.py", line 44, in from horovod.torch.mpi_ops import allreduce, allreduce_async, allreduce_, allreduce_async_ File "/home/eung0/miniconda3/envs/grover_1/lib/python3.6/site-packages/horovod/torch/mpi_ops.py", line 31, in from horovod.torch import mpi_lib_v2 as mpi_lib ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version GLIBCXX_3.4.30' not found (required by/home/eung0/miniconda3/envs/grover_1/lib/python3.6/site-packages/horovod/torch/mpi_lib_v2.cpython-36m-x86_64-linux-gnu.so)`

command

HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_GPU_OPERATIONS=NCCL pip install --no-cache-dir --force-reinstall horovod[pytorch]==0.19.5

grover conda env

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge blas 1.0 mkl conda-forge boost 1.68.0 py36h8619c78_1001 conda-forge boost-cpp 1.68.0 h11c811c_1000 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge ca-certificates 2020.6.20 hecda079_0 rmg cairo 1.16.0 h18b612c_1001 conda-forge descriptastorus 2.2.0 py_0 rmg expat 2.4.8 h27087fc_0 conda-forge fontconfig 2.14.0 h8e229c2_0 conda-forge freetype 2.10.4 h0708190_1 conda-forge gettext 0.19.8.1 hf34092f_1004 conda-forge glib 2.66.3 h58526e2_0 conda-forge icu 58.2 hf484d3e_1000 conda-forge joblib 1.1.0 pyhd8ed1ab_0 conda-forge jpeg 9e h166bdaf_1 conda-forge lcms2 2.12 hddcbb42_0 conda-forge lerc 3.0 h9c3ff4c_0 conda-forge libblas 3.9.0 8_mkl conda-forge libboost 1.67.0 h46d08c1_4 libcblas 3.9.0 8_mkl conda-forge libdeflate 1.10 h7f98852_0 conda-forge libffi 3.2.1 he1b5a44_1007 conda-forge libgcc-ng 12.1.0 h8d9b700_16 conda-forge libgfortran-ng 7.5.0 h14aa051_20 conda-forge libgfortran4 7.5.0 h14aa051_20 conda-forge libglib 2.66.3 hbe7bbb4_0 conda-forge libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 8_mkl conda-forge libpng 1.6.37 h21135ba_2 conda-forge libstdcxx-ng 12.1.0 ha89aaad_16 conda-forge libtiff 4.3.0 h0fcbabc_4 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libwebp-base 1.2.2 h7f98852_1 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libzlib 1.2.12 h166bdaf_0 conda-forge llvm-openmp 14.0.4 he0ac6c6_0 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge mkl 2020.4 h726a3e6_304 conda-forge mkl_fft 1.0.10 py36_0 conda-forge mkl_random 1.1.1 py36h830a2c2_0 conda-forge ncurses 6.3 h27087fc_1 conda-forge numpy 1.16.4 py36h7e9f1db_0 numpy-base 1.16.4 py36hde5b4d6_0 olefile 0.46 pyh9f0ad1d_1 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 1.1.1g h516909a_0 rmg pandas 0.25.0 py36hb3f55d8_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pillow 8.3.2 py36h676a545_0 conda-forge pip 21.3.1 pyhd8ed1ab_0 conda-forge pixman 0.38.0 h516909a_1003 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge py-boost 1.67.0 py36h04863e7_4 python 3.6.8 h0371630_0 python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.6 2_cp36m conda-forge pytz 2022.1 pyhd8ed1ab_0 conda-forge rdkit 2020.03.3.0 py36hc20afe1_1 rmg readline 7.0 h7b6447c_5 scikit-learn 0.21.2 py36hcdab131_1 conda-forge scipy 1.3.0 py36h921218d_1 conda-forge setuptools 58.0.4 py36h5fab9bb_2 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sqlite 3.28.0 h8b20d00_0 conda-forge tk 8.6.12 h27826a3_0 conda-forge torch 1.7.1+cu101 pypi_0 pypi torchaudio 0.7.2 pypi_0 pypi torchvision 0.8.2+cu101 pypi_0 pypi tqdm 4.32.1 py_0 conda-forge typing 3.6.4 py36_0 conda-forge typing_extensions 3.10.0.2 pyha770c72_0 conda-forge wheel 0.37.1 pyhd8ed1ab_0 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.7.2 h7f98852_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h7f98852_1 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.5 h516909a_1 conda-forge zlib 1.2.12 h166bdaf_0 conda-forge zstd 1.5.2 h8a70e8d_1 conda-forge

which version should I install?

opened by eungyeonglee-dev 0
File call issues
When I use

python scripts/build_vocab.py --data_path exampledata/pretrain/tryout.csv \

--vocab_save_folder exampledata/pretrain \ --dataset_name tryout

Here an error will be reported Traceback (most recent call last): File "scripts/build_vocab.py", line 6, in from grover.data.torchvocab import MolVocab ModuleNotFoundError: No module named 'grover'
opened by uestc-lese 1
How to install env with python3.7, cuda 11.1

install.sh

#Installing env with python3.7

conda create --name envgrover37 python=3.7 conda activate env grover37

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

(cuda version: 11.1)

conda install -c conda-forge boost conda install -c conda-forge boost-cpp conda install -c rmg descriptastorus conda install -c acellera rdkit=2019.03.4.0 conda install -c conda-forge tqdm
conda install -c anaconda typing
conda install -c anaconda scipy=1.3.0 conda install -c anaconda scikit-learn=0.21.2

how to fix 'module grover not found' error

Add import sys sys.path.append('/user/grover/') to build_vocab.py and save_features.py split_data.py

opened by funihang 0

Releases(1.0.0)

1.0.0(Jan 18, 2021)

This is the original implementation.
Source code(tar.gz)
Source code(zip)

Owner

Research repositories.

GitHub

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

656 Dec 29, 2022

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Graph Analysis & Deep Learning Laboratory, GRAND

32 Jan 2, 2023

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.)

7.6k Jan 1, 2023

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

12 Dec 12, 2022

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" ?? ?? . ?? Re

87 Dec 28, 2022

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

49 Nov 23, 2022

An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

2 Oct 31, 2022

DeepGNN is a framework for training machine learning models on large scale graph data.

DeepGNN Overview DeepGNN is a framework for training machine learning models on large scale graph data. DeepGNN contains all the necessary features in

45 Jan 1, 2023

[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

Self-Supervised Learning of Image Scale and Orientation Estimation (BMVC 2021) This is the official implementation of the paper "Self-Supervised Learn

17 Nov 10, 2022

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

92 Nov 30, 2022

A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video

45 Nov 29, 2022

Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

8 Oct 15, 2021

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

4 Jul 27, 2022

OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

280 Nov 24, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Galileo library for large scale graph training by JD

近年来，图计算在搜索、推荐和风控等场景中获得显著的效果，但也面临超大规模异构图训练，与现有的深度学习框架Tensorflow和PyTorch结合等难题。 Galileo（伽利略）是一个图深度学习框架，具备超大规模、易使用、易扩展、高性能、双后端等优点，旨在解决超大规模图算法在工业级场景的落地难题，提

128 Nov 29, 2022

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

190 Jan 3, 2023

PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].

Smooth ReLU in PyTorch Unofficial PyTorch reimplementation of the Smooth ReLU (SmeLU) activation function proposed in the paper Real World Large Scale

10 Jan 2, 2023

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

Related tags

Overview

Self-Supervised Graph Transformer on Large-Scale Molecular Data

Requirements

Pretained Model Download

Usage

Pretraining

Data Preparation

Semantic Motif Label Extraction

Atom/Bond Contextual Property (Vocabulary)

Data Splitting

Running Pretraining on Single GPU

Running Pretraining on Multiple GPU

Training & Finetuning

(Optional) Molecular Feature Extraction

Finetuning with Existing Data

Prediction

(Optional) Molecular Feature Extraction

Prediction with Finetuned Model

Generating Fingerprints

The Results

The Reproducibility Issue

Known Issues

Roadmap

Reference

Disclaimer

Comments

Package not found error

The incorrect implementation of multi-head attention!

I couldn't install horovod in grover conda environment.

Import Error

command

grover conda env

File call issues

How to install env with python3.7, cuda 11.1

install.sh

(cuda version: 11.1)

how to fix 'module grover not found' error

Releases(1.0.0)

1.0.0(Jan 18, 2021)

Owner

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

3D-Transformer: Molecular Representation with Transformer in 3D Space

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

DeepGNN is a framework for training machine learning models on large scale graph data.

[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

OSLO: Open Source framework for Large-scale transformer Optimization

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Galileo library for large scale graph training by JD

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].