Improving Compound Activity Classification via Deep Transfer and Representation Learning

This repository is the official implementation of Improving Compound Activity Classification via Deep Transfer and Representation Learning.

Requirements

Operating systems: Red Hat Enterprise Linux Server 7.9

To install requirements:

pip install -r requirements.txt

Installation guide

Download the code and dataset with the command:

git clone https://github.com/ninglab/TransferAct.git

Data Processing

1. Use provided processed dataset

One can use our provided processed dataset in ./data/pairs/: the dataset of pairs of processed balanced assays $\mathcal{P}$ . Check the details of bioassay selection, processing, and assay pair selection in our paper in Section 5.1.1 and Section 5.1.2, respectively. We provided our dataset of pairs as data/pairs.tar.gz compressed file. Please use tar to de-compress it.

2. Use own dataset

We provide necessary scripts in ./data/scripts/ with the processing steps in ./data/scripts/README.md.

Training

1. Running `TAc`

To run TAc-dmpn,

python code/train_aada.py --source_data_path <source_assay_csv_file> --target_data_path <target_assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --alpha 1 --lamda 0 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance --mpn_shared

To run TAc-dmpna, add these arguments to the above command

--attn_dim 100 --aggregation self-attention --model aada_attention

source_data_path and target_data_path specify the path to the source and target assay CSV files of the pair, respectively. First line contains a header smiles,target. Each of the following lines are comma-separated with the SMILES in the 1st column and the 0/1 label in the 2nd column.

dataset_type specifies the type of task; always classification for this project.

extra_metrics specifies the list of evaluation metrics.

hidden_size specifies the dimension of the learned compound representation out of GNN-based feature generators.

depth specifies the number of message passing steps.

init_lr specifies the initial learning rate.

batch_size specifies the batch size.

ffn_hidden_size and ffn_num_layers specify the number of hidden units and layers, respectively, in the fully connected network used as the classifier.

epochs specifies the total number of epochs.

split_type specifies the type of data split.

crossval_index_file specifies the path to the index file which contains the indices of data points for train, validation and test split for each fold.

save_dir specifies the directory where the model, evaluation scores and predictions will be saved.

class_balance indicates whether to use class-balanced batches during training.

model specifies which model to use.

aggregation specifies which pooling mechanism to use to get the compound representation from the atom representations. Default set to mean: the atom-level representations from the message passing network are averaged over all atoms of a compound to yield the compound representation.

attn_dim specifies the dimension of the hidden layer in the 2-layer fully connected network used as the attention network.

Use python code/train_aada.py -h to check the meaning and default values of other parameters.

2. Running `TAc-fc` variants and ablations

To run Tac-fc,

python code/train_aada.py --source_data_path <source_assay_csv_file> --target_data_path <target_assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --local_discriminator_hidden_size 100 --local_discriminator_num_layers 2 --global_discriminator_hidden_size 100 --global_discriminator_num_layers 2 --epochs 40 --alpha 1 --lamda 1 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance --mpn_shared

To run TAc-fc-dmpna, add these arguments to the above command

--attn_dim 100 --aggregation self-attention --model aada_attention

Ablations

To run TAc-f, add --exclude_global to the above command.
To run TAc-c, add --exclude_local to the above command.
Adding both --exclude_local and --exclude_global is equivalent to running TAc.

3. Running Baselines

`DANN`

python code/train_aada.py --source_data_path <source_assay_csv_file> --target_data_path <target_assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --global_discriminator_hidden_size 100 --global_discriminator_num_layers 2 --epochs 40 --alpha 1 --lamda 1 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance --mpn_shared

To run DANN-dmpn, add --model dann to the above command.
To run DANN-dmpna, add --model dann_attention --attn_dim 100 --aggregation self-attention --model to the above command.

Run the following baselines from chemprop as follows:

`FCN-morgan`

python chemprop/train.py --data_path <assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --features_generator morgan --features_only --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance

`FCN-morganc`

python chemprop/train.py --data_path <assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --features_generator morgan_count --features_only --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance

`FCN-dmpn`

python chemprop/train.py --data_path <assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance

`FCN-dmpna`

Add the following to the above command:

--model mpnn_attention --attn_dim 100 --aggregation self-attention

For the above baselines, data_path specifies the path to the target assay CSV file.

`FCN-dmpn(DT)`

python chemprop/train.py --data_path <source_assay_csv_file> --target_data_path <target_assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score  --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance

`FCN-dmpna(DT)`

--model mpnn_attention --attn_dim 100 --aggregation self-attention

For FCN-dmpn(DT)and FCN-dmpna(DT), data_path and target_data_path specify the path to the source and target assay CSV files.

Use python chemprop/train.py -h to check the meaning of other parameters.

Testing

To predict the labels of the compounds in the test set for Tac*, DANN methods:
```
python code/predict.py --test_path <test_csv_file> --checkpoint_dir <chkpt_dir> --preds_path <pred_file>
```
test_path specifies the path to a CSV file containing a list of SMILES and ground-truth labels. First line contains a header smiles,target. Each of the following lines are comma-separated with the SMILES in the 1st column and the 0/1 label in the 2nd column.

checkpoint_dir specifies the path to the checkpoint directory where the model checkpoint(s) .pt filles are saved (i.e., save_dir during training).

preds_path specifies the path to a CSV file where the predictions will be saved.

To predict the labels of the compounds in the test set for other methods:

python chemprop/predict.py --test_path <test_csv_file> --checkpoint_dir <chkpt_dir> --preds_path <pred_file>

Compound Prioritization using dmpna:

Please refer to the README.md in the comprank directory.

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

75 Dec 16, 2022

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Transfer Style API It's an API to use with Tranfer Style App, where you can use

1 Feb 13, 2022

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

45 Dec 8, 2022

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

24 Dec 17, 2022

Flower classification model that classifies flowers in 10 classes made using transfer learning (~85% accuracy).

flower-classification-inceptionV3 Flower classification model that classifies flowers in 10 classes. Training and validation are done using a pre-anot

1 Dec 12, 2021

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

16 Nov 4, 2020

Improving Compound Activity Classification via Deep Transfer and Representation Learning

Related tags

Overview

Improving Compound Activity Classification via Deep Transfer and Representation Learning

Requirements

Installation guide

Data Processing

1. Use provided processed dataset

2. Use own dataset

Training

1. Running TAc

2. Running TAc-fc variants and ablations

Ablations

3. Running Baselines

DANN

FCN-morgan

FCN-morganc

FCN-dmpn

FCN-dmpna

FCN-dmpn(DT)

FCN-dmpna(DT)

Testing

Compound Prioritization using dmpna:

You might also like...

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Flower classification model that classifies flowers in 10 classes made using transfer learning (~85% accuracy).

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Owner

NingLab

Code for "Multi-Compound Transformer for Accurate Biomedical Image Segmentation"

Tom-the-AI - A compound artificial intelligence software for Linux systems.

Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

Transfer-Learn is an open-source and well-documented library for Transfer Learning.

Improving Deep Network Debuggability via Sparse Decision Layers

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

1. Running `TAc`

2. Running `TAc-fc` variants and ablations

`DANN`

`FCN-morgan`

`FCN-morganc`

`FCN-dmpn`

`FCN-dmpna`

`FCN-dmpn(DT)`

`FCN-dmpna(DT)`