Benchmarking Pipeline for Prediction of Protein-Protein Interactions

Loïc Lannelongue

Last update: Jun 27, 2022

Related tags

Deep Learning B4PPI

Overview

B4PPI

Benchmarking Pipeline for the Prediction of Protein-Protein Interactions

How this benchmarking pipeline has been built, and how to use it, is detailed in our preprint here (please cite it if you find this work useful!).

A minimal example is available here, and the list of requirements there.

How to use the gold standard

All the data files are in data, most of them are available as csv (sep='|') and pickled pandas DataFrames (sometimes the csv file may be missing due to file size constraints on GitHub).

The gold standard, without pre-processed features, can be loaded using:

goldStandard = pd.read_csv(
    os.path.join('data', 'benchmarkingGS_v1-0.csv'),
    sep='|'
)

Or with the pre-processed features:

goldStandard_with_featuresSeq = pd.read_pickle(
    os.path.join('data', 'benchmarkingGS_v1-0_similarityMeasure_sequence_v3-1.pkl')
)

UniProtIDs are used for both proteins A and B.
isInteraction is the ground truth from the IntAct database (1 = interacting proteins, 0 = non-interacting proteins).
trainTest is the split between training set (train), first testing set T1 (test1) and second testing set T2 (test2).
Pre-processed features are explained in the manuscript.

Training and evaluation can then be done normally. The code from the preprint is in the Training section.

How to cite this work

Lannelongue L., Inouye M., Construction of in silico protein-protein interaction networks across different topologies using machine learning, 2022, BioArxiv

Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

Credits

The code was written in Python 3.7.
Many libraries were used, in particular Pandas, Numpy, scikit-learn and PyTorch Lightning (full list in the code and in the requirements file).
Plots were drawn using Matplotlib, Seaborn and the MetBrewer colour palettes.
Logs were saved using Weight & Bias.

You might also like...

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

The first comprehensive Robustness investigation benchmark on large-scale dataset ImageNet regarding ARchitecture design and Training techniques towards diverse noises.

132 Dec 23, 2022

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

12 Sep 26, 2021

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

188 Dec 17, 2022

Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

3 Oct 14, 2022

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

B-Pref Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments. Install conda env

48 Dec 20, 2022

Benchmarking Pipeline for Prediction of Protein-Protein Interactions

Related tags

Overview

B4PPI

How to use the gold standard

How to cite this work

Licence

Credits

You might also like...

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

ColossalAI-Benchmark - Performance benchmarking with ColossalAI

Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

Evaluation and Benchmarking of Speech Super-resolution Methods

Owner

Loïc Lannelongue

A geometric deep learning pipeline for predicting protein interface contacts.

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Doge-Prediction - Coding Club prediction ig

Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

FedScale: Benchmarking Model and System Performance of Federated Learning

Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System