Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Overview

pae_to_domains

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Overview

Using a predicted aligned error matrix corresponding to an AlphaFold2 model (e.g. as downloaded from https://alphafold.ebi.ac.uk/), returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a pseudo-rigid domain.

Requirements

  • Python >=3.7
  • NetworkX >= 2.6.2

Known Issues

Due to an internal implementation issue in NetworkX (Issue #4992) some combinations of PAE matrix and resolution can lead to a KeyError. Solutions to this are being explored, and it will hopefully be fixed in the next NetworkX release.

Usage

While primarily intended as a code snippet to be incorporated into larger projects, this can also be called from the command line. At its simplest:

python pae_to_domains.py pae_file.json

... will yield a .csv file with each line providing the indices for one residue cluster. Full help for the command-line version:

positional arguments:
  pae_file              Name of the PAE JSON file.

optional arguments:
  -h, --help            show this help message and exit
  --output_file OUTPUT_FILE
                        Name of output file (comma-delimited text format.
                        Default: clusters.csv
  --pae_power PAE_POWER
                        Graph edges will be weighted as 1/pae**pae_power.
                        Default: 1.0
  --pae_cutoff PAE_CUTOFF
                        Graph edges will only be created for residue pairs
                        with pae

Example

Using https://alphafold.ebi.ac.uk/entry/Q9HBA0 as an example case...

resolution=0.5: Resolution 0.5, cartoon coloured by domain assignment

resolution=1.0: Resolution 1.0, cartoon coloured by domain assignment

resolution=2.0: Resolution 2.0, cartoon coloured by domain assignment

You might also like...
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

A Protein-RNA Interface Predictor Based on Semantics of Sequences
A Protein-RNA Interface Predictor Based on Semantics of Sequences

PRIP PRIP:A Protein-RNA Interface Predictor Based on Semantics of Sequences installation gensim==3.8.3 matplotlib==3.1.3 xgboost==1.3.3 prettytable==2

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.
Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

Skeleton Merger Skeleton Merger, an Unsupervised Aligned Keypoint Detector. The paper is available at https://arxiv.org/abs/2103.10814. A map of the r

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)
PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

2021-CVPR-MvCLN This repo contains the code and data of the following paper accepted by CVPR 2021 Partially View-aligned Representation Learning with

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021) PyTorch implementation of Learning RAW-to-sRGB Mappings with Inaccurat

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral
TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.

Comments
  • Add support for the new PAE JSON format

    Add support for the new PAE JSON format

    • Adds support for the new PAE JSON format while also keeping backwards compatibility with the legacy format.
    • The new format makes the PAE JSON about 4x smaller when stored compressed, even smaller when stored uncompressed (we don't store redundant residue indices and the predicted aligned error is rounded to integers).
    • The new format parses about 3x faster.
    • I kept the dtype of the matrix as np.float64 to not break any existing code, but since PAE is now integers, np.int32, np.float32 or even np.float16 could be used is RAM usage is too high.
    opened by Augustin-Zidek 1
  • Errors with PAE matrix both with igraph and networkx

    Errors with PAE matrix both with igraph and networkx

    Hi,

    Thanks for these scripts, seems very useful. However, I have some problem running it. I set up a colab notebook with copy-pasting and editing the code so it is runnable without cmd arguments. I included the same example as you used in the readme.

    https://colab.research.google.com/drive/1xWsXMMolZqAL5bIAvl2pAqNuOV_Q0V0l?usp=sharing

    If I use networkx, I get the following error:

    TypeError                                 Traceback (most recent call last)
    
    [<ipython-input-8-4af6c93a04f6>](https://localhost:8080/#) in <module>
    ----> 1 clusters = f(pae, pae_power=pae_power, pae_cutoff=pae_cutoff, graph_resolution=resolution)
          2 max_len = max([len(c) for c in clusters])
          3 clusters = [list(c) + ['']*(max_len-len(c)) for c in clusters]
          4 output_file = output_file
    
    1 frames
    
    [/usr/local/lib/python3.7/dist-packages/networkx/algorithms/community/modularity_max.py](https://localhost:8080/#) in greedy_modularity_communities(G, weight, resolution, n_communities)
        111     for u, nbrdict in dq_dict.items():
        112         for v, wt in nbrdict.items():
    --> 113             dq_dict[u][v] = q0 * wt - resolution * (a[u] * b[v] + b[u] * a[v])
        114 
        115     # Use -dq to get a max_heap instead of a min_heap
    
    TypeError: can't multiply sequence by non-int of type 'numpy.float64'
    

    When I try igraph, I get this:

    /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:80: RuntimeWarning: divide by zero encountered in true_divide
    
    ---------------------------------------------------------------------------
    
    TypeError                                 Traceback (most recent call last)
    
    [<ipython-input-11-4af6c93a04f6>](https://localhost:8080/#) in <module>
    ----> 1 clusters = f(pae, pae_power=pae_power, pae_cutoff=pae_cutoff, graph_resolution=resolution)
          2 max_len = max([len(c) for c in clusters])
          3 clusters = [list(c) + ['']*(max_len-len(c)) for c in clusters]
          4 output_file = output_file
    
    [<ipython-input-2-c5063efac596>](https://localhost:8080/#) in domains_from_pae_matrix_igraph(pae_matrix, pae_power, pae_cutoff, graph_resolution)
         88     g.es['weight']=sel_weights
         89 
    ---> 90     vc = g.community_leiden(weights='weight', resolution_parameter=graph_resolution/100, n_iterations=-1)
         91     membership = numpy.array(vc.membership)
         92     from collections import defaultdict
    
    TypeError: unsupported operand type(s) for /: 'tuple' and 'int'
    

    If I try igraph and set the diagonal to 1, I still get the same error, only this line is missing from the above error: /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:80: RuntimeWarning: divide by zero encountered in true_divide

    Do you have any idea what might cause the problem? Thank you for your help!

    opened by gezmi 3
  • Setting seed - reproducible number of cluster

    Setting seed - reproducible number of cluster

    Hi,

    I was playing around with the different threshold parameters and realised that the same run can yield different number of clusters.

    python pae_to_domains.py *_predicted_aligned_error_v1.json --pae_cutoff 10 --pae_power 1 --resolution 1
    Wrote 13 clusters to clusters.csv. Biggest cluster contains 54 residues. Run time was 0.86 seconds.
    
    python pae_to_domains.py *_predicted_aligned_error_v1.json --pae_cutoff 10 --pae_power 1 --resolution 1
    Wrote 14 clusters to clusters.csv. Biggest cluster contains 54 residues. Run time was 0.73 seconds.
    

    Is this something you also observed on the same input pae json file? I suspect this comes from the igraph community_leiden step, but I might be wrong.

    opened by Ni-Ar 4
Owner
Tristan Croll
One-time chemical engineer, now full-time structural biology methods developer.
Tristan Croll
A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

null 2 Jul 25, 2022
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

yueliu1999 297 Dec 27, 2022
Python Implementation of algorithms in Graph Mining, e.g., Recommendation, Collaborative Filtering, Community Detection, Spectral Clustering, Modularity Maximization, co-authorship networks.

Graph Mining Author: Jiayi Chen Time: April 2021 Implemented Algorithms: Network: Scrabing Data, Network Construbtion and Network Measurement (e.g., P

Jiayi Chen 3 Mar 3, 2022
Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Yaoming Cai 5 Jul 18, 2022
Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

pytorch_clip_bbox: Implementation of the CLIP guided bbox ranking for Object Detection. Pytorch based library to rank predicted bounding boxes using t

Sergei Belousov 50 Nov 27, 2022
NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

null 5 Nov 3, 2022
Generative Models for Graph-Based Protein Design

Graph-Based Protein Design This repo contains code for Generative Models for Graph-Based Protein Design by John Ingraham, Vikas Garg, Regina Barzilay

John Ingraham 159 Dec 15, 2022
Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training Code for our paper "Predicting lncRNA–protein interactio

zhanglabNKU 1 Nov 29, 2022
This is the official Pytorch implementation of the paper "Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model"

Diverse Motion Stylization (Official) This is the official Pytorch implementation of this paper. Diverse Motion Stylization for Multiple Style Domains

Soomin Park 28 Dec 16, 2022
Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

[TensorFlow] Protein Interface Prediction using Graph Convolutional Networks Unofficial TensorFlow implementation of Protein Interface Prediction usin

YeongHyeon Park 9 Oct 25, 2022