Python-based Informatics Kit for Analysing Chemical Units

Overview

INSTALLATION

Python-based Informatics Kit for the Analysis of Chemical Units

Step 1: Make a conda environment:

conda create -n pikachu python=3.9
conda activate pikachu

Step 2: install pip:

conda install pip

Step 3: Install PIKAChU:

pip install pikachu-chem

GETTING STARTED

Step 1: Open python or initiate an empty .py file.

Step 2: Import required modules to visualise your first structure:

from pikachu.general import draw_smiles

Step 3: Load your SMILES string of interest and draw it!

smiles = draw_smiles("CCCCCCCCCC(=O)N[C@@H](CC1=CNC2=CC=CC=C21)C(=O)N[C@@H](CC(=O)N)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@H]3[C@H](OC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@H](NC(=O)CNC(=O)[C@@H](NC(=O)[C@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)CNC3=O)CCCN)CC(=O)O)C)CC(=O)O)CO)[C@H](C)CC(=O)O)CC(=O)C4=CC=CC=C4N)C")

Step 4: Play around with the other functions in pikachu.general. For guidance, refer to documentation in the wiki and function descriptors.

Comments
  • Allow depiction of markush structures

    Allow depiction of markush structures

    I tried to read structures with R-group from SMILES to depict them with PIKAChU and failed. Hence, I made some changes to enable it:

    atomproperties.py:

    • Added constructor that adds "R", "X" and "Z" as well as combinations of these variables with indices (0-99) to the atom property dictionaries
    • Discussion point: There is a bit of a problem when it comes to how to handle these "atoms" as the R group variables can represent many different things. Right now, they are treated like carbon atoms. This can lead to "Rogue electron" warnings when constructing a molecule from a SMILES string. My first thought was to treat them like the existing placeholder variable "*", but that comes with the problem that the R groups can then only be terminal. Something like "CC[X]CC" is not possible as "*" is treated very similar to hydrogen atoms. Right now, I don't really have an idea for an elegant solution. When the R group variables are treated like carbon atoms, there are warnings, but everything is depicted correctly.

    drawing.py:

    • Added a function that recognises digits that come behind "R", "X" and "Z" and returns the same string with subscript digits (so that the depiction contains "R₁" instead of "R1").

    test_drawing.py:

    • Unit test for added function in drawing.py

    smiles.py:

    • Treat digits that come behind "R", "X" or "Z" as a part of the "element"

    Result:

    from pikachu.smiles.smiles import read_smiles
    from pikachu.general import draw_structure
    
    structure = read_smiles("[R12]C1C([X2])[X]([R])[Z]([Z1])=C([Z2])C1[X1]")
    draw_structure(structure)
    

    image

    opened by OBrink 6
  • Save structure to file

    Save structure to file

    Hello!

    I met a problem while trying to save the structure with smile.general.smiles_to_molfile(). I would be grateful if you could help.

    To check if there is an issue with the molecular, I also try to draw it with the NCBI online tool(https://pubchem.ncbi.nlm.nih.gov//edit3/index.html). The SMILE string, structure and error message are following.

    CC1=C(/C=C/C(C)=C/C=C/C(C)=C/C=O)C(C)(C)CCC1.NC(=O)c1ccc[n+]([C@@H]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@H](n4cnc5c(N)ncnc54)[C@H](O)[C@@H]3O)[C@@H](O)[C@H]2O)c1.O

    error message structure

    Thank you!

    opened by Lokious 2
  • Add option to instantiate MolFileReader with mol str

    Add option to instantiate MolFileReader with mol str

    Added a convenience function:

    • Added the option to instantiate MolFileReader with the String content of a molfile instead of the path of the molfile
    • Everything else should behave the same way, the constructor of MolFileReader now simply also accepts and processes the argument molfile_str if the normal "molfile" argument is not given
    • Added unit tests for the things I have touched to make sure it behaves the way it is supposed to behave
    opened by OBrink 1
  • Default Bond thickness

    Default Bond thickness

    Hey,

    Really nice python package. Excited to see further developments. I tried the draw functionality of the package and found the default Bond thickness to be higher than most of the other toolkits. Probably setting a lower thickness would benefit most of the users. SMILES: CN1C=NC2=C1C(=O)N(C(=O)N2C)C

    pikachu image

    CDK-Depict image

    Do let me know once you have updated the documentation.

    Kind regards, Kohulan

    opened by Kohulan 1
  • Add option to instantiate MolFileReader with mol str

    Add option to instantiate MolFileReader with mol str

    Added a convenience function:

    • Added the option to instantiate MolFileReader with the String content of a molfile instead of the path of the molfile
    • Everything else should behave the same way, the constructor of MolFileReader now simply also accepts and processes the argument molfile_str if the normal "molfile" argument is not given
    • Added unit tests for the things I have touched to make sure it behaves the way it is supposed to behave
    opened by OBrink 0
  • Depiction of large structures makes Python crash on Windows

    Depiction of large structures makes Python crash on Windows

    Hey there!

    We have run into a problem with the depiction functionalities in PIKAChU. When depicting large molecules, Python simply crashes. We don't get an exception that can be caught or something like that. If I execute the code below in my Python shell, the Python shell simply closes.

    Minimal example for the reproduction of the problem:

    from pikachu.general import read_smiles
    from pikachu.drawing import drawing
    
    smiles = 'CC(=O)N[C@H]1[C@H]([C@H](O)[C@H](O)CO)O[C@@](O[C@H](CO)[C@@H](O)[C@@H]2O[C@@](O[C@H](CO)[C@@H](O)[C@@H]3O[C@@](O[C@H](CO)[C@@H](O)[C@@H]4O[C@@](O[C@H](CO)[C@@H](O)[C@@H]5O[C@@](O[C@H](CO)[C@@H](O)[C@@H]6O[C@@](O[C@H](CO)[C@@H](O)[C@@H]7O[C@@](O[C@H](CO)[C@@H](O)[C@@H]8O[C@@](O[C@H](CO)[C@@H](O)[C@@H]9O[C@@](O[C@H](CO)[C@@H](O)[C@@H]%10O[C@@](O[C@H](CO)[C@@H](O)[C@@H]%11O[C@@](O[C@H](CO)[C@@H](O)[C@@H]%12O[C@@](O[C@H](CO)[C@@H](O)[C@@H]%13O[C@@](O[C@H](CO)[C@@H](O)[C@@H]%14O[C@@](O)(C(=O)O)C[C@H](O)[C@H]%14NC(C)=O)(C(=O)O)C[C@H](O)[C@H]%13NC(C)=O)(C(=O)O)C[C@H](O)[C@H]%12NC(C)=O)(C(=O)O)C[C@H](O)[C@H]%11NC(C)=O)(C(=O)O)C[C@H](O)[C@H]%10NC(C)=O)(C(=O)O)C[C@H](O)[C@H]9NC(C)=O)(C(=O)O)C[C@H](O)[C@H]8NC(C)=O)(C(=O)O)C[C@H](O)[C@H]7NC(C)=O)(C(=O)O)C[C@H](O)[C@H]6NC(C)=O)(C(=O)O)C[C@H](O)[C@H]5NC(C)=O)(C(=O)O)C[C@H](O)[C@H]4NC(C)=O)(C(=O)O)C[C@H](O)[C@H]3NC(C)=O)(C(=O)O)C[C@H](O)[C@H]2NC(C)=O)(C(=O)O)C[C@@H]1O'
    
    mol = read_smiles(smiles)
    drawer = drawing.Drawer(mol)
    

    Expected behaviour:

    • Instantiation of a Drawer object.

    Observed behaviour:

    • Python crashes.

    We will build a workaround and simply not use PIKAChU for large structures in our application, but I think it may make sense to look into this.

    Have a nice day! Otto

    opened by OBrink 4
  • SMILES parser fails

    SMILES parser fails

    Hey there!

    When reading SMILES strings using PIKAChU, @Mar-Gol ran into the following problem:

    Minimal example for reproduction:

    
    >>> smiles = 'CCOC1=CC2=C(C=C1)S(=O)C(=C2C3=CC=CC=C3)C(=O)C4=CC=CC=C4'
    >>> from pikachu.general import read_smiles
    >>> mol = read_smiles(smiles)
    

    Expected behavior:

    • The structure is converted into PIKAChU's molecular representation.

    Observed behavior:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\Otto Brinkhaus\anaconda3\envs\RanDepict\lib\site-packages\pikachu\general.py", line 45, in read_smiles
        structure = smiles.smiles_to_structure()
      File "C:\Users\Otto Brinkhaus\anaconda3\envs\RanDepict\lib\site-packages\pikachu\smiles\smiles.py", line 496, in smiles_to_structure
        structure.refine_structure()
      File "C:\Users\Otto Brinkhaus\anaconda3\envs\RanDepict\lib\site-packages\pikachu\chem\structure.py", line 916, in refine_structure
        self.aromatic_cycles = self.find_aromatic_cycles()
      File "C:\Users\Otto Brinkhaus\anaconda3\envs\RanDepict\lib\site-packages\pikachu\chem\structure.py", line 781, in find_aromatic_cycles
        self.promote_lone_pairs_in_aromatic_cycles(aromatic_cycles)
      File "C:\Users\Otto Brinkhaus\anaconda3\envs\RanDepict\lib\site-packages\pikachu\chem\structure.py", line 731, in promote_lone_pairs_in_aromatic_cycles
        atom.promote_lone_pair_to_p_orbital()
      File "C:\Users\Otto Brinkhaus\anaconda3\envs\RanDepict\lib\site-packages\pikachu\chem\atom.py", line 512, in promote_lone_pair_to_p_orbital
        p_orbital = p_orbitals[-1]
    IndexError: list index out of range
    
    • We used the latest version on pipy which we installed via pip install pikachu-chem.

    I will build a work-around for our application for now, but I would appreciate your help a lot! Thank you in advance! Otto

    opened by OBrink 2
  • Is it possible to output ACS 1996 standard structure image with pikachu?

    Is it possible to output ACS 1996 standard structure image with pikachu?

    Hi developers, I am very happy that you have developed a new molecular image generation tool, which has been very useful to me. I hope it will support more drawing standards, such as ACS-1996 Document Settings, instead of the commercial software ChemDraw.

    If there is a solution, please let me know, thank you.

    opened by stud2008 0
  • Depiction options without function

    Depiction options without function

    After playing around with the attributes of drawing.Options(), I have noticed that some of them don't do anything:

    • ‘font_size_large’
    • ‘font_size_small’
    • ‘height’
    • ‘width’
    • All ‘kk_*’ attributes (although I am not sure what they are responsible for)
    • ‘background_color’
    • I am not sure what ‘draw_hydrogens’ does, but it does not draw hydrogen atoms
    • SMILES: [R12]C1RXZ=C([Z2])C1NC@(F)C(=O)O
      • ‘Draw_hydrogens’ = False: image
      • ‘Draw_hydrogens’ = True: image
    opened by OBrink 0
Owner
null
Optimising chemical reactions using machine learning

Summit Summit is a set of tools for optimising chemical processes. We’ve started by targeting reactions. What is Summit? Currently, reaction optimisat

Sustainable Reaction Engineering Group 75 Dec 14, 2022
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

Wei-Ning Hsu 21 Aug 23, 2022
A PyTorch Implementation of "Neural Arithmetic Logic Units"

Neural Arithmetic Logic Units [WIP] This is a PyTorch implementation of Neural Arithmetic Logic Units by Andrew Trask, Felix Hill, Scott Reed, Jack Ra

Kevin Zakka 181 Nov 18, 2022
Rational Activation Functions - Replacing Padé Activation Units

Rational Activations - Learnable Rational Activation Functions First introduce as PAU in Padé Activation Units: End-to-end Learning of Activation Func

ml-research@TUDarmstadt 38 Nov 22, 2022
A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

TransPose Code for our SIGGRAPH 2021 paper "TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors". This repository

Xinyu Yi 261 Dec 31, 2022
Systemic Evolutionary Chemical Space Exploration for Drug Discovery

SECSE SECSE: Systemic Evolutionary Chemical Space Explorer Chemical space exploration is a major task of the hit-finding process during the pursuit of

null 64 Dec 16, 2022
Conversion between units used in magnetism

convmag Conversion between various units used in magnetism The conversions between base units available are: T <-> G : 1e4

null 0 Jul 15, 2021
End-to-end image segmentation kit based on PaddlePaddle.

English | 简体中文 PaddleSeg PaddleSeg has released the new version including the following features: Our team won the AutoNUE@CVPR 2021 challenge, where

null 6.2k Jan 2, 2023
Starter kit for getting started in the Music Demixing Challenge.

Music Demixing Challenge - Starter Kit ?? Challenge page This repository is the Music Demixing Challenge Submission template and Starter kit! Clone th

AIcrowd 106 Dec 20, 2022
Development Kit for the SoccerNet Challenge

SoccerNetv2-DevKit Welcome to the SoccerNet-V2 Development Kit for the SoccerNet Benchmark and Challenge. This kit is meant as a help to get started w

Silvio Giancola 117 Dec 30, 2022
Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad to your characters in Modo.

Applicator Kit for Modo Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad with a TrueDepth camera to

Andrew Buttigieg 3 Aug 24, 2021
Development kit for MIT Scene Parsing Benchmark

Development Kit for MIT Scene Parsing Benchmark [NEW!] Our PyTorch implementation is released in the following repository: https://github.com/hangzhao

MIT CSAIL Computer Vision 424 Dec 1, 2022
Reproduce partial features of DeePMD-kit using PyTorch.

DeePMD-kit on PyTorch For better understand DeePMD-kit, we implement its partial features using PyTorch and expose interface consuing descriptors. Tec

Shaochen Shi 8 Dec 17, 2022
DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

DIT - DTLS Interception Tool DIT is a MitM proxy tool to intercept DTLS traffic. It can intercept, manipulate and/or suppress DTLS datagrams between t

null 52 Nov 30, 2022
A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

AMAZ3DSim AMAZ3DSim is a lightweight python-based 3D network multi-agent simulator. It uses a cell-based congestion model. It calculates risk, battery

Daniel Hirsch 13 Nov 4, 2022
PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

Code for On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models This repository will reproduce the main results from our pape

Mitch Hill 32 Nov 25, 2022
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

null 139 Jan 1, 2023
Alex Pashevich 62 Dec 24, 2022