Twin-deep neural network for semi-supervised learning of materials properties

Related tags

Deep Learning tsdnn
Overview

Deep Semi-Supervised Teacher-Student Material Synthesizability Prediction

Citation:

Semi-supervised teacher-student deep neural network for materials discovery” by Daniel Gleaves, Edirisuriya M. Dilanga Siriwardane,Yong Zhao, and JianjunHu.

Machine learning and evolution laboratory

Department of Computer Science and Engineering

University of South Carolina


This software package implements the Meta Pseudo Labels (MPL) semi-supervised learning method with Crystal Graph Convolutional Neural Networks (CGCNN) with that takes an arbitary crystal structure to predict material synthesizability and whether it has positive or negative formation energy

The package provides two major functions:

  • Train a semi-supervised TSDNN classification model with a customized dataset.
  • Predict material synthesizability and formation energy of new crystals with a pre-trained TSDNN model.

The following paper describes the details of the CGCNN architecture, a graph neural network model for materials property prediction: CGCNN paper

The following paper describes the details of the semi-supervised learning framework that we used in our model: Meta Pseudo Labels

Table of Contents

Prerequisites

This package requires:

If you are new to Python, the easiest way of installing the prerequisites is via conda. After installing conda, run the following command to create a new environment named cgcnn and install all prerequisites:

conda upgrade conda
conda create -n tsdnn python=3 scikit-learn pytorch torchvision pymatgen -c pytorch -c conda-forge

*Note: this code is tested for PyTorch v1.0.0+ and is not compatible with versions below v0.4.0 due to some breaking changes.

This creates a conda environment for running TSDNN. Before using TSDNN, activate the environment by:

conda activate tsdnn

Usage

Define a customized dataset

To input crystal structures to TSDNN, you will need to define a customized dataset. Note that this is required for both training and predicting.

Before defining a customized dataset, you will need:

  • CIF files recording the structure of the crystals that you are interested in
  • The target label for each crystal (not needed for predicting, but you need to put some random numbers in data_test.csv)

You can create a customized dataset by creating a directory root_dir with the following files:

  1. data_labeled.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column recodes the known value of the target label.

  2. data_unlabeled.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column can be filled with alternating 1 and 0 (the second column is still needed).

  3. atom_init.json: a JSON file that stores the initialization vector for each element. An example of atom_init.json is data/sample-regression/atom_init.json, which should be good for most applications.

  4. ID.cif: a CIF file that recodes the crystal structure, where ID is the unique ID for the crystal.

(4.) data_predict: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column can be filled with alternating 1 and 0 (the second column is still needed). This is the file that will be used if you want to classify materials with predict.py.

The structure of the root_dir should be:

root_dir
├── data_labeled.csv
├── data_unlabeled.csv
├── data_test.csv
├── data_positive.csv (optional- for positive and unlabeled dataset generation)
├── data_unlabeled_full.csv (optional- for positive and unlabeled dataset generation, data_unlabeled.csv will be overwritten)
├── atom_init.json
├── id0.cif
├── id1.cif
├── ...

There is an example of customized a dataset in: data/example.

Train a TSDNN model

Before training a new TSDNN model, you will need to:

Then, in directory synth-tsdnn, you can train a TSDNN model for your customized dataset by:

python main.py root_dir

If you want to use the PU learning dataset generation, you can train a model using the --uds flag with the number of PU iterations to perform.

python main.py --uds 5 root_dir

You can set the number of training, validation, and test data with labels --train-size, --val-size, and --test-size. Alternatively, you may use the flags --train-ratio, --val-ratio, --test-ratio instead. Note that the ratio flags cannot be used with the size flags simultaneously. For instance, data/example has 10 data points in total. You can train a model by:

python main.py --train-size 6 --val-size 2 --test-size 2 data/example

or alternatively

python main.py --train-ratio 0.6 --val-ratio 0.2 --test-ratio 0.2 data/example

After training, you will get 5 files in synth-tsdnn directory.

  • checkpoints/teacher_best.pth.tar: stores the TSDNN teacher model with the best validation accuracy.
  • checkpoints/student_best.pth.tar" stores the TSDNN student model with the best validation accuracy.
  • checkpoints/t_checkpoint.pth.tar: stores the TSDNN teacher model at the last epoch.
  • checkpoints/s_checkpoint.pth.tar: stores the TSDNN student model at the last epoch.
  • results/validation/test_results.csv: stores the ID and predicted value for each crystal in training set.

Predict material properties with a pre-trained TSDNN model

Before predicting the material properties, you will need to:

  • Define a customized dataset at root_dir for all the crystal structures that you want to predict.
  • Obtain a pre-trained TSDNN model (example found in checkpoints/pre-trained/pre-train.pth.tar).

Then, in directory synth-tsdnn, you can predict the properties of the crystals in root_dir:

python predict.py checkpoints/pre-trained/pre-trained.pth.tar data/root_dir

After predicting, you will get one file in synth-tsdnn directory:

  • predictions.csv: stores the ID and predicted value for each crystal in test set.

Data

To reproduce our paper, you can download the corresponding datasets following the instruction. Each dataset discussed can be found in data/datasets/

Authors

This software was primarily written by Daniel Gleaves who was advised by Prof. Jianjun Hu. This software builds upon work by Tian Xie, Hieu Pham, and Jungdae Kim.

Acknowledgements

Research reported in this work was supported in part by NSF under grants 1940099 and 1905775. The views, perspective,and content do not necessarily represent the official views of NSF. This work was supported in part by the South Carolina Honors College Research Program. This work is partially supported by a grant from the University of South Carolina Magellan Scholar Program.

License

TSDNN is released under the MIT License.

You might also like...
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

Deep-rPPG: Camera-based pulse estimation using deep learning tools Deep learning (neural network) based remote photoplethysmography: how to extract pu

Official repository for "Intriguing Properties of Vision Transformers" (2021)

Intriguing Properties of Vision Transformers Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, & Ming-Hsuan Yang P

Explainer for black box models that predict molecule properties

Explaining why that molecule exmol is a package to explain black-box predictions of molecules. The package uses model agnostic explanations to help us

An end-to-end regression problem of predicting the price of properties in Bangalore.
An end-to-end regression problem of predicting the price of properties in Bangalore.

Bangalore-House-Price-Prediction An end-to-end regression problem of predicting the price of properties in Bangalore. Deployed in Heroku using Flask.

TUPÃ was developed to analyze electric field properties in molecular simulations
TUPÃ was developed to analyze electric field properties in molecular simulations

TUPÃ: Electric field analyses for molecular simulations What is TUPÃ? TUPÃ (pronounced as tu-pan) is a python algorithm that employs MDAnalysis engine

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization (Salesforce Research) This is a PyTorch implementation of the CoMatch paper [B

Comments
  • Fix slow data loading past first epoch

    Fix slow data loading past first epoch

    Previously, the program would need to load the data into cache new with each iteration. This allows it to only have to load the data into cache with the first iteration, saving time when the positive and unknown learning framework is used.

    opened by dagleaves 0
Owner
MLEG
MLEG
Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach This is the implementation of traffic prediction code in DTMP based on PyTo

chenxin 1 Dec 19, 2021
Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

SSL_OSC Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

zaixizhang 2 May 14, 2022
Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Tom-R.T.Kvalvaag 2 Dec 17, 2021
UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning This is the official PyTorch implementation for UniMoCo pape

dddzg 49 Jan 2, 2023
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

null 5 Dec 10, 2022
The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

Guolz 36 Oct 19, 2022
Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing, Pattern Recognition

USDAN The implementation of Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing, which is accepte

null 11 Nov 3, 2022
All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

Daniel Bourke 3.4k Jan 7, 2023
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

null 7 Jun 22, 2022