Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Arun Singh Babal

Last update: Jul 1, 2022

Related tags

Miscellaneous machine-learning artificial-intelligence imbalanced-data retraining-pipeline-flow automated-pipeline faulty-wafer-substrate wafer-dataset

Overview

Retrainable-Faulty-Wafer-Detector

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced) and then train the model on this data so that it can continuously update itself with the environment and become more generalized with time. In this regard, a training and prediction dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values then they are considered good files on which we can train or predict our model, however, if it didn't match then they are moved to the bad folder. Moreover, we also identify the columns with null values. If all the data in a column is missing then the file is also moved to the bad folder. On the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and considered it as good data.

Data Insertion in Database:

First, open a connection to the database if it exists otherwise create a new one. A table with the name train_good_raw_dt or pred_good_raw_dt is created in the database, based on the training or prediction process, for inserting the good data files obtained from the data validation step. If the table is already present then new files are inserted in that table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file, to be used for the model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The columns with zero standard deviation were also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, Random Forest, K-Neighbours, Logistic Regression and XGBoost. For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for all the models and select the one with the best score. Similarly, the best model is selected for each cluster. For every cluster, the models are saved so that they can be used in future predictions. In the end, the confusion matrix of the model associated with every cluster is also saved to give the a glance over the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Retraining:

After the prediction, the prediction data is merged with the previous training dataset and then the models were retrained on this data using the hyperparameter values obtained from the GridSearch. The cycle repeats with every prediction it does and learns from the newly acquired data, making it more robust.

Deployment:

We will be deploying the model to Heroku Cloud.

You might also like...

Modeval (or Modular Eval) is a modular and secure string evaluation library that can be used to create custom parsers or interpreters.

modeval Modeval (or Modular Eval) is a modular and secure string evaluation library that can be used to create custom parsers or interpreters. Basic U

2 Jan 1, 2022

Trackthis - This library can be used to track USPS and UPS shipments.

Trackthis - This library can be used to track USPS and UPS shipments. It has the option of returning the raw API response, or optionally, it can be used to standardize the USPS and UPS responses so they are easier to work with.

0 Mar 29, 2022

Wrapper for the undocumented CodinGame API. Can be used both synchronously and asynchronlously.

codingame API wrapper Pythonic wrapper for the undocumented CodinGame API. Installation Python 3.6 or higher is required. Install codingame with pip:

19 Jun 20, 2022

tidevice can be used to communicate with iPhone device

h 该工具能够用于与iOS设备进行通信, 提供以下功能截图获取手机信息 ipa包的安装和卸载根据bundleID 启动和停止应用列出安装应用信息模拟Xcode运行XCTest，常用的如启动WebDriverAgent测试

1.8k Dec 30, 2022

Information about a signed UEFI Shell that can be used when Secure Boot is enabled.

SignedUEFIShell During our research of the BootHole vulnerability last year, we tried to find as many signed bootloaders as we could. We searched all

61 Jan 3, 2023

vFuzzer is a tool developed for fuzzing buffer overflows, For now, It can be used for fuzzing plain vanilla stack based buffer overflows

vFuzzer vFuzzer is a tool developed for fuzzing buffer overflows, For now, It can be used for fuzzing plain vanilla stack based buffer overflows, The

5 Nov 12, 2022

This repo will have a small amount of Chrome tools that can be used for DFIR, Hacking, Deception, whatever your heart desires.

Chrome-Tools Overview Welcome to the repo. This repo will have a small amount of Chrome tools that can be used for DFIR, Hacking, Deception, whatever

5 Jun 8, 2022

Frappe app for authentication, can be used with FrappeVue-AdminLTE

Frappeauth App Frappe app for authentication, can be used with FrappeVue-AdminLTE

9 Apr 13, 2022

A collection of UIKit components that can be used as a Wagtail StreamField block.

Wagtail UIKit Blocks A collection of UIKit components that can be used as a Wagtail StreamField block. Available UIKit components Container Grid Headi

13 Dec 15, 2022

Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Related tags

Overview

Retrainable-Faulty-Wafer-Detector

Aim of the project:

Data Description:

Directory creation:

Data Validation:

Data Insertion in Database:

Data Pre-processing and Model Training:

Prediction:

Retraining:

Deployment:

You might also like...

Modeval (or Modular Eval) is a modular and secure string evaluation library that can be used to create custom parsers or interpreters.

Trackthis - This library can be used to track USPS and UPS shipments.

Wrapper for the undocumented CodinGame API. Can be used both synchronously and asynchronlously.

tidevice can be used to communicate with iPhone device

Information about a signed UEFI Shell that can be used when Secure Boot is enabled.

vFuzzer is a tool developed for fuzzing buffer overflows, For now, It can be used for fuzzing plain vanilla stack based buffer overflows

This repo will have a small amount of Chrome tools that can be used for DFIR, Hacking, Deception, whatever your heart desires.

Frappe app for authentication, can be used with FrappeVue-AdminLTE

A collection of UIKit components that can be used as a Wagtail StreamField block.

Owner

Arun Singh Babal

To effectively detect the faulty wafers

Cirq is a Python library for writing, manipulating, and optimizing quantum circuits and running them against quantum computers and simulators

nbsafety adds a layer of protection to computational notebooks by solving the stale dependency problem when executing cells out-of-order

Skull shaped MOSFET cells for the Efabless's 130nm process

This is a Python program I wrote to simulate the solar system with 79 lines of code.

Python data loader for Solar Orbiter's (SolO) Energetic Particle Detector (EPD).

Stopmagic gives you the power of creating amazing Stop Motion animations faster and easier than ever before.

An integrated library for checking email if it is registered on social media

Originally used during Marketplace.tf's open period, this program was used to get the profit of items bought with keys and sold for dollars.

Ikaros is a free financial library built in pure python that can be used to get information for single stocks, generate signals and build prortfolios