To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

You might also like...
Allows you to canibalize methods from classes effectively implementing trait-oriented programming

About This package enables code reuse in non-inheritance way from existing classes, effectively implementing traits-oriented programming pattern. Stor

Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Senexia - A powerful telegram bot to manage your groups as effectively as possible

⚡ Kenechi bot ⚡ A Powerful, Smart And Simple Group Manager ... Written with AioG

szrose is an all in one group management bot made for managing your group effectively with some advance security tools & Suit For All Your Needs ❤️
szrose is an all in one group management bot made for managing your group effectively with some advance security tools & Suit For All Your Needs ❤️

szrose is an all in one group management bot made for managing your group effectively with some advance security tools & Suit For All Your Needs ❤️

Official python API for Phish.AI public and private API to detect zero-day phishing websites

phish-ai-api Summary Official python API for Phish.AI public and private API to detect zero-day phishing websites How it Works (TLDR) Essentially we h

1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

Instead, two models for appearance modeling are included, together with the open-source BAGS model and the full set of code for inference. With this code, you can achieve around mAP@23 with TAO test set (based on our estimation).

Code and yara rules to detect and analyze Cobalt Strike

Cobalt Strike Resources This repository contains: analyze.py: a script to analyze a Cobalt Strike beacon (python analyze.py BEACON) extract.py; extrac

Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models
Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Python scripts to detect faces using Python with the BlazeFace Tensorflow Lite models. Tested on Windows 10, Tensorflow 2.4.0 (Python 3.8).

Track to Detect and Segment: An Online Multi-Object Tracker (CVPR 2021)
Track to Detect and Segment: An Online Multi-Object Tracker (CVPR 2021)

Track to Detect and Segment: An Online Multi-Object Tracker (CVPR 2021) Track to Detect and Segment: An Online Multi-Object Tracker Jialian Wu, Jiale

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Detect and fix skew in images containing text
Detect and fix skew in images containing text

Alyn Skew detection and correction in images containing text Image with skew Image after deskew Install and use via pip! Recommended way(using virtual

Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

Detect handwritten words in a text-line (classic image processing method).
Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box. text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

 AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car Tool which allow you to detect and translate text.
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Code and yara rules to detect and analyze Cobalt Strike

Cobalt Strike Resources This repository contains: analyze.py: a script to analyze a Cobalt Strike beacon (python analyze.py BEACON) extract.py; extrac

Tek 224 Jan 4, 2023
Broken Link Finder is a Burp Extension to detect broken links for a passive scanning domains and links.

Broken Link Finder Broken Link Finder is a Burp Extension to detect broken links for a passive scanning domains and links. Inspired by InitRoot's link

Red Section 10 Sep 11, 2021
Regular Expressions - Use regular expressions to detect date format

A list of all the resources used https://regex101.com/ - To test regex https://w

Ravika Nagpal 1 Jan 4, 2022
A Python script to parse Fortinet products serial numbers, and detect the associated model and version.

ParseFortinetSerialNumber A Python script to parse Fortinet products serial numbers, and detect the associated model and version. Example $ ./ParseFor

Podalirius 10 Oct 28, 2022
This is a Python script to detect rapid upwards price changes (pumps) in a cryptocurrency pairing

A python script to detect a rapid upwards price brekout (pump) in a cryptocurrency pairing, through pandas and Binance API.

null 3 May 25, 2022
Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The project retrains itself after every prediction, making it more robust and generalized over time.

Arun Singh Babal 2 Jul 1, 2022
A Serverless Application Model stack that persists the $XRP price to the XRPL every minute as a TrustLine. There are no servers, it is effectively a "smart contract" in Python for the XRPL.

xrpl-price-persist-oracle-sam This is a XRPL Oracle that publishes external data into the XRPL. This Oracle was inspired by XRPL-Labs/XRPL-Persist-Pri

Joseph Chiocchi 11 Dec 17, 2022
Seaborn-image is a Python image visualization library based on matplotlib and provides a high-level API to draw attractive and informative images quickly and effectively.

seaborn-image: image data visualization Description Seaborn-image is a Python image visualization library based on matplotlib and provides a high-leve

null 48 Jan 5, 2023
Python package to easily work with selenium and manage tabs effectively.

Simple Selenium The aim of this package is to quickly get started with working with selenium for simple browser automation tasks. Installation Install

Vishal Kumar Mishra 1 Oct 27, 2021
SecurAID securely connects aid organizations directly with individuals in dangerous situations to allow them to discreetly and effectively get the assistance they need.

SecurAID securely connects aid organizations directly with individuals in dangerous situations to allow them to discreetly and effec

Ty K 2 Mar 23, 2022