A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

Pinar Oner

Last update: Dec 18, 2021

Related tags

Machine Learning feature_engineering

Overview

FEATURE ENGINEERING

Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is expected that the dataset will be ready for modelling when passed through this script.

Story of the Dataset:
The dataset is the dataset of the people who were in the Titanic shipwreck. It consists of 768 observations and 12 variables. The target variable is specified as "Survived";

0: indicates the person's inability to survive.

1: refers to the survival of the person.

ATTRIBUTES:

PassengerId: ID of the passenger

Survived: Survival status (0: not survived, 1: survived)

Pclass: Ticket class (1: 1st class (upper), 2: 2nd class (middle), 3: 3rd class(lower))

Name: Name of the passenger

Sex: Gender of the passenger (male, female)

Age: Age in years

Sibsp: Number of siblings/spouses aboard the Titanic
Sibling = Brother, sister, stepbrother, stepsister
Spouse = Husband, wife (mistresses and fiances were ignored)

Parch: Number of parents/children aboard the Titanic
Parent = Mother, father
Child = Daughter, son, stepdaughter, stepson
Some children travelled only with a nanny , therefore Parch = 0 for them.

Ticket: Ticket number # Fare: Passenger fare

Cabin: Cabin number

Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

REFERENCE: Data Science and ML Boot Camp, 2021, Veri Bilimi Okulu (https://www.veribilimiokulu.com/)

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

19 Oct 3, 2022

Data science, Data manipulation and Machine learning package.

duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l

3 Oct 19, 2022

Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

2 Jul 29, 2021

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

5.7k Dec 30, 2022

Regularization and Feature Selection in Least Squares Temporal Difference Learning

Regularization and Feature Selection in Least Squares Temporal Difference Learning Description This is Python implementations of Least Angle Regressio

0 Jan 18, 2022

Python ML pipeline that showcases mltrace functionality.

mltrace tutorial Date: October 2021 This tutorial builds a training and testing pipeline for a toy ML prediction problem: to predict whether a passeng

28 Nov 9, 2022

MLOps pipeline project using Amazon SageMaker Pipelines

This project shows steps to build an end to end MLOps architecture that covers data prep, model training, realtime and batch inference, build model registry, track lineage of artifacts and model drift detection. It utilizes SageMaker Pipelines that offers machine learning (ML) to orchestrate SageMaker jobs and author reproducible ML pipelines.

3 Sep 16, 2022

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

4.2k Dec 29, 2022

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

121 Dec 28, 2022

A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

Related tags

Overview

You might also like...

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

Data science, Data manipulation and Machine learning package.

Data Version Control or DVC is an open-source tool for data science and machine learning projects

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Regularization and Feature Selection in Least Squares Temporal Difference Learning

Python ML pipeline that showcases mltrace functionality.

MLOps pipeline project using Amazon SageMaker Pipelines

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Owner

Pinar Oner

Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

A data preprocessing package for time series data. Design for machine learning and deep learning.

Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

fMRIprep Pipeline To Machine Learning

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques