Tutorial repo for an end-to-end Data Science project

Deena Gergis

Last update: Dec 30, 2022

Related tags

Overview

End-to-end Data Science project

This is the repo with the notebooks, code, and additional material used in the ITI's workshop. The goal of the sessions was to illustrate the end-to-end process of an real project.

Additional material

In addition to the notebooks and code, the following material is also available:

Video recordings of the sessions are uploaded to youtube
Slide decks are also added to this repo here

Problem statement

Our (fictional) client is an IT educational institute. They have reached out to us has reach out with the following: “IT jobs and technologies keep evolving quickly. This makes our field to be one of the most interesting out there. But on the other hand, such fast development confuses our students. They do not know which skills they need to learn for which job. “Do I need to learn C++ to be a Data Scientist?” “Do DevOps and System admins use the same technologies?” “I really like JavaScript; can I use it in Data Analytics?” Those are some of the questions that our students ask. Could you please develop a data-driven solution for our students to answer such questions? They mostly want to understand the relationships between the jobs and the technologies.

Level guide

	Basic	Intermediate	Advanced
Business case		Decide on the KPIs that you will positively influence	Calculate the expected financial returns
Data collection	Decide on and collect a suitable data source for your business case	Decide on, collect and connect multiple data sources for better performance
Legal review		Get basic information about the local data privacy law	Study the local data privacy law
Cookie Cutter	Create the standard directory structure
Git	Use Git's GUI to track on master branch	Use Git's CLI to track on Dev branch and merge back to Master	Decide on a branching strategy and solve merge conflicts
Environments	Install python packages using conda	Create a dedicated conda environment	Share your environment and install it on a different machine
Data cleaning	Use basic statistics to filter out non-sense entries	Use advanced statistics and unsupervised learning to filter out non-sense entries	Calculate a 'sanity probability value' for each data point and use it later as the weight
Descriptive analytics	Calculate summary statistics to provide data insights	Produce visualizations to provide deeper understanding	Apply unsupervised learning to provide even deeper understanding
Predictive analytics	Create a single baseline model	Create multiple hyper-tuned models. Benchmark their performance	Combine the chosen models via ensemble and provide prediction confidence
Prescriptive analytics			Recommend the action that the user should take
Software Engineering	Refactor your notebooks to simple python scripts	Create a production OOP class for predictions	Expose your model using an API
MLops	Export and load models from pickle files	Track your models using Mlflow	Create and run a docker image for your project
Product	Create a Web App / GUI to expose prediction functionality	Add the relevant historical insights, predictions and optimization results	Collect users' feedback and retrain your model accordingly

You might also like...

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

687 Jan 7, 2023

PURE: End-to-End Relation Extraction

PURE: End-to-End Relation Extraction This repository contains (PyTorch) code and pre-trained models for PURE (the Princeton University Relation Extrac

657 Jan 9, 2023

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022

An end-to-end PyTorch framework for image and video classification

What's New: March 2021: Added RegNetZ models November 2020: Vision Transformers now available, with training recipes! 2020-11-20: Classy Vision v0.5 R

1.5k Dec 31, 2022

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

695 Jan 5, 2023

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

52 Jan 4, 2023

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

41 Dec 9, 2022

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems This is the implementation of the paper: Learning Knowledge Bases with Par

42 Nov 10, 2022

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

This is the project page for the paper: ISTR: End-to-End Instance Segmentation via Transformers, Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wa

182 Dec 19, 2022

Tutorial repo for an end-to-end Data Science project

Related tags

Overview

End-to-end Data Science project

Additional material

Problem statement

Level guide

You might also like...

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

PURE: End-to-End Relation Extraction

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

An end-to-end PyTorch framework for image and video classification

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Owner

Deena Gergis

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Rafael Project- Classifying rockets to different types using data science algorithms.

End-to-end machine learning project for rices detection

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Simulation code and tutorial for BBHnet training data

A Peer-to-peer Platform for Secure, Privacy-preserving, Decentralized Data Science

Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data

🛠 All-in-one web-based IDE specialized for machine learning and data science.

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

End-to-End Object Detection with Fully Convolutional Network