Ml-design-patterns - Source code accompanying O'Reilly book: Machine Learning Design Patterns

Overview

This is not an official Google product

ml-design-patterns

Source code accompanying O'Reilly book:
Title: Machine Learning Design Patterns
Authors: Valliappa (Lak) Lakshmanan, Sara Robinson, Michael Munn

https://www.oreilly.com/library/view/machine-learning-design/9781098115777/

Buy from O'Reilly
Buy from Amazon

We will update this repo with source code as we write each chapter. Stay tuned!

Chapters

  • Preface
  • The Need for ML Design Patterns
  • Data representation design patterns
    • #1 Hashed Feature
    • #2 Embedding
    • #3 Feature Cross
    • #4 Multimodal Input
  • Problem representation design patterns
    • #5 Reframing
    • #6 Multilabel
    • #7 Ensemble
    • #8 Cascade
    • #9 Neutral Class
    • #10 Rebalancing
  • Patterns that modify model training
    • #11 Useful overfitting
    • #12 Checkpoints
    • #13 Transfer Learning
    • #14 Distribution Strategy
    • #15 Hyperparameter Tuning
  • Resilience patterns
    • #16 Stateless Serving Function
    • #17 Batch Serving
    • #18 Continuous Model Evaluation
    • #19 Two Phase Predictions
    • #20 Keyed Predictions
  • Reproducibility patterns
    • #21 Transform
    • #22 Repeatable Sampling
    • #23 Bridged Schema
    • #24 Windowed Inference
    • #25 Workflow Pipeline
    • #26 Feature Store
    • #27 Model Versioning
  • Responsible AI
    • #28 Heuristic benchmark
    • #29 Explainable Predictions
    • #30 Fairness Lens
  • Summary
Comments
  • Include a link to open the project

    Include a link to open the project

    Hey, I'm reading through the book and thoroughly enjoying it, thanks @lakshmanok!

    I have some trouble viewing the notebooks here on GitHub though (it keeps saying "Sorry, something went wrong", likely hitting some size limits).

    I work at Deepnote, and we try to build better data science notebooks. The proposed button opens the repo as a project in Deepnote, and you can view and execute all of the files – it might be helpful for other readers too :)

    opened by robertlacok 2
  • Chapter 3 : Cascade evaluate ValueError: The pyarrow library is not installed

    Chapter 3 : Cascade evaluate ValueError: The pyarrow library is not installed

    Dear authors, the evaluate component of the pipeline fails due to the lack of pyarrow module.

    Solved by changing the module request in the pipeline definition :

    dsl.pipeline(
        name='Cascade pipeline on SF bikeshare',
        description='Cascade pipeline on SF bikeshare'
    )
    
    def cascade_pipeline(
        project_id = PROJECT_ID
    ):
        ddlop = comp.func_to_container_op(run_bigquery_ddl, packages_to_install=['google-cloud-bigquery'])
            
        c1 = train_classification_model(ddlop, PROJECT_ID)
        c1_model_name = c1.outputs['created_table']
        
        c2a_input = create_training_data(ddlop, PROJECT_ID, c1_model_name, 'Typical')
        c2b_input = create_training_data(ddlop, PROJECT_ID, c1_model_name, 'Long')
        
        c3a_model = train_distance_model(ddlop, PROJECT_ID, c2a_input.outputs['created_table'], 'Typical')
        c3b_model = train_distance_model(ddlop, PROJECT_ID, c2b_input.outputs['created_table'], 'Long')
        
        evalop = comp.func_to_container_op(evaluate, packages_to_install=['google-cloud-bigquery[bqstorage,pandas]', 'pandas'])
        error = evalop(PROJECT_ID, c1_model_name, c3a_model.outputs['created_table'], c3b_model.outputs['created_table'])
        print(error.output)
    

    Best Regards

    Jerome

    opened by jeromemassot 1
  • chapter 7: Explainability shap.DeepExplainer fails

    chapter 7: Explainability shap.DeepExplainer fails "Can't convert non-rectangular Python sequence to Tensor".

    shap.DeepExplainer call fails with error "Can't convert non-rectangular Python sequence to Tensor". tensorflow version 2.1.1 shap version 0.37.0

    seems similar to: https://github.com/slundberg/shap/issues/850 thanks, jim

    opened by jimwill3 1
  • Chapter 5: Continued Evaluation - Class label consistency

    Chapter 5: Continued Evaluation - Class label consistency

    The order of the publications is inconsistent between the original CLASSES definition and the source name function:

    CLASSES = { 'github': 0, 'nytimes': 1, 'techcrunch': 2 }

    labels = tf.constant(['github', 'techcrunch', 'nytimes'], dtype=tf.string)

    Suggest:

    labels = tf.constant(['github', 'nytimes','techcrunch'], dtype=tf.string)

    opened by mshearer0 1
  • Chapter 5: Continued Evaluation: Dataset Access, EarlyStopping, Evaluation

    Chapter 5: Continued Evaluation: Dataset Access, EarlyStopping, Evaluation

    1. The munn-sandbox is not publically available so the txtcls is not available.

    I created using code from: https://datalab.office.datisan.com.au/notebooks/training-data-analyst/blogs/textclassification/txtcls.ipynb

    as:

    query=""" SELECT source, REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title FROM (SELECT ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '.://(.[^/]+)/'), '.'))[OFFSET(1)] AS source, title FROM bigquery-public-data.hacker_news.stories WHERE REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.://(.[^/]+)/'), '.com$') AND LENGTH(title) > 10 ) WHERE (source = 'github' OR source = 'nytimes' OR source = 'techcrunch') """

    from google.cloud import bigquery client = bigquery.Client() df = client.query(query).to_dataframe() df.to_csv('titles_full.csv', header=False, index=False, encoding='utf-8', sep=',')

    I had to swap the column order: COLUMNS = ['source', 'title']

    1. With EarlyStopping enabled training finished after just 2 Epochs

    callbacks=[EarlyStopping(), TensorBoard(model_dir)],

    without it loss was minimised after 20.

    1. Evaluation job section is 'to-do':

    "some stuff here about setting up Eval jobs"

    opened by mshearer0 1
  • Chapter 5: Evaluation job not available?

    Chapter 5: Evaluation job not available?

    From the continuous_eval.ipynb notebook:

    image

    However, I see the option on "Evaluation" is not available (disabled):

    Screenshot 2021-07-20 at 1 47 10 PM

    What's the recommended course of action here?

    opened by sayakpaul 0
  • Chapter 3 : multilabel : unable to get the dataset

    Chapter 3 : multilabel : unable to get the dataset

    Hi everyone,

    I have no access to thte bucket to get the dataset. AccessDeniedException: 403 [email protected] does not have storage.objects.list access to the Google Cloud Storage bucket.

    Best Regards

    Jerome

    opened by jeromemassot 0
  • Errors in two_phase_predictions.ipynb

    Errors in two_phase_predictions.ipynb

    • "audio" path should be "audio_spectros".
    • "image_batch" variable not found.
    • fit_generator() method deprecated, prefer fit().
    • The spectrometer png files get moved, so need to redownload or copy them.
    opened by matthewdrees 1
  • Chapter 6: How should I determine a number of bridging?

    Chapter 6: How should I determine a number of bridging?

    Hello, I am studying this book. Thank you for writing such a well-structured textbook.

    I have a question about the Bridged Schema pattern in section 23. How should I determine the amount of old data to be added to the training data.

    In this repository, it is stated that adding 60,000 old data is best. However, in the line graph of the number of data and the R2 value, 60,000 is rather at the bottom of the R2 value. The higher the R2 value, the better the prediction, so it looks rather like the prediction accuracy is decreasing as older data is added. From the results of this graph, I may conclude that the prediction accuracy is higher when learning only with new data.

    I would be glad if anyone tell me why you decide that adding 60,000 old data was the best.

    opened by TaskeHAMANO 0
  • Chapter 5: Up the patience to at least 5?

    Chapter 5: Up the patience to at least 5?

    @munnm, in the 05_resilience/continuous_eval.ipynb notebook, please set some value for the patience (preferably > 1) argument in the EarlyStopping() callback. Otherwise, it'd only run for a single epoch. Users not familiar with the EarlyStopping callback API of Keras might not understand what's wrong at the very first glance.

    opened by sayakpaul 0
  • Chapter 5  : continuous evaluation

    Chapter 5 : continuous evaluation

    Dear authors,

    it seems to me that the Section 2 of the continuous evaluation notebook needs to be updated. Indeed, the continuous evaluation mode is not more straightforward and requires more setup information to be used flawlessly,

    Thanks Best Regards

    Jerome

    opened by jeromemassot 0
  • Chapter 4 : checkpointing : callback not used in the fit() method

    Chapter 4 : checkpointing : callback not used in the fit() method

    Hi authors team,

    just to inform you that the checkpointing callback is not called in the fit() method. I guess that it is a typo as the callback is used in the code snippet in the book.

    Best regards

    Jerome

    opened by jeromemassot 0
Owner
Google Cloud Platform
Google Cloud Platform
System Design Assignments as part of Arpit's System Design Masterclass

System Design Assignments The repository contains a set of problem statements around Software Architecture and System Design as conducted by Arpit's S

Relog 1.1k Jan 9, 2023
An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to art and design.

Awesome AI for Art & Design An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to a

Margaret Maynard-Reid 20 Dec 21, 2022
This is the accompanying repository for the Bloomberg Global Coal Countdown website.

This is the accompanying repository for the Bloomberg Global Coal Countdown (BGCC) website. Data Sources Dashboard Data Schema and Validation License

null 7 Jun 1, 2022
This program tries to book a tennis court slot in either Southwark Park or Tanner Street Park in Southwark, London.

Book tennis courts in London This program tries to book a tennis court slot in either Southwark Park or Tanner Street Park in Southwark, London. Note:

Daniele 1 Jul 25, 2022
That is a example of a Book app on Python, made with support of all JS libraries on React framework

React+Python Books App You can use this repository whenever you want Used for a video Create the database: python -m dbutils Start the web server: pyt

Koma Human 1 Apr 20, 2022
Implementation of the Angular Spectrum method in Python to simulate Diffraction Patterns

Diffraction Simulations - Angular Spectrum Method Implementation of the Angular Spectrum method in Python to simulate Diffraction Patterns with arbitr

Rafael de la Fuente 276 Dec 30, 2022
What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space

What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space Introduction: Environment: Python3.6.5, PyTorch1.5.0 Dataset: CIFAR-10, Image

null 8 Mar 23, 2022
Analyzes crypto candles over a set time period and then trades based on winning patterns found

patternstrade Analyzes crypto candles over a set time period and then trades based on winning patterns found. Heavily customizable. Warning: This was

ConnorCreate 14 May 29, 2022
Architectural Patterns implementation by using notification handler module prototype

This repository covers singleton, indirection, factory, adaptor, mediator patterns in python language by using university hypothetical notification module prototype. The code is just for demonstrating the pattern implementation not modules working

Muhammad Umair 2 Jan 8, 2022
Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

null 1 Jan 11, 2022
chiarose(XCR) based on chia(XCH) source code fork, open source public chain

chia-rosechain 一个无耻的小活动 | A shameless little event 如果您喜欢这个项目,请点击star 将赠送您520朵玫瑰,可以去 facebook 留下您的(xcr)地址,和github用户名。 If you like this project, please

ddou123 376 Dec 14, 2022
3D Printed Flip Clock Design and Code

Smart Flip Clock 3D printed smart clock that puts a new twist on old technology. Making The Smart Flip Clock The first thing that must be done for thi

Thomas 105 Oct 17, 2022
Source-o-grapher is a tool built with the aim to investigate software resilience aspects of Open Source Software (OSS) projects.

Source-o-grapher is a tool built with the aim to investigate software resilience aspects of Open Source Software (OSS) projects.

Aristotle University 5 Jun 28, 2022
Here is my Senior Design Project that I implemented to graduate from Computer Engineering.

Here is my Senior Design Project that I implemented to graduate from Computer Engineering. It is a chatbot made in RASA and helps the user to plan their vacation in the Turkish language. In order to plan the user's vacation, it provides reservations by asking various questions for hotel, flight, or event.

Ezgi Subaşı 25 May 31, 2022
An app about keyboards, originating from the design of u/Sonnenschirm

keebapp-backend An app about keyboards, originating from the design of u/Sonnenschirm Setup Firstly, ensure that the environment for python is install

null 8 Sep 4, 2022
An addin for Autodesk Fusion 360 that lets you view your design in a Looking Glass Portrait 3D display

An addin for Autodesk Fusion 360 that lets you view your design in a Looking Glass Portrait 3D display

Brian Peiris 12 Nov 2, 2022
Random pass word generator made with python. PyQt5 module is used to design GUI.

Differences in this GUI program : Default titlebar removed Custom Minimize,Maximize and Close Buttons Drag & move window from any point Program work l

Dimuth De Zoysa 1 Jan 26, 2022
Eatlocal - This package helps users solve PyBites code challenges on their local machine

eatlocal This package helps the user solve Pybites code challenges locally. Inst

Russell 0 Jul 25, 2022
Source code for Learn Programming: Python

This repository contains the source code of the game engine behind Learn Programming: Python. The two key files are game.py (the main source of the ga

Niema Moshiri 25 Apr 24, 2022