A simple guide to MLOps through ZenML and its various integrations.

ZenML

Last update: Dec 27, 2022

Related tags

Overview

ZenBytes

Join our

Slack Community and become part of the ZenML family

Give the main ZenML repo a

GitHub star to show your love

ZenBytes is a series of practical lessons about MLOps through ZenML and its various integrations. It is intended for people looking to learn about MLOps generally, and also practitioners specifically looking to learn more about ZenML.

🙏 About ZenML

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. The ZenML repository and Docs has more details.

ZenML is a good tool to learn MLOps because of two reasons:

🔹 ZenML focuses on being un-opinionated about underlying tooling and infrastructure across the MLOps stack. 🔹 ZenML presents itself as a pipeline tool, making all development in ZenML data-centric rather than model-centric.

🧱 Structure of Lessons

The lessons are structured in Chapters. Each chapter is a notebook that walks through and explains various concepts:

Chapter 0: Basics
Chapter 1: Building a ML(Ops) pipeline
Chapter 2: Transitioning across stacks
Coming soon: More chapters

💻 System Requirements

In order to run these lessons, you need to have some packages installed on your machine. Note you only need these for some parts, and you might get away with only Python and pip install requirements.txt for some parts of the codebase, but we recommend installing all these:

Currently, this will only run on UNIX systems.

package	MacOS installation	Linux installation
docker	Docker Desktop for Mac	Docker Engine for Linux
kubectl	kubectl for mac	kubectl for linux
k3d	Brew Installation of k3d	k3d installation linux

You might also need to install Anaconda to get the MLflow deployment to work.

🐍 Python Requirements

Once you've got the system requirements figured out, let's jump into the Python packages you need. Within the Python environment of your choice, run:

git clone https://github.com/zenml-io/zenbytes
pip install -r requirements.txt

If you are running the run.py script, you will also need to install some integrations using zenml:

zenml integration install sklearn -f
zenml integration install dash -f
zenml integration install evidently -f
zenml integration install mlflow -f
zenml integration install kubeflow -f
zenml integration install seldon -f

📓 Diving into the code

We're ready to go now. You can go through the notebook step-by-step guide:

jupyter notebook

🏁 Cleaning up when you're done

Once you are done running all notebooks you might want to stop all running processes. For this, run the following command. (This will tear down your k3d cluster and the local docker registry.)

zenml stack set aws_kubeflow_stack
zenml stack down -f
zenml stack set local_kubeflow_stack
zenml stack down -f

❓ FAQ

MacOS When starting the container registry for Kubeflow, I get an error about port 5000 not being available. OSError: [Errno 48] Address already in use

Solution: In order for Kubeflow to run, the docker container registry currently needs to be at port 5000. MacOS, however, uses port 5000 for the Airplay receiver. Here is a guide on how to fix this Freeing up port 5000.

Comments

ModuleNotFoundError: No module named 'zenml.integrations.sklearn.helpers'

I am facing the issue while importing the below code: from zenml.integrations.sklearn.helpers.digits import get_digits

Error: ModuleNotFoundError: No module named 'zenml.integrations.sklearn.helpers'

opened by Coder-Vishali 6
Restructure ZenBytes
I restructured/revamped ZenBytes:

Instead of 3 large notebooks, we now have 7 smaller lessons (more coming soon) that are centred around specific MLOps topics.

The README has been redesigned from ground up to better communicate what exactly ZenBytes is and how it relates to ZenML.

The old notebook "00 - Basics" was completely removed as the content was not related to ML (and too trivial IMO).

The content of the 7 new ZenByte lessons is largely taken from the old ZenBytes 01 and 02, but has been majorly rearranged, commented, and generally polished.

Newly added content: experiment tracking with W&B

Added 'Open in Colab' buttons (notebooks not tested in Colab yet)

Next Steps:

add overview of entire MLOps stack being formed

add more visualizations

chapter 3: add drift detection between training and serving in 3.1

chapter 3: add lesson on drift detection

chapter 3: add lesson on exploratory data analysis

chapter 3: add lesson on data validation

chapter 4: split 4.1 into several smaller lessons to set up infrastructure, software, and MLOps stack one by one
opened by fa9r 5
Rework ZenBytes to include ZenML Dashboard

I reworked ZenBytes to include the ZenML dashboard wherever it makes sense.

I also deleted lesson 4.1 and made several minor adjustments to all other lessons to ensure everything can run both locally and on Colab.

opened by fa9r 1
Adjust ZenBytes to zenml version 0.8.0.
DO NOT MERGE BEFORE 0.8.0 IS LIVE

changed zenml integration install ... -f to zenml integration install ... -y

changed zenml ... register ... --type=... to zenml ... register ... --flavor=...

reordered imports
opened by fa9r 1
Replace `-f` with `-y` in ZenBytes

I forgot that even though the code is merged, it won't be available for users until we make a release, so I reverted the original PR and will wait until after the 0.8.0 release until I merge it. Sorry for the mixup!
internal

opened by strickvl 1
Setup linting and formatting dev tools
I setup dev tools for linting and formatting to improve code quality of our .py files.

Changes:

Added pyproject.toml that defines all dev tools and their configs. This also enables poetry install to set them up.

Added scripts for linting, formatting, and spell-checking in scripts/.

Defined pre-commit hooks in .pre-commit-config.yaml that calls linting and spell-checking.

Moved all the .py code files into src/ and fixed all linting/formatting issues.
opened by fa9r 1

Owner

ZenML

Building production MLOps tooling.

GitHub

⏳ Tempo: The MLOps Software Development Kit

Tempo provides a unified interface to multiple MLOps projects that enable data scientists to deploy and productionise machine learning systems.

36 Jun 20, 2021

This is a public repo where code samples are stored for the book Practical MLOps.

[Book-2021] Practical MLOps O'Reilly Book

421 Dec 31, 2022

End to End toy example of MLOps

churn_model MLOps Toy Example End to End You might find below links useful Connect VSCode to Git MLFlow Port Heroku App Project Organization ├── LICEN

6 Feb 6, 2022

MLOps pipeline project using Amazon SageMaker Pipelines

This project shows steps to build an end to end MLOps architecture that covers data prep, model training, realtime and batch inference, build model registry, track lineage of artifacts and model drift detection. It utilizes SageMaker Pipelines that offers machine learning (ML) to orchestrate SageMaker jobs and author reproducible ML pipelines.

3 Sep 16, 2022

Azure MLOps (v2) solution accelerators.

Azure MLOps (v2) solution accelerator Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting poi

233 Jan 1, 2023

A complete guide to start and improve in machine learning (ML)

A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2021 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art techniques!

3.3k Jan 4, 2023

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Demand-Forecasting Business Problem A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

3 Mar 6, 2022

Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Simple but maybe too simple config management through python data classes. We use it for machine learning.

67 Nov 29, 2022

CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning applications.

SmartSim Example Zoo This repository contains CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning appl

14 Mar 30, 2022

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

363 Dec 14, 2022

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

17 Aug 14, 2022

Cohort Intelligence used to solve various mathematical functions

Cohort-Intelligence-for-Mathematical-Functions About Cohort Intelligence : Cohort Intelligence ( CI ) is an optimization technique. It attempts to mod

2 Oct 25, 2021

Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

Self Organising Map for Clustering of Atomistic Samples - V2 Description Self Organising Map (also known as Kohonen Network) implemented in Python for

0 Nov 16, 2021

Machine-care - A simple python script to take care of simple maintenance tasks

Machine care An simple python script to take care of simple maintenance tasks fo

2 Jul 10, 2022

Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

Parallelized symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution, simulated annealing, and gradient-free optimization.

924 Jan 3, 2023

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022

A simple example of ML classification, cross validation, and visualization of feature importances

Simple-Classifier This is a basic example of how to use several different libraries for classification and ensembling, mostly with sklearn. Example as

2 Aug 25, 2022

Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

SDK: Overview of the Kubeflow pipelines service Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on

3.1k Jan 6, 2023

In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

SKlearn_to_MLFLow In this Repo a simple Sklearn Model will be trained and pushed to MLFlow Install This Repo is based on poetry python3 -m venv .venv

1 Dec 13, 2021

A simple guide to MLOps through ZenML and its various integrations.

Related tags

Overview

ZenBytes

🙏 About ZenML

🧱 Structure of Lessons

💻 System Requirements

🐍 Python Requirements

📓 Diving into the code

🏁 Cleaning up when you're done

❓ FAQ

Comments

ModuleNotFoundError: No module named 'zenml.integrations.sklearn.helpers'

Restructure ZenBytes

Rework ZenBytes to include ZenML Dashboard

Adjust ZenBytes to zenml version 0.8.0.

Replace `-f` with `-y` in ZenBytes

Setup linting and formatting dev tools

Owner

ZenML

⏳ Tempo: The MLOps Software Development Kit

This is a public repo where code samples are stored for the book Practical MLOps.

End to End toy example of MLOps

MLOps pipeline project using Amazon SageMaker Pipelines

Azure MLOps (v2) solution accelerators.

A complete guide to start and improve in machine learning (ML)

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning applications.

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

Cohort Intelligence used to solve various mathematical functions

Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

Machine-care - A simple python script to take care of simple maintenance tasks

Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

A simple example of ML classification, cross validation, and visualization of feature importances

Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

In this Repo a simple Sklearn Model will be trained and pushed to MLFlow