A simple guide to MLOps through ZenML and its various integrations.

Overview

ZenBytes

ZenML Logo

Join our Slack Slack Community and become part of the ZenML family
Give the main ZenML repo a Slack GitHub star to show your love

Sam

ZenBytes is a series of practical lessons about MLOps through ZenML and its various integrations. It is intended for people looking to learn about MLOps generally, and also practitioners specifically looking to learn more about ZenML.

🙏 About ZenML

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. The ZenML repository and Docs has more details.

ZenML is a good tool to learn MLOps because of two reasons:

🔹 ZenML focuses on being un-opinionated about underlying tooling and infrastructure across the MLOps stack. 🔹 ZenML presents itself as a pipeline tool, making all development in ZenML data-centric rather than model-centric.

🧱 Structure of Lessons

The lessons are structured in Chapters. Each chapter is a notebook that walks through and explains various concepts:

  • Chapter 0: Basics
  • Chapter 1: Building a ML(Ops) pipeline
  • Chapter 2: Transitioning across stacks
  • Coming soon: More chapters

💻 System Requirements

In order to run these lessons, you need to have some packages installed on your machine. Note you only need these for some parts, and you might get away with only Python and pip install requirements.txt for some parts of the codebase, but we recommend installing all these:

Currently, this will only run on UNIX systems.

package MacOS installation Linux installation
docker Docker Desktop for Mac Docker Engine for Linux
kubectl kubectl for mac kubectl for linux
k3d Brew Installation of k3d k3d installation linux

You might also need to install Anaconda to get the MLflow deployment to work.

🐍 Python Requirements

Once you've got the system requirements figured out, let's jump into the Python packages you need. Within the Python environment of your choice, run:

git clone https://github.com/zenml-io/zenbytes
pip install -r requirements.txt

If you are running the run.py script, you will also need to install some integrations using zenml:

zenml integration install sklearn -f
zenml integration install dash -f
zenml integration install evidently -f
zenml integration install mlflow -f
zenml integration install kubeflow -f
zenml integration install seldon -f

📓 Diving into the code

We're ready to go now. You can go through the notebook step-by-step guide:

jupyter notebook

🏁 Cleaning up when you're done

Once you are done running all notebooks you might want to stop all running processes. For this, run the following command. (This will tear down your k3d cluster and the local docker registry.)

zenml stack set aws_kubeflow_stack
zenml stack down -f
zenml stack set local_kubeflow_stack
zenml stack down -f

FAQ

  1. MacOS When starting the container registry for Kubeflow, I get an error about port 5000 not being available. OSError: [Errno 48] Address already in use

Solution: In order for Kubeflow to run, the docker container registry currently needs to be at port 5000. MacOS, however, uses port 5000 for the Airplay receiver. Here is a guide on how to fix this Freeing up port 5000.

Comments
  • ModuleNotFoundError: No module named 'zenml.integrations.sklearn.helpers'

    ModuleNotFoundError: No module named 'zenml.integrations.sklearn.helpers'

    I am facing the issue while importing the below code: from zenml.integrations.sklearn.helpers.digits import get_digits

    Error: ModuleNotFoundError: No module named 'zenml.integrations.sklearn.helpers'

    opened by Coder-Vishali 6
  • Restructure ZenBytes

    Restructure ZenBytes

    I restructured/revamped ZenBytes:

    • Instead of 3 large notebooks, we now have 7 smaller lessons (more coming soon) that are centred around specific MLOps topics.
    • The README has been redesigned from ground up to better communicate what exactly ZenBytes is and how it relates to ZenML.
    • The old notebook "00 - Basics" was completely removed as the content was not related to ML (and too trivial IMO).
    • The content of the 7 new ZenByte lessons is largely taken from the old ZenBytes 01 and 02, but has been majorly rearranged, commented, and generally polished.
    • Newly added content: experiment tracking with W&B
    • Added 'Open in Colab' buttons (notebooks not tested in Colab yet)

    Next Steps:

    • add overview of entire MLOps stack being formed
    • add more visualizations
    • chapter 3: add drift detection between training and serving in 3.1
    • chapter 3: add lesson on drift detection
    • chapter 3: add lesson on exploratory data analysis
    • chapter 3: add lesson on data validation
    • chapter 4: split 4.1 into several smaller lessons to set up infrastructure, software, and MLOps stack one by one
    opened by fa9r 5
  • Rework ZenBytes to include ZenML Dashboard

    Rework ZenBytes to include ZenML Dashboard

    I reworked ZenBytes to include the ZenML dashboard wherever it makes sense.

    I also deleted lesson 4.1 and made several minor adjustments to all other lessons to ensure everything can run both locally and on Colab.

    opened by fa9r 1
  • Adjust ZenBytes to zenml version 0.8.0.

    Adjust ZenBytes to zenml version 0.8.0.

    DO NOT MERGE BEFORE 0.8.0 IS LIVE

    • changed zenml integration install ... -f to zenml integration install ... -y
    • changed zenml ... register ... --type=... to zenml ... register ... --flavor=...
    • reordered imports
    opened by fa9r 1
  • Replace `-f` with `-y` in ZenBytes

    Replace `-f` with `-y` in ZenBytes

    I forgot that even though the code is merged, it won't be available for users until we make a release, so I reverted the original PR and will wait until after the 0.8.0 release until I merge it. Sorry for the mixup!

    internal 
    opened by strickvl 1
  • Setup linting and formatting dev tools

    Setup linting and formatting dev tools

    I setup dev tools for linting and formatting to improve code quality of our .py files.

    Changes:

    • Added pyproject.toml that defines all dev tools and their configs. This also enables poetry install to set them up.
    • Added scripts for linting, formatting, and spell-checking in scripts/.
    • Defined pre-commit hooks in .pre-commit-config.yaml that calls linting and spell-checking.
    • Moved all the .py code files into src/ and fixed all linting/formatting issues.
    opened by fa9r 1
Owner
ZenML
Building production MLOps tooling.
ZenML
⏳ Tempo: The MLOps Software Development Kit

Tempo provides a unified interface to multiple MLOps projects that enable data scientists to deploy and productionise machine learning systems.

Seldon 36 Jun 20, 2021
Pragmatic AI Labs 421 Dec 31, 2022
End to End toy example of MLOps

churn_model MLOps Toy Example End to End You might find below links useful Connect VSCode to Git MLFlow Port Heroku App Project Organization ├── LICEN

Ashish Tele 6 Feb 6, 2022
MLOps pipeline project using Amazon SageMaker Pipelines

This project shows steps to build an end to end MLOps architecture that covers data prep, model training, realtime and batch inference, build model registry, track lineage of artifacts and model drift detection. It utilizes SageMaker Pipelines that offers machine learning (ML) to orchestrate SageMaker jobs and author reproducible ML pipelines.

AWS Samples 3 Sep 16, 2022
Azure MLOps (v2) solution accelerators.

Azure MLOps (v2) solution accelerator Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting poi

Microsoft Azure 233 Jan 1, 2023
A complete guide to start and improve in machine learning (ML)

A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2021 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art techniques!

Louis-François Bouchard 3.3k Jan 4, 2023
A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Demand-Forecasting Business Problem A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Ayşe Nur Türkaslan 3 Mar 6, 2022
Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Simple but maybe too simple config management through python data classes. We use it for machine learning.

Eren Gölge 67 Nov 29, 2022
CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning applications.

SmartSim Example Zoo This repository contains CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning appl

Cray Labs 14 Mar 30, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 363 Dec 14, 2022
A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

null 17 Aug 14, 2022
Cohort Intelligence used to solve various mathematical functions

Cohort-Intelligence-for-Mathematical-Functions About Cohort Intelligence : Cohort Intelligence ( CI ) is an optimization technique. It attempts to mod

Aayush Khandekar 2 Oct 25, 2021
Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

Self Organising Map for Clustering of Atomistic Samples - V2 Description Self Organising Map (also known as Kohonen Network) implemented in Python for

Franco Aquistapace 0 Nov 16, 2021
Machine-care - A simple python script to take care of simple maintenance tasks

Machine care An simple python script to take care of simple maintenance tasks fo

null 2 Jul 10, 2022
Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

Parallelized symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution, simulated annealing, and gradient-free optimization.

Miles Cranmer 924 Jan 3, 2023
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 23.3k Dec 31, 2022
A simple example of ML classification, cross validation, and visualization of feature importances

Simple-Classifier This is a basic example of how to use several different libraries for classification and ensembling, mostly with sklearn. Example as

Rob 2 Aug 25, 2022
Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

SDK: Overview of the Kubeflow pipelines service Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on

Kubeflow 3.1k Jan 6, 2023
In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

SKlearn_to_MLFLow In this Repo a simple Sklearn Model will be trained and pushed to MLFlow Install This Repo is based on poetry python3 -m venv .venv

null 1 Dec 13, 2021