Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

Tezan Sahu

Last update: Jul 22, 2022

Related tags

FastAPI Utilities data-science demo machine-learning deployment dvc fastapi pycaret mlops-workflow

Overview

Using DVC with PyCaret & FastAPI (Demo)

This repo contains all the resources for my demo explaining how to use DVC along with other interesting tools & frameworks like PyCaret & FastAPI for data & model versioning, experimentation with ML models & finally deploying these models quickly for inferencing.

This demo was presented at the DVC Office Hours on 20th Jan 2022.

Note: We will use Azure Blob Storage as our remote storage for this demo. To follow along, it is advised to either create an Azure account or use a different remote for storage.

Steps Followed for the Demo

0. Preliminaries

Create a virtual environment named dvc-demo & install required packages

python3 -m venv dvc-demo
source dvc-demo/bin/activate

pip install dvc[azure] pycaret fastapi uvicorn python-multipart

Initialize the repo with DVC tracking & create a data/ folder

mkdir dvc-pycaret-fastapi-demo
cd dvc-pycaret-fastapi-demo
git init
dvc init

git remote add origin https://github.com/tezansahu/dvc-pycaret-fastapi-demo.git

mkdir data

1. Tracking Data with DVC

We use the Heart Failure Prediction Dataset for this demo.

First, we download the heart.csv file & retain ~800 rows from this file in the data/ folder. (We will use the file with all the rows later - this is to simulate the change/increase in data that an ML workflow sees during its lifetime)

Track this data/heart.csv using DVC

dvc add data/heart.csv
git add data/heart.csv.dvc
git commit -m "add data - phase 1"

2. Setup the Remote for Storing Tracked Data & Models

Go to the Azure Portal & create a Storage Account (here, we name it dvcdemo)
Within the storage account, create a Container (here, we name it demo20jan2022)
Obtain the Connection String from the storage account as follows:
Install the Azure CLI from here & log into Azure from within the terminal using az login

Now, we store the tracked data in Azure:

dvc remote add -d storage azure://demo20jan2022/dvcstore
dvc remote modify --local storage connection_string <connection-string>

dvc push
git push origin main

3. ML Experimentation with PyCaret

Create the notebooks/ folders using mkdir notebook & download the notebooks/experimentation_with_pycaret.ipynb notebook from this repo into this notebooks/ folder.

Track this notebook with Git:

git add notebooks/
git commit -m "add ml training notebook"

Run all the cells mentioned under Phase 1 in the notebook. This involves basics of PyCaret:

Setting up a vanilla experiment with setup()
Comparing various classification models with compare_models()
Evaluating the preformance a model with evaluate_model()
Making predictions on the held-out eval data using predict_model()
Finalizing the model by training on the full training + eval data using finalize_model()
Saving the model pipeline using save_model()

This will create a model.pkl file in the models/ folder

4. Tracking Models with DVC

Now, we track the ML model using DVC & store it in our remote storage

dvc add models/model.pkl
git add models/model.pkl.dvc
git commit -m "add model - phase 1"

dvc push
git push origin main

5. Deploy the Model with FastAPI

First, delete the .dvc/cache/ & models/model.pkl (simulate production env). Then, pull the changes from the DVC remote storage.

dvc pull

Check that the model.pkl file is now present in models/ folder.

Now, create a server/ folder & place the main.py file in it after downloaidng the server/main.py file from this repo. This RESTful API server has 2 POST endpoints:

Inferencing on an individual record
Batch inferencing on a CSV file

We commit this to our repo:

git add server/
git commit -m "create basic fastapi server"

Now, we can run our local server on port 8000

cd server
uvicorn main:app --port=8000

Go to http://localhost:8000/docs & play with the endpoints present in the interactive documentation.

For the individual inference, you could use teh following data:

{
  "Age": 61,
  "Sex": "M",
  "ChestPainType": "ASY",
  "RestingBP": 148,
  "Cholesterol": 203,
  "FastingBS": 0,
  "RestingECG": "Normal",
  "MaxHR": 161,
  "ExerciseAngina": "N",
  "Oldpeak": 0,
  "ST_Slope": "Up"
}

6. Simulating the arrival of New Data

Now, we use the full heart.csv file to simulate the arrival of new data with time. We place it within data/ folder & upload it to DVC remote.

dvc add data/heart.csv
git add data/heart.csv.dvc
git commit -m "add data - phase 2"

dvc push
git push origin main

7. More Experimentation with PyCaret

Now, we run the experiment in Phase 2 of the notebooks/experimentation_with_pycaret.ipynb notebook. This involves:

Feature engineering while setting up teh experient
Fine-tuning of models with tune_model()
Creating an ensemble of models with blend_models()

The blended model is saved as models/modl.pkl

We upload it to our DVC remote.

dvc add models/model.pkl
git add models/model.pkl.dvc
git commit -m "add model - phase 2"

dvc push
git push origin main

8. Redeploying the New Model using FastAPI

Now, we again start the server (no code changes required, because the model file has same name) & perform inference.

cd server
uvicorn main:app --port=8000

With this, we demonstrate how DVC can be used in conjunction with PyCaret & FastAPI for iterating & experimenting efficiently with ML models & deploying them with minimal effort.

Additional Resources

Created with ❤️ by Tezan Sahu

Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

Related tags

Overview

Using DVC with PyCaret & FastAPI (Demo)

Steps Followed for the Demo

0. Preliminaries

1. Tracking Data with DVC

2. Setup the Remote for Storing Tracked Data & Models

3. ML Experimentation with PyCaret

4. Tracking Models with DVC

5. Deploy the Model with FastAPI

6. Simulating the arrival of New Data

7. More Experimentation with PyCaret

8. Redeploying the New Model using FastAPI

Additional Resources

You might also like...

LuSyringe is a documentation injection tool for your classes when using Fast API

A FastAPI Plug-In to support authentication authorization using the Microsoft Authentication Library (MSAL)

Simple FastAPI Example : Blog API using FastAPI : Beginner Friendly

API using python and Fastapi framework

FastAPI CRUD template using Deta Base

Minecraft biome tile server writing on Python using FastAPI

Sample-fastapi - A sample app using Fastapi that you can deploy on App Platform

FastAPI Socket.io with first-class documentation using AsyncAPI

Fully Automated YouTube Channel ▶️with Added Extra Features.

A practical ML pipeline for data labeling with experiment tracking using DVC.

Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

DVC-NLP-Simple-usecase

Data Version Control or DVC is an open-source tool for data science and machine learning projects

Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.

Ralph is the CMDB / Asset Management system for data center and back office hardware.

Simple Python tool to check if there is an Office 365 instance linked to a domain.

Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation

Recreate the joys of Office Assistant from the comfort of the Python interpreter

This tool crawls a list of websites and download all PDF and office documents

Owner

Tezan Sahu

Cookiecutter API for creating Custom Skills for Azure Search using Python and Docker

Cookiecutter template for FastAPI projects using: Machine Learning, Poetry, Azure Pipelines and Pytests

🚀 Cookiecutter Template for FastAPI + React Projects. Using PostgreSQL, SQLAlchemy, and Docker

CLI and Streamlit applications to create APIs from Excel data files within seconds, using FastAPI

Formatting of dates and times in Flask templates using moment.js.

Restful Api developed with Flask using Prometheus and Grafana for monitoring and containerization with Docker :rocket:

Boilerplate code for quick docker implementation of REST API with JWT Authentication using FastAPI, PostgreSQL and PgAdmin ⭐

Sample project showing reliable data ingestion application using FastAPI and dramatiq

Lazy package to start your project using FastAPI✨

Beyonic API Python official client library simplified examples using Flask, Django and Fast API.