This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities

Google Cloud Platform

Last update: Dec 21, 2022

Related tags

Deep Learning tensorflow gcp google-cloud-platform tfx mlops vertex-ai

Overview

MLOps with Vertex AI

This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities. The example use Keras to implement the ML model, TFX to implement the training pipeline, and Model Builder SDK to interact with Vertex AI.

Getting started

Setup your MLOps environment on Google Cloud.
Start your AI Notebook instance.
Open the JupyterLab then open a new Terminal

Clone the repository to your AI Notebook instance:

git clone https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai.git
cd mlops-with-vertex-ai

Install the required Python packages:
```
pip install tfx==1.2.0 --user
pip install -r requirements.txt
```
NOTE: You can ignore the pip dependencies issues. These will be fixed when upgrading to subsequent TFX version.

Upgrade the gcloud components:

sudo apt-get install google-cloud-sdk
gcloud components update

Dataset Management

The Chicago Taxi Trips dataset is one of public datasets hosted with BigQuery, which includes taxi trips from 2013 to the present, reported to the City of Chicago in its role as a regulatory agency. The task is to predict whether a given trip will result in a tip > 20%.

The 01-dataset-management notebook covers:

Performing exploratory data analysis on the data in BigQuery.
Creating Vertex AI Dataset resource using the Python SDK.
Generating the schema for the raw data using TensorFlow Data Validation.

ML Development

We experiment with creating a Custom Model using 02-experimentation notebook, which covers:

Preparing the data using Dataflow.
Implementing a Keras classification model.
Training the Keras model with Vertex AI using a pre-built container.
Upload the exported model from Cloud Storage to Vertex AI.
Extract and visualize experiment parameters from Vertex AI Metadata.
Use Vertex AI for hyperparameter tuning.

We use Vertex TensorBoard and Vertex ML Metadata to track, visualize, and compare ML experiments.

In addition, the training steps are formalized by implementing a TFX pipeline. The 03-training-formalization notebook covers implementing and testing the pipeline components interactively.

Training Operationalization

The 04-pipeline-deployment notebook covers executing the CI/CD steps for the training pipeline deployment using Cloud Build. The CI/CD routine is defined in the pipeline-deployment.yaml file, and consists of the following steps:

Clone the repository to the build environment.
Run unit tests.
Run a local e2e test of the TFX pipeline.
Build the ML container image for pipeline steps.
Compile the pipeline.
Upload the pipeline to Cloud Storage.

Continuous Training

After testing, compiling, and uploading the pipeline definition to Cloud Storage, the pipeline is executed with respect to a trigger. We use Cloud Functions and Cloud Pub/Sub as a triggering mechanism. The Cloud Function listens to the Pub/Sub topic, and runs the training pipeline given a message sent to the Pub/Sub topic. The Cloud Function is implemented in src/pipeline_triggering.

The 05-continuous-training notebook covers:

Creating a Cloud Pub/Sub topic.
Deploying a Cloud Function.
Triggering the pipeline.

The end-to-end TFX training pipeline implementation is in the src/pipelines directory, which covers the following steps:

Receive hyper-parameters using hyperparam_gen custom python component.
Extract data from BigQuery using BigQueryExampleGen component.
Validate the raw data using StatisticsGen and ExampleValidator component.
Process the data using on Dataflow Transform component.
Train a custom model with Vertex AI using Trainer component.
Evaluate and validate the custom model using ModelEvaluator component.
Save the blessed to model registry location in Cloud Storage using Pusher component.
Upload the model to Vertex AI using vertex_model_pusher custom python component.

Model Deployment

The 06-model-deployment notebook covers executing the CI/CD steps for the model deployment using Cloud Build. The CI/CD routine is defined in build/model-deployment.yaml file, and consists of the following steps:

Test model interface.
Create an endpoint in Vertex AI.
Deploy the model to the endpoint.
Test the Vertex AI endpoint.

Prediction Serving

We serve the deployed model for prediction. The 07-prediction-serving notebook covers:

Use the Vertex AI endpoint for online prediction.
Use the Vertex AI uploaded model for batch prediction.
Run the batch prediction using Vertex Pipelines.

Model Monitoring

After a model is deployed in for prediction serving, continuous monitoring is set up to ensure that the model continue to perform as expected. The 08-model-monitoring notebook covers configuring Vertex AI Model Monitoring for skew and drift detection:

Set skew and drift threshold.
Create a monitoring job for all the models under and endpoint.
List the monitoring jobs.
List artifacts produced by monitoring job.
Pause and delete the monitoring job.

Metadata Tracking

You can view the parameters and metrics logged by your experiments, as well as the artifacts and metadata stored by your Vertex Pipelines in Cloud Console.

Disclaimer

This is not an official Google product but sample code provided for an educational purpose.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments

02-Experimentation: Create classifier fails w/layer requires matching shapes

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 2), (None, 4), (7,), (None, 3), (None, 1), (None, 1), (6,), (None, 3), (None, 3), (None, 1), (None, 10)]

Input Features dropoff_grid_xf <dtype: 'int64'>: [0, 0, 0] euclidean_xf <dtype: 'float32'>: [0.669279932975769, -0.8318284749984741, -0.8318284749984741] loc_cross_xf <dtype: 'int64'>: [0, 0, 0] payment_type_xf <dtype: 'int64'>: [2, 0, 0] pickup_grid_xf <dtype: 'int64'>: [0, 0, 0] trip_day_of_week_xf <dtype: 'int64'>: [0, 6, 2] trip_day_xf <dtype: 'int64'>: [8, 30, 1] trip_hour_xf <dtype: 'int64'>: [1, 13, 6] trip_miles_xf <dtype: 'float32'>: [2.3255326747894287, -0.22459185123443604, -0.4029441475868225] trip_month_xf <dtype: 'int64'>: [3, 1, 0] trip_seconds_xf <dtype: 'float32'>: [0.9550504088401794, -0.2630620300769806, -0.24356801807880402] target: [0, 0, 0]

opened by jth1911 3
DataTransformer ModuleNotFoundError: No module named 'user_module_0' error

ModuleNotFoundError: No module named 'user_module_0' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute op.start() File "apache_beam/runners/worker/operations.py", line 710, in apache_beam.runners.worker.operations.DoOperation.start File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.start File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.start File "apache_beam/runners/worker/operations.py", line 311, in apache_beam.runners.worker.operations.Operation.start File "apache_beam/runners/worker/operations.py", line 317, in apache_beam.runners.worker.operations.Operation.start File "apache_beam/runners/worker/operations.py", line 659, in apache_beam.runners.worker.operations.DoOperation.setup File "apache_beam/runners/worker/operations.py", line 660, in apache_beam.runners.worker.operations.DoOperation.setup File "apache_beam/runners/worker/operations.py", line 292, in apache_beam.runners.worker.operations.Operation.setup File "apache_beam/runners/worker/operations.py", line 306, in apache_beam.runners.worker.operations.Operation.setup File "apache_beam/runners/worker/operations.py", line 799, in apache_beam.runners.worker.operations.DoOperation._get_runtime_performance_hints File "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", line 294, in loads return dill.loads(s) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 275, in loads return load(file, ignore, **kwds) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 270, in load return Unpickler(file, ignore=ignore, **kwds).load() File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 472, in load obj = StockUnpickler.load(self) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 826, in _import_module return __import__(import_name) ModuleNotFoundError: No module named 'user_module_0'

Running the transform component of this repository in Vertex AI hits the following error (in the 04-pipeline-deployment.ipynb notebook). Does anyone have a quick fix for this? Have tried specifying setup.py and "save_main_session": True so far with no luck.

opened by baymears 2
Error: 02 - ML Experimentation with Custom Model

Hi I'm running the tutorial with TFX 1.4.0 and TF 2.7.0.

When I run the cell classifier = model.create_binary_classifier(tft_output, hyperparams) classifier.summary()

I get the error:

ValueError Traceback (most recent call last) in ----> 1 classifier = model.create_binary_classifier(tft_output, hyperparams) 2 classifier.summary()

~/mlops-with-vertex-ai/src/model_training/model.py in create_binary_classifier(tft_output, hyperparams) 83 ) 84 ---> 85 return _create_binary_classifier(feature_vocab_sizes, hyperparams)

~/mlops-with-vertex-ai/src/model_training/model.py in _create_binary_classifier(feature_vocab_sizes, hyperparams) 62 pass 63 ---> 64 joined = keras.layers.Concatenate(name="combines_inputs")(layers) 65 feedforward_output = keras.Sequential( 66 [

/opt/conda/lib/python3.7/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.traceback) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb

/opt/conda/lib/python3.7/site-packages/keras/layers/merge.py in build(self, input_shape) 514 ranks = set(len(shape) for shape in shape_set) 515 if len(ranks) != 1: --> 516 raise ValueError(err_msg) 517 # Get the only rank for the set. 518 (rank,) = ranks

ValueError: A Concatenate layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 2), (None, 4), (7,), (None, 3), (None, 1), (None, 1), (6,), (None, 3), (None, 3), (None, 1), (None, 10)]

opened by jarmstrongcorus 1
Error during compilation with Cloud Build

During the training pipeline compilation phase with Cloud Build I am facing a weird error.

Here's the entire compilation log. log-41d61716-bd25-434c-ae3c-745540d398b8.log.zip

opened by sayakpaul 1
A doubt regarding the TFX pipeline associated with Continuous Training

Hi @ksalama.

Thank you very much for this amazing resource. It's a mini-book in itself.

I am referring to the statement written just below the continuous training section:

The end-to-end TFX training pipeline implementation is in the src/pipelines directory, which covers the following steps:

Are these pipelines demonstrated in any of the notebooks?

opened by sayakpaul 1
Version an already trained custom model on Model Registry

Hi, everyone!

I trained a model using scikit-learn, and saved it as a pickle file, stored in a GCS bucket. I was wondering... how can I version this model using model registry since it is already trained?

In my case, I have the option of using Cloud Run for this (a cloud run container that runs a retraining task every week), but I want to start doing it on Vertex AI.

After reading some articles about Vertex AI Model Registry, I concluded that the model must be firstly trained using a custom training job, and, just after that, we can begin its versioning on Model Registry. Is this correct?

opened by LiviaPimentelCVER 0
Error: googleapi: Error 409: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again a different name and try again.

I literally created a bucket with a random BTC wallet address to make it unique, which I was told to do by the GCP guide, and still i am hitting the same error while executing gcs-bucket.tf

google_storage_bucket.bc1qxy2kgdygjrsqtzq2n0yrf2493p83kkfjhx0wlh: Creating... ╷ │ Error: googleapi: Error 409: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again., conflict │ │ with google_storage_bucket.bc1qxy2kgdygjrsqtzq2n0yrf2493p83kkfjhx0wlh, │ on gcs-bucket.tf line 17, in resource "google_storage_bucket" "bc1qxy2kgdygjrsqtzq2n0yrf2493p83kkfjhx0wlh": │ 17: resource "google_storage_bucket" "bc1qxy2kgdygjrsqtzq2n0yrf2493p83kkfjhx0wlh" { │

opened by Sharaykaka 2
In Vertex AI creating Tensorboard instance will drain out all the credits

I've noticed that the TensorBoard instance that is being produced in Notebook 2 uses a lot of credits. Before January 22 this was free, once enabled this instance , disabling the Vertex AI API won't stop the billing. See response from GCP - https://www.googlecloudcommunity.com/gc/AI-ML/How-can-I-avoid-being-charged-for-Tensorboard/td-p/180658

opened by sthubeML2020 0
Python markupsafe dependency error in 03-training-formalization.ipynb
The latest Python 3 package markupsafe is not compatible with from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext:

import ml_metadata as mlmd from ml_metadata.proto import metadata_store_pb2 from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

Error:

ImportError Traceback (most recent call last) /tmp/ipykernel_1/3645963042.py in 1 import ml_metadata as mlmd 2 from ml_metadata.proto import metadata_store_pb2 ----> 3 from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext 4 5 connection_config = metadata_store_pb2.ConnectionConfig()

~/.local/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in 35 36 import absl ---> 37 import jinja2 38 import nbformat 39 from tfx import types

~/.local/lib/python3.7/site-packages/jinja2/init.py in 10 from .bccache import FileSystemBytecodeCache 11 from .bccache import MemcachedBytecodeCache ---> 12 from .environment import Environment 13 from .environment import Template 14 from .exceptions import TemplateAssertionError

~/.local/lib/python3.7/site-packages/jinja2/environment.py in 23 from .compiler import CodeGenerator 24 from .compiler import generate ---> 25 from .defaults import BLOCK_END_STRING 26 from .defaults import BLOCK_START_STRING 27 from .defaults import COMMENT_END_STRING

~/.local/lib/python3.7/site-packages/jinja2/defaults.py in 1 # -- coding: utf-8 -- 2 from ._compat import range_type ----> 3 from .filters import FILTERS as DEFAULT_FILTERS # noqa: F401 4 from .tests import TESTS as DEFAULT_TESTS # noqa: F401 5 from .utils import Cycler

~/.local/lib/python3.7/site-packages/jinja2/filters.py in 11 from markupsafe import escape 12 from markupsafe import Markup ---> 13 from markupsafe import soft_unicode 14 15 from ._compat import abc

ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/home/jupyter/.local/lib/python3.7/site-packages/markupsafe/init.py)

The workaround is to install an older version by pip install markupsafe==2.0.1 and restart kernel
opened by hilliao 0
Python IndexError execution error in 03-training-formalization.ipynb
Python error encountered executing the following line at [Extract train and eval splits]:

sql_query = datasource_utils.get_training_source_query(

sql_query = datasource_utils.get_training_source_query( PROJECT, REGION, DATASET_DISPLAY_NAME, ml_use='UNASSIGNED', limit=5000)

Observed error:

IndexError Traceback (most recent call last) /tmp/ipykernel_1/1584844956.py in 1 print(DATASET_DISPLAY_NAME) 2 sql_query = datasource_utils.get_training_source_query( ----> 3 PROJECT, REGION, DATASET_DISPLAY_NAME, ml_use='UNASSIGNED', limit=5000) 4 5 output_config = example_gen_pb2.Output(

~/mlops-with-vertex-ai/src/common/datasource_utils.py in get_training_source_query(project, region, dataset_display_name, ml_use, limit) 55 dataset = vertex_ai.TabularDataset.list( 56 filter=f"display_name={dataset_display_name}", order_by="update_time" ---> 57 )[-1] 58 bq_source_uri = dataset.gca_resource.metadata"inputConfig"]["bigquerySource"][ 59 "uri"

IndexError: list index out of range

I can't find .list method for google.cloud.aiplatform's TabularDataset in datasource_utils.py
opened by hilliao 1

Owner

Google Cloud Platform

GitHub

The MLOps platform for innovators 🚀

DS2.ai is an integrated AI operation solution that supports all stages from custom AI development to deployment. It is an AI-specialized platform service that collects data, builds a training dataset through data labeling, and enables automatic development of artificial intelligence and easy deployment and operation.

9 Jan 3, 2023

Painting app using Python machine learning and vision technology.

AI Painting App We are making an app that will track our hand and helps us to draw from that. We will be using the advance knowledge of Machine Learni

3 Oct 3, 2022

A PyTorch library and evaluation platform for end-to-end compression research

CompressAI CompressAI (compress-ay) is a PyTorch library and evaluation platform for end-to-end compression research. CompressAI currently provides: c

680 Jan 6, 2023

FastReID is a research platform that implements state-of-the-art re-identification algorithms.

2.8k Jan 7, 2023

A repository that finds a person who looks like you by using face recognition technology.

Find Your Twin Hello everyone, I've always wondered how casting agencies do the casting for a scene where a certain actor is young or old for a movie

3 Jan 29, 2022

The end-to-end platform for building voice products at scale

Picovoice Made in Vancouver, Canada by Picovoice Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Goog

318 Jan 7, 2023

Wafer Fault Detection using MlOps Integration

Wafer Fault Detection using MlOps Integration This is an end to end machine learning project with MlOps integration for predicting the quality of wafe

0 Mar 11, 2022

Neon-erc20-example - Example of creating SPL token and wrapping it with ERC20 interface in Neon EVM

Example of wrapping SPL token by ERC2-20 interface in Neon Requirements Install

7 Mar 28, 2022

This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

Self Driving Car An autonomous car (also known as a driverless car, self-driving car, and robotic car) is a vehicle that is capable of sensing its env

4 Sep 4, 2021

Artificial intelligence technology inferring issues and logically supporting facts from raw text

개요 비정형 텍스트를 학습하여 쟁점별 사실과 논리적 근거 추론이 가능한 인공지능 원천기술 Artificial intelligence techno

6 Dec 29, 2021

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

8 Nov 2, 2022

Python-kafka-reset-consumergroup-offset-example - Python Kafka reset consumergroup offset example

Python Kafka reset consumergroup offset example This is a simple example of how

1 Feb 16, 2022

Introduction to AI assignment 1 HCM University of Technology, term 211

Sokoban Bot Introduction to AI assignment 1 HCM University of Technology, term 211 Abstract This is basically a solver for Sokoban game using Breadth-

4 Dec 12, 2022

The Face Mask recognition system uses AI technology to detect the person with or without a mask.

Face Mask Detection Face Mask Detection system built with OpenCV, Keras/TensorFlow using Deep Learning and Computer Vision concepts in order to detect

4 Apr 5, 2022

MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope

1 Nov 1, 2021

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

ORB-SLAM2 Authors: Raul Mur-Artal, Juan D. Tardos, J. M. M. Montiel and Dorian Galvez-Lopez (DBoW2) 13 Jan 2017: OpenCV 3 and Eigen 3.3 are now suppor

7.8k Dec 30, 2022

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

63 Oct 17, 2022

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

3D-GMPDCNN Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network PyTorch implementation of "Geological Modeling Usin

5 Nov 21, 2022

Camera-caps - Examine the camera capabilities for V4l2 cameras

camera-caps This is a graphical user interface over the v4l2-ctl command line to

25 Dec 26, 2022

This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities

Related tags

Overview

MLOps with Vertex AI

Getting started

Dataset Management

ML Development

Training Operationalization

Continuous Training

Model Deployment

Prediction Serving

Model Monitoring

Metadata Tracking

Disclaimer

Comments

Error:

ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/home/jupyter/.local/lib/python3.7/site-packages/markupsafe/init.py)

Observed error:

IndexError: list index out of range

Owner

Google Cloud Platform

The MLOps platform for innovators 🚀

Painting app using Python machine learning and vision technology.

A PyTorch library and evaluation platform for end-to-end compression research

FastReID is a research platform that implements state-of-the-art re-identification algorithms.

A repository that finds a person who looks like you by using face recognition technology.

The end-to-end platform for building voice products at scale

Wafer Fault Detection using MlOps Integration

Neon-erc20-example - Example of creating SPL token and wrapping it with ERC20 interface in Neon EVM

This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

Artificial intelligence technology inferring issues and logically supporting facts from raw text

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Python-kafka-reset-consumergroup-offset-example - Python Kafka reset consumergroup offset example

Introduction to AI assignment 1 HCM University of Technology, term 211

The Face Mask recognition system uses AI technology to detect the person with or without a mask.

MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

Camera-caps - Examine the camera capabilities for V4l2 cameras