A demo of Prometheus+Grafana for monitoring an ML model served with FastAPI.

Jeremy Jordan

Last update: Jan 1, 2023

Related tags

Logging ml-monitoring

Overview

ml-monitoring

Jeremy Jordan

This repository provides an example setup for monitoring an ML system deployed on Kubernetes.

Blog post: https://www.jeremyjordan.me/ml-monitoring/

Components:

ML model served via FastAPI
Export server metrics via prometheus-fastapi-instrumentator
Simulate production traffic via locust
Monitor and store metrics via Prometheus
Visualize metrics via Grafana

Setup

Ensure you can connect to a Kubernetes cluster and have kubectl and helm installed.
- You can easily spin up a Kubernetes cluster on your local machine using minikube.

minikube start --driver=docker --memory 4g --nodes 2

Deploy Prometheus and Grafana onto the cluster using the community Helm chart.

kubectl create namespace monitoring
helm install prometheus-stack prometheus-community/kube-prometheus-stack -n monitoring

Verify the resources were deployed successfully.

kubectl get all -n monitoring

Connect to the Grafana dashboard.

kubectl port-forward svc/prometheus-stack-grafana 8000:80 -n monitoring

Go to http://127.0.0.1:8000/
Log in with the credentials:
- Username: admin
- Password: prom-operator
- (This password can be configured in the Helm chart values.yaml file)

Import the model dashboard.
- On the left sidebar, click the "+" and select "Import".
- Copy and paste the JSON defined in dashboards/model.json in the text area.

Deploy a model

This repository includes an example REST service which exposes an ML model trained on the UCI Wine Quality dataset.

You can launch the service on Kubernetes by running:

kubectl apply -f kubernetes/models/

You can also build and run the Docker container locally.

docker build -t wine-quality-model -f model/Dockerfile .
docker run -d -p 3000:80 -e ENABLE_METRICS=true wine-quality-model

Note: In order for Prometheus to scrape metrics from this service, we need to define a ServiceMonitor resource. This resource must have the label release: prometheus-stack in order to be discovered. This is configured in the Prometheus resource spec via the serviceMonitorSelector attribute.

You can verify the label required by running:

kubectl get prometheuses.monitoring.coreos.com prometheus-stack-kube-prom-prometheus -n monitoring -o yaml

Simulate production traffic

We can simulate production traffic using a Python load testing tool called locust. This will make HTTP requests to our model server and provide us with data to view in the monitoring dashboard.

You can begin the load test by running:

kubectl apply -f kubernetes/load_tests/

By default, production traffic will be simulated for a duration of 5 minutes. This can be changed by updating the image arguments in the kubernetes/load_tests/locust_master.yaml manifest.

You can also modify the community Helm chart instead of using the manifests defined in this repo.

Uploading new images

This process can eventually be automated with a Github action, but remains manual for now.

Obtain a personal access token to connect with the Github container registry.

echo "INSERT_TOKEN_HERE" >> ~/.github/cr_token

Authenticate with the Github container registry.

cat ~/.github/cr_token | docker login ghcr.io -u jeremyjordan --password-stdin

Build and tag new Docker images.

docker build -t wine-quality-model:0.3 -f model/Dockerfile .
docker tag wine-quality-model:0.3 ghcr.io/jeremyjordan/wine-quality-model:0.3

docker build -t locust-load-test:0.2 -f load_test/Dockerfile .
docker tag locust-load-test:0.2 ghcr.io/jeremyjordan/locust-load-test:0.2

Push Docker images to container registery.

docker push ghcr.io/jeremyjordan/wine-quality-model:0.3
docker push ghcr.io/jeremyjordan/locust-load-test:0.2

Update Kubernetes manifests to use the new image tag.

Teardown instructions

To stop the model REST server, run:

kubectl delete -f kubernetes/models/

To stop the load tests, run:

kubectl delete -f kubernetes/load_tests/

To remove the Prometheus stack, run:

helm uninstall prometheus-stack -n monitoring

Comments

How to Monitor NLP Models?

Hi Jeremy,

I'm following your template for a POC, and it's been very helpful. I'm creating a REST API for an NLP model (Multinomial Naive Bayes) and I'm not sure how to monitor this particular model when the predictions are classes instead of float values like the wine quality prediction model. How would the prometheus instrumentation be used to capture metrics for classification models?

Thanks,

Riley

opened by rileyhun 4
Adding prometheus instrumentation package is resulting in some requests taking a long amount of time

Hello again @jeremyjordan,

We are trying to decrease the latency of our BERT model prediction service that is deployed using FastAPI. The predictions are called through the /predict endpoint. We looked into the tracing and found one of the bottlenecks is the prometheus-fastapi-instrumentator. About 1% of the requests do timeout because they exceed 10s.

We also discovered that some metrics are not getting reported on 4 requests/second. Some requests took 30-50 seconds, with the starlette/fastapi taking long times. So it seems that under high usage, the /metrics endpoint doesn't get enough resources, and hence all /metrics requests wait for some time and fail eventually. So having separate container for metrics could help. Or if possible to have metrics delayed/paused under high load. Any insight/guidance would be much appreciated.

opened by rileyhun 3
Latency & Counter Metrics Not Detected By Prometheus
Hello @jeremyjordan,

I've been following your fastapi ml-monitoring repository as a template for my own project and it's been super helpful! Thanks so much for setting this up. Unfortunately, I'm experiencing a lot of trouble getting prometheus to scrape my Counter metric and latency as well. Interestingly, when I run your wine-quality application and add a Counter metric though, it seems to be working fine, but mine which pretty much follows your same approach (only difference being that I set up my application using application factory design pattern) doesn't seem to be working. It seems like histogram and summary are going through though.

Do you have any insight as to what the issue could be? Would really appreciate your guidance as I've been trying to figure this out for 3 days.

Here is my monitoring.py file: https://github.com/rileyhun/fastapi-ml-example/blob/main/app/core/monitoring.py

Reproducible example:

git clone https://github.com/rileyhun/fastapi-ml-example.git docker build -t ${IMAGE_NAME}:${IMAGE_TAG} -f Dockerfile . docker tag ${IMAGE_NAME}:${IMAGE_TAG} rhun/${IMAGE_NAME}:${IMAGE_TAG} docker push rhun/${IMAGE_NAME}:${IMAGE_TAG} minikube start --driver=docker --memory 4g --nodes 2 kubectl create namespace monitoring helm install prometheus-stack prometheus-community/kube-prometheus-stack -n monitoring kubectl apply -f deployment/wine-model-local.yaml kubectl port-forward svc/wine-model-service 8080:80 python api_call.py
opened by rileyhun 2
Bump pydantic from 1.6.1 to 1.6.2 in /model
Bumps pydantic from 1.6.1 to 1.6.2.

Release notes

Sourced from pydantic's releases.

v1.6.2 (2021-05-11)

Security fix: Fix date and datetime parsing so passing either 'infinity' or float('inf') (or their negative values) does not cause an infinite loop, see security advisory CVE-2021-29510.

Changelog

Sourced from pydantic's changelog.

v1.6.2 (2021-05-11)

Security fix: Fix date and datetime parsing so passing either 'infinity' or float('inf') (or their negative values) does not cause an infinite loop, See security advisory CVE-2021-29510

Commits

acf7783 tweak history

829528c comment out broken tests

cf9a417 hack tests into passing

b37a922 fix formatting

ac360c5 prepare for release

bdde15b Merge pull request from GHSA-5jqp-qgf6-3pvh

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump fastapi from 0.63.0 to 0.65.2 in /model
Bumps fastapi from 0.63.0 to 0.65.2.

Release notes

Sourced from fastapi's releases.

0.65.2

Security fixes

🔒 Check Content-Type request header before assuming JSON. Initial PR #2118 by @patrickkwang.

This change fixes a CSRF security vulnerability when using cookies for authentication in path operations with JSON payloads sent by browsers.

In versions lower than 0.65.2, FastAPI would try to read the request payload as JSON even if the content-type header sent was not set to application/json or a compatible JSON media type (e.g. application/geo+json).

So, a request with a content type of text/plain containing JSON data would be accepted and the JSON data would be extracted.

But requests with content type text/plain are exempt from CORS preflights, for being considered Simple requests. So, the browser would execute them right away including cookies, and the text content could be a JSON string that would be parsed and accepted by the FastAPI application.

See CVE-2021-32677 for more details.

Thanks to Dima Boger for the security report! 🙇🔒

Internal

🔧 Update sponsors badge, course bundle. PR #3340 by @tiangolo.

🔧 Add new gold sponsor Jina 🎉. PR #3291 by @tiangolo.

🔧 Add new banner sponsor badge for FastAPI courses bundle. PR #3288 by @tiangolo.

👷 Upgrade Issue Manager GitHub Action. PR #3236 by @tiangolo.

0.65.1

Security fixes

📌 Upgrade pydantic pin, to handle security vulnerability CVE-2021-29510. PR #3213 by @tiangolo.

0.65.0

Breaking Changes - Upgrade

⬆️ Upgrade Starlette to 0.14.2, including internal UJSONResponse migrated from Starlette. This includes several bug fixes and features from Starlette. PR #2335 by @hanneskuettner.

Translations

🌐 Initialize new language Polish for translations. PR #3170 by @neternefer.

Internal

👷 Add GitHub Action cache to speed up CI installs. PR #3204 by @tiangolo.

⬆️ Upgrade setup-python GitHub Action to v2. PR #3203 by @tiangolo.

🐛 Fix docs script to generate a new translation language with overrides boilerplate. PR #3202 by @tiangolo.

✨ Add new Deta banner badge with new sponsorship tier 🙇. PR #3194 by @tiangolo.

👥 Update FastAPI People. PR #3189 by @github-actions[bot].

🔊 Update FastAPI People to allow better debugging. PR #3188 by @tiangolo.

0.64.0

Features

... (truncated)

Commits

4d91f97 🔖 Release version 0.65.2

aabe2c7 📝 Update release notes

377234a 🔒 Create Security Policy

38b7858 📝 Update release notes

fa7e3c9 🐛 Check Content-Type request header before assuming JSON (#2118)

90120dd 📝 Update release notes

3677254 🔧 Update sponsors badge, course bundle (#3340)

40bb0c5 📝 Update release notes

60918d2 🔧 Add new gold sponsor Jina 🎉 (#3291)

3afce2c 📝 Update release notes

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Owner

Jeremy Jordan

Machine learning engineer. Broadly curious. Twitter: @jeremyjordan

GitHub https://www.jeremyjordan.me/ml-monitoring/

HTTP(s) "monitoring" webpage via FastAPI+Jinja2. Inspired by https://github.com/RaymiiOrg/bash-http-monitoring

python-http-monitoring HTTP(s) "monitoring" powered by FastAPI+Jinja2+aiohttp. Inspired by bash-http-monitoring. Installation can be done with pipenv

39 Aug 26, 2022

Monitoring plugin to check disk io with Icinga, Nagios and other compatible monitoring solutions

check_disk_io - Monitor disk io This is a monitoring plugin for Icinga, Nagios and other compatible monitoring solutions to check the disk io. It uses

3 Nov 15, 2022

Soda SQL Data testing, monitoring and profiling for SQL accessible data.

Soda SQL Data testing, monitoring and profiling for SQL accessible data. What does Soda SQL do? Soda SQL allows you to Stop your pipeline when bad dat

51 Jan 1, 2023

changedetection.io - The best and simplest self-hosted website change detection monitoring service

changedetection.io - The best and simplest self-hosted website change detection monitoring service. An alternative to Visualping, Watchtower etc. Designed for simplicity - the main goal is to simply monitor which websites had a text change. Open source web page change detection.

7.3k Jan 1, 2023

Ransomware leak site monitoring

RansomWatch RansomWatch is a ransomware leak site monitoring tool. It will scrape all of the entries on various ransomware leak sites, store the data

278 Dec 31, 2022

Scout: an open-source version of the monitoring tool

Badger Scout Scout is an open-source version of the monitoring tool used by Badg

2 Jan 13, 2022

dash-manufacture-spc-dashboard is a dashboard for monitoring read-time process quality along manufacture production line

In our solution based on plotly, dash and influxdb, the user will firstly generate the specifications for different robots, and then a wide range of interactive visualizations for different machines for machine power, machine cost, and total cost based on the energy time and total energy getting dynamically from sensors. If a threshold is met, the alert email is generated for further operation.

1 Feb 13, 2022

A System Metrics Monitoring Tool Built using Python3 , rabbitmq,Grafana and InfluxDB. Setup using docker compose. Use to monitor system performance with graphical interface of grafana , storage of influxdb and message queuing of rabbitmq

SystemMonitoringRabbitMQGrafanaInflux This repository has code to setup a system monitoring tool The tools used are the follows Python3.6 Docker Rabbi

7 Sep 6, 2022

Restful Api developed with Flask using Prometheus and Grafana for monitoring and containerization with Docker :rocket:

Hephaestus ?? In Greek mythology, Hephaestus was either the son of Zeus and Hera or he was Hera's parthenogenous child. ... As a smithing god, Hephaes

16 Oct 7, 2022

A Prometheus exporter for monitoring & analyzing Grafana Labs' technical documentation

grafana-docs-exporter A Prometheus exporter for monitoring & analyzing Grafana Labs' technical documentation Here is the public endpoint.

5 May 2, 2022

Rundeck / Grafana / Prometheus / Rundeck Exporter integration demo

Rundeck / Prometheus / Grafana integration demo via Rundeck Exporter This is a demo environment that shows how to monitor a Rundeck instance using Run

4 Oct 14, 2022

Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

249 Jan 2, 2023

CURSO PROMETHEUS E GRAFANA: Observability in a real world

Curso de monitoração com o Prometheus Esse curso ensina como usar o Prometheus como uma ferramenta integrada de monitoração, entender seus conceitos,

318 Dec 23, 2022

Run with one command grafana, prometheus, and a python script to collect and display cryptocurrency prices and track your wallet balance.

CryptoWatch Track your favorite crypto coin price and your wallet balance. Install Create .env: ADMIN_USER=admin ADMIN_PASSWORD=admin Configure you

13 Dec 13, 2022

HTTP(s) "monitoring" webpage via FastAPI+Jinja2. Inspired by https://github.com/RaymiiOrg/bash-http-monitoring

python-http-monitoring HTTP(s) "monitoring" powered by FastAPI+Jinja2+aiohttp. Inspired by bash-http-monitoring. Installation can be done with pipenv

39 Aug 26, 2022

Home solar infrastructure (with Peimar Inverter) monitoring based on Raspberry Pi 3 B+ using Grafana, InfluxDB, Custom Python Collector and Shelly EM.

raspberry-solar-mon Home solar infrastructure (with Peimar Inverter) monitoring based on Raspberry Pi 3 B+ using Grafana, InfluxDB, Custom Python Coll

10 Dec 23, 2022

WebApp served by OAK PoE device to visualize various streams, metadata and AI results

DepthAI PoE WebApp | Bootstrap 4 & Vue.js SPA Dashboard Based on dashmin (https:

6 Apr 9, 2022

A python-image-classification web application project, written in Python and served through the Flask Microframework

A python-image-classification web application project, written in Python and served through the Flask Microframework. This Project implements the VGG16 covolutional neural network, through Keras and Tensorflow wrappers, to make predictions on uploaded images.

19 Dec 12, 2022

A python-image-classification web application project, written in Python and served through the Flask Microframework. This Project implements the VGG16 covolutional neural network, through Keras and Tensorflow wrappers, to make predictions on uploaded images.

Image Classification in Python Implementing image classification in Flask using Keras. The VGG16 is a convolution neural network model architecture th

19 Dec 12, 2022

Prometheus exporter for Starlette and FastAPI

starlette_exporter Prometheus exporter for Starlette and FastAPI. The middleware collects basic metrics: Counter: starlette_requests_total Histogram:

225 Jan 5, 2023

A demo of Prometheus+Grafana for monitoring an ML model served with FastAPI.

Related tags

Overview

ml-monitoring

Setup

Deploy a model

Simulate production traffic

Uploading new images

Teardown instructions

Comments

How to Monitor NLP Models?

Adding prometheus instrumentation package is resulting in some requests taking a long amount of time

Latency & Counter Metrics Not Detected By Prometheus

Bump pydantic from 1.6.1 to 1.6.2 in /model

v1.6.2 (2021-05-11)

v1.6.2 (2021-05-11)

Bump fastapi from 0.63.0 to 0.65.2 in /model

0.65.2

Security fixes

Internal

0.65.1

Security fixes

0.65.0

Breaking Changes - Upgrade

Translations

Internal

0.64.0

Features

Owner

Jeremy Jordan

HTTP(s) "monitoring" webpage via FastAPI+Jinja2. Inspired by https://github.com/RaymiiOrg/bash-http-monitoring

Monitoring plugin to check disk io with Icinga, Nagios and other compatible monitoring solutions

Soda SQL Data testing, monitoring and profiling for SQL accessible data.

changedetection.io - The best and simplest self-hosted website change detection monitoring service

Ransomware leak site monitoring

Scout: an open-source version of the monitoring tool

dash-manufacture-spc-dashboard is a dashboard for monitoring read-time process quality along manufacture production line

A System Metrics Monitoring Tool Built using Python3 , rabbitmq,Grafana and InfluxDB. Setup using docker compose. Use to monitor system performance with graphical interface of grafana , storage of influxdb and message queuing of rabbitmq

Restful Api developed with Flask using Prometheus and Grafana for monitoring and containerization with Docker :rocket:

A Prometheus exporter for monitoring & analyzing Grafana Labs' technical documentation

Rundeck / Grafana / Prometheus / Rundeck Exporter integration demo

Fully Automated YouTube Channel ▶️with Added Extra Features.

CURSO PROMETHEUS E GRAFANA: Observability in a real world

Run with one command grafana, prometheus, and a python script to collect and display cryptocurrency prices and track your wallet balance.

HTTP(s) "monitoring" webpage via FastAPI+Jinja2. Inspired by https://github.com/RaymiiOrg/bash-http-monitoring

Home solar infrastructure (with Peimar Inverter) monitoring based on Raspberry Pi 3 B+ using Grafana, InfluxDB, Custom Python Collector and Shelly EM.

WebApp served by OAK PoE device to visualize various streams, metadata and AI results

A python-image-classification web application project, written in Python and served through the Flask Microframework

A python-image-classification web application project, written in Python and served through the Flask Microframework. This Project implements the VGG16 covolutional neural network, through Keras and Tensorflow wrappers, to make predictions on uploaded images.

Prometheus exporter for Starlette and FastAPI