CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

ZhihuiYangCS

Last update: Jun 7, 2022

Related tags

Machine Learning CorrProxies

Overview

CorrProxies

Declaration

This repo is for paper: Optimizing Machine Learning Inference Queries with Correlative Proxy Models.

Setup ENV

Quick Start

We provide a fully ready Docker Image ready to use out-of-box.
Optionally, you can also follow the steps to build your own testing environment.

The Provided Docker Environment

Steps to run the Docker Environment

Get the docker image from this link.
Load the docker image. docker load -i corrproxies-image.tar
Run the docker image in a container. docker run --name=CorrProxies -i -t -d corrproxies-image
- it will return you the docker container ID, for example d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6.
Use CLI to control the container with the specific ID generated. docker exec -it d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6 /bin/zsh

ENV Spec

Operating System: [email protected]
Python ENV:
- [email protected] with [email protected] distribution.
- dependencies:
  - [email protected]
  - [email protected]
  - [email protected]
  - see requirements.txt for more dependencies.
Java ENV: [email protected]

File structure:

The home directory for CorrProxies locates at /home/CorrProxies.
The Python executable locates at /home/anaconda3/envs/condaenv/bin/python3.
The models locate at /home/CorrProxies/model.
The datasets locate at /home/CorrProxies/data.
The starting scripts locate at /home/CorrProxies/scripts.

Build Your Own Environment

This instruction is based on a clean distribution of [email protected]

Install pre-requisites.

apt-get update && apt-get install -y build-essential
Install Anaconda.
- wget https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh && bash Anaconda3-5.3.1-Linux-x86_64.sh -b -p
- export PATH=" /bin/:$PATH"
Install [email protected] with Anaconda3.

conda create -n condaenv python=3.6.6
Activate the newly installed Python ENV.

conda activate condaenv
Install dependencies with pip.

pip3 install -r requirements.txt
Install Java (openjdk-8) (for standford-nlp usage).

apt-get install -y openjdk-8-jdk

Queries & Datasets

We use Twitter text dataset, COCO image dataset and UCF101 video dataset as our benchmark datasets. Please see this page for examples of detailed Queries and Datasets examples we use in our experiments.
After you setup the environment, either manually or using the docker image provided by us, the next step is to download the datasets.
- To get the COCO dataset: cd /home/CorrProxies/data/image/coco && ./get_coco_dataset.sh
- To get the UCF101 dataset: cd /home/CorrProxies/data/video/ucf101 && wget -c https://www.crcv.ucf.edu/data/UCF101/UCF101.rar && unrar x UCF101.rar.

Execution

Please pull the latest code before executing the code. Command `cd /home/CorrProxies && git pull`

Run Operators Individually

To run and see each operator we used in our experiment, simply execute python3. For example: python3 operators/ml_operators/image_video_operators/video_activity_recognition.py.

Run Experiments

We use scripts/run.sh to start experiments. The script will take in command line arguments.

Text(Twitter)
- Since we do not provide text dataset, we will skip the experiment.
Image(COCO)

Example: ./scripts/run.sh -w 2 -t 1 -i '1' -a 0.9 -s 3 -o 2 -e 1
Video(UCF101)

Example: ./scripts/run.sh -w 2 -t 2 -i '1' -a 0.9 -s 3 -o 2 -e 1
arguments detail.
- w int: experiment type in [1, 2, 3, 4] referring to /home/CorrProxies/ml_workflow/exps/WorkflowExp*.py;
- t int: query type in [0, 1, 2]. Int 0, 1, 2 means queries on the Twitter, COCO, and UCF101 datasets, respectively;
- i int: query index in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
- a float: query accuracy;
- s int: scheme in [0, 1, 2, 3, 4, 5, 6]. Int 0, 1, 2, 3, 4, 5, 6 means 'ORIG', 'NS', 'PP', 'CORE', 'COREa', 'COREh' and 'REORDER' schemes, respectively;
- o int: number of threads used in optimization phase;
- e int: number of threads used in execution phase after generating an optimized plan.

You might also like...

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

152 Jan 2, 2023

A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod

15 Aug 12, 2022

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

121 Dec 28, 2022

Evidently helps analyze machine learning models during validation or production monitoring

Evidently helps analyze machine learning models during validation or production monitoring. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. Currently 6 reports are available.

3.1k Jan 7, 2023

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

FLAML - Fast and Lightweight AutoML

2.2k Jan 9, 2023

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

95 Dec 28, 2022

CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

Related tags

Overview

CorrProxies

Declaration

Setup ENV

Quick Start

The Provided Docker Environment

Steps to run the Docker Environment

ENV Spec

File structure:

Build Your Own Environment

Queries & Datasets

Execution

Please pull the latest code before executing the code. Command `cd /home/CorrProxies && git pull`

Run Operators Individually

Run Experiments

You might also like...

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Evidently helps analyze machine learning models during validation or production monitoring

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

Owner

ZhihuiYangCS

Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber

A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

MIT-Machine Learning with Python–From Linear Models to Deep Learning

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

Related tags

Overview

CorrProxies

Declaration

Setup ENV

Quick Start

The Provided Docker Environment

Steps to run the Docker Environment

ENV Spec

File structure:

Build Your Own Environment

Queries & Datasets

Execution

Please pull the latest code before executing the code. Command cd /home/CorrProxies && git pull

Run Operators Individually

Run Experiments

You might also like...

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Evidently helps analyze machine learning models during validation or production monitoring

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

Owner

ZhihuiYangCS

Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber

A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

MIT-Machine Learning with Python–From Linear Models to Deep Learning

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Please pull the latest code before executing the code. Command `cd /home/CorrProxies && git pull`