Inferoxy is a service for quick deploying and using dockerized Computer Vision models.

Overview

Inferoxy

codecov

What is it?

Inferoxy is a service for quick deploying and using dockerized Computer Vision models. It's a core of EORA's Computer Vision platform Vision Hub that runs on top of AWS EKS.

Why use it?

You should use it if:

  • You want to simplify deploying Computer Vision models with an appropriate Data Science stack to production: all you need to do is to build a Docker image with your model including any pre- and post-processing steps and push it into an accessible registry
  • You have only one machine or cluster for inference (CPU/GPU)
  • You want automatic batching for multi-GPU/multi-node setup
  • Model versioning

Architecture

Overall architecture

Inferoxy is built using message broker pattern.

  • Roughly speaking, it accepts user requests through different interfaces which we call "bridges". Multiple bridges can run simultaneously. Current supported bridges are REST API, gRPC and ZeroMQ
  • The requests are carefully split into batches and processed on a single multi-GPU machine or a multi-node cluster
  • The models to be deployed are managed through Model Manager that communicates with Redis to store/retrieve models information such as Docker image URL, maximum batch size value, etc.

Batching

Batching

One of the core Inferoxy's features is the batching mechanism.

  • For batch processing it's taken into consideration that different models can utilize different batch sizes and that some models can process a series of batches from a specific user, e.g. for video processing tasks. The latter models are called "stateful" models while models which don't depend on user state are called "stateless"
  • Multiple copies of the same model can run on different machines while only one copy can run on the same GPU device. So, to increase models efficiency it's recommended to set batch size for models to be as high as possible
  • A user of the stateful model reserves the whole copy of the model and releases it when his task is finished.
  • Users of the stateless models can use the same copy of the model simultaneously
  • Numpy tensors of RGB images with metadata are all going through ZeroMQ to the models and the results are also read from ZeroMQ socket

Cluster management

Cluster

The cluster management consists of keeping track of the running copies of the models, load analysis, health checking and alerting.

Requirements

You can run Inferoxy locally on a single machine or k8s cluster. To run Inferoxy, you should have a minimum of 4GB RAM and CPU or GPU device depending on your speed/cost trade-off.

Basic commands

Local run

To run locally you should use Inferoxy Docker image. The last version you can find here.

docker pull public.registry.visionhub.ru/inferoxy:v1.0.4

After image is pulled we need to make basic configuration using .env file

# .env
CLOUD_CLIENT=docker
TASK_MANAGER_DOCKER_CONFIG_NETWORK=inferoxy
TASK_MANAGER_DOCKER_CONFIG_REGISTRY=
TASK_MANAGER_DOCKER_CONFIG_LOGIN=
TASK_MANAGER_DOCKER_CONFIG_PASSWORD=
MODEL_STORAGE_DATABASE_HOST=redis
MODEL_STORAGE_DATABASE_PORT=6379
MODEL_STORAGE_DATABASE_NUMBER=0
LOGGING_LEVEL=INFO

The next step is to create inferoxy Docker network.

docker network create inferoxy

Now we should run Redis in this network. Redis is needed to store information about your models.

docker run --network inferoxy --name redis redis:latest 

Create models.yaml file with simple set of models. You can read about models.yaml in documentation

stub:
  address: public.registry.visionhub.ru/models/stub:v5
  batch_size: 256
  run_on_gpu: False
  stateless: True

Now we can start Inferoxy:

docker run --env-file .env 
	-v /var/run/docker.sock:/var/run/docker.sock \
	-p 7787:7787 -p 7788:7788 -p 8000:8000 -p 8698:8698\
	--name inferoxy --rm \
	--network inferoxy \
	-v $(pwd)/models.yaml:/etc/inferoxy/models.yaml \
	public.registry.visionhub.ru/inferoxy:${INFEROXY_VERSION}

Documentation

You can find the full documentation here

Discord

Join our community in Discord server to discuss stuff related to Inferoxy usage and development

Comments
  • Speed of request processing.

    Speed of request processing.

    I have a problem with very slow requests processing.

    Ten parallel (or sequential) requests with only one image as input are processing for ±8 minutes, while one single (first) request takes ±10 seconds to complete.

    In the logs it seems like inferoxy waits for something before process the image.

    There is a log: log.log

    The image: test

    The model can be dowloaded from docker hub: docker pull smthngslv/clip_vit-b32_no_proj:latest

    Code, image, log. Archive.zip

    Also, does the setting exists , that allow to do not shutdown the container between requests (I mean, run sequential requests on the same container)?

    performance 
    opened by smthngslv 6
  • Re-usage of a stateful model.

    Re-usage of a stateful model.

    In the documentation:

    Down triggers (↓): time of last use for source_id > T_max - in this case either model release or instance stopping happens depending on whether there are incoming requests to this model

    But now in code:

                if (
                    time.time() - model_instance.sender.get_time_of_last_sent_batch()
                    > self.config.load_analyzer.stateful_checker.keep_model
                ):
                    triggers += [self.make_decrease_trigger(model_instance=model_instance)]
    

    There is no check for incoming request, just deletion procedure.

    bug 
    opened by wselfjes 0
  • Multiple images at input

    Multiple images at input

    There's a need to send multiple images per request item, for example, when we are running a model for image retrieval task which accepts images of a product from different angles of view. The amount of images of the product is arbitrary. So, it looks like we can make a request object with a list of tensors of different sizes

    opened by VladVin 3
  • Support of multiple models on a single GPU

    Support of multiple models on a single GPU

    Let's imagine the case when we have a stateful model consisting 2 Gb of GPU memory while processing an RTSP stream. If we have a GPU with 16 Gb of memory then theoretically we can run up to 8 copies of such models. This would be currently possible with stateless models but usually RTSP streams are processed by people trackers or similar which are stateful. The issue was first mentioned by @tz3 who designes the system where 8 RSTP streams are processed in parallel.

    opened by VladVin 0
  • Model should have ability to run infinitely

    Model should have ability to run infinitely

    There are currently up and down triggers for models. But if Inferoxy is used in the scenario of hosting a signle model it isn't needed to release this model - it should run without interruptions in order to get rid of the "cold start" effect. Otherwise, we wait 10 seconds for a model to start even if a very simple model is used.

    enhancement 
    opened by VladVin 0
  • Profile REST API latency

    Profile REST API latency

    Even a simple REST API request to CPU model has been slowly processed (10 seconds). The reason is probably in latency since computation overhead is minimal. Needed to profile netwok during simple requests

    performance 
    opened by VladVin 0
  • Drop frames in video processing

    Drop frames in video processing

    Expected behavior

    Pass a video to Inferoxy processing. It processed using some model and returns in the same way that was sent.

    Examples: Input: https://api.dev.visionhub.ru/public_media/66218c4f-7b14-45b0-8669-370dee03ffa6 output: https://api.dev.visionhub.ru/public_media/1eb3f365-00d7-4b66-a134-0d3b952eb89c

    Current behavior

    There is frame dropping.

    We can not guarantee an order, because of retriable errors. For example, we have three batches that are passed to Inferoxy in order.

    3 -> 2 -> 1 -> inferoxy
    

    The first batch processed. When the second batch start processing model failed because processing instance is disappeared. We put the second batch at the end of the queue.

    2 -> 3 -> 1 -> inferoxy
    

    Possible solution

    Again we don't guarantee order, but users can send an index of the input in parameters and sort output on their side(

    Steps to reproduce

    1. Read video in frames
    2. Send frames in order to Inferoxy
    3. Receive results from Inferoxy
    4. Build a video
    5. Compare input and output video and see frame dropping and unordering
    bug 
    opened by wselfjes 2
Owner
null
Dockerized iCloud drive

iCloud-drive-docker is a simple iCloud drive client in Docker environment. It uses pyiCloud python library to interact with iCloud

Mandar Patil 376 Jan 1, 2023
Deploying a production-ready Django project using Nginx and Gunicorn

django-nginx-gunicorn This project is for deploying a production-ready Django project using Nginx and Gunicorn. Running a local server of Django is no

Arash Sayareh 8 Jul 3, 2022
Quick & dirty controller to schedule Kubernetes Jobs later (once)

K8s Jobber Operator Quickly implemented Kubernetes controller to enable scheduling of Jobs at a later time. Usage: To schedule a Job later, Set .spec.

Jukka Väisänen 2 Feb 11, 2022
Hubble - Network, Service & Security Observability for Kubernetes using eBPF

Network, Service & Security Observability for Kubernetes What is Hubble? Getting Started Features Service Dependency Graph Metrics & Monitoring Flow V

Cilium 2.4k Jan 4, 2023
Iris is a highly configurable and flexible service for paging and messaging.

Iris Iris core, API, UI and sender service. For third-party integration support, see iris-relay, a stateless proxy designed to sit at the edge of a pr

LinkedIn 715 Dec 28, 2022
pyinfra automates infrastructure super fast at massive scale. It can be used for ad-hoc command execution, service deployment, configuration management and more.

pyinfra automates/provisions/manages/deploys infrastructure super fast at massive scale. It can be used for ad-hoc command execution, service deployme

Nick Barrett 2.1k Dec 29, 2022
Dynamic DNS service

About nsupdate.info https://nsupdate.info is a free dynamic DNS service. nsupdate.info is also the name of the software used to implement it. If you l

nsupdate.info development 880 Jan 4, 2023
Prometheus exporter for AWS Simple Queue Service (SQS)

Prometheus SQS Exporter Prometheus exporter for AWS Simple Queue Service (SQS) Metrics Metric Description ApproximateNumberOfMessages Returns the appr

Gabriel M. Dutra 0 Jan 31, 2022
A tool to convert AWS EC2 instances back and forth between On-Demand and Spot billing models.

ec2-spot-converter This tool converts existing AWS EC2 instances back and forth between On-Demand and 'persistent' Spot billing models while preservin

jcjorel 152 Dec 29, 2022
a CLI that provides a generic automation layer for assessing the security of ML models

Counterfit About | Getting Started | Learn More | Acknowledgments | Contributing | Trademarks | Contact Us -------------------------------------------

Microsoft Azure 575 Jan 2, 2023
Let's learn how to build, release and operate your containerized applications to Amazon ECS and AWS Fargate using AWS Copilot.

?? Welcome to AWS Copilot Workshop In this workshop, you'll learn how to build, release and operate your containerised applications to Amazon ECS and

Donnie Prakoso 15 Jul 14, 2022
Python utility function to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed

iterable-subprocess Python utility function to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be

Department for International Trade 5 Jul 10, 2022
A lobby boy will create a VPS server when you need one, and destroy it after using it.

Lobbyboy What is a lobby boy? A lobby boy is completely invisible, yet always in sight. A lobby boy remembers what people hate. A lobby boy anticipate

null 226 Dec 29, 2022
DAMPP (gui) is a Python based program to run simple webservers using MySQL, Php, Apache and PhpMyAdmin inside of Docker containers.

DAMPP (gui) is a Python based program to run simple webservers using MySQL, Php, Apache and PhpMyAdmin inside of Docker containers.

Sehan Weerasekara 1 Feb 19, 2022
Lima is an alternative to using Docker Desktop on your Mac.

lima-xbar-plugin Table of Contents Description Installation Dependencies Lima is an alternative to using Docker Desktop on your Mac. Description This

Joe Block 68 Dec 22, 2022
Convenient tool to manage multiple VMs at once using libvirt

Convenient tool to manage multiple VMs at once using libvirt Installing To install the tool and its dependencies: pip install -e . Getting completion

Cedric Bosdonnat 13 Nov 11, 2022
Bitnami Docker Image for Python using snapshots for the system packages repositories

Python Snapshot packaged by Bitnami What is Python Snapshot? Python is a programming language that lets you work quickly and integrate systems more ef

Bitnami 1 Jan 13, 2022