Model serving at scale

Overview


Run inference at scale

Cortex is an open source platform for large-scale machine learning inference workloads.


Workloads

Realtime APIs - respond to prediction requests in real-time

  • Deploy TensorFlow, PyTorch, and other models.
  • Scale to handle production workloads with server-side batching and request-based autoscaling.
  • Configure rolling updates and live model reloading to update APIs without downtime.
  • Serve many models efficiently with multi-model caching.
  • Perform A/B tests with configurable traffic splitting.
  • Stream performance metrics and structured logs to any monitoring tool.

Batch APIs - run distributed inference on large datasets

  • Deploy TensorFlow, PyTorch, and other models.
  • Configure the number of workers and the compute resources for each worker.
  • Recover from failures with automatic retries and dead letter queues.
  • Stream performance metrics and structured logs to any monitoring tool.

How it works

Implement a Predictor

# predictor.py

from transformers import pipeline

class PythonPredictor:
    def __init__(self, config):
        self.model = pipeline(task="text-generation")

    def predict(self, payload):
        return self.model(payload["text"])[0]

Configure a realtime API

# text_generator.yaml

- name: text-generator
  kind: RealtimeAPI
  predictor:
    type: python
    path: predictor.py
  compute:
    gpu: 1
    mem: 8Gi
  autoscaling:
    min_replicas: 1
    max_replicas: 10

Deploy

$ cortex deploy text_generator.yaml

# creating http://example.com/text-generator

Serve prediction requests

$ curl http://example.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'

Get started

Comments
  • Display realtime output

    Display realtime output

    I have a text-generator language model that is compressed and is in .bin format, and it can accessed by command line terminal. It generates about one word per second and prints out every word in realtime when accessed through command line. I would like to deploy my model using cortex but I'm struggling to get the output in realtime word-by-word. Right now my code prints out only one line a time.

    import subprocess
    def run_command(text):
        command = ['./mycommand', text]
        process = subprocess.Popen(command, stdout=subprocess.PIPE, universal_newlines=True, bufsize=-1)
        while True:
            output = process.stdout.readline()
            if output == '' and process.poll() is not None:
                break
            if output:
                print(output.strip())
    
    run_command('TEXT')
    

    One line may include about 20 words, so that means 1 line will be displayed every 20 seconds (since my model outputs roughly 1 word/ second). I would really like the output to be more dynamic and output one word at a time (as it does through command line) instead of just one line. Is there a way this can be achieved?

    question 
    opened by AbbeKamalov 61
  • Persistent private instances

    Persistent private instances

    I would like to use Cortex functionality, to create an application where each user will be able to request and communicate with AWS instance for a period of time. In this scenario, data of each user will be processed and stored on one whole AWS instance. From the documentation, I understand that each API call will use an instance that it is not busy at the moment. It wouldn’t be ideal if by making an API call, a user would receive sensitive data stored by a another user on the same instance. Would it be possible to somehow mark an instance to which an API call is being made? That way the data of individual users wouldn’t be made accesible to everyone, but only to those users who request/use an instance.

    question 
    opened by da-source 41
  • Resource exhausted error

    Resource exhausted error

    I'm trying to send audio files, which are fairly large, to the server and am getting a resource exhausted error. Is there any way to configure the server in order to increase the maximum allowed message size?

    Here's the stack trace:

    2020-12-24 23:30:14.941839:cortex:pid-2247:INFO:500 Internal Server Error POST /
    2020-12-24 23:30:14.942071:cortex:pid-2247:ERROR:Exception in ASGI application
    Traceback (most recent call last):
      File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line
    390, in run_asgi
        result = await app(self.scope, self.receive, self.send)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
        return await self.app(scope, receive, send)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/applications.py", line 181, in __call__
        await super().__call__(scope, receive, send)  # pragma: no cover
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
        await self.middleware_stack(scope, receive, send)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
        raise exc from None
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
        await self.app(scope, receive, _send)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
        response = await self.dispatch_func(request, self.call_next)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 187, in parse_payload
        return await call_next(request)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
        task.result()
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
        await self.app(scope, receive, send)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
        response = await self.dispatch_func(request, self.call_next)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 134, in register_request
        response = await call_next(request)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
        task.result()
     File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
        await self.app(scope, receive, send)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
        raise exc from None
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
        await self.app(scope, receive, sender)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
        await route.handle(scope, receive, send)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
        await self.app(scope, receive, send)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
        response = await func(request)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 183, in app
        dependant=dependant, values=values, is_coroutine=is_coroutine
      File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 135, in run_endpoint_function
        return await run_in_threadpool(dependant.call, **values)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
        return await loop.run_in_executor(None, func, *args)
      File "/opt/conda/envs/env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 200, in predict
        prediction = predictor_impl.predict(**kwargs)
      File "/mnt/project/serving/cortex_server.py", line 10, in predict
        return self.client.predict({"waveform": np.array(payload["audio"]).astype("float32")})
      File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
    114, in predict
        return self._run_inference(model_input, consts.SINGLE_MODEL_NAME, model_version)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
    164, in _run_inference
        return self._client.predict(model_input, model_name, model_version)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/model/tfs.py", line 376, in
    predict
        response_proto = self._pred.Predict(prediction_request, timeout=timeout)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
        return _end_unary_response_blocking(state, call, False, None)
      File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
        raise _InactiveRpcError(state)
    grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
            status = StatusCode.RESOURCE_EXHAUSTED
            details = "Received message larger than max (102484524 vs. 4194304)"
            debug_error_string = "{"created":"@1608852614.937822193","description":"Received message larger
    than max (102484524 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"
    
    bug 
    opened by lminer 24
  • Per Process GPU Ram

    Per Process GPU Ram

    As you have mentioned in docs gpus.md about the limiting the gpu ram, I know what exactly code snippet I have to use, but dont know exactly where I have to write that code snippet in cortex source code.

    mem_limit_mb = 1024 for gpu in tf.config.list_physical_devices("GPU"): tf.config.set_logical_device_configuration( gpu, [tf.config.LogicalDeviceConfiguration(memory_limit=mem_limit_mb)])

    question 
    opened by akash-harijan 23
  • Support aws_session_token for CLI auth

    Support aws_session_token for CLI auth

    Description

    In order to authenticate with the Cortex operator, the Cortex CLI should be able to use aws_session_token (currently only static credentials are supported).

    Also, consider enabling auth via IAM role (e.g. inherited from Lambda, EC2)

    enhancement research 
    opened by deliahu 22
  • Package Cortex library into .ZIP

    Package Cortex library into .ZIP

    I'm trying to create a microservice to manage my cluster via Cortex and Lambda. AWS Lambda requires python dependencies to be packaged and uploaded as a .zip files. How can I package Cortex library to .zip?

    question 
    opened by imagine3D-ai 18
  • How to make Cortex XmlHttpRequest on HTTPS page?

    How to make Cortex XmlHttpRequest on HTTPS page?

    I have website which runs on https:// and I can't make Cortex API XmlHttpRequest requests over it.

    When running on localhost using http://, everything works fine:

    function postData(url = '', data = {}) {
      // Default options are marked with *
      const response = await fetch(url, {
        method: 'POST', // *GET, POST, PUT, DELETE, etc.
        mode: 'cors', // no-cors, *cors, same-origin
        cache: 'no-cache', // *default, no-cache, reload, force-cache, only-if-cached
        credentials: 'same-origin', // include, *same-origin, omit
        headers: {
          'Content-Type': 'application/json'
          // 'Content-Type': 'application/x-www-form-urlencoded',
        },
        redirect: 'follow', // manual, *follow, error
        referrerPolicy: 'no-referrer', // no-referrer, *no-referrer-when-downgrade, origin, origin-when-cross-origin, same-origin, strict-origin, strict-origin-when-cross-origin, unsafe-url
        body: JSON.stringify(data) // body data type must match "Content-Type" header
      });
      return response.json(); // parses JSON response into native JavaScript objects
    }
    

    But when making the same request on a https:// page gives following:

    Mixed Content: The page at 'https://www.@' was loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint 'http://a6cc8d4dee22a448e81bb29862332bf0-93580d7c9d7d2256.elb.us-east-2.amazonaws.com/newtest-user'. This request has been blocked; the content must be served over HTTPS.

    How can I access Cortex API over HTTPS?

    question 
    opened by imagine3D-ai 17
  • upstream connect error or disconnect/reset before headers. reset reason: connection failure

    upstream connect error or disconnect/reset before headers. reset reason: connection failure

    Version

    cli version: 0.18.1

    Description

    Intermittent 503 errors on AWS cluster.

    Configuration

    cortex.yaml

    # cortex.yaml
    
    - name: offer-features
      predictor:
        type: python
        path: predictor.py
        config:
          bucket: XXXXXXXXXXXXXXXXXXXX
      compute:
        cpu: 1  # CPU request per replica, e.g. 200m or 1 (200m is equivalent to 0.2) (default: 200m)
        gpu: 0  # GPU request per replica (default: 0)
        inf: 0 # Inferentia ASIC request per replica (default: 0)
        mem: 1Gi
      autoscaling:
        min_replicas: 2
        max_replicas: 3
        init_replicas: 2
        max_replica_concurrency: 13
        target_replica_concurrency: 5
        window: 1m0s
        downscale_stabilization_period: 5m0s
        upscale_stabilization_period: 1m0s
        max_downscale_factor: 0.75
        max_upscale_factor: 1.5
        downscale_tolerance: 0.05
        upscale_tolerance: 0.05
    
    # cluster.yaml
    
    # AWS credentials (if not specified, ~/.aws/credentials will be checked) (can be overridden by $AWS_ACCESS_KEY_ID and $AWS_SECRET_ACCESS_KEY)
    aws_access_key_id: XXXXXXXXXXXXXX
    aws_secret_access_key: XXXXXXXXXXXXXXXXX
    
    # optional AWS credentials for the operator which may be used to restrict its AWS access (defaults to the AWS credentials set above)
    cortex_aws_access_key_id: XXXXXXXXXXXXXXXX
    cortex_aws_secret_access_key: XXXXXXXXXXXXXXXXXXXXX
    
    # EKS cluster name for cortex (default: cortex)
    cluster_name: cortex
    
    # AWS region
    region: us-east-1
    
    # S3 bucket (default: <cluster_name>-<RANDOM_ID>)
    # note: your cortex cluster uses this bucket for metadata storage, and it should not be accessed directly (a separate bucket should be used for your models)
    bucket: # cortex-<RANDOM_ID>
    
    # list of availability zones for your region (default: 3 random availability zones from the specified region)
    availability_zones: # e.g. [us-east-1a, us-east-1b, us-east-1c]
    
    # instance type
    instance_type: t3.medium
    
    # minimum number of instances (must be >= 0)
    min_instances: 1
    
    # maximum number of instances (must be >= 1)
    max_instances: 5
    
    # disk storage size per instance (GB) (default: 50)
    instance_volume_size: 50
    
    # instance volume type [gp2, io1, st1, sc1] (default: gp2)
    instance_volume_type: gp2
    
    # instance volume iops (only applicable to io1 storage type) (default: 3000)
    # instance_volume_iops: 3000
    
    # whether the subnets used for EC2 instances should be public or private (default: "public")
    # if "public", instances will be assigned public IP addresses; if "private", instances won't have public IPs and a NAT gateway will be created to allow outgoing network requests
    # see https://docs.cortex.dev/v/0.18/miscellaneous/security#private-cluster for more information
    subnet_visibility: public  # must be "public" or "private"
    
    # whether to include a NAT gateway with the cluster (a NAT gateway is necessary when using private subnets)
    # default value is "none" if subnet_visibility is set to "public"; "single" if subnet_visibility is "private"
    nat_gateway: none  # must be "none", "single", or "highly_available" (highly_available means one NAT gateway per availability zone)
    
    # whether the API load balancer should be internet-facing or internal (default: "internet-facing")
    # note: if using "internal", APIs will still be accessible via the public API Gateway endpoint unless you also disable API Gateway in your API's configuration (if you do that, you must configure VPC Peering to connect to your APIs)
    # see https://docs.cortex.dev/v/0.18/miscellaneous/security#private-cluster for more information
    api_load_balancer_scheme: internet-facing  # must be "internet-facing" or "internal"
    
    # whether the operator load balancer should be internet-facing or internal (default: "internet-facing")
    # note: if using "internal", you must configure VPC Peering to connect your CLI to your cluster operator (https://docs.cortex.dev/v/0.18/guides/vpc-peering)
    # see https://docs.cortex.dev/v/0.18/miscellaneous/security#private-cluster for more information
    operator_load_balancer_scheme: internet-facing  # must be "internet-facing" or "internal"
    
    # CloudWatch log group for cortex (default: <cluster_name>)
    log_group: cortex
    
    # additional tags to assign to aws resources for labelling and cost allocation (by default, all resources will be tagged with cortex.dev/cluster-name=<cluster_name>)
    tags:  # <string>: <string> map of key/value pairs
    
    # whether to use spot instances in the cluster (default: false)
    # see https://docs.cortex.dev/v/0.18/cluster-management/spot-instances for additional details on spot configuration
    spot: false
    
    # see https://docs.cortex.dev/v/0.18/guides/custom-domain for instructions on how to set up a custom domain
    ssl_certificate_arn: XXXXXXXXXXXXXXXXXXXXXXXXXXXX
    
    

    Steps to reproduce

    • Spin up instances on AWS.
    • Wait a couple of days / hours (varies).
    • Notice sudden 503 errors

    Expected behavior

    It should work

    Actual behavior

    503 errors with the message

    upstream connect error or disconnect/reset before headers. reset reason: connection failure
    

    Screenshots

    NOTE: The endpoint stopped responding around 15:30 in the graphs below.

    Monitoring nr of bytes in:

    image

    Nr of requests:

    image

    Stack traces

    Nothing useful, just:

    2020-08-16 05:38:34.697979:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:38:37.643022:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:38:40.577522:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:38:42.008412:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:38:43.513294:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:38:45.425255:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:38:48.327276:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:38:51.316962:cortex:pid-447:INFO:200 OK POST /predict
    2020-08-16 05:38:54.009212:cortex:pid-447:INFO:200 OK POST /predict
    2020-08-16 05:38:55.852878:cortex:pid-447:INFO:200 OK POST /predict
    2020-08-16 05:38:57.525264:cortex:pid-447:INFO:200 OK POST /predict
    2020-08-16 05:39:00.795236:cortex:pid-447:INFO:200 OK POST /predict
    2020-08-16 05:39:04.437013:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:39:05.981920:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:39:09.314293:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:39:12.343143:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:39:15.821708:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:39:19.083554:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:39:22.048843:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:39:24.943968:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:39:26.613330:cortex:pid-448:INFO:200 OK POST /predict
    2020-08-16 05:39:29.702703:cortex:pid-448:INFO:200 OK POST /predict
    
    

    Additional context

    • The prediction takes about ~150ms on my Dell with Intel© Core™ i7-8750H CPU @ 2.20GHz × 6, 32GB Ram.
    • All the load balancer target are marked as "unhealthy", even when they work (i.e. I can send requests and receive 2XX responses)
    • The load balancer healthcheck endpoint returns the following
    /healthz
    {
            "service": {
                    "namespace": "istio-system",
                    "name": "ingressgateway-operator"
            },
            "localEndpoints": 0
    }%  
    

    Suggested solution

    (optional)

    bug 
    opened by cristianmtr 14
  • Add possibility to export environment variables with .env file

    Add possibility to export environment variables with .env file

    Description

    Add support for exporting environment variables from an .env file placed in the root directory of a Cortex project.

    Motivation

    In case the user doesn't want to export environment variables using the predictor:env field in cortex.yaml. A reason for that could be to keep the cortex.yaml deployment clean.

    enhancement 
    opened by RobertLucian 14
  • Is there a way to speed-up API deployment

    Is there a way to speed-up API deployment

    When deploying an API and observing logs, it seems that the most time-consuming part of deployment is:

    2021-01-25 18:37:27.401057:cortex:pid-1:INFO:downloading the project code
    2021-01-25 18:37:27.483562:cortex:pid-1:INFO:downloading the python serving image
    

    Is there a way to somehow make deploying an API quicker?

    question 
    opened by imagine3D-ai 13
  • Why is min_replicas 0 not possible?

    Why is min_replicas 0 not possible?

    We are trying to deploy a text generation API on AWS. We do not expect the API to receive a lot of traffic initially and hence we would like to save some costs. My idea was that min_replicas can be set to 0 which would not keep an instance idle in case the traffic on the API is none. As soon as a new request would come in cortex would spawn a new instance and shut it down once the traffic goes back to 0.

    However, I noticed that setting min_replicas to 0 is invalid. Isn't the above use case a valid one for this? Also, is this a recent change? I vaguely(very) remember that this was possible to do in version 0.20(Please correct me if I'm wrong) but it seems like it is not in 0.26.

    cc @deliahu I opened a new thread here because - 1) It's a different issue than the other thread , 2) Other users might benefit from the conversation here.

    question 
    opened by dakshvar22 13
  • Fix Grafana dashboard for AsyncAPIs

    Fix Grafana dashboard for AsyncAPIs

    Changes

    • Fix typo: async_queue_length -> async_queued so that the list of api_names is populated (currently empty)
    • Use =~ with api_name where missing to enable displaying multiple AsyncAPIs on a panel
    • For the "In-Flight Requests" panel include the api_name in the legend

    Testing

    I have made the corresponding updates manually through the Grafana UI for our deployed Cortex cluster. AsyncAPIs now list in the "Cortex / AsyncAPI" dashboard and the dashboard works when multiple AsyncAPIs are selected.


    checklist:

    • [ ] run make test and make lint
    • [ ] test manually (i.e. build/push all images, restart operator, and re-deploy APIs)
    • [ ] update examples
    • [ ] update docs and add any new files to summary.md (view in gitbook after merging)
    • [ ] cherry-pick into release branches if applicable
    • [ ] alert the dev team if the dev environment changed
    opened by jackmpcollins 0
  • Use of root url

    Use of root url

    I don't really know how to word it correctly, long story short, I need to use the "http://$URL/" instead of http://$URL/$API_NAME" for one of the multiple APIs inside the cluster, I haven't found any way to do it in the documentation, but surely it is implemented.

    question 
    opened by Lunatik00 0
  • Bump sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9

    Bump sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9

    Bumps sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9.

    Release notes

    Sourced from sigs.k8s.io/aws-iam-authenticator's releases.

    v0.5.9

    Changelog

    • 1209cfe2 Bump version in Makefile
    • 029d1dcf Add query parameter validation for multiple parameters

    v0.5.7

    What's Changed

    New Contributors

    Full Changelog: https://github.com/kubernetes-sigs/aws-iam-authenticator/compare/v0.5.6...v0.5.7

    v0.5.6

    Changelog

    Docker Images

    Note: You must log in with the registry ID and your role must have the necessary ECR privileges:

    $(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 602401143452)
    
    • docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6
    • docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6-arm64
    • docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6-amd64

    v0.5.5

    Changelog

    Docker Images

    Note: You must log in with the registry ID and your role must have the necessary ECR privileges:

    $(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 602401143452)
    

    ... (truncated)

    Commits
    • 1209cfe Bump version in Makefile
    • 029d1dc Add query parameter validation for multiple parameters
    • 0a72c12 Merge pull request #455 from jyotimahapatra/rev2
    • 596a043 revert use of upstream yaml parsing
    • 2a9ee95 Merge pull request #448 from jngo2/master
    • fc4e6cb Remove unused imports
    • f0fe605 Remove duplicate InitMetrics
    • 99f04d6 Merge pull request #447 from nckturner/release-0.5.6
    • 9dcb6d1 Faster multiarch docker builds
    • a9cc81b Bump timeout for image build job
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies go 
    opened by dependabot[bot] 1
  • Consider using the CDK SDK for `cortex cluster up / down` commands

    Consider using the CDK SDK for `cortex cluster up / down` commands

    Description

    Replace cloud provider specific code in cortex cluster commands by using the CDK API.

    Motivation

    Make cluster management commands more independent from each cloud provider. Make it easier to use code to define the infrastructure (aka Cortex) in this case.

    enhancement research 
    opened by RobertLucian 0
  • Restrict minimum EC2/EKS IAM policies by resource

    Restrict minimum EC2/EKS IAM policies by resource

    Description

    As it is described in https://docs.cortex.dev/clusters/management/auth#minimum-iam-policy, the current minimum IAM policy is to grant the cortex CLI (and by that extension to eskctl) full control over the EC2/EKS services.

    Motivation

    These should be restricted to a resource-based policy that would limit what an IAM role/user can do. This is especially helpful in bigger corporations where there are more than a handful of developers and the company's policy on what access its devs have is more stringent.

    Additional context

    This seems to be blocked on what eksctl requires: https://eksctl.io/usage/minimum-iam-policies/. Talk to the eksctl team to see if there's a way to further reduce the IAM policy requirements.

    enhancement provisioning 
    opened by RobertLucian 0
Releases(v0.42.1)
  • v0.42.1(Sep 23, 2022)

    v0.42.1

    New features

    • Add support for new set of EC2 instances amongst which the c6 and g5 families can be found https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian)

    Bug fixes

    • Esthetic fix where the VPC CNI logging functionality was triggering warn logs when running the cortex CLI https://github.com/cortexlabs/cortex/pull/2443 (RobertLucian)

    Misc

    • Update Cortex dependency versions; eksctl, EKS to 1.22, AWS IAM, Python, etc https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian, deliahu)
    Source code(tar.gz)
    Source code(zip)
  • v0.42.0(Jan 10, 2022)

    v0.42.0

    New features

    • Add support for the Classic Load Balancer for APIs; the Network Load Balancer remains the default (docs) https://github.com/cortexlabs/cortex/pull/2413 https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian)

    Bug fixes

    • Fix Async API http/tcp probes when probing the empty root path (/) https://github.com/cortexlabs/cortex/pull/2407 (RobertLucian)
    • Fix nil pointer exception in the cortex cluster export command https://github.com/cortexlabs/cortex/pull/2415 https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian)
    • Ensure that user-specified environment variables are ordered deterministically in the Kubernetes deployment spec https://github.com/cortexlabs/cortex/pull/2411 (deliahu)

    Misc

    • Ensure that the batch on-job-complete request contains a valid JSON body https://github.com/cortexlabs/cortex/pull/2409 (RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.41.0(Dec 8, 2021)

    v0.41.0

    New features

    • Support configurable pre_stop command for containers https://github.com/cortexlabs/cortex/pull/2403 (docs) (deliahu)

    Misc

    • Support m6i instance types https://github.com/cortexlabs/cortex/pull/2398 (deliahu)
    • Update to Kubernetes v1.21 https://github.com/cortexlabs/cortex/pull/2398 (deliahu)

    Bug fixes

    • Wait for in-flight requests to reach zero before terminating the proxy container https://github.com/cortexlabs/cortex/pull/2402 (deliahu)
    • Fix cortex get --env command https://github.com/cortexlabs/cortex/pull/2404 (deliahu)
    • Fix cluster price estimate during cortex cluster up for spot node groups with on-demand base capacity https://github.com/cortexlabs/cortex/pull/2406 (RobertLucian)

    Nucleus Model Server

    We have released v0.1.0 of the Nucleus model server!

    Nucleus is a model server for TensorFlow and generic Python models. It is compatible with Cortex clusters, Kubernetes clusters, and any other container-based deployment platforms. Nucleus can also be run locally via Docker compose.

    Some of Nucleus's features include:

    • Generic Python models (PyTorch, ONNX, Sklearn, MLFlow, Numpy, Pandas, etc)
    • TensorFlow models
    • CPU and GPU support
    • Serve models directly from S3 paths
    • Configurable multiprocessing and multithreadding
    • Multi-model endpoints
    • Dynamic server-side request batching
    • Automatic model reloading when new model versions are uploaded to S3
    • Model caching based on LRU policy (on disk and memory)
    • HTTP and gRPC support
    Source code(tar.gz)
    Source code(zip)
  • v0.40.0(Aug 5, 2021)

    v0.40.0

    New features

    • Support concurrency for Async APIs (via the max_concurrency field) https://github.com/cortexlabs/cortex/pull/2376 https://github.com/cortexlabs/cortex/issues/2200 (miguelvr)
    • Add graphs for cluster-wide and per-API cost breakdowns to the cluster metrics dashboard https://github.com/cortexlabs/cortex/pull/2382 https://github.com/cortexlabs/cortex/issues/1962 (RobertLucian)
    • Allow worker nodes containing Async APIs to scale to zero (now a shared async gateway is used, which runs on the operator node group) https://github.com/cortexlabs/cortex/pull/2380 https://github.com/cortexlabs/cortex/issues/2279 (vishalbollu)
    • Add cortex describe API_NAME command for Realtime and Async APIs https://github.com/cortexlabs/cortex/pull/2368 https://github.com/cortexlabs/cortex/issues/2320 https://github.com/cortexlabs/cortex/issues/2359 (RobertLucian)
    • Support updating the priority of an existing node group https://github.com/cortexlabs/cortex/pull/2369 https://github.com/cortexlabs/cortex/issues/2254 (vishalbollu)

    Misc

    • Improve the reporting of API statuses https://github.com/cortexlabs/cortex/pull/2368 https://github.com/cortexlabs/cortex/issues/2320 https://github.com/cortexlabs/cortex/issues/2359 (RobertLucian)
    • Remove the default readiness probe on the target port if a custom readiness probe is specified in the API spec https://github.com/cortexlabs/cortex/pull/2379 (RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.39.1(Jul 21, 2021)

    v0.39.1

    Bug fixes

    • Remove an unnecessary cluster validation which limited the IP ranges that could be used in api_load_balancer_cidr_white_list and operator_load_balancer_cidr_white_list https://github.com/cortexlabs/cortex/pull/2363 (RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.39.0(Jul 20, 2021)

    v0.39.0

    New features

    • Add cortex cluster health command to show the health of the cluster's components https://github.com/cortexlabs/cortex/pull/2313 https://github.com/cortexlabs/cortex/issues/2029 (miguelvr)
    • Forward request headers to AsyncAPIs https://github.com/cortexlabs/cortex/pull/2329 https://github.com/cortexlabs/cortex/issues/2296 (miguelvr)
    • Add metrics dashboard for Task APIs https://github.com/cortexlabs/cortex/pull/2311 https://github.com/cortexlabs/cortex/pull/2322 (RobertLucian)

    Reliability

    • Enable larger cluster sizes (up to 1000 nodes with 10000 pods) by enabling IPVS https://github.com/cortexlabs/cortex/pull/2357 https://github.com/cortexlabs/cortex/issues/1834 (RobertLucian)
    • Automatically limit the rate at which nodes are added to avoid overloading the Kubernetes API server https://github.com/cortexlabs/cortex/pull/2331 https://github.com/cortexlabs/cortex/pull/2338 https://github.com/cortexlabs/cortex/issues/2314 (RobertLucian)
    • Ensure cluster autoscaler availability https://github.com/cortexlabs/cortex/pull/2347 https://github.com/cortexlabs/cortex/issues/2346 (RobertLucian)
    • Improve istiod availability at large scale https://github.com/cortexlabs/cortex/pull/2342 https://github.com/cortexlabs/cortex/issues/2332 (RobertLucian)
    • Reduce metrics shown in cortex get to improve scalability and reliability of the command https://github.com/cortexlabs/cortex/pull/2333 https://github.com/cortexlabs/cortex/issues/2319 (vishalbollu)
    • Show aggregated node statistics in the cluster dashboard https://github.com/cortexlabs/cortex/pull/2336 https://github.com/cortexlabs/cortex/issues/2318 (RobertLucian)

    Bug fixes

    • Ensure that the Content-Type header is properly set to application/json for responses to Async API submissions https://github.com/cortexlabs/cortex/pull/2323 (vishalbollu)
    • Fix pod autoscaler scale-to-zero edge cases https://github.com/cortexlabs/cortex/pull/2350 (miguelvr)
    • Allow autoscaling configuration to be updated on a running API https://github.com/cortexlabs/cortex/pull/2355 (RobertLucian)
    • Fix node group priority calculation for the cluster autoscaler https://github.com/cortexlabs/cortex/pull/2358 https://github.com/cortexlabs/cortex/pull/2343 (RobertLucian, deliahu)
    • Allow the node_groups selector to be updated in a running API https://github.com/cortexlabs/cortex/pull/2354 (RobertLucian)
    • Fix the active replicas graph on the Async API dashboard https://github.com/cortexlabs/cortex/pull/2328 (RobertLucian)

    Docs

    Misc

    • Add a graph of the number of active and queued requests to the Async API dashboard https://github.com/cortexlabs/cortex/pull/2326 https://github.com/cortexlabs/cortex/issues/1960 (deliahu)
    • Add a graph of the number of instances to the cluster dashboard https://github.com/cortexlabs/cortex/pull/2336 https://github.com/cortexlabs/cortex/issues/2318 (RobertLucian)
    • Ensure that cortex cluster info --print-config displays YAML that is consumable by cortex cluster configure https://github.com/cortexlabs/cortex/pull/2324 (vishalbollu)
    Source code(tar.gz)
    Source code(zip)
  • v0.38.0(Jul 6, 2021)

    v0.38.0

    New features

    • Support autoscaling down to zero replicas for Realtime APIs https://github.com/cortexlabs/cortex/pull/2298 https://github.com/cortexlabs/cortex/issues/445 (miguelvr)
    • Allow ssl_certificate_arn, api_load_balancer_cidr_white_list, and operator_load_balancer_cidr_white_list to be updated on an existing cluster (via the cortex cluster configure command) https://github.com/cortexlabs/cortex/pull/2305 https://github.com/cortexlabs/cortex/issues/2107 (vishalbollu)
    • Allow Prometheus's instance type to be configured (docs) https://github.com/cortexlabs/cortex/pull/2307 https://github.com/cortexlabs/cortex/issues/2285 (RobertLucian)
    • Allow multiple Inferentia chips to be assigned to a single container https://github.com/cortexlabs/cortex/pull/2304 https://github.com/cortexlabs/cortex/issues/1123 (deliahu)

    Bug fixes

    • Fix cluster autoscaler's nodegroup priority calculation https://github.com/cortexlabs/cortex/pull/2309 (RobertLucian)

    Misc

    • Various scalability improvements https://github.com/cortexlabs/cortex/pull/2307 https://github.com/cortexlabs/cortex/pull/2304 https://github.com/cortexlabs/cortex/issues/2297 https://github.com/cortexlabs/cortex/issues/2278 https://github.com/cortexlabs/cortex/issues/2285
    • Allow setting a nodegroup's max_instances to 0 https://github.com/cortexlabs/cortex/pull/2310 (RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.37.0(Jun 24, 2021)

    v0.37.0

    New features

    • Support ARM instance types https://github.com/cortexlabs/cortex/pull/2268 https://github.com/cortexlabs/cortex/issues/1528 (RobertLucian)
    • Add cortex cluster configure command to add, remove, or scale nodegroups on a running cluster https://github.com/cortexlabs/cortex/pull/2246 https://github.com/cortexlabs/cortex/issues/2096 (RobertLucian)
    • Add cortex cluster info --print-config command to print the current configuration of a running cluster https://github.com/cortexlabs/cortex/pull/2246 (RobertLucian)
    • Add metrics dashboard for Async APIs https://github.com/cortexlabs/cortex/pull/2242 https://github.com/cortexlabs/cortex/issues/1958 (miguelvr)
    • Support cortex refresh command for Async APIs https://github.com/cortexlabs/cortex/pull/2265 https://github.com/cortexlabs/cortex/issues/2237 (deliahu)

    Breaking changes

    • The cortex cluster scale command has been replaced by the cortex cluster configure command.

    Bug fixes

    • Fix Async API metrics reporting for non-200 response status codes https://github.com/cortexlabs/cortex/pull/2266 (miguelvr)
    • Make batch job metrics persistence resilient to instance termination https://github.com/cortexlabs/cortex/pull/2247 https://github.com/cortexlabs/cortex/issues/2041 (vishalbollu)
    • Make network validations during cortex cluster up more permissive (to avoid unnecessarily failing checks on GovCloud) https://github.com/cortexlabs/cortex/pull/2248 (vishalbollu)
    • Fix Inferentia resource requests https://github.com/cortexlabs/cortex/pull/2250 (RobertLucian)

    Docs

    Misc

    • Improve output of cortex cluster info for running batch jobs https://github.com/cortexlabs/cortex/pull/2270 (deliahu)
    • Persist Batch job metrics regardless of job status https://github.com/cortexlabs/cortex/pull/2244 (miguelvr)
    • Support creating clusters with no node groups https://github.com/cortexlabs/cortex/pull/2269 (deliahu)
    • Improve handling of container startup errors in batch jobs with multiple containers https://github.com/cortexlabs/cortex/pull/2260 https://github.com/cortexlabs/cortex/issues/2217 (vishalbollu)
    • Add CPU and memory resource requests to the proxy and dequeuer containers https://github.com/cortexlabs/cortex/pull/2252 (deliahu)
    Source code(tar.gz)
    Source code(zip)
  • v0.36.0(Jun 8, 2021)

    v0.36.0

    New features

    • Support running arbitrary Docker containers in all workload types (Realtime, Async, Batch, Task) https://github.com/cortexlabs/cortex/pull/2173 (RobertLucian, miguelvr, vishalbollu, deliahu, ospillinger)
    • Support autoscaling Async APIs to zero replicas https://github.com/cortexlabs/cortex/pull/2224 https://github.com/cortexlabs/cortex/issues/2199 (RobertLucian)

    Breaking changes

    • With this release, we have generalized Cortex to exclusively support running arbitrary Docker containers for all workload types (Realtime, Async, Batch, and Task). This enables the use of any model server, programming language, etc. As a result, the API configuration has been updated: the predictor section has been removed, the pod section has been added, and the autoscaling parameters have been modified slightly (depending on the workload type). See updated docs for Realtime, Async, Batch, and Task. If you'd like to to see examples of Dockerizing Python applications, see our test/apis folder.
    • The cortex prepare-debug command has been removed; Cortex now exclusively runs Docker containers, which can be run locally via docker run.
    • The cortex patch command as been removed; its behavior is now identical to cortex deploy.
    • The cortex logs command now prints a CloudWatch Insights URL with a pre-populated query which can be executed to show logs from your workloads, since this is the recommended approach in production. If you wish to stream logs from a pod at random, you can use cortex logs --random-pod (keep in mind that these logs will not include some system logs related to your workload).
    • gRPC support has been temporarily removed; we are working on adding it back in v0.37.

    Bug fixes

    • Handle exception when initializing the Python client when the default environment is not set https://github.com/cortexlabs/cortex/pull/2225 https://github.com/cortexlabs/cortex/issues/2223 (deliahu)

    Docs

    • Document how to configure SMTP in Grafana (e.g to enable email alerts) https://github.com/cortexlabs/cortex/pull/2219 (RobertLucian)

    Misc

    • Show CloudWatch Insights URL with a pre-populated query in the output of cortex logs https://github.com/cortexlabs/cortex/issues/2085 (vishalbollu)
    • Improve efficiency of batch job submission validations https://github.com/cortexlabs/cortex/pull/2179 https://github.com/cortexlabs/cortex/issues/2178 (deliahu)
    Source code(tar.gz)
    Source code(zip)
  • v0.35.0(May 11, 2021)

    v0.35.0

    New features

    • Avoid processing HTTP requests that have been cancelled by the client https://github.com/cortexlabs/cortex/pull/2135 https://github.com/cortexlabs/cortex/issues/1453 (vishalbollu)
    • Support GP3 volumes (and make GP3 the default volume type) https://github.com/cortexlabs/cortex/pull/2130 https://github.com/cortexlabs/cortex/issues/1843 (RobertLucian)
    • Allow setting the shared memory (shm) size for Task APIs https://github.com/cortexlabs/cortex/pull/2132 https://github.com/cortexlabs/cortex/issues/2115 (RobertLucian)
    • Implement automatic 7-day expiration for Async API responses https://github.com/cortexlabs/cortex/pull/2151 (RobertLucian)
    • Add cortex env rename command https://github.com/cortexlabs/cortex/pull/2165 https://github.com/cortexlabs/cortex/issues/1773 (deliahu)

    Breaking changes

    • The Python client methods which deploy Python classes have been separated from the deploy() method. Now, deploy() is used only to deploy project folders, and deploy_realtime_api(), deploy_async_api(), deploy_batch_api(), and deploy_task_api() are for deploying Python classes. (docs)
    • The name of the bucket that Cortex uses for internal purposes is no longer configurable. During cluster creation, Cortex will auto-generate the bucket name (and create the bucket if it doesn't exist). During cluster deletion, the bucket will be emptied (unless the --keep-aws-resources flag is provided to cortex cluster down). Users' files should not be stored in the Cortex internal bucket.

    Bug fixes

    • Fix the number of Async API replicas shown in cortex cluster info https://github.com/cortexlabs/cortex/pull/2140 https://github.com/cortexlabs/cortex/issues/2129 (RobertLucian)

    Misc

    • Delete all cortex-created AWS resources when deleting a cluster, and support the --keep-aws-resources flag with cortex cluster down to preserve AWS resources https://github.com/cortexlabs/cortex/pull/2161 https://github.com/cortexlabs/cortex/issues/1612 (RobertLucian)
    • Validate the user's AWS service quota for number of security groups and in/out rules during cluster creation https://github.com/cortexlabs/cortex/pull/2127 https://github.com/cortexlabs/cortex/issues/2087 (RobertLucian)
    • Allow specifying only one of --min-instances or --max-instances with cortex cluster scale https://github.com/cortexlabs/cortex/pull/2149 (RobertLucian)
    • Use 405 status code for un-implemented Realtime API methods https://github.com/cortexlabs/cortex/pull/2158 (RobertLucian)
    • Decrease file size and project size limits https://github.com/cortexlabs/cortex/pull/2152 (deliahu)
    • Set the default environment name to the cluster name when creating a cluster https://github.com/cortexlabs/cortex/pull/2164 https://github.com/cortexlabs/cortex/issues/1546 (deliahu)
    Source code(tar.gz)
    Source code(zip)
  • v0.34.0(Apr 27, 2021)

    v0.34.0

    New features

    • Support handling GET, PUT, PATCH, and DELETE HTTP requests in Realtime APIs (docs) https://github.com/cortexlabs/cortex/pull/2111 https://github.com/cortexlabs/cortex/issues/2063 (RobertLucian)
    • Support running realtime API containers locally for debugging / development purposes (docs) https://github.com/cortexlabs/cortex/pull/2112 https://github.com/cortexlabs/cortex/issues/2077 (vishalbollu)
    • Support multiple gRPC services / methods (which can be named arbitrarily) in a single Realtime API (docs) https://github.com/cortexlabs/cortex/pull/2111 https://github.com/cortexlabs/cortex/issues/2063 (RobertLucian)
    • Support specifying a list of node groups on which a workload is allowed to run (see configuration docs for Realtime, Async, Batch, or Task APIs) https://github.com/cortexlabs/cortex/pull/2098 https://github.com/cortexlabs/cortex/issues/2034 (RobertLucian)
    • Support AWS GovCloud regions https://github.com/cortexlabs/cortex/pull/2118 https://github.com/cortexlabs/cortex/issues/2103 (vishalbollu)

    Breaking changes

    • "predictor" has been renamed to "handler" throughout the product (API configuration and Python APIs). In addition, as a result of supporting additional HTTP method verbs, predict() has been renamed to handle_post() in Realtime APIs (handle_get(), handle_put(), handle_patch(), and handle_delete() are now also supported). For consistency, predict() has been renamed to handle_async() for Async APIs, and handle_batch() for Batch APIs. See the examples for Realtime, Async, and Batch APIs. Task APIs have not been changed.

    Bug fixes

    • Fix invalid Async workload status during processing https://github.com/cortexlabs/cortex/pull/2106 https://github.com/cortexlabs/cortex/issues/2104 (RobertLucian)

    Docs

    Misc

    • Support json output for the cortex cluster info command https://github.com/cortexlabs/cortex/pull/2089 https://github.com/cortexlabs/cortex/issues/2062 (RobertLucian)
    • Allow nodegroups to be scaled down to max_instances == 0 https://github.com/cortexlabs/cortex/pull/2095 (deliahu)
    Source code(tar.gz)
    Source code(zip)
  • v0.33.0(Apr 13, 2021)

    v0.33.0

    New features

    • Allow specifying a CIDR range whitelist for APIs and the operator (docs) https://github.com/cortexlabs/cortex/pull/2071 https://github.com/cortexlabs/cortex/issues/2003 (vishalbollu)
    • Enable CORS for async, batch, and task APIs https://github.com/cortexlabs/cortex/pull/2082 https://github.com/cortexlabs/cortex/issues/2073 (deliahu)

    Breaking changes

    • The onnx predictor type has been replaced by the python predictor type; please use the python predictor type instead (all onnx models are fully supported by the python predictor type)

    Bug fixes

    • Fix bug affecting async api consistency during heavy traffic https://github.com/cortexlabs/cortex/pull/2072 (RobertLucian)
    • Fix bug affecting async api updates https://github.com/cortexlabs/cortex/pull/2067 (vishalbollu)

    Misc

    • Rename cortex cluster configure command to cortex cluster scale https://github.com/cortexlabs/cortex/pull/2040 https://github.com/cortexlabs/cortex/issues/1972 (RobertLucian)
    • Disable AZRebalance autoscaling group process https://github.com/cortexlabs/cortex/pull/2042 https://github.com/cortexlabs/cortex/issues/1349 (RobertLucian)
    • Add horizontal pod autoscaler to async API gateway https://github.com/cortexlabs/cortex/pull/2079 https://github.com/cortexlabs/cortex/issues/2078 (RobertLucian)
    • Rename async modules to async_api to avoid name collision with the reserved keyword in Python 3.7+ https://github.com/cortexlabs/cortex/pull/2066 https://github.com/cortexlabs/cortex/issues/2052 (vishalbollu)
    • Backup images to dockerhub https://github.com/cortexlabs/cortex/pull/2081 (vishalbollu)
    • Add additional debugging info for cluster up failures https://github.com/cortexlabs/cortex/pull/2080 https://github.com/cortexlabs/cortex/issues/2027 (vishalbollu)
    Source code(tar.gz)
    Source code(zip)
  • v0.32.0(Mar 30, 2021)

    v0.32.0

    New features

    • Add gRPC support to realtime APIs (docs) https://github.com/cortexlabs/cortex/pull/1997 https://github.com/cortexlabs/cortex/issues/1056 (RobertLucian)
    • Add support for ONNX and TensorFlow predictor types in async APIs (docs) https://github.com/cortexlabs/cortex/pull/1996 https://github.com/cortexlabs/cortex/issues/1980 (miguelvr)
    • Support using ECR images from other AWS accounts and regions https://github.com/cortexlabs/cortex/pull/2011 https://github.com/cortexlabs/cortex/issues/1988 (vishalbollu)

    Breaking changes

    • GCP support has been removed so that we can focus our efforts on improving the scalability, reliability, and security for Cortex on AWS. Cortex on GCP will still be available in v0.31. If you are currently using Cortex on GCP, our team will be happy to help you migrate to AWS or work with you to find alternative solutions. Please feel free to reach out to us on slack or email us at [email protected] if you're interested.

    Bug fixes

    • Fix memory plots on Grafana dashboards for realtime and batch APIs https://github.com/cortexlabs/cortex/pull/2024 https://github.com/cortexlabs/cortex/pull/2014 https://github.com/cortexlabs/cortex/issues/1970 (RobertLucian)

    Docs

    • Misc docs improvements https://github.com/cortexlabs/cortex/pull/1994 (ospillinger)

    Misc

    • Increase kubelet's registryPullQPS limit from 5 to 10 https://github.com/cortexlabs/cortex/pull/2023 https://github.com/cortexlabs/cortex/issues/1989 (miguelvr)
    • Pin the AMI version https://github.com/cortexlabs/cortex/pull/2010 https://github.com/cortexlabs/cortex/issues/1975 https://github.com/cortexlabs/cortex/issues/1615 (vishalbollu)
    Source code(tar.gz)
    Source code(zip)
  • v0.31.1(Mar 23, 2021)

    v0.31.1

    Bug fixes

    • Preemptible node pools on GCP aren't autoscaling https://github.com/cortexlabs/cortex/pull/1981 (vishalbollu)
    • Replica autoscaler targets incorrect deployments on operator restart https://github.com/cortexlabs/cortex/pull/1982 (miguelvr)
    • Replica autoscaler is not reinitialized for running APIs on operator restart on GCP https://github.com/cortexlabs/cortex/pull/1984 (vishalbollu)
    Source code(tar.gz)
    Source code(zip)
  • v0.31.0(Mar 17, 2021)

    v0.31.0

    New features

    • Add support for AsyncAPI (experimental) (docs) https://github.com/cortexlabs/cortex/pull/1935 https://github.com/cortexlabs/cortex/issues/1610 (miguelvr)
    • Add support for multi-instance-type clusters to AWS/GCP providers (experimental) (aws/gcp docs) https://github.com/cortexlabs/cortex/pull/1951 (RobertLucian)
    • Allow users to duplicate/mirror traffic using shadow pipelines https://github.com/cortexlabs/cortex/pull/1948 https://github.com/cortexlabs/cortex/issues/1889 (docs) (vishalbollu)

    Breaking changes

    • on_demand_backup in cluster configuration has been removed in favour of using a cluster with a mixture of spot and on-demand nodegroups. See multi-instance documentation for aws and gcp for more details.

    Bug fixes

    • Fix Python client not respecting CORTEX_CLI_CONFIG_DIR environment variable for client-id.txt https://github.com/cortexlabs/cortex/pull/1953 (jackmpcollins)
    • Prevent threads from being stuck in DynamicBatcher https://github.com/cortexlabs/cortex/pull/1915 (cbensimon)
    • Fix unexpected cortex logs termination by increasing buffer size https://github.com/cortexlabs/cortex/pull/1939 (vishalbollu)
    • Decouple cluster deletion from EBS volume deletion for cortex cluster down https://github.com/cortexlabs/cortex/pull/1954 (deliahu)
    • Fix spot/on-demand GPU instances not joining the cluster by upgrading to eksctl 0.40.0 https://github.com/cortexlabs/cortex/pull/1955 (vishalbollu)
    • Prevent premature queue not found errors by preserving the SQS for minutes till after the job has completed https://github.com/cortexlabs/cortex/pull/1952 (vishalbollu)

    Docs

    • Update docs https://github.com/cortexlabs/cortex/pull/1949 (ospillinger)

    Misc

    • Configure a default cortex client to manage APIs from with cortex workloads https://github.com/cortexlabs/cortex/pull/1942 https://github.com/cortexlabs/cortex/issues/1644 (RobertLucian)
    • Save batch metrics to cloud to preserve job metrics history https://github.com/cortexlabs/cortex/pull/1940 (vishalbollu)
    Source code(tar.gz)
    Source code(zip)
  • v0.30.0(Mar 3, 2021)

    v0.30.0

    New features

    • Record custom metrics from predictors and view them in Grafana (docs) https://github.com/cortexlabs/cortex/pull/1910 https://github.com/cortexlabs/cortex/issues/1897 (miguelvr)
    • Add granular pod metrics to the Grafana dashboards https://github.com/cortexlabs/cortex/pull/1905 (RobertLucian)
    • Add node metrics to Grafana dashboards https://github.com/cortexlabs/cortex/pull/1900 (miguelvr)

    Breaking changes

    • Remove support for installing Cortex on your own Kubernetes Cluster https://github.com/cortexlabs/cortex/pull/1921 (RobertLucian)

    Bug fixes

    • Fix bug where successfully completed jobs were marked as completed with errors https://github.com/cortexlabs/cortex/pull/1913 (vishalbollu)
    • Fix bug where batch jobs were being terminated unnecessarily https://github.com/cortexlabs/cortex/pull/1917 (vishalbollu)
    • Prevent cluster autoscaler from reallocating job pods https://github.com/cortexlabs/cortex/pull/1919 (vishalbollu)
    • Address AWS cluster up quota issues such not enough NAT Gateways or EIPs https://github.com/cortexlabs/cortex/pull/1912 (RobertLucian)
    • Delete unused prometheus volume on cluster down https://github.com/cortexlabs/cortex/pull/1863 (miguelvr)
    • Create .cortex dir if not present https://github.com/cortexlabs/cortex/pull/1909 (RobertLucian)

    Docs

    • Add docs for accessing dashboard through private load balancer (docs) https://github.com/cortexlabs/cortex/pull/1907 (deliahu)

    Misc

    • Allow specifying paths for requirements.txt, conda-packages.txt & dependencies.sh (docs) https://github.com/cortexlabs/cortex/pull/1896 https://github.com/cortexlabs/cortex/pull/1927 https://github.com/cortexlabs/cortex/issues/1777 (miguelvr)
    • Log relevant kubernetes events to API specific log streams https://github.com/cortexlabs/cortex/pull/1906 https://github.com/cortexlabs/cortex/issues/833 (miguelvr)
    • Support credentials using AWS_SESSION_TOKEN with the CLI/Client (docs) https://github.com/cortexlabs/cortex/pull/1908 https://github.com/cortexlabs/cortex/pull/1920 https://github.com/cortexlabs/cortex/issues/1134 https://github.com/cortexlabs/cortex/issues/1865 (vishalbollu)
    • Provide auth to Operator and APIs by attaching IAM policies to the cluster (docs) https://github.com/cortexlabs/cortex/pull/1908 https://github.com/cortexlabs/cortex/issues/1858 (vishalbollu)
    Source code(tar.gz)
    Source code(zip)
  • v0.29.0(Feb 17, 2021)

    v0.29.0

    New features

    • Add Grafana dashboard for APIs (docs) https://github.com/cortexlabs/cortex/pull/1867 https://github.com/cortexlabs/cortex/pull/1885 https://github.com/cortexlabs/cortex/pull/1890 https://github.com/cortexlabs/cortex/pull/1887 (miguelvr)
    • Support API autoscaling in GCP clusters (docs) https://github.com/cortexlabs/cortex/pull/1814 https://github.com/cortexlabs/cortex/pull/1879 https://github.com/cortexlabs/cortex/issues/1601 (miguelvr)
    • Support traffic splitting in GCP clusters (docs) https://github.com/cortexlabs/cortex/pull/1892 https://github.com/cortexlabs/cortex/issues/1660 (miguelvr)

    Breaking changes

    • The default Docker images for APIs have been slimmed down to not include packages other than what Cortex requires to function. Therefore, when deploying APIs, it is now necessary to include the dependencies that your predictor needs in requirements.txt (docs) and/or dependencies.sh (docs).

    Bug fixes

    • Disable dynamic batcher for TensorFlow predictor type https://github.com/cortexlabs/cortex/pull/1888 (miguelvr)
    • Support empty directory objects for models saved in S3/GCS https://github.com/cortexlabs/cortex/pull/1830 https://github.com/cortexlabs/cortex/issues/1829 (RobertLucian)
    • Fix bug which prevented Task APIs on GCP from being cleaned up after completion https://github.com/cortexlabs/cortex/pull/1871 (RobertLucian)

    Docs

    • Add documentation for using a version of Python other than the default via dependencies.sh (docs) or custom images (docs) https://github.com/cortexlabs/cortex/pull/1862 https://github.com/cortexlabs/cortex/issues/1779 (RobertLucian)

    Misc

    • Support deploying predictor Python classes from more environments (e.g. from separate Python files, AWS Lambda) https://github.com/cortexlabs/cortex/pull/1883 https://github.com/cortexlabs/cortex/commit/3a1b777d06e660a49b6223badda4c5e8b1fe4ec1 https://github.com/cortexlabs/cortex/issues/1824 https://github.com/cortexlabs/cortex/issues/1826 (vishalbollu)
    • Improve error logging for Batch and Task APIs https://github.com/cortexlabs/cortex/pull/1866 https://github.com/cortexlabs/cortex/issues/1833 (RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.28.0(Feb 3, 2021)

    v0.28.0

    New features

    • Support installing Cortex on an existing Kubernetes cluster (on AWS or GCP) (docs) https://github.com/cortexlabs/cortex/pull/1837 https://github.com/cortexlabs/cortex/issues/1808 (vishalbollu)

    Breaking changes

    • The cloudwatch dashboard has been removed as a result of our switch to Prometheus for metrics aggregation. The dashboard will be replaced with an alternative in an upcoming release.

    Bug fixes

    • Fix bug which can cause requests to APIs from a Python client to timeout during cluster autoscaling https://github.com/cortexlabs/cortex/pull/1841 https://github.com/cortexlabs/cortex/issues/1840 (RobertLucian)
    • Fix bug which can cause downscale_stabilization_period to be disregarded during downscaling https://github.com/cortexlabs/cortex/pull/1847 https://github.com/cortexlabs/cortex/issues/1846 (RobertLucian)

    Misc

    • AWS credentials are no longer required to connect the CLI to the cluster operator. If you need to restrict access to your cluster operator, configure the operator's load balancer to be private by setting operator_load_balancer_scheme: internal in your cluster configuration file, and set up VPC Peering. We plan in supporting a new auth strategy in an upcoming release.
    • Improve S6 error code/signal handling https://github.com/cortexlabs/cortex/pull/1825 https://github.com/cortexlabs/cortex/issues/1703 (RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.27.0(Jan 21, 2021)

    v0.27.0

    New features

    • Add new API type TaskAPI for running arbitrary Python jobs (docs) https://github.com/cortexlabs/cortex/pull/1717 https://github.com/cortexlabs/cortex/issues/253 (miguelvr, RobertLucian)
    • Write Cortex's logs as structured logs, and allow use of Cortex's structured logger in predictors (supports adding extra fields) (aws docs, gcp docs) https://github.com/cortexlabs/cortex/pull/1778 https://github.com/cortexlabs/cortex/pull/1803 https://github.com/cortexlabs/cortex/pull/1804 https://github.com/cortexlabs/cortex/issues/1732 https://github.com/cortexlabs/cortex/issues/1563 (vishalbollu)
    • Support preemptible instances on GCP (docs) https://github.com/cortexlabs/cortex/pull/1791 https://github.com/cortexlabs/cortex/issues/1631 (RobertLucian)
    • Support private load balancers on GCP (docs) https://github.com/cortexlabs/cortex/pull/1786 https://github.com/cortexlabs/cortex/issues/1621 (deliahu)
    • Support GCP instances with multiple GPUs (docs) https://github.com/cortexlabs/cortex/pull/1789 https://github.com/cortexlabs/cortex/issues/1784 (deliahu)

    Breaking changes

    • cortex logs now streams logs from a single replica at random when there are multiple replicas for an API. The recommended way to analyze production logs is via a dedicated logging tool (by default, logs are sent to CloudWatch on AWS and StackDriver on GCP)

    Bug fixes

    • Misc Python client fixes https://github.com/cortexlabs/cortex/pull/1798 https://github.com/cortexlabs/cortex/pull/1782 https://github.com/cortexlabs/cortex/pull/1772 (vishalbollu, RobertLucian)

    Docs

    • Document the shared /mnt directory for TensorFlow predictors https://github.com/cortexlabs/cortex/pull/1802 https://github.com/cortexlabs/cortex/issues/1792 (deliahu)
    • Misc GCP docs improvements https://github.com/cortexlabs/cortex/pull/1799 (deliahu)

    Misc

    • Improve out-of-memory status reporting (RobertLucian)
    • Improve batch job cleanup process https://github.com/cortexlabs/cortex/pull/1797 https://github.com/cortexlabs/cortex/pull/1796 (vishalbollu)
    • Remove grpc msg send/receive limit https://github.com/cortexlabs/cortex/pull/1769 https://github.com/cortexlabs/cortex/issues/1740 (RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.26.0(Jan 6, 2021)

    v0.26.0

    New features

    • Support configuring the log level for APIs (docs) https://github.com/cortexlabs/cortex/pull/1741 https://github.com/cortexlabs/cortex/issues/1484 (RobertLucian)
    • Support creating a cluster in an existing AWS VPC (docs) https://github.com/cortexlabs/cortex/pull/1759 https://github.com/cortexlabs/cortex/issues/1142 (deliahu)
    • Support specifying the GCP network and subnet for the Cortex cluster (docs) https://github.com/cortexlabs/cortex/pull/1752 https://github.com/cortexlabs/cortex/issues/1738 (deliahu)
    • Support configuring shared memory size (shm) for inter-process communication (docs) https://github.com/cortexlabs/cortex/pull/1756 https://github.com/cortexlabs/cortex/issues/1638 (vishalbollu)

    Breaking changes

    • The local provider has been removed. The best way to test your predictor implementation locally is to import it in a separate Python file and call your __init__() and predict() functions directly. The best way to test your API is to deploy it to a dev/test cluster.
    • Built-in support for API Gateway has been removed. If you need to create an https endpoint with valid certs, some options are to set up a custom domain or to manually create an API Gateway.
    • Prediction monitoring has been removed. We are exploring how to build a more powerful and customizable solution for this.
    • The predict CLI command has been deleted. curl, requests, etc. are the best tools for testing APIs.

    Bug fixes

    • For multi-model APIs, allow model names to share a prefix https://github.com/cortexlabs/cortex/pull/1745 https://github.com/cortexlabs/cortex/issues/1699 (RobertLucian)

    Docs

    Source code(tar.gz)
    Source code(zip)
  • v0.25.0(Dec 23, 2020)

    v0.25.0

    New features

    • Support server-side micro batching for the Python predictor (docs) https://github.com/cortexlabs/cortex/pull/1653 https://github.com/cortexlabs/cortex/issues/1382 (miguelvr)
    • Add timeout configuration for batch jobs (docs) https://github.com/cortexlabs/cortex/pull/1712 https://github.com/cortexlabs/cortex/issues/1324 (vishalbollu)
    • Support batch retries (docs) https://github.com/cortexlabs/cortex/pull/1713 https://github.com/cortexlabs/cortex/issues/1540 (lapaniku, vishalbollu)
    • Support sending failed batches to a dead-letter queue (docs) https://github.com/cortexlabs/cortex/pull/1713 https://github.com/cortexlabs/cortex/issues/1541 (lapaniku, vishalbollu)
    • Support installing the cortex Python client in predictors https://github.com/cortexlabs/cortex/pull/1709 https://github.com/cortexlabs/cortex/issues/1670 https://github.com/cortexlabs/cortex/issues/1206 (RobertLucian)

    Breaking changes

    • The predictor.model_path field of the realtime api configuration has been moved to predictor.models.path. In addition, for the Python predictor type, predictor.models has been renamed to predictor.multi_model_reloading. Here is the entire API configuration schema.

    Bug fixes

    • Misc batch reliability improvements https://github.com/cortexlabs/cortex/pull/1705 https://github.com/cortexlabs/cortex/pull/1718 https://github.com/cortexlabs/cortex/pull/1729 (vishalbollu)

    Docs

    • Reorganize the docs structure https://github.com/cortexlabs/cortex/pull/1696 https://github.com/cortexlabs/cortex/pull/1701 https://github.com/cortexlabs/cortex/pull/1704 https://github.com/cortexlabs/cortex/pull/1719 https://github.com/cortexlabs/cortex/issues/1675 (ospillinger)
    • Add GCP to the contributing guide https://github.com/cortexlabs/cortex/pull/1720 https://github.com/cortexlabs/cortex/issues/1654 (deliahu)
    • Add docs for setting up kubectl on GCP https://github.com/cortexlabs/cortex/commit/759b4b144c25cc623e1b385b036f83825d122db7 (deliahu)

    Misc

    • Parse the request body as a string when content type text/plain is specified https://github.com/cortexlabs/cortex/pull/1714 (deliahu)
    • Support paths to single ONNX files in API configuration https://github.com/cortexlabs/cortex/pull/1711 https://github.com/cortexlabs/cortex/issues/1686 (RobertLucian)
    • Support deploying public S3 models on GCP, and public GCS models on AWS https://github.com/cortexlabs/cortex/pull/1694 https://github.com/cortexlabs/cortex/issues/1684 (RobertLucian)
    • Pre-download docker images when creating GCP clusters https://github.com/cortexlabs/cortex/pull/1721 https://github.com/cortexlabs/cortex/issues/1658 (deliahu)
    • Speed up the validation processes for multi-model APIs https://github.com/cortexlabs/cortex/pull/1690 https://github.com/cortexlabs/cortex/issues/1663 (RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.24.1(Dec 13, 2020)

    v0.24.1

    Bug fixes

    • Propagate the exit code from the predictor's initialization so that the API status is set to "error" when initialization fails https://github.com/cortexlabs/cortex/issues/1680 https://github.com/cortexlabs/cortex/pull/1691 (RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.24.0(Dec 9, 2020)

    v0.24.0

    New features

    • Add GCP support: our initial release supports all three predictor types (Python, TensorFlow, ONNX), on CPU or GPU, with live reloading, multi-model caching, and cluster autoscaling https://github.com/cortexlabs/cortex/pull/1655 https://github.com/cortexlabs/cortex/pull/1672 https://github.com/cortexlabs/cortex/pull/1667 https://github.com/cortexlabs/cortex/issues/1661 https://github.com/cortexlabs/cortex/issues/114 https://github.com/cortexlabs/cortex/issues/1600 https://github.com/cortexlabs/cortex/issues/1602 https://github.com/cortexlabs/cortex/issues/1616 https://github.com/cortexlabs/cortex/issues/1624 (RobertLucian, deliahu, vishalbollu)
    • Add the patch command to the CLI and Python client, which can be used to update an API using only the API configuration (without needing to provide the predictor's Python implementation) https://github.com/cortexlabs/cortex/pull/1651 https://github.com/cortexlabs/cortex/pull/1666 https://github.com/cortexlabs/cortex/issues/1329 (vishalbollu)
    • Support deploying predictor Python classes from the Python client https://github.com/cortexlabs/cortex/pull/1587 https://github.com/cortexlabs/cortex/issues/1617 (see the tutorial for an example) (vishalbollu)

    Breaking changes

    • The Python client's deploy() function has been renamed to create_api(), and some of the argument names have changed (docs)

    Bug fixes

    • Enable CORS for APIs accessed via API Gateway or load balancer https://github.com/cortexlabs/cortex/pull/1649 https://github.com/cortexlabs/cortex/issues/1234 (RobertLucian, deliahu)
    • Fix local TensorFlow models when live reloading is enabled https://github.com/cortexlabs/cortex/pull/1668 https://github.com/cortexlabs/cortex/issues/1554 (RobertLucian)
    • Prevent TensorFlow multi-model caching from attempting to download local models from S3 https://github.com/cortexlabs/cortex/pull/1669 https://github.com/cortexlabs/cortex/issues/1598 (RobertLucian)

    Docs

    Misc

    • Improve Python client cross Python version compatibility https://github.com/cortexlabs/cortex/pull/1640 (vishalbollu)
    • Reinstall TensorFlow and ONNX dependencies when the Python version is overridden https://github.com/cortexlabs/cortex/pull/1652 (vishalbollu)
    • Terminate container when bootloader script fails https://github.com/cortexlabs/cortex/pull/1639 (vishalbollu)
    Source code(tar.gz)
    Source code(zip)
  • v0.23.0(Nov 25, 2020)

    v0.23.0

    New features

    • Update Python client deploy() to accept a Python dictionary for API configuration (previously, only a file path was supported) (docs) https://github.com/cortexlabs/cortex/pull/1587 (vishalbollu)
    • Show API deployment history in cortex get API_NAME command https://github.com/cortexlabs/cortex/pull/1544 https://github.com/cortexlabs/cortex/issues/1496 (deliahu)
    • Add cortex export API_NAME and cortex export API_NAME API_ID commands to export specific and historical API deployments https://github.com/cortexlabs/cortex/pull/1544 https://github.com/cortexlabs/cortex/issues/1497 (deliahu)
    • Build and push python-predictor-gpu-slim image with different combinations of cuda and cudnn (cuda10.0-cudnn7, cuda10.1-cudnn7, cuda10.1-cudnn8, cuda10.2-cudnn7, cuda10.2-cudnn8, cuda11.0-cudnn8, cuda11.1-cudnn8) (docs) https://github.com/cortexlabs/cortex/pull/1575 https://github.com/cortexlabs/cortex/issues/1574 (deliahu)

    Bug fixes

    • Allow local deployments of public S3 models without requiring AWS credentials https://github.com/cortexlabs/cortex/pull/1589 https://github.com/cortexlabs/cortex/issues/1588 (RobertLucian)

    Docs

    Misc

    • Remove API request maximum payload size limit https://github.com/cortexlabs/cortex/pull/1583 (deliahu)
    • Switch to Quay docker container registry https://github.com/cortexlabs/cortex/pull/1578 (deliahu, RobertLucian)
    Source code(tar.gz)
    Source code(zip)
  • v0.22.1(Nov 19, 2020)

    v0.22.1

    Bug fixes

    • Set the predictor's working directory to the root Cortex project directory https://github.com/cortexlabs/cortex/pull/1573 https://github.com/cortexlabs/cortex/issues/1572 (deliahu)
    • Allow max_instances to be updated via cortex cluster configure https://github.com/cortexlabs/cortex/pull/1568 https://github.com/cortexlabs/cortex/issues/1567 (deliahu)
    • Gracefully stop the serving container when a multi-processed cron throws exception https://github.com/cortexlabs/cortex/pull/1560 https://github.com/cortexlabs/cortex/issues/1552 (RobertLucian)

    Docs

    • Demonstrate how to make API requests with various payload types (binary, form fields, etc), and show how to access them in predict() https://github.com/cortexlabs/cortex/pull/1566 (docs)
    • Misc docs improvements https://github.com/cortexlabs/cortex/pull/1551 https://github.com/cortexlabs/cortex/pull/1556 c3dab4045a61703cb1db1d5f95776614252f96c0 https://github.com/cortexlabs/cortex/pull/1557 (deliahu, RobertLucian)

    Misc

    • Build and upload the Python package/CLI to a public S3 bucket https://github.com/cortexlabs/cortex/pull/1562 (vishalbollu)
    Source code(tar.gz)
    Source code(zip)
  • v0.22.0(Nov 11, 2020)

    v0.22.0

    New features

    • Multi-model caching: serve a collection of models that is collectively bigger than what will fit in memory (via LRU cache eviction) (docs) https://github.com/cortexlabs/cortex/pull/1428 https://github.com/cortexlabs/cortex/issues/619 (RobertLucian)
    • Live reloading: support updating models in running APIs by adding new versions to the model's S3 directory (docs) https://github.com/cortexlabs/cortex/pull/1428 https://github.com/cortexlabs/cortex/issues/1252 (RobertLucian)
    • Inter-process fairness: distribute requests within an API replica evenly across all processes https://github.com/cortexlabs/cortex/pull/1526 https://github.com/cortexlabs/cortex/issues/839 https://github.com/cortexlabs/cortex/issues/1298 (RobertLucian)
    • Support requests between APIs within the same cluster (docs) https://github.com/cortexlabs/cortex/pull/1503 https://github.com/cortexlabs/cortex/issues/1241 (deliahu)
    • Allow overriding of CLI install path and config directory (via $CORTEX_INSTALL_PATH and $CORTEX_CLI_CONFIG_DIR) (docs) https://github.com/cortexlabs/cortex/pull/1521 https://github.com/cortexlabs/cortex/issues/1222 (deliahu)

    Breaking changes

    • ONNX model paths in API configuration files must now point to a directory containing a single ONNX file, rather than the onnx file itself. For example model_path: s3://cortex-examples/onnx/yolov5-youtube/yolov5s.onnx becomes model_path: s3://cortex-examples/onnx/yolov5-youtube.
    • The --env/-e flag in all cortex cluster commands has been renamed to --configure-env/-e, and if not provided, the environment named aws will no longer be configured in the cortex cluster info command

    Bug fixes

    • Fix intermittent failed requests during rolling updates https://github.com/cortexlabs/cortex/pull/1526 https://github.com/cortexlabs/cortex/issues/814 (RobertLucian)
    • Prevent CLI environments from getting overwritten when multiple cortex cluster commands are run concurrently https://github.com/cortexlabs/cortex/pull/1520 https://github.com/cortexlabs/cortex/issues/1410 (deliahu)

    Docs

    Misc

    • Stagger Predictor __init__() calls to reduce peak memory consumption https://github.com/cortexlabs/cortex/pull/1543 https://github.com/cortexlabs/cortex/issues/1450 (RobertLucian)
    • Add --name/-n and --region/-r flags to cortex cluster info, cortex cluster export, and cortex cluster down commands https://github.com/cortexlabs/cortex/pull/1492 https://github.com/cortexlabs/cortex/issues/1363 (RobertLucian)
    • Rename --env/-e flag to --configure-env/-e in cortex cluster commands and update its behavior https://github.com/cortexlabs/cortex/pull/1533 https://github.com/cortexlabs/cortex/issues/1412 (deliahu)
    • Disallow ARM-based instances, which are not currently supported https://github.com/cortexlabs/cortex/pull/1536 (deliahu)
    • Validate AWS vCPU quota is sufficient for up to max_instances instances when running cortex cluster up and cortex cluster configure https://github.com/cortexlabs/cortex/pull/1537 https://github.com/cortexlabs/cortex/issues/1461 (deliahu)
    Source code(tar.gz)
    Source code(zip)
  • v0.21.0(Oct 27, 2020)

    New features

    • Add Python client: pypi.org/project/cortex https://github.com/cortexlabs/cortex/pull/1449 https://github.com/cortexlabs/cortex/issues/684 (vishalbollu)
    • Add support for private docker image registries (docs) https://github.com/cortexlabs/cortex/pull/1460 https://github.com/cortexlabs/cortex/issues/1113 (deliahu)

    Bug fixes

    • Fix minor BatchAPI bugs https://github.com/cortexlabs/cortex/pull/1471 https://github.com/cortexlabs/cortex/pull/1468 https://github.com/cortexlabs/cortex/pull/1480 https://github.com/cortexlabs/cortex/issues/1473 (vishalbollu, RobertLucian)
    • Bypass instance limit check if AWS's API doesn't provide quota information (this was blocking cluster creation in eu-north-1) https://github.com/cortexlabs/cortex/pull/1439 https://github.com/cortexlabs/cortex/issues/1438 (deliahu)

    Docs

    Misc

    • Change default local port from 8888 to 8890 to avoid port conflicts with Jupyter https://github.com/cortexlabs/cortex/pull/1456 (vishalbollu)
    • Disallow instance types that aren't supported by NLB https://github.com/cortexlabs/cortex/pull/1436 https://github.com/cortexlabs/cortex/issues/1433 (deliahu)
    • Add --cluster-aws-key and --cluster-aws-secret flags to cortex cluster configure command https://github.com/cortexlabs/cortex/pull/1404 (deliahu)
    • Add --output flag to cortex env list command https://github.com/cortexlabs/cortex/pull/1444 (vishalbollu)
    Source code(tar.gz)
    Source code(zip)
  • v0.20.0(Sep 29, 2020)

    v0.20.0

    New features

    • Add cortex cluster export command to export all APIs running in a cluster (docs) https://github.com/cortexlabs/cortex/pull/1368 https://github.com/cortexlabs/cortex/issues/1255 (vishalbollu)
    • Enable users to specify CIDR ranges for the cluster's VPC (docs) https://github.com/cortexlabs/cortex/pull/1388 (vishalbollu)
    • Support json output for CLI commands (via -o/--output json) https://github.com/cortexlabs/cortex/pull/1365 https://github.com/cortexlabs/cortex/issues/1161 (vishalbollu)
    • Support the nvidia device driver (nvidia-container-toolkit) when running locally https://github.com/cortexlabs/cortex/pull/1366 https://github.com/cortexlabs/cortex/issues/1223 (vishalbollu)

    Breaking changes

    • The valid values for api_gateway in the cluster configuration file have been changed from enabled/disabled to public/none (to match the values for networking.api_gateway in the API configuration file).

    Bug fixes

    • Support AWS tags with spaces and valid special characters https://github.com/cortexlabs/cortex/pull/1374 https://github.com/cortexlabs/cortex/pull/1355 https://github.com/cortexlabs/cortex/pull/1380 https://github.com/cortexlabs/cortex/pull/1385 https://github.com/cortexlabs/cortex/issues/1373 (deliahu)
    • Fix tensor shape validation for the TensorFlow predictor https://github.com/cortexlabs/cortex/pull/1311 https://github.com/cortexlabs/cortex/issues/1310 (RobertLucian)
    • Allow cortex cluster * commands to be run from within a docker container https://github.com/cortexlabs/cortex/pull/1370 https://github.com/cortexlabs/cortex/issues/1361 https://github.com/cortexlabs/cortex/issues/1325 (deliahu)

    New examples

    • pytorch/question-generator to generate questions given text and the correct answer (uses transformers and spacy) https://github.com/cortexlabs/cortex/pull/1308 (ismaelc)

    Docs

    • Add documentation for how to install a specific version of the CLI https://github.com/cortexlabs/cortex/pull/1386 https://github.com/cortexlabs/cortex/issues/1244 (vishalbollu)
    • Add sections for overprovisioning and responsiveness to autoscaling docs https://github.com/cortexlabs/cortex/pull/1397 (deliahu)
    • Add documentation for how to allow IAM users who did not create the cortex cluster to run cortex cluster * commands https://github.com/cortexlabs/cortex/pull/1392 https://github.com/cortexlabs/cortex/issues/1391 (deliahu)
    • Add guide for setting up kubectl to access the cluster https://github.com/cortexlabs/cortex/pull/1344 https://github.com/cortexlabs/cortex/issues/1343 (RobertLucian)

    Misc

    • Update sources of AWS credentials for cortex cluster * commands, and improve transparency (docs) https://github.com/cortexlabs/cortex/pull/1378 https://github.com/cortexlabs/cortex/issues/1229 (vishalbollu)
    • Rename cluster api_gateway config values to match API config https://github.com/cortexlabs/cortex/pull/1335 https://github.com/cortexlabs/cortex/issues/1334 (deliahu)
    • Set the default value for networking.api_gateway in the API configuration to none if api gateway is disabled cluster-wide https://github.com/cortexlabs/cortex/pull/1337 https://github.com/cortexlabs/cortex/issues/1336 (deliahu)
    • Support c6g and r6g instances https://github.com/cortexlabs/cortex/pull/1332 https://github.com/cortexlabs/cortex/issues/809 (deliahu)
    • Display autoscaling group activity history when cortex cluster up fails https://github.com/cortexlabs/cortex/pull/1342 https://github.com/cortexlabs/cortex/issues/1340 (deliahu)
    • Print debug info if cortex cluster up times out https://github.com/cortexlabs/cortex/pull/1396 (deliahu)
    • Add Inferentia compute statistics to cortex cluster info command https://github.com/cortexlabs/cortex/pull/1354 https://github.com/cortexlabs/cortex/issues/1304 (RobertLucian)
    • Disable prompts in get-cli.sh if not running interactively https://github.com/cortexlabs/cortex/pull/1372 https://github.com/cortexlabs/cortex/issues/1371 (deliahu)
    • Update cortex help output https://github.com/cortexlabs/cortex/pull/1398 (deliahu)
    Source code(tar.gz)
    Source code(zip)
  • v0.19.0(Aug 25, 2020)

    New features

    • Support batch APIs docs https://github.com/cortexlabs/cortex/pull/1203 https://github.com/cortexlabs/cortex/issues/523 (vishalbollu)
    • Support traffic splitting (enables A/B testing, multi-armed bandit, etc) docs https://github.com/cortexlabs/cortex/pull/1213 https://github.com/cortexlabs/cortex/pull/1270 https://github.com/cortexlabs/cortex/issues/1132 https://github.com/cortexlabs/cortex/issues/275 https://github.com/cortexlabs/cortex/issues/1089 (tthebst)
    • Support server-side request batching for the TensorFlow Predictor docs https://github.com/cortexlabs/cortex/pull/1193 https://github.com/cortexlabs/cortex/issues/1060 (RobertLucian)
    • Add post_predict() method to Predictor interface (runs after the response has been sent) docs https://github.com/cortexlabs/cortex/pull/1237 https://github.com/cortexlabs/cortex/issues/954 (RobertLucian)
    • Support disabling API Gateway cluster-wide docs https://github.com/cortexlabs/cortex/pull/1259 https://github.com/cortexlabs/cortex/issues/1198 (deliahu)
    • Support different CUDA versions for the slim Python Predictor image docs https://github.com/cortexlabs/cortex/pull/1263 https://github.com/cortexlabs/cortex/issues/923 https://github.com/cortexlabs/cortex/issues/1254 (RobertLucian)
    • Add additional widgets to the CloudWatch Dashboard (avg in-flight requests per replica, active replicas) docs https://github.com/cortexlabs/cortex/pull/1181 (RobertLucian)

    Breaking changes

    • kind is now a required top-level field for all API configurations. Existing APIs should add kind: RealtimeAPI. This release adds support for kind: BatchAPI and kind: TrafficSplitter.

    Bug fixes

    • Fix python_path config field https://github.com/cortexlabs/cortex/pull/1202 (deliahu)
    • Fix local TensorFlow deploy from parent directory https://github.com/cortexlabs/cortex/pull/1274 (deliahu)
    • Improve error response for invalid payloads https://github.com/cortexlabs/cortex/pull/1212 https://github.com/cortexlabs/cortex/issues/1208 (RobertLucian)

    New examples

    • onnx/yolov5-youtube https://github.com/cortexlabs/cortex/pull/1201 (dsuess)
    • Update PyTorch text generator example to use Hugging Face transfomers GPT-2 model https://github.com/cortexlabs/cortex/pull/1177 (ospillinger)

    Docs

    • Update tutorial to use the pytorch text-generator example https://github.com/cortexlabs/cortex/pull/1278 https://github.com/cortexlabs/cortex/issues/1256 (deliahu)
    • Improve instructions for updating cluster without downtime https://github.com/cortexlabs/cortex/pull/1261 (deliahu)
    • Mention API Gateway timeout in 404/503 API responses guide https://github.com/cortexlabs/cortex/pull/1264 https://github.com/cortexlabs/cortex/issues/1225 (deliahu)

    Misc

    • Set tags on log groups https://github.com/cortexlabs/cortex/pull/1164 https://github.com/cortexlabs/cortex/issues/1078 (tthebst)
    • Display API metrics in the CLI by API ID (rather than by API name) https://github.com/cortexlabs/cortex/pull/1216 (vishalbollu)
    • Fix recursive error message for deploy/delete CLI commands https://github.com/cortexlabs/cortex/pull/1247 https://github.com/cortexlabs/cortex/issues/1218 (RobertLucian)
    • Add shell completion to .zshrc file during CLI installation https://github.com/cortexlabs/cortex/pull/1265 https://github.com/cortexlabs/cortex/issues/1221 (deliahu)
    • Handle OOM error when project files are too large https://github.com/cortexlabs/cortex/pull/1217 (RobertLucian)
    • Display image pull errors https://github.com/cortexlabs/cortex/pull/1167 https://github.com/cortexlabs/cortex/issues/955 (deliahu)
    • Display local Docker image pull error when out of space https://github.com/cortexlabs/cortex/pull/1238 https://github.com/cortexlabs/cortex/issues/1236 (zouyee)
    Source code(tar.gz)
    Source code(zip)
  • v0.18.1(Jun 30, 2020)

    Bug fixes

    • Fix dynamic axes for ONNX models https://github.com/cortexlabs/cortex/pull/1187 https://github.com/cortexlabs/cortex/issues/1186 (RobertLucian)
    • Fix memory node capacity calculation for multi-api configuration files https://github.com/cortexlabs/cortex/pull/1185 (deliahu)
    • Check cluster-name tag when choosing load balancer for VPC Link integration https://github.com/cortexlabs/cortex/pull/1173 (deliahu)

    New guides

    Misc

    • Delete API Gateway if cluster up fails https://github.com/cortexlabs/cortex/pull/1172 (deliahu)
    • Move image version verification from serve.py to run.sh https://github.com/cortexlabs/cortex/pull/1180 https://github.com/cortexlabs/cortex/pull/1183 (vishalbollu)
    • Add retries for resource tagging during cluster up https://github.com/cortexlabs/cortex/pull/1188 (deliahu)
    • Use info log level when TensorFlow model is being loaded https://github.com/cortexlabs/cortex/pull/1171 (RobertLucian)
    • Increase max number of processes per API replica to 100 https://github.com/cortexlabs/cortex/pull/1166 (RobertLucian)
    • Allow empty cluster config https://github.com/cortexlabs/cortex/pull/1179 (deliahu)
    Source code(tar.gz)
    Source code(zip)
Owner
Cortex Labs
Cortex Labs
A simple rest api serving a deep learning model that classifies human gender based on their faces. (vgg16 transfare learning)

this is a simple rest api serving a deep learning model that classifies human gender based on their faces. (vgg16 transfare learning)

crispengari 5 Dec 9, 2021
Serving PyTorch 1.0 Models as a Web Server in C++

Serving PyTorch Models in C++ This repository contains various examples to perform inference using PyTorch C++ API. Run git clone https://github.com/W

Onur Kaplan 223 Jan 4, 2023
Pytorch library for end-to-end transformer models training and serving

Pytorch library for end-to-end transformer models training and serving

Mikhail Grankin 768 Jan 1, 2023
Boostcamp CV Serving For Python

Boostcamp-CV-Serving Prerequisites MySQL GCP Cloud Storage GCP key file Sentry Streamlit Cloud Secrets: .streamlit/secrets.toml #DO NOT SHARE THIS I

Jungwon Seo 19 Feb 22, 2022
A library for building and serving multi-node distributed faiss indices.

About Distributed faiss index service. A lightweight library that lets you work with FAISS indexes which don't fit into a single server memory. It fol

Meta Research 170 Dec 30, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

Google 3.2k Dec 31, 2022
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 6 Dec 8, 2022
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

El Bruno 3 Mar 30, 2022
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

GraspNet 209 Dec 29, 2022
This repository allows the user to automatically scale a 3D model/mesh/point cloud on Agisoft Metashape

Metashape-Utils This repository allows the user to automatically scale a 3D model/mesh/point cloud on Agisoft Metashape, given a set of 2D coordinates

INSCRIBE 4 Nov 7, 2022
Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

DHF1K =========================================================================== Wenguan Wang, J. Shen, M.-M Cheng and A. Borji, Revisiting Video Sal

Wenguan Wang 126 Dec 3, 2022
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 3, 2023
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

ReConsider ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin

Facebook Research 47 Jul 26, 2022
Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance.

Qualcomm Innovation Center 137 Jan 3, 2023
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Hugging Face 865 Dec 24, 2022
LIAO Shuiying 6 Dec 1, 2022
Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

Yasunori Shimura 12 Nov 9, 2022