Machine Learning automation and tracking

Last update: Jan 4, 2023

Related tags

Deep Learning python kubernetes workflow data-science machine-learning data-engineering model-serving mlops experiment-tracking mlops-workflow

Overview

The Open-Source MLOps Orchestration Framework

MLRun is an open-source MLOps framework that offers an integrative approach to managing your machine-learning pipelines from early development through model development to full pipeline deployment in production. MLRun offers a convenient abstraction layer to a wide variety of technology stacks while empowering data engineers and data scientists to define the feature and models.

The MLRun Architecture

MLRun is composed of the following layers:

Feature and Artifact Store — handles the ingestion, processing, metadata, and storage of data and features across multiple repositories and technologies.
Elastic Serverless Runtimes — converts simple code to scalable and managed microservices with workload-specific runtime engines (such as Kubernetes jobs, Nuclio, Dask, Spark, and Horovod).
ML Pipeline Automation — automates data preparation, model training and testing, deployment of real-time production pipelines, and end-to-end monitoring.
Central Management — provides a unified portal for managing the entire MLOps workflow. The portal includes a UI, a CLI, and an SDK, which are accessible from anywhere.

Key Benefits

MLRun provides the following key benefits:

Rapid deployment of code to production pipelines
Elastic scaling of batch and real-time workloads
Feature management — ingestion, preparation, and monitoring
Works anywhere — your local IDE, multi-cloud, or on-prem

▶ For more information, see the MLRun Python package documentation.

In This Document

General Concept and Motivation
- The Challenge
- Why MLRun?
Installation
- Installation on the Iguazio Data Science Platform
Examples and Tutorial Notebooks
- Additional Examples
Quick-Start Tutorial — Architecture and Usage Guidelines

General Concept and Motivation

The Challenge

As an ML developer or data scientist, you typically want to write code in your preferred local development environment (IDE) or web notebook, and then run the same code on a larger cluster using scale-out containers or functions. When you determine that the code is ready, you or someone else need to transfer the code to an automated ML workflow (for example, using Kubeflow Pipelines). This pipeline should be secure and include capabilities such as logging and monitoring, as well as allow adjustments to relevant components and easy redeployment.

However, the implementation is challenging: various environments ("runtimes") use different configurations, parameters, and data sources. In addition, multiple frameworks and platforms are used to focus on different stages of the development life cycle. This leads to constant development and DevOps/MLOps work.

Furthermore, as your project scales, you need greater computation power or GPUs, and you need to access large-scale data sets. This cannot work on laptops. You need a way to seamlessly run your code on a remote cluster and automatically scale it out.

Why MLRun?

When running ML experiments, you should ideally be able to record and version your code, configuration, outputs, and associated inputs (lineage), so you can easily reproduce and explain your results. The fact that you probably need to use different types of storage (such as files and AWS S3 buckets) and various databases, further complicates the implementation.

Wouldn't it be great if you could write the code once, using your preferred development environment and simple "local" semantics, and then run it as-is on different platforms? Imagine a layer that automates the build process, execution, data movement, scaling, versioning, parameterization, outputs tracking, and more. A world of easily developed, published, or consumed data or ML "functions" that can be used to form complex and large-scale ML pipelines.

In addition, imagine a marketplace of ML functions that includes both open-source templates and your internally developed functions, to support code reuse across projects and companies and thus further accelerate your work.

This is the goal of MLRun.

Note: The code is in early development stages and is provided as a reference. The hope is to foster wide industry collaboration and make all the resources pluggable, so that developers can code to a single API and use various open-source projects or commercial products.

Installation

Run the following command from your Python development environment (such as Jupyter Notebook) to install the MLRun package (mlrun), which includes a Python API library and the mlrun command-line interface (CLI):

pip install mlrun

MLRun requires separate containers for the API and the dashboard (UI). You can also select to use the pre-baked JupyterLab image.

To install and run MLRun locally using Docker or Kubernetes, see the instructions in the MLRun documentation.

Installation on the Iguazio Data Science Platform

MLRun runs as a service on the Iguazio Data Science Platform (version 2.8 and above) —

To access MLRun UI select it from the services screen, consult with Iguazio support for further details.

Examples and Tutorial Notebooks

MLRun has many code examples and tutorial Jupyter notebooks with embedded documentation, ranging from examples of basic tasks to full end-to-end use-case applications, including the following; note that some of the examples are found in other mlrun GitHub repositories:

Learn MLRun basics — examples/mlrun_basics.ipynb
Convert local runs to Kubernetes jobs and create automated pipelines — examples/mlrun_jobs.ipynb
- build and end to end pipeline in a single notebook
- build custom containers and work with shared files and objects
- use model management APIs (log_model, get_model, update_model)
End-to-end ML pipeline— demos/scikit-learn, including:
- Data ingestion and analysis
- Model training
- Verification
- Model deployment
MLRun with scale-out runtimes —
- Distributed TensorFlow with Horovod and MPIJob, including data collection and labeling, model training and serving, and implementation of an automated workflow — demos/image-classification-with-distributed-training
- Serverless model serving with Nuclio — examples/xgb_serving.ipynb
- Dask — examples/mlrun_dask.ipynb
- Spark — examples/mlrun_sparkk8s.ipynb
MLRun project and Git life cycle —
- Load a project from a remote Git location and run pipelines — examples/load-project.ipynb
- Create a new project, functions, and pipelines, and upload to Git — examples/new-project.ipynb
Import and export functions using different modes — examples/mlrun_export_import.ipynb
- save, auto-document, and upload functions
- import functions and run as: module, local-run, and clusterd job
Query the MLRun DB — examples/mlrun_db.ipynb

Additional Examples

Additional end-to-end use-case applications — mlrun/demos repo
MLRun functions Library — mlrun/functions repo

Quick-Start Tutorial — Architecture and Usage Guidelines

Basic Components

MLRun has the following main components:

Project — a container for organizing all of your work on a particular activity. Projects consist of metadata, source code, workflows, data and artifacts, models, triggers, and member management for user collaboration.
Function — a software package with one or more methods and runtime-specific attributes (such as image, command, arguments, and environment).
Run — an object that contains information about an executed function. The run object is created as a result of running a function, and contains the function attributes (such as arguments, inputs, and outputs), as well the execution status and results (including links to output artifacts).
Artifact — versioned data artifacts (such as data sets, files and models) that are produced or consumed by functions, runs, and workflows.
Workflow — defines a functions pipeline or a directed acyclic graph (DAG) to execute using Kubeflow Pipelines.
UI — a graphical user interface (dashboard) for displaying and managing projects and their contained experiments, artifacts, and code.

Managed and Portable Execution

MLRun supports various types of "runtimes" — computation frameworks such as local, Kubernetes job, Dask, Nuclio, Spark, or MPI job (Horovod). Runtimes may support parallelism and clustering to distribute the work among multiple workers (processes/containers).

The following code example creates a task that defines a run specification — including the run parameters, inputs, and secrets. You run the task on a "job" function, and print the result output (in this case, the "model" artifact) or watch the run's progress. For more information and examples, see the examples/mlrun_basics.ipynb notebook.

# Create a task and set its attributes
task = NewTask(handler=handler, name='demo', params={'p1': 5})
task.with_secrets('file', 'secrets.txt').set_label('type', 'demo')

run = new_function(command='myfile.py', kind='job').run(task)
run.logs(watch=True)
run.show()
print(run.artifact('model'))

You can run the same task on different functions — enabling code portability, re-use, and AutoML. You can also use the same function to run different tasks or parameter combinations with minimal coding effort.

Moving from local notebook execution to remote execution — such as running a container job, a scaled-out framework, or an automated workflow engine like Kubeflow Pipelines — is seamless: just swap the runtime function or wire functions in a graph. Continuous build integration and deployment (CI/CD) steps can also be configured as part of the workflow, using the deploy_step function method.

Functions (function objects) can be created by using any of the following methods:

new_function — creates a function "from scratch" or from another function.
code_to_function — creates a function from local or remote source code or from a web notebook.
import_function — imports a function from a local or remote YAML function-configuration file or from a function object in the MLRun database (using a DB address of the format db://<project>/<name>[:<tag>]).
function_to_module — import MLRun function or code as a local python module (can also be used inside another parent function) You can use the save function method to save a function object in the MLRun database, or the export method to save a YAML function-configuration function to your preferred local or remote location. For function-method details and examples, see the embedded documentation/help text.

Automated Parameterization, Artifact Tracking, and Logging

After running a job, you need to be able to track it, including viewing the run parameters, inputs, and outputs. To support this, MLRun introduces a concept of a runtime "context": the code can be set up to get parameters and inputs from the context, as well as log run outputs, artifacts, tags, and time-series metrics in the context.

Example

The following code example from the train-xgboost.ipynb notebook of the MLRun XGBoost demo (demo-xgboost) defines two functions: the iris_generator function loads the Iris data set and saves it to the function's context object; the xgb_train function uses XGBoost to train an ML model on a data set and saves the log results in the function's context:

import xgboost as xgb
import os
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.metrics import accuracy_score
from mlrun.artifacts import PlotArtifact
import pandas as pd


def iris_generator(context):
    iris = load_iris()
    iris_dataset = pd.DataFrame(data=iris.data, columns=iris.feature_names)
    iris_labels = pd.DataFrame(data=iris.target, columns=['label'])
    iris_dataset = pd.concat([iris_dataset, iris_labels], axis=1)
    context.logger.info('Saving Iris data set to "{}"'.format(context.out_path))
    context.log_dataset('iris_dataset', df=iris_dataset)


def xgb_train(context,
              dataset='',
              model_name='model.bst',
              max_depth=6,
              num_class=10,
              eta=0.2,
              gamma=0.1,
              steps=20):

    df = pd.read_csv(dataset)
    X = df.drop(['label'], axis=1)
    y = df['label']

    X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.2)
    dtrain = xgb.DMatrix(X_train, label=Y_train)
    dtest = xgb.DMatrix(X_test, label=Y_test)

    # Get parameters from event
    param = {"max_depth": max_depth,
             "eta": eta, "nthread": 4,
             "num_class": num_class,
             "gamma": gamma,
             "objective": "multi:softprob"}

    xgb_model = xgb.train(param, dtrain, steps)

    preds = xgb_model.predict(dtest)
    best_preds = np.asarray([np.argmax(line) for line in preds])

    context.log_result('accuracy', float(accuracy_score(Y_test, best_preds)))
    context.log_model('model', body=bytes(xgb_model.save_raw()), 
                      model_file='model.txt', 
                      metrics=context.results, parameters={'xx':'abc'},
                      labels={'framework': 'xgboost'},
                      artifact_path=context.artifact_subpath('models'))

The example training function can be executed locally with parameters, and the run results and artifacts can be logged automatically into a database by using a single command, as demonstrated in the following example; the example sets the function's eta parameter:

train_run = run_local(handler=xgb_train, pramas={'eta': 0.3})

Alternatively, you can replace the function with a serverless runtime to run the same code on a remote cluster, which could result in a ~10x performance boost. You can find examples for different runtimes — such as a Kubernetes job, Nuclio, Dask, Spark, or an MPI job — in the MLRun examples directory.

If you run your code from the main function, you can get the runtime context by calling the get_or_create_ctx method, as demonstrated in the following code from the MLRun training.py example application. The code also demonstrates how you can use the context object to read and write execution metadata, parameters, secrets, inputs, and outputs:

from mlrun import get_or_create_ctx
from mlrun.artifacts import ChartArtifact
import pandas as pd


def my_job(context, p1=1, p2='x'):
    # load MLRUN runtime context (will be set by the runtime framework e.g. KubeFlow)

    # get parameters from the runtime context (or use defaults)

    # access input metadata, values, files, and secrets (passwords)
    print(f'Run: {context.name} (uid={context.uid})')
    print(f'Params: p1={p1}, p2={p2}')
    print('accesskey = {}'.format(context.get_secret('ACCESS_KEY')))
    print('file\n{}\n'.format(context.get_input('infile.txt', 'infile.txt').get()))
    
    # Run some useful code e.g. ML training, data prep, etc.

    # log scalar result values (job result metrics)
    context.log_result('accuracy', p1 * 2)
    context.log_result('loss', p1 * 3)
    context.set_label('framework', 'sklearn')

    # log various types of artifacts (file, web page, table), will be versioned and visible in the UI
    context.log_artifact('model', body=b'abc is 123', local_path='model.txt', labels={'framework': 'xgboost'})
    context.log_artifact('html_result', body=b'<b> Some HTML <b>', local_path='result.html')

    # create a chart output (will show in the pipelines UI)
    chart = ChartArtifact('chart')
    chart.labels = {'type': 'roc'}
    chart.header = ['Epoch', 'Accuracy', 'Loss']
    for i in range(1, 8):
        chart.add_row([i, i/20+0.75, 0.30-i/20])
    context.log_artifact(chart)

    raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
                'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
                'age': [42, 52, 36, 24, 73],
                'testScore': [25, 94, 57, 62, 70]}
    df = pd.DataFrame(raw_data, columns=[
        'first_name', 'last_name', 'age', 'testScore'])
    context.log_dataset('mydf', df=df, stats=True)


if __name__ == "__main__":
    context = get_or_create_ctx('train')
    p1 = context.get_param('p1', 1)
    p2 = context.get_param('p2', 'a-string')
    my_job(context, p1, p2)

The example training.py application can be invoked as a local task, as demonstrated in the following code from the MLRun mlrun_basics.ipynb example notebook:

run = run_local(task, command='training.py')

Alternatively, you can invoke the application by using the mlrun CLI; edit the parameters, inputs, and/or secret information, as needed, and ensure that training.py is found in the execution path or edit the file path in the command:

mlrun run --name train -p p2=5 -i infile.txt=s3://my-bucket/infile.txt -s file=secrets.txt training.py

Using Hyperparameters for Job Scaling

Data science involves long computation times and data-intensive tasks. To ensure efficiency and scalability, you need to implement parallelism whenever possible. MLRun supports this by using two mechanisms:

Clustering — run the code on a distributed processing engine (such as Dask, Spark, or Horovod).
Load-balancing/partitioning — split (partition) the work across multiple workers.

MLRun functions and tasks can accept hyperparameters or parameter lists, deploy many parallel workers, and partition the work among the deployed workers. The parallelism implementation is left to the runtime. Each runtime may have its own method of concurrent tasks execution. For example, the Nuclio serverless engine manages many micro threads in the same process, which can run multiple tasks in parallel. In a containerized system like Kubernetes, you can launch multiple containers, each processing a different task.

MLRun supports parallelism. For example, the following code demonstrates how to use hyperparameters to run the XGBoost model-training task from the example in the previous section (xgb_train) with different parameter combinations:

    parameters = {
         "eta":       [0.05, 0.10, 0.20, 0.30],
         "max_depth": [3, 4, 5, 6, 8, 10],
         "gamma":     [0.0, 0.1, 0.2, 0.3],
         }

    task = NewTask(handler=xgb_train, out_path='/User/mlrun/data').with_hyper_params(parameters, 'max.accuracy')
    run = run_local(task)

This code demonstrates how to instruct MLRun to run the same task while choosing the parameters from multiple lists (grid search). MLRun then records all the runs, but marks only the run with minimal loss as the selected result. For parallelism, it would be better to use runtimes like Dask, Nuclio, or jobs.

Alternatively, you can run a similar task (with hyperparameters) by using the MLRun CLI (mlrun); ensure that training.py is found in the execution path or edit the file path in the command:

mlrun run --name train_hyper -x p1="[3,7,5]" -x p2="[5,2,9]" --out-path '/User/mlrun/data' training.py

You can also use a parameters file if you want to control the parameter combinations or if the parameters are more complex. The following code from the example mlrun_basics.ipynb notebook demonstrates how to run a task that uses a CSV parameters file (params.csv in the current directory):

    task = NewTask(handler=xgb_train).with_param_file('params.csv', 'max.accuracy')
    run = run_local(task)

Note: Parameter lists can be used in various ways. For example, you can pass multiple parameter files and use multiple workers to process the files simultaneously instead of one at a time.

Automated Code Deployment and Containerization

MLRun adopts Nuclio serverless technologies for automatically packaging code and building containers. This enables you to provide code with some package requirements and let MLRun build and deploy your software.

To build or deploy a function, all you need is to call the function's deploy method, which initiates a build or deployment job. Deployment jobs can be incorporated in pipelines just like regular jobs (using the deploy_step method of the function or Kubernetes-job runtime), thus enabling full automation and CI/CD.

A functions can be built from source code or from a function specification, web notebook, Git repo, or TAR archive.

A function can also be built by using the mlrun CLI and providing it with the path to a YAML function-configuration file. You can generate such a file by using the to_yaml or export function method. For example, the following CLI code builds a function from a function.yaml file in the current directory:

mlrun build function.yaml

Following is an example function.yaml configuration file:

kind: job
metadata:
  name: remote-git-test
  project: default
  tag: latest
spec:
  command: 'myfunc.py'
  args: []
  image_pull_policy: Always
  build:
    commands: ['pip install pandas']
    base_image: mlrun/mlrun:dev
    source: git://github.com/mlrun/ci-demo.git

For more examples of building and running functions remotely using the MLRun CLI, see the remote example.

You can also convert your web notebook to a containerized job, as demonstrated in the following sample code; for a similar example with more details, see the mlrun_jobs.ipynb example:

# Create an ML function from the notebook code and annotations, and attach a
# v3io Iguazio Data Science Platform data volume to the function
fn = code_to_function(kind='job').apply(mount_v3io())

# Prepare an image from the dependencies to allow updating the code and
# parameters per run without the need to build a new image
fn.build(image='mlrun/nuctest:latest')

Running an ML Workflow with Kubeflow Pipelines

ML pipeline execution with MLRun is similar to CLI execution. A pipeline is created by running an MLRun workflow. MLRun automatically saves outputs and artifacts in a way that is visible to Kubeflow Pipelines, and allows interconnecting steps.

For an example of a full ML pipeline that's implemented in a web notebook, see the Sklearn MLRun demo (demos/scikit-learn). The sklearn-project.ipynb demo notebook includes the following code for implementing an ML-training pipeline:

from kfp import dsl
from mlrun import mount_v3io

funcs = {}
DATASET = 'iris_dataset'
LABELS  = "label"

def init_functions(functions: dict, project=None, secrets=None):
    for f in functions.values():
        f.apply(mount_v3io())
        f.spec.image_pull_policy = 'Always'

@dsl.pipeline(
    name="My XGBoost training pipeline",
    description="Shows how to use mlrun."
)
def kfpipeline():
    
    # build our ingestion function (container image)
    builder = funcs['gen-iris'].deploy_step(skip_deployed=True)
    
    # run the ingestion function with the new image and params
    ingest = funcs['gen-iris'].as_step(
        name="get-data",
        handler='iris_generator',
        image=builder.outputs['image'],
        params={'format': 'pq'},
        outputs=[DATASET])

    # analyze our dataset
    describe = funcs["describe"].as_step(
        name="summary",
        params={"label_column": LABELS},
        inputs={"table": ingest.outputs[DATASET]})
    
    # train with hyper-paremeters 
    train = funcs["train"].as_step(
        name="train-skrf",
        params={"model_pkg_class" : "sklearn.ensemble.RandomForestClassifier",
                "sample"          : -1, 
                "label_column"    : LABELS,
                "test_size"       : 0.10},
        hyperparams={'CLASS_n_estimators': [100, 300, 500]},
        selector='max.accuracy',
        inputs={"dataset"         : ingest.outputs[DATASET]},
        outputs=['model', 'test_set'])

    # test and visualize our model
    test = funcs["test"].as_step(
        name="test",
        params={"label_column": LABELS},
        inputs={"models_path" : train.outputs['model'],
                "test_set"    : train.outputs['test_set']})

    # deploy our model as a serverless function
    deploy = funcs["serving"].deploy_step(models={f"{DATASET}_v1": train.outputs['model']})

Viewing Run Data and Performing Database Operations

When you configure an MLRun database, the results, parameters, and input and output artifacts of each run are recorded in the database. You can view the results and perform operations on the database by using either of the following methods:

Using the MLRun dashboard
Using DB methods from your code

The MLRun Dashboard

The MLRun dashboard is a graphical user interface (GUI) for working with MLRun and viewing run data.

MLRun Database Methods

You can use the get_run_db DB method to get an MLRun DB object for a configured MLRun database or API service. Then, use the DB object's connect method to connect to the database or API service, and use additional methods to perform different operations, such as listing run artifacts or deleting completed runs. For more information and examples, see the mlrun_db.ipynb example notebook, which includes the following sample DB method calls:

from mlrun import get_run_db

# Get an MLRun DB object and connect to an MLRun database/API service.
# Specify the DB path (for example, './' for the current directory) or
# the API URL ('http://mlrun-api:8080' for the default configuration).
db = get_run_db('./')

# List all runs
db.list_runs('').show()

# List all artifacts for version 'latest' (default)
db.list_artifacts('', tag='').show()

# Check different artifact versions
db.list_artifacts('ch', tag='*').show()

# Delete completed runs
db.del_runs(state='completed')

Additional Information and Examples

Replacing Runtime Context Parameters from the CLI
Remote Execution
- Nuclio Example
Running the MLRun Database/API Service

Replacing Runtime Context Parameters from the CLI

You can use the MLRun CLI (mlrun) to run MLRun functions or code and change the parameter values.

For example, the following CLI command runs the example XGBoost training code from the previous tutorial examples:

python -m mlrun run -p p1=5 -s file=secrets.txt -i infile.txt=s3://mybucket/infile.txt training.py

When running this sample command, the CLI executes the code in the training.py application using the provided run information:

The value of parameter p1 is set to 5, overwriting the current parameter value in the run context.
The file infile.txt is downloaded from a remote "mybucket" AWS S3 bucket.
The credentials for the S3 download are retrieved from a secrets.txt file in the current directory.

Remote Execution

You can also run the same MLRun code that you ran locally as a remote HTTP endpoint.

Nuclio Example

For example, you can wrap the XGBoost training code from the previous tutorial examples within a serverless Nuclio handler function, and execute the code remotely using a similar CLI command to the one that you used locally.

You can run the following code from a Jupyter Notebook to create a Nuclio function from the notebook code and annotations, and deploy the function to a remote cluster.

Note:

Before running the code, install the nuclio-jupyter package for using Nuclio from Jupyter Notebook.

The example uses apply(mount_v3io()to attach a v3io Iguazio Data Science Platform data-store volume to the function. By default, the v3io mount mounts the home directory of the platform's running user into the \\User function path.

# Create an `xgb_train` Nuclio function from the notebook code and annotations;
# add a v3io data volume and a multi-worker HTTP trigger for parallel execution
fn = code_to_function('xgb_train', runtime='nuclio:mlrun')
fn.apply(mount_v3io()).with_http(workers=32)

# Deploy the function
run = fn.run(task, handler='xgb_train')

To execute the code remotely, run the same CLI command as in the previous tutorial examples and just substitute the code file name at the end with your function's URL. For example, run the following command and replace <function endpoint> with your remote function endpoint:

mlrun run -p p1=5 -s file=secrets.txt -i infile.txt=s3://mybucket/infile.txt http://<function-endpoint>

Running an MLRun Service

An MLRun service is a web service that manages an MLRun database for tracking and logging MLRun run information, and exposes an HTTP API for working with the database and performing MLRun operations.

You can create and run an MLRun service by using either of the following methods:

Using Docker
Using the MLRun CLI

Note: For both methods, you can optionally configure the service port and/or directory path by setting the MLRUN_httpdb__port and MLRUN_httpdb__dirpath environment variables instead of the respective run parameters or CLI options.

Using the MLRun CLI to Run an MLRun Service

Use the db command of the MLRun CLI (mlrun) to create and run an instance of the MLRun service from the command line:

mlrun db [OPTIONS]

To see the supported options, run mlrun db --help:

Options:
  -p, --port INTEGER  HTTP port for serving the API
  -d, --dirpath TEXT  Path to the MLRun service directory

Comments

Getting error while running workflow on kubernetes
Hi,

I'm using minikube to run kubernetes in local system and trying to run workflow defined in demos/sklearn-pipe/sklearn-project.ipynb but getting the below error message.

Jupyter Cell:

artifact_path = path.abspath('./pipe/{{workflow.uid}}') run_id = skproj.run( 'main', arguments={}, artifact_path=artifact_path, dirty=True)

Error message: MaxRetryError: HTTPConnectionPool(host='ml-pipeline.default.svc.cluster.local', port=8888): Max retries exceeded with url: /apis/v1beta1/experiments (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fea36705a90>: Failed to establish a new connection: [Errno -2] Name or service not known'))

I have followed the instructions mentioned in below readme file https://github.com/mlrun/mlrun/blob/master/hack/local/README.md

Can anyone help me in resolving the error?
opened by narendra36 26
permission error when trying to run pipeline on kubeflow

Hi, I am trying to run the demo notebook sklearn-project on a local kubernetes. I have installed kubeflow.

I get this error when trying to send the pipeline to the api server: 400 Client Error: Bad Request for url: http://mlrun-api:8080/api/projects/sk-project/pipelines?namespace=mlrun&experiment=sk-project-main: details: {'reason': 'MLRunBadRequestError("Failed creating pipeline: HTTPConnectionPool(host='ml-pipeline.mlrun.svc.cluster.local', port=8888): Max retries exceeded with url: /apis/v1beta1/experiments (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9f94278ed0>: Failed to establish a new connection: [Errno -2] Name or service not known'))")'}

I noticed it tried to call the ml-pipeline on the wrong namespace (it uses the one mlrun is installed on). I changed the namespace to "kubeflow" and now I get this error: 400 Client Error: Bad Request for url: http://mlrun-api:8080/api/projects/sk-project/pipelines?namespace=kubeflow&experiment=sk-project-main: details: {'reason': 'MLRunBadRequestError('Failed creating pipeline: (400)\nReason: Bad Request\nHTTP response headers: HTTPHeaderDict({\'content-type\': \'application/json\', \'date\': \'Mon, 08 Nov 2021 13:40:04 GMT\', \'content-length\': \'708\', \'x-envoy-upstream-service-time\': \'1\', \'server\': \'istio-envoy\', \'x-envoy-decorator-operation\': \'ml-pipeline.kubeflow.svc.cluster.local:8888/*\'})\nHTTP response body: {"error":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []","code":3,"message":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []","details":[{"@type":"type.googleapis.com/api.Error","error_message":"Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []","error_details":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []"}]}\n')'}

also keep in mind that the ml-pipeline service is installed on kubeflow, but I probably need to add the experiment on kubeflow-user-example-com namespace (the default example user namespace created when installing kubeflow).

In any case - what am I doing wrong?

opened by ran-haim 20
[Bug]: Max retries exceeded with url: /v2/models/cancer-classifier/infer
MLRun Version checks

[X] I have checked that this issue has not already been reported.

[X] I have confirmed this bug exists on the latest version of the MLRun Kit.

Reproducible Example

original 01-mlrun-basics.ipynb, issues see attached jupyter notebook, you can see this error in case of call serving_fn.invoke("/v2/models/cancer-classifier/infer", body=my_data)

Issue Description

in case of call serving_fn.invoke("/v2/models/cancer-classifier/infer", body=my_data) I got

OSError: error: cannot run function at url http://127.0.0.1:54652/v2/models/cancer-classifier/infer, HTTPConnectionPool(host='127.0.0.1', port=54652): Max retries exceeded with url: /v2/models/cancer-classifier/infer (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f175c75b070>: Failed to establish a new connection: [Errno 111] Connection refused'))

see jupyter Uploading 01-mlrun-basics.ipynb.txt…

Expected Behavior

Invoke without this issue, it can have relation to https://github.com/mlrun/mlrun/issues/2102

Python Version

3.8.8

MLRun Version

1.2.0

Additional Information

No response
opened by j0terry 10

[Installation]: Docker-compose with Jupyter - SQLite database error

Installation check

[X] I have read the installation guide.

Installation OS

Linux

Installation Method

Docker

Kubernetes Cluster Type

N/A - Docker

MLRun Kit Helm Chart Version

Issue Description

SQLite error inside jupyter container.

Installation Logs

> 2022-08-16 15:35:54,802 [info] Initializing DB data
jupyter_1   | > 2022-08-16 15:35:54,802 [debug] Waiting for database liveness
jupyter_1   | > 2022-08-16 15:35:54,802 [debug] SQLite DB is used, liveness check not needed
jupyter_1   | > 2022-08-16 15:35:54,849 [info] No projects in DB, assuming latest data version: {'exc': OperationalError('(sqlite3.OperationalError) unable to open database file'), 'latest_data_version': 2}
jupyter_1   | > 2022-08-16 15:35:54,878 [info] No projects in DB, assuming latest data version: {'exc': OperationalError('(sqlite3.OperationalError) unable to open database file'), 'latest_data_version': 2}
jupyter_1   | > 2022-08-16 15:35:54,878 [info] Checking if migration is needed: {'is_migration_from_scratch': True, 'is_schema_migration_needed': True, 'is_data_migration_needed': False, 'is_database_migration_needed': False, 'is_backup_needed': False, 'is_migration_needed': False}
jupyter_1   | > 2022-08-16 15:35:54,879 [info] Creating initial data
jupyter_1   | > 2022-08-16 15:35:54,879 [info] Performing schema migration
jupyter_1   | > 2022-08-16 15:35:54,879 [debug] Performing alembic schema migrations
jupyter_1   | > 2022-08-16 15:35:54,890 [warning] Migrations failed, changing API state: {'state': 'migrations_failed'}
jupyter_1   | Traceback (most recent call last):
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3208, in _wrap_pool_connect
jupyter_1   |     return fn()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 301, in connect
jupyter_1   |     return _ConnectionFairy._checkout(self)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 761, in _checkout
jupyter_1   |     fairy = _ConnectionRecord.checkout(pool)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 419, in checkout
jupyter_1   |     rec = pool._do_get()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 259, in _do_get
jupyter_1   |     return self._create_connection()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 247, in _create_connection
jupyter_1   |     return _ConnectionRecord(self)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 362, in __init__
jupyter_1   |     self.__connect()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 605, in __connect
jupyter_1   |     pool.logger.debug("Error on connect(): %s", e)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
jupyter_1   |     compat.raise_(
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
jupyter_1   |     raise exception
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 599, in __connect
jupyter_1   |     connection = pool._invoke_creator(self)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/create.py", line 578, in connect
jupyter_1   |     return dialect.connect(*cargs, **cparams)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 584, in connect
jupyter_1   |     return self.dbapi.connect(*cargs, **cparams)
jupyter_1   | sqlite3.OperationalError: unable to open database file
jupyter_1   | 
jupyter_1   | The above exception was the direct cause of the following exception:
jupyter_1   | 
jupyter_1   | Traceback (most recent call last):
jupyter_1   |   File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
jupyter_1   |     return _run_code(code, main_globals, None,
jupyter_1   |   File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
jupyter_1   |     exec(code, run_globals)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/mlrun/api/main.py", line 256, in <module>
jupyter_1   |     main()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/mlrun/api/main.py", line 240, in main
jupyter_1   |     init_data()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/mlrun/api/initial_data.py", line 65, in init_data
jupyter_1   |     _perform_schema_migrations(alembic_util)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/mlrun/api/initial_data.py", line 160, in _perform_schema_migrations
jupyter_1   |     alembic_util.init_alembic()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/mlrun/api/utils/db/alembic.py", line 24, in init_alembic
jupyter_1   |     alembic.command.upgrade(self._alembic_config, "head")
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/alembic/command.py", line 294, in upgrade
jupyter_1   |     script.run_env()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/alembic/script/base.py", line 490, in run_env
jupyter_1   |     util.load_python_file(self.dir, "env.py")
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
jupyter_1   |     module = load_module_py(module_id, path)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/alembic/util/compat.py", line 182, in load_module_py
jupyter_1   |     spec.loader.exec_module(module)
jupyter_1   |   File "<frozen importlib._bootstrap_external>", line 783, in exec_module
jupyter_1   |   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/mlrun/api/migrations_sqlite/env.py", line 82, in <module>
jupyter_1   |     run_migrations_online()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/mlrun/api/migrations_sqlite/env.py", line 72, in run_migrations_online
jupyter_1   |     with connectable.connect() as connection:
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3162, in connect
jupyter_1   |     return self._connection_cls(self, close_with_result=close_with_result)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 92, in __init__
jupyter_1   |     else engine.raw_connection()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3241, in raw_connection
jupyter_1   |     return self._wrap_pool_connect(self.pool.connect, _connection)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3211, in _wrap_pool_connect
jupyter_1   |     Connection._handle_dbapi_exception_noconnection(
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2061, in _handle_dbapi_exception_noconnection
jupyter_1   |     util.raise_(
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
jupyter_1   |     raise exception
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3208, in _wrap_pool_connect
jupyter_1   |     return fn()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 301, in connect
jupyter_1   |     return _ConnectionFairy._checkout(self)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 761, in _checkout
jupyter_1   |     fairy = _ConnectionRecord.checkout(pool)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 419, in checkout
jupyter_1   |     rec = pool._do_get()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 259, in _do_get
jupyter_1   |     return self._create_connection()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 247, in _create_connection
jupyter_1   |     return _ConnectionRecord(self)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 362, in __init__
jupyter_1   |     self.__connect()
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 605, in __connect
jupyter_1   |     pool.logger.debug("Error on connect(): %s", e)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
jupyter_1   |     compat.raise_(
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
jupyter_1   |     raise exception
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 599, in __connect
jupyter_1   |     connection = pool._invoke_creator(self)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/create.py", line 578, in connect
jupyter_1   |     return dialect.connect(*cargs, **cparams)
jupyter_1   |   File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 584, in connect
jupyter_1   |     return self.dbapi.connect(*cargs, **cparams)
jupyter_1   | sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file
jupyter_1   | (Background on this error at: http://sqlalche.me/e/14/e3q8)

Additional Information

No response

opened by lbonini94 9

[Azure DataStore] Handle upload strings vs bytes and filepath formation when using adlfs
When uploading strings to as model artifact attributes to abfs using put method and adlfs, operations were failing. Added ability to alter write method based on incoming data

get, listdir, stat filepath handling

Validated performance with private integration testing against adlfs
opened by hayesgb 8
[Datastore] Extend Azure blob to support other auth methods

Currently, the only supported authentication methods against AzureBlobStore is AZURE_STORAGE_CONNECTION_STRING. , and possibly AZURE_STORAGE_KEY. This enables other authentication methods including use of ServicePrincipals and SAS tokens.

opened by hayesgb 8
Issue in installing MLRun locally on windows

Hi,

I am following instructions to install MLRun locally on Windows 10 as described in page /install/local-docker.html with jupyter image.

set HOST_IP=localhost set SHARED_DIR=D:\MLRun mkdir %SHARED_DIR% docker-compose -f compose.with-jupyter.yaml up results in below error.

invalid interpolation format for services.jupyter.volumes.[]. You may need to escape any $ with another $. required variable SHARED_DIR is missing a value: err

opened by ganesh3 6
[Docs]: Add documentation for mlrun.feature_store.feature_set.FeatureSetSpec
MLRun Kit version checks

[X] I have checked that the issue still exists on the latest versions of the docs here

Location of the documentation

https://docs.mlrun.org/en/latest/api/mlrun.feature_store.html#mlrun.feature_store.FeatureSet.spec

Documentation problem

Missing documentation for this class (it is without generation of documentation but some parts of source code contains relevant documentation)

Suggested fix for documentation

Please, generate documentation also for this class
opened by george0st 6
[Feature Store] Add `MinMaxLenValidator` and `RegexValidator`

Add MinMaxLenValidator for Feature, including add system test and add RegexValidator (validation based on regular expression). It is very similar such as existing MinMaxValidator for Feature.

opened by george0st 6
Extend filter vector ability (focus on off-line featurestore)
It will be very useful to support in the vector rich filtering e.g.:

High priority

support logical conditions 'fn2 > 500 and (fn3<=500 or fn4==500)'

Medium priority

support like operator 'fn5 like %sdsd%'

support between and in

Low priority

fuzzy match for string

BTW: you are supporting in get_offline_features only exact match see part of the code

data = pd.DataFrame({"fn0": [39560793709,35392257080], "fn1": [27203050525,13749105613]}) resp = fstore.get_offline_features(vector, entity_rows=data)
opened by george0st 6
[Azure DataStore] Handle storage options as secrets
This add the ability to pass standard dictionary keys from fsspec's storage_options parameter into mlrun.run.get_dataitem() as secrets.

Enabling this will more easily allow users engaging in exploratory analysis to leverage the mlrun api to fetch data_items from Azure by enabling the following

storage_options={'account_name': "<NAME>", 'credential': <CREDENTIAL>} df = mlrun.run.get_dataitem("az://CONTAINER/myfile.parquet", secrets=storage_options).as_df()
opened by hayesgb 6
[Runtimes] Add container container image to serving function status

Add the built and pushed container image name, that is used to run the nuclio function container, to the function status.

A followup to https://github.com/nuclio/nuclio/pull/2769 released in https://github.com/nuclio/nuclio/releases/tag/1.11.7 .

Fixes https://jira.iguazeng.com/browse/IG-21462

opened by TomerShor 0
[Data Store] My Sql - target, source and driver

This pr focus in implementation of source, target for SqlDB in mlrun. SqlDB target can create or read from exists sql collection. The collection is fixed and can't change is schema in the flow.

https://jira.iguazeng.com/browse/ML-2610

opened by davesh0812 0
Mlrun Jupyter- Image pull back off error
MLRun Version checks

[X] I have checked that this issue has not already been reported.

[X] I have confirmed this bug exists on the latest version of MLRun CE.

Reproducible Example

kubectl create namespace mlrun helm repo add mlrun-ce https://mlrun.github.io/ce helm repo update kubectl --namespace mlrun create secret docker-registry registry-credentials --docker-server="https://index.docker.io/v1/" --docker-username="xyz" --docker-password="xyz" --docker-email="xyz" helm --namespace mlrun install mlrun-ce --wait --timeout 960s --set global.registry.url="index.docker.io/v1/xyz" --set global.registry.secretName=registry-credentials --set global.externalHostAddress=http://192.168.49.2 mlrun-ce/mlrun-ce

Issue Description

I'm following mlrun Kubernetes installation kit. I'm following the way suggested by them, still getting Mlrun Jupyter - "Image pull back error". so please guide me through this.

Expected Behavior

I'm using Kubectl, Minikube, helm, docker etc.

Installation OS

Windows

Installation Method

Kubernetes

Python Version

3.8

MLRun Version

1.2.0

Additional Information

No response
opened by harishgawade1999 5
[Feature Request]: Add ability to delete data from Project
Feature Type

[X] Adding new functionality to MLRun

[X] Changing existing functionality in MLRun

[x] Removing existing functionality in MLRun

Problem Description

When I delete the project from GUI, everything is deleted (own project, jobs, artefacts, etc.) except the data.

And the information in dialog is not fully readable (it is not about delete of all resources under the project), see the information from GUI:

You try to delete project "jist-from-local". 'The project is not empty. Deleting it will also delete all of its resources, such as jobs, 'artifacts, and features.

BTW: It is necessity to delete data (e.g. parquet files) manually e.g. via linux commands in specific directories and in case of delete missing (it can generate garbage in file system).

Feature Description

It will be useful to have ability to delete also data stored directly in project directory (parquet files, kv files, etc.).

Alternative Solutions

Add note to the delete dialog, that project data (parquets, ...) are out of delete procedure and have to be deleted manually.

Additional Context

No response
opened by george0st 2

Releases(v1.2.1-rc12)

v1.2.1-rc12(Jan 4, 2023)

Source code(tar.gz)
Source code(zip)
v1.2.1-rc11(Jan 4, 2023)
Features / Enhancements

Requirements: Freeze the version from ~=1.0 to ~=1.0.0 [1.2.x], #2872, @guy1992l

API: Include the whole CE section in frontend-spec and client-spec [1.2.x], #2847, @quaark

UI: Features & enhancement

Bug fixes

Scheduler: Update next run time after skipping run [1.2.x], #2862, @AlonMaor14

Serving: Revert to db.create_or_patch_model_endpoint [1.2.x], #2850, @davesh0812

Unknown: Revert "[Project] Mask credentials when project.set_function [1.2.x]", #2859, @tankilevitch

FileDB: Add warning message when initializing FileRunDB [1.2.x], #2856, @tankilevitch

Project: Fix sync_functions to sync the function names from the project.spec._function_definitions map [1.2.x], #2857, @tankilevitch

UI: Bug fixes

Pull requests:

93226538 [Requirements] Freeze the version from ~=1.0 to ~=1.0.0 [1.2.x] (#2872) ae8e7d7f [Scheduler] Update next run time after skipping run [1.2.x] (#2862) b6cfe864 [Serving] Revert to db.create_or_patch_model_endpoint [1.2.x] (#2850) 65dbff5d Revert "[Project] Mask credentials when project.set_function [1.2.x]" (#2859) 4cc137a6 [FileDB] Add warning message when initializing FileRunDB [1.2.x] (#2856) e21b4d79 [Project] Fix sync_functions to sync the function names from the project.spec._function_definitions map [1.2.x] (#2857) f87a4e96 [API] Include the whole CE section in frontend-spec and client-spec [1.2.x] (#2847)
Source code(tar.gz)
Source code(zip)
v1.3.0-rc4(Jan 3, 2023)

Source code(tar.gz)
Source code(zip)
v1.2.1-rc10(Dec 31, 2022)
Features / Enhancements

Project: Mask credentials when project.set_function [1.2.x], #2844, @tankilevitch

CLI: [Projects] Add overwrite scheduled workflow - fixed [1.2.x], #2841, @yonishelach

UI: Features & enhancement

Bug fixes

Runtimes: Fix submitting job with hyper-params config param to use correct credentials [1.2.x], #2835, @theSaarco

UI: Bug fixes

Pull requests:

de93127d [Project] Mask credentials when project.set_function [1.2.x] (#2844) 8b8fad88 [CLI][Projects] Add overwrite scheduled workflow - fixed [1.2.x] (#2841) 1c61d8a5 [Runtimes] Fix submitting job with hyper-params config param to use correct credentials [1.2.x] (#2835)
Source code(tar.gz)
Source code(zip)
v1.2.1-rc9(Dec 28, 2022)
Features / Enhancements

Makefile: Isort ignore venvs [1.2.x], #2832, @AlonMaor14

API: Add Fields to Client Spec for Use in UI [1.2.x], #2828, @quaark

Serving: GET with router & inexplicit GET in test mode [1.2.x], #2815, @davesh0812

UI: Features & enhancement

Bug fixes

Project: Fix load_project log message (#2825) [1.2.x], #2827, @AlonMaor14

SDK: Fix submit_job tries to update run state when run wasn't created [1.2.x], #2822, @quaark

Frameworks: Enable scikit-learn v1.2.0 to work with mlrun.frameworks [1.2.x], #2814, @guy1992l

UI: Bug fixes

Pull requests:

1ff96293 [Makefile] Isort ignore venvs [1.2.x] (#2832) f9a0f5e3 [API] Add Fields to Client Spec for Use in UI [1.2.x] (#2828) 14b59f56 [Project] Fix load_project log message (#2825) [1.2.x] (#2827) 22cb4f8f [SDK] Fix submit_job tries to update run state when run wasn't created [1.2.x] (#2822) 65c7b7d6 [Frameworks] Enable scikit-learn v1.2.0 to work with mlrun.frameworks [1.2.x] (#2814) 585b0398 [Serving] GET with router & inexplicit GET in test mode [1.2.x] (#2815)
Source code(tar.gz)
Source code(zip)
v1.1.3(Dec 28, 2022)
Bug fixes:

Pass timeout when executing kfp pipeline in CLI

Add timeouts for requests which are getting rerouted to chief

Source code(tar.gz)
Source code(zip)
v1.1.3-rc4(Dec 27, 2022)

Source code(tar.gz)
Source code(zip)
v1.3.0-rc3(Dec 26, 2022)

Source code(tar.gz)
Source code(zip)
v1.2.1-rc8(Dec 25, 2022)
Features / Enhancements

Projects: Slack notify remote workflow [1.2.x], #2805, @yonishelach

API: verify cookie session is iguazio-like sessions [1.2.x], #2797, @liranbg

Feature Store: Fix: set index before write to target in local merger [1.2.x], #2790, @gtopper

Run: Ignore bokeh installation with warning on import error [1.2.x], #2792, @AlonMaor14

UI: Features & enhancement

Bug fixes

UI: Bug fixes

Pull requests:

f09586f5 [Projects] Slack notify remote workflow [1.2.x] (#2805) 721ed35a [API] verify cookie session is iguazio-like sessions [1.2.x] (#2797) 5171dd9f [Feature Store] Fix: set index before write to target in local merger [1.2.x] (#2790) 5d9ada27 [Run] Ignore bokeh installation with warning on import error [1.2.x] (#2792)
Source code(tar.gz)
Source code(zip)
v1.1.3-rc3(Dec 22, 2022)
Features / Enhancements

Projects: Notify Slack when scheduled load project failed [1.1.x], #2794, @yonishelach

UI: Features & enhancement

Bug fixes

UI: Bug fixes

Pull requests:

fd179fbd [Projects] Notify Slack when scheduled load project failed [1.1.x] (#2794)
Source code(tar.gz)
Source code(zip)
v1.3.0-rc2(Dec 20, 2022)
Features / Enhancements

Pipelines: Support list_pipelines with pagination & predicates, #2784, @theSaarco

UI: Features & enhancement

Bug fixes

UI: Bug fixes

Pull requests:

39701c39 [Pipelines] Support list_pipelines with pagination & predicates (#2784)
Source code(tar.gz)
Source code(zip)
v1.2.1-rc7(Dec 21, 2022)
Features / Enhancements

Datastore: Backport fix handling of DataItem path in windows [1.2.x], #2785, @yaronha

Pipelines: Support list_pipelines with pagination & predicates [1.2.x], #2786, @theSaarco

CLI: [Runtimes] URL placeholder for using run args without URL [1.2.x], #2782, @AlonMaor14

FeatureStore: Fixing graph plot with multiple targets of same kind + adding storage kind to plot (#2766) [1.2.x], #2779, @theSaarco

Artifacts: Serialize DirArtifact to dictionary using new format (#2778) [1.2.x], #2780, @theSaarco

UI: Features & enhancement

Bug fixes

UI: Bug fixes

Pull requests:

a8d20686 [Datastore] Backport fix handling of DataItem path in windows [1.2.x] (#2785) 9bfadd02 [Pipelines] Support list_pipelines with pagination & predicates [1.2.x] (#2786) 04136edc [CLI][Runtimes] URL placeholder for using run args without URL [1.2.x] (#2782) a4587012 [FeatureStore] Fixing graph plot with multiple targets of same kind + adding storage kind to plot (#2766) [1.2.x] (#2779) 38230699 [Artifacts] Serialize DirArtifact to dictionary using new format (#2778) [1.2.x] (#2780)
Source code(tar.gz)
Source code(zip)
v1.3.0-rc1(Dec 20, 2022)
Features / Enhancements

API: Add project-scope files/filestat API that work with project secrets, #2714, @theSaarco

API: Fix run paramaters larger than int64 corrupting projects, #2671, @quaark

API: Remove print from api, #2734, @liranbg

API: verify cookie session is iguazio-like sessions, #2773, @liranbg

Artifacts: Don't resolve artifact target_path if explicitly request upload=False, #2732, @tankilevitch

Artifacts: Serialize DirArtifact to dictionary using new format, #2778, @theSaarco

Artifacts: Set dataset stats according to stats flag in log_dataset, #2710, @TomerShor

CI: Bump prefix version for build images, #2645, @tankilevitch

CI: Fix Open Source System Tests Fail Deploy, #2772, @quaark

CI: Run Open Source System Tests Against MLRun CE, #2667, @quaark

CI: Updated installation and bug report issue templates, #2193, @nschenone

CLI: Do not ignore unknown options, #2678, @AlonMaor14

CLI: Fix cli "get runtime" command, #2676, @yaronha

CLI: Fix watch when running function through CLI, #2730, @tankilevitch

CLI: Support default .env file location + CLI "config set" command, #2690, @yaronha

CLI: Validate base arguments, #2745, @AlonMaor14

CLI: Waiting for pod status with timeout fix when running project, #2635, @AlonMaor14

CLI: [Runtimes] URL placeholder for using run args without URL, #2765, @AlonMaor14

Config: Skip failures in first init of config (mlrun import), #2742, @yaronha

DataStore: Allow passing secrets to create datastore and don't cache datastores when running on API, #2633, @tankilevitch

DataStore: Fix how we resolve if running as API, #2680, @tankilevitch

DataStore: Fix makedirs not threadsafe, #2723, @liranbg

Datastore: Fix _write_dataframe to pass storage_options to pandas write operations, #2709, @gtopper

Datastore: Fix handling of DataItem path in windows + support path mappings, #2774, @yaronha

Docs: Add ecosystem, bug fixes, #2682, @jillnogold

Docs: Add some docs to artifact client code, #2775, @tankilevitch

Docs: Added MLRun Cheat Sheet, #2647, @nschenone

Docs: Better docstring for mount_s3, added warnings, #2737, @theSaarco

Docs: CLI project schedule, ML-3007, #2721, @jillnogold

Docs: Edit wording to make AWS Documentation more clear, #2697, @yevgenykhazan

Docs: Fix docstring errors, ML-2916, ML-2909, #2662, @jillnogold

Docs: Improve log_dataset docstring wrt local_path flag, #2753, @TomerShor

Docs: Typo mistakes, #2731, @george0st

Docs: Update MLRun CE Kubernetes Installation Docs for new "only full" Deployment, #2673, @quaark

Docs: Update CONTRIBUTING.md, #2754, @moranbental

Feature Store: Get time from time-column instead of event metadata, #2660, @gtopper

FeatureStore: Fix serving to support AVRO encoded kafka, #2658, @assaf758

FeatureStore: Fixing impute failures when using get_online_feature_service with a feature-vector uri, #2666, @theSaarco

FeatureStore: Fixing graph plot with multiple targets of same kind + adding storage kind to plot, #2766, @theSaarco

Frameworks: Adjusted the model servers to support step_to_dict, #2653, @guy1992l

Frameworks: Remove the handling of regression models, #2722, @guy1992l

MPI: Fix local variable resp referenced before assignment, #2639, @tankilevitch

Model Monitoring: Add abstraction for model endpoint store target, #2378, @Eyal-Danieli

Model Monitoring: Add model_monitoring package to setup.py, #2674, @Eyal-Danieli

Model Monitoring: Ensure auth info for model monitoring batch job function, #2688, @Eyal-Danieli

Project: Set default sync=True in project.get_function(), #2720, @yaronha

Projects: Raise error for workflow scheduling with non-remote project, #2711, @yonishelach

Projects: Set default context path to working directory, #2740, @liranbg

Run: Fix not updating the run state when running local fails on pre-loading of the function, #2762, @tankilevitch

Run: Fix outputs wait for completion, #2663, @tankilevitch

Run: Normalize function name in new_function, #2696, @TomerShor

Runtime: Fix resolving completion time, #2738, @liranbg

Runtime: Move some k8s logic to k8s helpers, #2733, @liranbg

Schedules: Fix label handling when reloading schedules (ML-3014), #2719, @theSaarco

Schedules: Scheduled tasks access-key usage refactor, #2695, @theSaarco

SecretStore: Fix overwriting SecretStore credentials on API, #2661, @tankilevitch

Secrets: Add a global get_secret_or_env function to retrieve secret values, #2659, @theSaarco

Secrets: Fix get_secret_or_env, #2675, @theSaarco

Serving: Fixing - OneHotEncoder with pandas engine - fails when the values has spaces or hyphens in them, #2713, @davesh0812

Serving: Improve mock server handling, #2726, @yaronha

Spark: Shut local spark context down when ingest completes, #2692, @gtopper

Test: fix mkdir race condition - round 2, #2750, @liranbg

Tests: Fix failing feature store system tests, #2643, @gtopper

Unknown: Raise error on ingest with KafkaSource, #2654, @gtopper

Unknown: Revert "[CLI] Do not ignore unknown options", #2703, @AlonMaor14

Unknown: Revert "[Spark] Shut spark context down when ingest completes", #2752, @gtopper

Utils: - Remove spammy logs, #2724, @liranbg

UI: Features & enhancement

Bug fixes

UI: Bug fixes

Pull requests:

27e8a957 [Datastore] Fix handling of DataItem path in windows + support path mappings (#2774) c08f2d10 [CLI][Runtimes] URL placeholder for using run args without URL (#2765) b1d35de9 [Feature Store] Get time from time-column instead of event metadata (#2660) fb2f4a70 [Artifacts] Serialize DirArtifact to dictionary using new format (#2778) 598015a9 [Docs] Add some docs to artifact client code (#2775) 183506b2 [FeatureStore] Fixing graph plot with multiple targets of same kind + adding storage kind to plot (#2766) 83e52077 [API] verify cookie session is iguazio-like sessions (#2773) 1613b0d8 [CI] Fix Open Source System Tests Fail Deploy (#2772) 0124e17b [CI] Updated installation and bug report issue templates (#2193) d7a7aa6b [Docs] Update MLRun CE Kubernetes Installation Docs for new "only full" Deployment (#2673) eea96ddf [Run] Fix not updating the run state when running local fails on pre-loading of the function (#2762) aa7e2d03 [API] Fix run paramaters larger than int64 corrupting projects (#2671) 11f31303 [CLI] Validate base arguments (#2745) 70b09e54 [Docs] Improve log_dataset docstring wrt local_path flag (#2753) 915bcf13 [Config] Skip failures in first init of config (mlrun import) (#2742) 49187555 [Test] fix mkdir race condition - round 2 (#2750) dbeaadde [Serving] Improve mock server handling (#2726) e8902c4e [Model Monitoring] Ensure auth info for model monitoring batch job function (#2688) 7fdaaa75 [Docs] Update CONTRIBUTING.md (#2754) cb89a7b3 Revert "[Spark] Shut spark context down when ingest completes" (#2752) bdaf21fd [Docs] Better docstring for mount_s3, added warnings (#2737) b71feaad [Projects] Set default context path to working directory (#2740) e0719288 [Runtime] Move some k8s logic to k8s helpers (#2733) e103e8ce [Runtime] Fix resolving completion time (#2738) 1e627882 [CLI] Fix watch when running function through CLI (#2730) 7f176494 [Docs] CLI project schedule, ML-3007 (#2721) 4b4aba12 [API] Remove print from api (#2734) b67e88c5 [Docs] Typo mistakes (#2731) 50820728 [Artifacts] Don't resolve artifact target_path if explicitly request upload=False (#2732) c9b6a91c [Frameworks] Remove the handling of regression models (#2722) 23bbdf98 [Docs] Add ecosystem, bug fixes (#2682) 59a9616b [Project] Set default sync=True in project.get_function() (#2720) e77d7088 [DataStore] Fix makedirs not threadsafe (#2723) ab2be0f4 [Utils] - Remove spammy logs (#2724) 10f72b60 [API] Add project-scope files/filestat API that work with project secrets (#2714) c0f866f4 [Schedules] Fix label handling when reloading schedules (ML-3014) (#2719) 9ab0f43d [Docs] Edit wording to make AWS Documentation more clear (#2697) a2e75a0a [Projects] Raise error for workflow scheduling with non-remote project (#2711) 0918ae03 [CLI] Support default .env file location + CLI "config set" command (#2690) daee4ccc [Serving] Fixing - OneHotEncoder with pandas engine - fails when the values has spaces or hyphens in them (#2713) d85e7e2f [Artifacts] Set dataset stats according to stats flag in log_dataset (#2710) 6edc7ba9 [Datastore] Fix _write_dataframe to pass storage_options to pandas write operations (#2709) 2589e441 [Schedules] Scheduled tasks access-key usage refactor (#2695) 5497c842 Revert "[CLI] Do not ignore unknown options" (#2703) 47de2bba [Run] Normalize function name in new_function (#2696) 3fc9b84d [CLI] Fix cli "get runtime" command (#2676) 2a9316b7 [Spark] Shut local spark context down when ingest completes (#2692) f93d8fee [CLI] Do not ignore unknown options (#2678) 671247a7 [DataStore] Fix how we resolve if running as API (#2680) e07e4c21 [Secrets] Fix get_secret_or_env (#2675) 7e3f9888 [Frameworks] Adjusted the model servers to support step_to_dict (#2653) c359b618 [Model Monitoring] Add model_monitoring package to setup.py (#2674) d560118d [CI] Run Open Source System Tests Against MLRun CE (#2667) 08dc2e9a [FeatureStore] Fix serving to support AVRO encoded kafka (#2658) 3c12c5c4 [Docs] Fix docstring errors, ML-2916, ML-2909 (#2662) 09c3e9ae [Run] Fix outputs wait for completion (#2663) 47a0a621 [FeatureStore] Fixing impute failures when using get_online_feature_service with a feature-vector uri (#2666) 773a623e [MPI] Fix local variable resp referenced before assignment (#2639) 14cba17b [Docs] Added MLRun Cheat Sheet (#2647) 1f07fcd0 [Secrets] Add a global get_secret_or_env function to retrieve secret values (#2659) 2676f739 [SecretStore] Fix overwriting SecretStore credentials on API (#2661) 264da255 [Model Monitoring] Add abstraction for model endpoint store target (#2378) 79557146 [DataStore] Allow passing secrets to create datastore and don't cache datastores when running on API (#2633) ca635b88 Raise error on ingest with KafkaSource (#2654) c693606d [Tests] Fix failing feature store system tests (#2643) b0be0c94 [CI] Bump prefix version for build images (#2645) d57ea8d3 [CLI] Waiting for pod status with timeout fix when running project (#2635)
Source code(tar.gz)
Source code(zip)
v1.2.1-rc6(Dec 19, 2022)
Features / Enhancements

Spark: Use FeatureSet's timestamp_key as fallback for source time_field [1.2.x], #2771, @gtopper

Frameworks: Remove regression special handling [1.2.x], #2770, @guy1992l

UI: Features & enhancement

Bug fixes

UI: Bug fixes

Pull requests:

c6ced7a4 [Spark] Use FeatureSet's timestamp_key as fallback for source time_field [1.2.x] (#2771) 8690cb62 [Frameworks] Remove regression special handling [1.2.x] (#2770)
Source code(tar.gz)
Source code(zip)
v1.2.1-rc5(Dec 16, 2022)
Features / Enhancements

Backports: Cherry pick latest updates to [1.2.x], #2768, @yaronha

Model Monitoring: Ensure auth info for model monitoring batch job function [1.2.x], #2756, @Eyal-Danieli

CLI: Validate base arguments (#2745) [1.2.x], #2761, @AlonMaor14

Requirements: Bump storey to 1.2.5 [1.2.x], #2758, @gtopper Docs: Better docstring for mount_s3, added warnings (#2737) [1.2.x], #2746, @theSaarco

Projects: Set default context path to working directory [1.2.x], #2744, @liranbg

Docs: Improve log_dataset docstring wrt local_path flag [1.2.x], #2743, @TomerShor

UI: Features & enhancement

Bug fixes

API: Fix run paramaters larger than int64 corrupting projects [1.2.x], #2763, @quaark

Run: Fix not updating the run state when running local fails on pre-loading of the function [1.2.x], #2747, @tankilevitch

Unknown: Revert "[Spark] Shut local spark context down when ingest completes [1.2.x], #2751, @gtopper

UI: Bug fixes

Pull requests:

90a86df6 [Backports] Cherry pick latest updates to [1.2.x] (#2768) 18e9c7d5 [Model Monitoring] Ensure auth info for model monitoring batch job function [1.2.x] (#2756) e047ea34 [CLI] Validate base arguments (#2745) [1.2.x] (#2761) 87920ef6 [API] Fix run paramaters larger than int64 corrupting projects [1.2.x] (#2763) 0e4c79c0 [Requirements] Bump storey to 1.2.5 [1.2.x] (#2758) ac41d7d2 [Run] Fix not updating the run state when running local fails on pre-loading of the function [1.2.x] (#2747) 7989cc99 Revert "[Spark] Shut local spark context down when ingest completes [1.2.x] (#2751) 53584bf1 [Docs] Better docstring for mount_s3, added warnings (#2737) [1.2.x] (#2746) 92e59aea [Projects] Set default context path to working directory [1.2.x] (#2744) 1286b087 [Docs] Improve log_dataset docstring wrt local_path flag [1.2.x] (#2743)
Source code(tar.gz)
Source code(zip)
v1.1.3-rc2(Dec 16, 2022)
Features / Enhancements

API: Add timeouts for requests which are getting rerouted to chief [1.1.x], #2764, @tankilevitch

API: Configure Uvicorn Keep Alive Timeout [1.1.x], #2760, @quaark

UI: Features & enhancement

Bug fixes

Unknown: Revert "[Projects] Raise error for workflow scheduling with non-remote project [1.1.x]", #2767, @tankilevitch

Projects: Raise error for workflow scheduling with non-remote project [1.1.x], #2708, @yonishelach

UI: Bug fixes

Pull requests:

b4e78454 [API] Add timeouts for requests which are getting rerouted to chief [1.1.x] (#2764) 2c602f77 Revert "[Projects] Raise error for workflow scheduling with non-remote project [1.1.x]" (#2767) 6de4131f [API] Configure Uvicorn Keep Alive Timeout [1.1.x] (#2760) 13880f4e [Projects] Raise error for workflow scheduling with non-remote project [1.1.x] (#2708)
Source code(tar.gz)
Source code(zip)
v1.2.1-rc4(Dec 13, 2022)
Features / Enhancements

API: Add project-scope files/filestat API that work with project secrets [1.2.x], #2728, @theSaarco

UI: Features & enhancement

Bug fixes

Runtime: Fix resolving completion time [1.2.x], #2741, @liranbg

CLI: Backport - Fix watch when running function through CLI [1.2.x], #2739, @tankilevitch

Unknown: Revert "[Projects] Raise error for workflow scheduling with non-remote project [1.2.x]", #2736, @tankilevitch

API: Remove print from api [1.2.x], #2717, @liranbg

Artifacts: Don't resolve artifact target_path if explicitly request upload=False [1.2.x], #2704, @tankilevitch

Schedules: Fix label handling when reloading schedules [1.2.x], #2725, @theSaarco

Utils: - Remove spammy logs [1.2.x], #2729, @liranbg

CLI: Merge fixes for get runtime command and config set environment [1.2.x], #2715, @yaronha

Datastore: Fix _write_dataframe to pass storage_options to pandas write operations [1.2.x], #2707, @gtopper

UI: Bug fixes

Pull requests:

38bffe6d [Runtime] Fix resolving completion time [1.2.x] (#2741) 9b909a3f [CLI] Backport - Fix watch when running function through CLI [1.2.x] (#2739) dbc826a0 Revert "[Projects] Raise error for workflow scheduling with non-remote project [1.2.x]" (#2736) 947fb955 [API] Remove print from api [1.2.x] (#2717) 2dc67dbd [Artifacts] Don't resolve artifact target_path if explicitly request upload=False [1.2.x] (#2704) 7890fb51 [Schedules] Fix label handling when reloading schedules [1.2.x] (#2725) f8eb7b5b [API] Add project-scope files/filestat API that work with project secrets [1.2.x] (#2728) a03f2848 [Utils] - Remove spammy logs [1.2.x] (#2729) 28d39d6d [CLI] Merge fixes for get runtime command and config set environment [1.2.x] (#2715) 90dc095c [Datastore] Fix _write_dataframe to pass storage_options to pandas write operations [1.2.x] (#2707)
Source code(tar.gz)
Source code(zip)
v1.2.1-rc3(Dec 9, 2022)
Features / Enhancements

Artifacts: Set dataset stats according to stats flag in log_dataset [1.2.x], #2698, @TomerShor

Schedules: Scheduled tasks access-key usage refactor (#2695) [1.2.x], #2705, @theSaarco

Run: Normalize function name in new_function [1.2.x], #2701, @TomerShor

Spark: Shut local spark context down when ingest completes [1.2.x], #2694, @gtopper

Projects: Add missing overwrite field to WorkflowSpec [1.2.x], #2683, @yonishelach

UI: Features & enhancement

Bug fixes

UI: Bug fixes

Pull requests:

cece4e9e [Artifacts] Set dataset stats according to stats flag in log_dataset [1.2.x] (#2698) e5983c41 [Schedules] Scheduled tasks access-key usage refactor (#2695) [1.2.x] (#2705) 4bd22b40 Revert "[CLI] Do not ignore unknown options [1.2.x]" (#2702) c04bcff3 [Run] Normalize function name in new_function [1.2.x] (#2701) 88e9a519 [Projects] Raise error for workflow scheduling with non-remote project [1.2.x] (#2689) 05eb0daf [Spark] Shut local spark context down when ingest completes [1.2.x] (#2694) 603ebeea [Projects] Add missing overwrite field to WorkflowSpec [1.2.x] (#2683)
Source code(tar.gz)
Source code(zip)
v1.2.1-rc2(Dec 6, 2022)
Features / Enhancements

CLI: Do not ignore unknown options [1.2.x], #2665, @AlonMaor14

Project: Add option to overwrite workflow schedule [1.2.x], #2657, @yonishelach

UI: Features & enhancement

Bug fixes

DataStore: Fix how we resolve if running as API [1.2.x], #2681, @tankilevitch

Secrets: Fix get_secret_or_env [1.2.x], #2677, @theSaarco

CLI: Waiting for pod status with timeout fix when running project [1.2.x], #2652, @AlonMaor14

FeatureStore: Fix serving to support AVRO encoded kafka (#2658), #2672, @assaf758

Run: Fix outputs wait for completion [1.2.x], #2670, @tankilevitch

MPI: Fix local variable resp referenced before assignment [1.2.x], #2668, @tankilevitch

FeatureStore: Fixing impute failures when using get_online_feature_service with a feature-vector uri (#2666) [1.2.x], #2669, @theSaarco

UI: Bug fixes

Pull requests:

4a5a417a [DataStore] Fix how we resolve if running as API [1.2.x] (#2681) e5313dde [CLI] Do not ignore unknown options [1.2.x] (#2665) 80903494 [Secrets] Fix get_secret_or_env [1.2.x] (#2677) 753948cd [CLI] Waiting for pod status with timeout fix when running project [1.2.x] (#2652) 2e1eb3f2 [FeatureStore] Fix serving to support AVRO encoded kafka (#2658) (#2672) b991d310 [Run] Fix outputs wait for completion [1.2.x] (#2670) 1f083781 [MPI] Fix local variable resp referenced before assignment [1.2.x] (#2668) edfd00f7 [FeatureStore] Fixing impute failures when using get_online_feature_service with a feature-vector uri (#2666) [1.2.x] (#2669) c1a12fb2 [Project] Add option to overwrite workflow schedule [1.2.x] (#2657)
Source code(tar.gz)
Source code(zip)
v1.2.1-rc1(Dec 5, 2022)
Features / Enhancements

UI: Features & enhancement

Bug fixes

Secrets: Backport secrets fixes [1.2.x], #2664, @tankilevitch

UI: Bug fixes

Pull requests:

4d3c1418 [Secrets] Backport secrets fixes [1.2.x] (#2664)
Source code(tar.gz)
Source code(zip)
v1.1.3-rc1(Dec 5, 2022)
Features / Enhancements

CI: Bump prefix version to 1.1.3, #2656, @tankilevitch

UI: Features & enhancement

Bug fixes

Project: Add option to overwrite workflow schedule [1.1.x], #2651, @yonishelach

CLI: Waiting for pod status with timeout fix when running project [1.1.x], #2638, @AlonMaor14

UI: Bug fixes

Pull requests:

a7928716 [CI] Bump prefix version to 1.1.3 (#2656) 0a4b1881 [Project] Add option to overwrite workflow schedule [1.1.x] (#2651) 071571f6 [CLI] Waiting for pod status with timeout fix when running project [1.1.x] (#2638)
Source code(tar.gz)
Source code(zip)
v1.2.0(Dec 1, 2022)
Artifacts

Support for artifact tagging SDK: Add tag_artifacts and delete_artifacts_tags that can be used to modify existing artifacts tags and have more than one version for an artifact. API: Introduce new endpoints in /projects/<project>/tags.

Auth

Support S3 profile and assume-role when using fsspec.

Support GitHub fine grained tokens.

Functions

Add function.with_annotations({"framework":"tensorflow"}) to user created functions.

Add overwrite_build_params to project.build_function() so the user can choose whether or not to keep the build params that were used in previous function builds.

Feature Store

Support Redis as an online feature set, for storey engine only.

Support GCP objects as a data source for the feature store.

Fully support ingesting with pandas engine - now equivalent to ingestion with storey engine: Support DataFrame with multi-index. Support mlrun steps when using pandas engine: OneHotEncoder , DateExtractor, MapValue, Imputer and FeatureValidation.

Add new step: DropFeature for pandas and storey engines.

Add param query for get_offline_feature for filtering the output.

Frameworks

Add HuggingFaceModelServer to mlrun.frameworks at mlrun.frameworks.huggingface to serve HuggingFace models.

Installation

Add option to install google-cloud requirements using mlrun[google-cloud]: when installing MLRun for integration with GCP clients, only compatible packages are installed.

Documentation

Restructured, and new content

Third party integrations

Supports Confluent Kafka (Tech Preview)

Internal

Refactor artifacts endpoints to follow the MLRun convention of /projects/<project>/artifacts/... .

Add /api/_internal/memory-reports/ endpoints for memory related metrics to better understand the memory consumption of the API.

Improve the HTTP retry mechanism.

Support a new lightweight mechanism for KFP pods to pull the run state they triggered. Default behavior is legacy, which pulls the logs of the run to figure out the run state. The new behavior can be enabled using a feature flag configured in the API.

Breaking changes

Feature store: Ingestion using pandas now takes the dataframe and creates an index out of the entity column (and removes it from being a column in this df). This could cause breakage for existing custom steps when using a pandas engine.

Bug fixes:

Support logging artifacts larger than 5GB to V3IO. #2455

Limit KFP to kfp~=1.8.0, <1.8.14 due to non-backwards changes done in 1.8.14 for ParallelFor, which isn’t compatible with the MLRun managed KFP server (1.8.1). #2516

Add artifact_path enrichment from project artifact_path . Previously, the parameter wasn't applied to project runs when defining project.artifact_path. #2507

Align timeouts for requests that are getting re-routed from worker to chief (for projects/background related endpoints). #2565

Fix legacy artifacts load when loading a project. Fixed corner cases when legacy artifacts were saved to yaml and loaded back into the system using load_project(). #2584

Fix artifact latest tag enrichment to happen also when user defined a specific tag. #2572

Fix zip source extraction during function build. #2588

Fix Docker compose deployment so Nuclio is configured properly with a platformConfig file that sets proper mounts and network configuration for Nuclio functions, meaning that they run in the same network as MLRun. #2601

Workaround for background tasks getting cancelled prematurely, due to the current FastAPI version that has a bug in the starlette package it uses. The bug caused the task to get cancelled if the client’s http connection was closed before the task was done. #2618

Fix run fails after deploying function without defined image. #2530.

Scheduled jobs failed on GKE with resource quota error. #2520.

Can now delete a model via tag. #2433.

Source code(tar.gz)
Source code(zip)
v1.2.0-rc22(Nov 29, 2022)
Features / Enhancements

Requirements: Bump storey version to 1.2.4, #2634, @assaf758

UI: Features & enhancement

Bug fixes

Run: Fix handler param in run_function was overwritten by default handler, #2631, @yaronha

UI: Bug fixes

Pull requests:

e6c7e521 [Requirements] Bump storey version to 1.2.4 (#2634) f691e7f9 [Run] Fix handler param in run_function was overwritten by default handler (#2631)
Source code(tar.gz)
Source code(zip)
v1.2.0-rc21(Nov 28, 2022)
Features / Enhancements

Requirements: Bump v3io version to 0.5.20, #2628, @gtopper

UI: Features & enhancement

Bug fixes

Artifacts: Fix querying artifacts with tag name while link artifact is not tagged, #2627, @theSaarco

Artifacts: Fix list artifacts when having multiple hyper params runs, #2625, @tankilevitch

UI: Bug fixes

Pull requests:

d01483fb [Requirements] Bump v3io version to 0.5.20 (#2628) 22852ad5 [Artifacts] Fix querying artifacts with tag name while link artifact is not tagged (#2627) 6581610d [Artifacts] Fix list artifacts when having multiple hyper params runs (#2625)
Source code(tar.gz)
Source code(zip)
v1.2.0-rc20(Nov 24, 2022)
Features / Enhancements

UI: Features & enhancement

Bug fixes

UI: Bug fixes

Pull requests:
Source code(tar.gz)
Source code(zip)
v1.2.0-rc19(Nov 24, 2022)
Features / Enhancements

UI: Features & enhancement

Bug fixes

Projects: Fix project doesn't persist changes on a function when using project.build/run_function, #2624, @tankilevitch

Docs: Fix docs generating of frameworks, #2623, @yonishelach

API: fix check for k8s in get logs, #2622, @yaronha

UI: Bug fixes

Pull requests:

06e353a8 [Projects] Fix project doesn't persist changes on a function when using project.build/run_function (#2624) dcc030bc [Docs] Fix docs generating of frameworks (#2623) 16e3a481 [API] fix check for k8s in get logs (#2622)
Source code(tar.gz)
Source code(zip)
v1.2.0-rc18(Nov 22, 2022)
Features / Enhancements

SDK: Support string values in min/max replicas, #2606, @GiladShapira94

UI: Features & enhancement

Bug fixes

Pipelines: Fix how we pull logs when httpdb.logs.pipeline.pull_state.mode=enabled, #2620, @tankilevitch

API: Fix Background Tasks Being Cancelled Prematurely, #2618, @quaark

System Tests: Add restart to datanode docker registry before cleanup, #2619, @tankilevitch

Builder: Build function from source fixes, #2617, @AlonMaor14

UI: Bug fixes

Pull requests:

3d4802d1 [Pipelines] Fix how we pull logs when httpdb.logs.pipeline.pull_state.mode=enabled (#2620) edc00dfa [SDK] Support string values in min/max replicas (#2606) 9c1b8c52 [API] Fix Background Tasks Being Cancelled Prematurely (#2618) 01730067 [System Tests] Add restart to datanode docker registry before cleanup (#2619) da9da17e [Builder] Build function from source fixes (#2617)
Source code(tar.gz)
Source code(zip)
v1.2.0-rc17(Nov 21, 2022)
Features / Enhancements

Build: Add overwrite_build_params for build_function, #2604, @tankilevitch

Job: Load source at runtime or build time fix, #2588, @AlonMaor14

Pipelines: Change the way the kfp pod pulls the run state, #2548, @tankilevitch

Frameworks: Add alias for the SciKit-Learn model server, #2615, @guy1992l

API: Adding cloud storage to default allowed file paths, #2614, @theSaarco

Projects: Support GH fine grained tokens, #2611, @theSaarco

Tests: Adopt test_sync_pipeline_chunks to entities set to df index, #2599, @assaf758

Docs: figure updated with Function Hub, #2602, @jillnogold

UI: Features & enhancement

Bug fixes

SDK: Fix console notification being printed when also ipython notification is displayed, #2610, @quaark

Feature Store: Fix target path in spark merger, #2605, @gtopper

Tests: Fix test_schedule_on_filtered_by_time, #2598, @gtopper

Unknown: Fix Docker Compose Deployment, #2601, @quaark

Frameworks: Fixed bug of no validation set in training, #2608, @guy1992l

System Tests: Add checkout before copying cleanup.py, #2612, @tankilevitch

API: Configure Uvicorn Keep Alive Timeout, #2613, @quaark

Client-Spec: Pass logs config through the client spec, #2616, @tankilevitch

Run: Ignore returned None values for logging, #2603, @guy1992l

UI: Bug fixes

Pull requests:

85d5016e [Client-Spec] Pass logs config through the client spec (#2616) 15e20ef6 [Build] Add overwrite_build_params for build_function (#2604) 0ad3530e [SDK] Fix console notification being printed when also ipython notification is displayed (#2610) a57b4b92 [Job] Load source at runtime or build time fix (#2588) edb1b723 [Pipelines] Change the way the kfp pod pulls the run state (#2548) 04a96f2c [Frameworks] Add alias for the SciKit-Learn model server (#2615) 926732ce Fix Docker Compose Deployment (#2601) 0744d011 [Frameworks] Fixed bug of no validation set in training (#2608) efec5fac [API] Configure Uvicorn Keep Alive Timeout (#2613) 1e4db64d [API] Adding cloud storage to default allowed file paths (#2614) b062157f [Projects] Support GH fine grained tokens (#2611) 024aa277 [System Tests] Add checkout before copying cleanup.py (#2612) 2b391945 [Feature Store] Fix target path in spark merger (#2605) 6efd5574 [Tests] Fix test_schedule_on_filtered_by_time (#2598) db4a9be8 [Tests] Adopt test_sync_pipeline_chunks to entities set to df index (#2599) 63f3db6a [Run] Ignore returned None values for logging (#2603) ec1a12bf [Docs] figure updated with Function Hub (#2602)
Source code(tar.gz)
Source code(zip)
v1.2.0-rc16(Nov 17, 2022)
Features / Enhancements

System Tests: Change order of system tests cleanup and add datanode docker registry restart, #2600, @tankilevitch

Docs: Change marketplace to Function Hub, co-located data ingestion topics, #2590, @jillnogold

Docs: AWS install with policy, #2591, @gilad-shaham

Artifacts: List Artifacts enrich with tag for all tags, #2589, @tankilevitch

Frameworks: HuggingFace model server, #2594, @guy1992l

Unknown: [DataStore] Fix v3io listdir: fix logic error and move to httpclient transport, #2592, @assaf758

UI: Features & enhancement

Bug fixes

Tags: Fix warning Too much data for declared Content-Length in Delete Tags endpoint, #2597, @tankilevitch

Tests: Fix test_pandas_write_parquet, #2596, @gtopper

Datastore: Fix double passing of named parameter, #2578, @gtopper

UI: Bug fixes

Pull requests:

f45da1ff [System Tests] Change order of system tests cleanup and add datanode docker registry restart (#2600) c3a97564 [Docs] Change marketplace to Function Hub, co-located data ingestion topics (#2590) d5ad0bce [Tags] Fix warning Too much data for declared Content-Length in Delete Tags endpoint (#2597) 47c6616e [Docs] AWS install with policy (#2591) f4c45fc0 [Artifacts] List Artifacts enrich with tag for all tags (#2589) 747f45ef [Tests] Fix test_pandas_write_parquet (#2596) ff4d0a64 [Frameworks] HuggingFace model server (#2594) 15d7693d [DataStore] Fix v3io listdir: fix logic error and move to httpclient transport (#2592) 62e2e7d6 [Datastore] Fix double passing of named parameter (#2578)
Source code(tar.gz)
Source code(zip)
v1.1.2-rc3(Nov 18, 2022)

Source code(tar.gz)
Source code(zip)