Machine Learning Toolkit for Kubernetes

Last update: Jan 3, 2023

Overview

Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment.

Documentation

Please refer to the official docs at kubeflow.org.

Working Groups

The Kubeflow community is organized into working groups (WGs) with associated repositories, that focus on specific pieces of the ML platform.

Quick Links

Get Involved

Please refer to the Community page.

Comments

adds apply -f to kfctl
Implements case 1 and case 2 in the new kfctl semantics Design doc reference

[x] make platform args optional

[x] implement new semantics for build and apply

Current workflow for kfctl -

New sematics single command install:

# move into empty directory kfctl apply -f <path to config file / URL>

New semantics build, edit, apply:

# move into empty directory kfctl build -f <path to config file / URL> kfctl apply

Old semantics:

kfctl init kf-app --config <path to config file / URL> cd kf-app kfctl generate <platform> kfctl apply <platform>

/cc @yanniszark @jlewi @kkasravi

Related to: #3518

This change is
size/L lgtm approved cla: yes
opened by swiftdiaries 101
Kubeflow Install Error

Trying to install Kubeflow v.70 on on-prem cluster (kubernetes v15.7/Ubuntu 18.04) but getting below error:

root@0939-jdeml-m03:/opt/kubeflow/kf-test# kfctl apply -V -f ${CONFIG_FILE} INFO[0000]

Notice anonymous usage reporting enabled using spartakus To disable it If you have already deployed it run the following commands: cd $(pwd) kubectl -n ${K8S_NAMESPACE} delete deploy -l app=spartakus

For more info: https://www.kubeflow.org/docs/other-guides/usage-reporting/

filename="coordinator/coordinator.go:120" INFO[0000] Deleting cachedir /opt/kubeflow/kf-test/.cache/manifests because Status.ReposCache is out of date filename="kfconfig/types.go:464" INFO[0000] Fetching https://github.com/kubeflow/manifests/archive/v0.7-branch.tar.gz to /opt/kubeflow/kf-test/.cache/manifests filename="kfconfig/types.go:485" Error: failed to build kfApp from URI /opt/kubeflow/kf-test/kfctl_existing_arrikto.yaml: couldn't generate KfApp: (kubeflow.error): Code 500 with message: could not sync cache. Error: (kubeflow.error): Code 400 with message: couldn't download URI https://github.com/kubeflow/manifests/archive/v0.7-branch.tar.gz Error Error opening a gzip reader for /tmp/getter093015857/archive: EOF Usage: kfctl apply -f ${CONFIG} [flags]

Flags: -f, --file string Static config file to use. Can be either a local path: export CONFIG=./kfctl_gcp_iap.yaml or a URL: export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_gcp_iap.0.7.0.yaml export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_existing_arrikto.0.7.0.yaml export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_aws.0.7.0.yaml export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_k8s_istio.0.7.0.yaml kfctl apply -V --file=${CONFIG} -h, --help help for apply -V, --verbose verbose output default is false

failed to build kfApp from URI /opt/kubeflow/kf-test/kfctl_existing_arrikto.yaml: couldn't generate KfApp: (kubeflow.error): Code 500 with message: could not sync cache. Error: (kubeflow.error): Code 400 with message: couldn't download URI https://github.com/kubeflow/manifests/archive/v0.7-branch.tar.gz Error Error opening a gzip reader for /tmp/getter093015857/archive: EOF

The Kubeflow installation was working fin when tried it a few days ago. Trying to install it in a new cluster and getting this error. Can you please help?

Thanks, Job Varkey
priority/p0 platform/onprem kind/bug area/enterprise_readiness

opened by jobvarkey 84
Release KF 1.1
/kind process

We need to identify who will be driving the 1.1 release. These folks should then

Identify timelines for the release

e.g. cutoff dates for branch cuts

Target dates for RCs

This would probably be a good opportunity to update some of the processes and policies around releases. https://github.com/kubeflow/kubeflow/blob/master/docs_dev/releasing.md

Area | release czar | Tracking Issue --- | --- | --- aws | @Jeffwan | ~~#5057~~ | centraldashboard | | ~~#5068~~ docs | | kubeflow/website#1984 fairing | @jinchihe | ~~kubeflow/fairing#503~~ | feast | @woop | | gcp | @jlewi | kubeflow/gcp-blueprints#46 | katib | @andreyvelich | kubeflow/katib#1211 | kfctl | @krishnadurai, @crobby | kubeflow/kfctl#352 | kfserving | @yuzisun @animeshsingh | kubeflow/kfserving#648 | manifests | @krishnadurai | kubeflow/manifests#1252 metadata | | minikf | @vkoukis | | multiuser | @yanniszark @bmorphism | #5067, #5068 notebooks | @kimwnasptd , @jtfogarty | #5060, #5068 | pipelines | @Bobgy | kubeflow/pipelines#3961 | training | @johnugeorge @andreyvelich @Jeffwan | kubeflow/common#97

Target Dates

Branch cut June 19

kind/feature priority/p0 kind/process area/engprod
opened by jlewi 73
Fire off TFJob from Jupyter Notebook
We'd like to make it super easy to go from writing code in a notebook to training that model distributed.

Experience might be something like

User writes code in notebook and executes in Jupyter lab

User clicks a button which allows user to fill in various settings e.g. number of GPUs

User clicks train

Under the hood this would cause

A docker image to be built

A TFJob/PyTorch/K8s Job to be created and fired off.

I think the biggest challenge is that we probably don't want to execute all code in the notebook. Typically, there's some amount of refactoring that needs to be done to convert a notebook into a python module suitable for execution in a bash job.

As a concrete example

Here's the notebook for our GitHub Issue summarization example

Here's the corresponding python module used when training in a K8s job.

The python module only executes a subset of cells in particular those to

Define model architecture

train the model

Rather than try to auto-convert a notebook like the github issue example, I think we should require users structure their code to facilitate the conversion.

My suggestion would be to allow any functions defined in the notebook to be used as entry points. So for the GitHub issues summarization the user would have a cell like the following

from keras.callbacks import CSVLogger, ModelCheckpoint def train_model(output) script_name_base = 'tutorial_seq2seq' csv_logger = CSVLogger('{:}.log'.format(script_name_base)) model_checkpoint = ModelCheckpoint('{:}.epoch{{epoch:02d}}- val{{val_loss:.5f}}.hdf5'.format(script_name_base), save_best_only=True) batch_size = 1200 epochs = 7 history = seq2seq_Model.fit([encoder_input_data, decoder_input_data], np.expand_dims(decoder_target_data, -1), batch_size=batch_size, epochs=epochs, validation_split=0.12, callbacks=[csv_logger, model_checkpoint]) seq2seq_Model.save(output) train('seq2seq_model_tutorial.h5')

If user structures their code this way, we should be able to manually create and invoke a suitable container entry point. Something like the following

Use nbconvert to convert from ipynb to python code

Post process the python code

Strip out any statements not inside a function (except imports)

Create a CLI for the functions using a library like PyFire

Build a Docker image that is Notebook image + code

A variant of this idea would be to use metaml (by @wbuchwalter ). metaml uses metaparticle to allow people to annotate their python code with information needed to then run it on K8s (e.g. distributed using TFJob). If we went this approach I think the flow would be

Run nbconvert to go from ipynb -> py

Use metaparticle/metaml tool chain to build the docker image and submit the job.

@willingc @yuvipanda Is there existing tooling in the Jupyter community other than nbconvert to convert notebooks to code suitable for asynchronous batch execution?

/cc @wbuchwalter @gaocegege @yuvipanda @willingc
priority/p1 area/jupyter area/0.4.0
opened by jlewi 67
Add Tolerations to PodDefault

I want to add tolerations to the pods for my Jupyter server. I am able to inject other things like Labels and Annotations, this would allow me to also use PodDefault to inject tolerations.

This change is
size/L lgtm approved cla: yes ok-to-test

opened by wdhorton 60
Kubeflow main page namespace selection
/kind bug

When trying to change namespace on Kubeflow main page, the drop down of namespaces is showing behind the left blue pane, which completely obstructs the list

Kubeflow version: 0.6

kfctl version: kfctl v0.6.0-0-g71aea0a9

Kubernetes platform: OpenShift

Kubernetes version: kubernetes v1.13.4+c62ce01

priority/p0 kind/bug
opened by blublinsky 53
User None is not authorized to list ... for namespace: anonymous
/kind bug

What steps did you take and what happened: Installed kubeflow 1.0. While trying to create notebook server I am getting errors like:

User None is not authorized to list kubeflow.org.v1beta1.notebooks for namespace: anonymous

and

User None is not authorized to list .v1.persistentvolumeclaims for namespace: anonymous

What did you expect to happen: Successfully create notebook servers

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Kubeflow version: (version number can be found at the bottom left corner of the Kubeflow dashboard): 1.0

kfctl version: (use kfctl version): 1.0-rc3

Kubernetes platform: (e.g. minikube) AWS kops

Kubernetes version: (use kubectl version): 1.15.1

OS (e.g. from /etc/os-release): Ubuntu 18.04

priority/p0 platform/onprem kind/bug area/kfctl
opened by mfojtak 50
* spec.validation.openAPIV3Schema.properties[spec].type: Required value: must not be empty for specified object fields; with Kubernetes 1.16

/kind bug

What steps did you take and what happened: [A clear and concise description of what the bug is.]

Executed the workaround in https://github.com/kubeflow/kubeflow/issues/3810#issuecomment-527687504

get errors like the below

no matches for kind "deployment" in version "extensions/v1beta1" error: unable to recognize "STDIN": no matches for kind "Deployment" in version "apps/v1beta2"

What did you expect to happen:

all the pods to show up in kubectl get pods -n kubeflow

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment: Kubeflow version: (version number can be found at the bottom left corner of the Kubeflow dashboard): kfctl version: (use kfctl version): kfctl v0.6.2-0-g47a0e4c7 Kubernetes platform: (e.g. minikube) Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"} OS (e.g. from /etc/os-release): NAME="Ubuntu" VERSION="18.04.3 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.3 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic
priority/p1 kind/bug area/seldon

opened by cyrilthank 49
Add Kaggle notebook Dockerfile
Related to #1057 - pytorch notebook image Related to #258 - Kaggle image PTAL /cc @jlewi /cc @ankushagarwal /cc @johnugeorge

Launched for me but obviously not exhaustively tested. Created a contrib dir to give it that YMMV vibe.

Notes (IMHO):

py3 only

we should just manually build/push to gcr.io once per release, no other support

definitely should not be in any argo or release workflows

waaaay too big an image, there may be some slimming that could be done with my mods but almost certainly it would remain north of 20 Gb

This change is
size/M lgtm approved
opened by pdmack 48
Adds central UI

Adds a Central UI dashboard which provides links to all other components (jobs dashboard, jupyterhub) as a separate component.
Addresses Issue #146.

Depends on #11. A secure proxy is needed for the links to point to their respective components.

This change is
size/XL

opened by swiftdiaries 46
Kubeflow v1.2 release
/kind process

Time line:

RC Release: Nov 7

Official Release: Nov 16

WGs and projects

Area | release czar | Tracking Issue --- | --- | --- centraldashboard | | https://github.com/kubeflow/kubeflow/pull/5412 docs |@RFMVasconcelos | kubeflow/website#2322 fairing | @jinchihe | | feast | @woop | | katib | @andreyvelich | https://github.com/kubeflow/manifests/pull/1593 | kfctl | @Jeffwan | https://github.com/kubeflow/kfctl/issues/421 | kfserving | @yuzisun @animeshsingh @cliveseldon | https://github.com/kubeflow/manifests/pull/1600 https://github.com/kubeflow/manifests/pull/1575 | manifests | @Jeffwan | kubeflow/manifests#1597 metadata | | notebooks | @kimwnasptd , @jtfogarty | | pipelines | @Bobgy | https://github.com/kubeflow/manifests/pull/1605 | training | @johnugeorge @gaocegege @terrytangyuan | |

Platforms Platforms | release czar | Tracking Issue --- | --- | --- aws | @PatrickXYS | | azure | @aronchick | | ibm | @animeshsingh @adrian555 @shawnzhu | | gcp | @jlewi @Bobgy | | minikf | @vkoukis | |
kind/feature priority/p1 area/docs platform/aws community/maintenance kind/process effort/2-weeks+ area/1.2.0
opened by Jeffwan 45
[WIP] Re-Introducing the Volumes Viewer
Volumes Viewer

About two years ago, @DavidSpek proposed a pvcviewer. We've been using and enjoying this feature quite a lot since then and I'm convinced this feature needs to find its way to Kubeflow's core. Unfortunately, the PR staled and never got merged even though it sparked serious interest among the community.

This is my attempt to move forward with this feature, making it available for all users.

Pre-Existing Work

@DavidSpek provided a functional prototype of the volumes viewer. His work comprises a pvcviewer resource definition, a resource controller and changes to the volumes UI. His changes already got partially merged into the volumes UI but are only available within the rok distribution.

The initial controller was based off the tensorboard controller and was WIP in many regards.

My Work

I've re-implemented the volumes viewer, and made it compatible with the current volumes UI's master.

Also, the controller has been re-written from scratch and is well-tested. It now comprises new features, such as restarting pods on RWO-Nodename changes.

API

I've included an example of a viewer object. The API is similar to what @kimwnasptd descibed in another thread. Creating a VolumesViewer currently looks like this (comments included explaining its functionality).

apiVersion: kubeflow.org/v1alpha1 kind: VolumesViewer metadata: name: volumesviewer-sample namespace: kubeflow-user-example-com spec: # The podTemplate is applied to the deployment.Spec.Template.Spec # and thus, represents the core viewer's application podTemplate: #... service: # Specifies the application's target port used by the Service targetPort: 8080 # If defined, an istio VirtualService is created, pointing to the Service virtualService: # The base prefix is suffixed by '/namespace/name' to create the # VirtualService's prefix and a unique URL for each started viewer basePrefix: "/volumesviewer" # You may specify the VirtualService's rewrite. # If not set, the prefix's value is used rewrite: "/" # By default, no timeout is set # timeout: 30s rwoScheduling: # If set to true, the controller detects RWO-Volumes referred to by the # podTemplate and uses affinities to schedule the viewer to nodes # where the volume is currently mounted. This enables the viewer to # access RWO-Volumes, even though they might already be mounted. enabled: true # Using the rwoScheduling feature, the viewer might block other application # from (re-starting). Setting restart to true instructs the controller to # re-compute the affinity in case Pods start using the viewer's RWO-Volumes. # Thus, the viewer might restart on another node without blocking new Pods. restart: true

How to install

Using the current kubeflow master:

Install the kustomize applications

kustomize build components/crud-web-apps/volumes/manifests/overlays/istio | kubectl apply -f -

kustomize build components/volumes-viewer/config/overlays/kubeflow | kubectl apply -f -

Build the container images and set them accordingly (alternatively, use my prebuilt images):

kubectl -n kubeflow set image deploy/volumes-web-app-deployment volumes-web-app=tobiasgoerke/kubeflow-volumes-web-app:test-v2

kubectl -n kubeflow set image deploy/volumes-viewer-controller-manager manager=tobiasgoerke/kubeflow-volumes-viewer:test-v2

Outlook and Future Work

There is a generic viewer controller managed by the pipelines WG. The volumes viewer would integrate into this controller nicely. However, attempts to integrate the viewer have failed and it seems the viewer itself is not moving forward, too. Thus, I've decided to create a new controller (like @DavidSpek did in his PR) and call the integration into the pipelines WG optional and out-of-scope. In case this PR gets merged, I'll rename the VolumesViewer controller and push for it to be accepted as a generic viewer. This would enable the volumes viewer, notebooks, tensorboard etc. to use a common implementation and controller. For this PR, the CRD and controller could be dropped then, leaving us with only changes to the volumes UI. I've designed this PR with this possible step in mind so that a generic viewer would only require one file to change.

I've created a PR in the filebrowser project which enables Filebrowser to support the tus.io protocol for resumable and chunked uploads. This comes in very handy with big uploads that may get disrupted or proxies that block large requests. This PR is currently in review.

size/XXL do-not-merge/work-in-progress
opened by TobiasGoerke 1
jwa(front): Auto update mount path

Fix an issue in JWA's form and change the volume's mount path when the volume's name changes. This feature is already working for workspace volumes. Also, add one more e2e test for the volume's mount path.
size/L lgtm

opened by tasos-ale 2
Support Pod Defaults in Tensorboard controller
This PR adds functionality for adding pod default related labels for tensorboard pods in tensorboard controller. This can be used for various things like configuring cloud storage parameters using environment variables(#6493), service account, tolerations etc.

User experience:

User creates a tensorboard CR with required labels (metadata.labels)

Controller will create the deployment with labels on the pod copied from the tensorboard CR

Admission webhook will take care of injecting necessary configurations according to poddefault

Updated the image version because v2.1.0 did not work with IAM roles. Using the latest version would require building and maintaining an image because some functionality is moved out to a different package(tensorflow-io). This package is not installed by default in tensorflow image and I would like to focus on completing the core functionality first.

@kimwnasptd @kandrio @elikatsis

Is there anyone who can help with implementing the webapp frontend related changes or I would appreciate if anyone can give me a quick intro on the frontend Angular part of the code so I can make the changes myself?
size/S
opened by surajkota 1
make: Build KF images in parallel
This PR addresses https://github.com/kubeflow/kubeflow/issues/6872

Currently there is a single build-all rule in the top-level Makefile under components/ dir that calls all sub-makefiles for building all KF images in a serial manner. This makes the whole process very time-consuming as we can't use the -j option of make to run jobs in parallel.

Changes to the top-level Makefile for building all KF images:

Create a single rule for each directory containing a sub-makefile

Have each directory rule as a dependency to the build-all rule in the central Makefile

This way every directory rule can run in parallel with other rules.

Similarly, the central makefile for building all the example-notebook-servers images calls each sub-Makefile for each of the notebook servers in a serial manner.

Changes to the central Makefile for building all notebook-server-images:

Split the single target rule into multiple rules which perform recursive make calls (sub-makefiles) to build all the notebook-server-images.

Use the variable MAKE for recursive make commands instead of explicit make command:

https://www.gnu.org/software/make/manual/html_node/MAKE-Variable.html#MAKE-Variable

Signed-off-by: Apostolos Gerakaris [email protected]
size/L
opened by apo-ger 1
Build Kubeflow images in parallel
/kind feature

https://github.com/kubeflow/kubeflow/pull/6555 introduced a mechanism for building all Kubeflow images via a top-level central-Makefile. The central-Makefile has a build-all rule that calls all the sub-Makefiles of the different components in a serial manner. This makes the whole process very time-consuming since we can't use the -j option of make to run jobs in parallel.

We could speed-up this process by:

creating a single rule for each directory containing a sub-Makefile.

having each directory rule as a dependency to the build-all rule in the central Makefile

This way every directory rule can run in parallel with other rules.

The same applies for the central Makefile for building all notebook-server-images. In this case though we should follow a slightly different approach as we need to build the images in a specific order since most of the Dockerfiles expect a base image to build on top. The following graph shows the dependencies for building the images:

So, we could split the single docker-build-all rule into multiple rules which perform recursive make calls to sub-Makefiles to build all the notebook-server-images. We can introduce target rules for the following sub-Makefiles:

jupyter-pytorch-full:

jupyter-pytorch-full-cpu

jupyter-pytorch-full-cuda

jupyter-scipy

jupyter-tensorflow-full:

jupyter-tensorflow-full-cpu

jupyter-tensorflow-full-cuda

codeserver-python

rstudio-tidyverse

This way we ensure that all notebook-server-images will be built as each sub-Makefile is responsible for building all the required base images.
kind/feature
opened by apo-ger 0
set_cpu_limit not acting like set_gpu_limit

What steps did you take and what happened: When trying to set limitation on cpu like this: op.set_cpu_limit(16) on ContainerOp and compile it using: kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip') I get: TypeError: expected string or bytes-like object which is different when using gpu limit: op.set_gpu_limit(16) that compile well and working

What did you expect to happen: I expect those two function to act similarly.

Anything else you would like to add: I solved it by using op.set_cpu_limit('16')

Environment: kfp==1.6.2 python 3.8

opened by ShaharSheli 0

Releases(v1.6.0)

v1.6.0(Sep 7, 2022)
KF 1.6 release 🎉

What's new ✨

Expose notebook idleness in Notebook management web app https://github.com/kubeflow/kubeflow/pull/6544 @kimwnasptd

Notebook Controller QPS and Burst https://github.com/kubeflow/kubeflow/pull/6453 @mofanke

Don't show Notebook groups if they don't have any images https://github.com/kubeflow/kubeflow/pull/6482 @haoxins

Upgrade Kubebuilder for Profiles Controller https://github.com/kubeflow/kubeflow/pull/6491 @apo-ger

What's fixed 🐛

Reload certificates in PodDefaults webhook https://github.com/kubeflow/kubeflow/pull/6581 @manolis-andr

Notebooks controller and K8s 1.22 https://github.com/kubeflow/kubeflow/pull/6374 @samuelvl

PodDefaults and K8s 1.22 https://github.com/kubeflow/kubeflow/pull/6459 @aaron-arellano

TensorBoard and K8s 1.22 https://github.com/kubeflow/kubeflow/pull/6406 @alembiewski

Allow Notebook Controller to patch events https://github.com/kubeflow/kubeflow/pull/6523 @henrysecond1

Upgrade notes ⏫

The project is using DockerHub hosted images instead of ECR ones https://github.com/kubeflow/kubeflow/pull/6548. Users will need to update their ovelays accordingly

Source code(tar.gz)
Source code(zip)
v1.5.0(Mar 10, 2022)
KF 1.5 release 🎉

What's new ✨

notebooks: Extend Notebook Controller to expose idleness for Jupyter #6297 @athamark

jwa: Rework the Storage API of the web app #6321 @kimwnasptd

Change namespace label for Katib metrics collector injection #6225 @andreyvelich

Tell user to select a namespace to access namespaced menu item #6181 @toshi-k

frontend: Use responsive tables instead of cards #6316 @kimwnasptd

Fix(manifests): Upgrade rbac.authorization.k8s.io from v1beta1 to v1 #6261 @haoxins

Synchronize jupyter-web-application role with clusterrole #6241 @juliusvonkohout

What's fixed 🐛

Secure access to KFAM #6077 @DavidSpek

fix(backend): tensorboard-controller does not work because of missing permissions #6216 @juliusvonkohout

fix(notebooks) make culling work with multi-user #5980 @LEDfan

notebooks: Graceful handling of events #6338 @kimwnasptd

notebooks: Fix endless restarts #6341 @kimwnasptd

Centraldashboard/add uncaughtexception handler #6203 @haoxins

fix dashboard sending malformed headers to kfam #6186 @thesuperzapper

Upgrade notes ⏫

Update the Jupyter web apps configmap with the new storage structure https://github.com/kubeflow/kubeflow/pull/6321

Notebook Controller is using different ENV Vars for culling https://github.com/kubeflow/kubeflow/blob/master/components/proposals/20220121-jupyter-notebook-idleness.md#upgrade-notes

The Profile Controller is adding different Katib metrics labels https://github.com/kubeflow/kubeflow/pull/6225

Source code(tar.gz)
Source code(zip)
v1.5.0-rc.1(Feb 14, 2022)
First RC of the KF 1.5 release 🎉

What's new ✨

notebooks: Extend Notebook Controller to expose idleness for Jupyter #6297 @athamark

jwa: Rework the Storage API of the web app #6321 @kimwnasptd

Change namespace label for Katib metrics collector injection #6225 @andreyvelich

Tell user to select a namespace to access namespaced menu item #6181 @toshi-k

frontend: Use responsive tables instead of cards #6316 @kimwnasptd

Fix(manifests): Upgrade rbac.authorization.k8s.io from v1beta1 to v1 #6261 @haoxins

Synchronize jupyter-web-application role with clusterrole #6241 @juliusvonkohout

What's fixed 🐛

Secure access to KFAM #6077 @DavidSpek

fix(backend): tensorboard-controller does not work because of missing permissions #6216 @juliusvonkohout

fix(notebooks) make culling work with multi-user #5980 @LEDfan

notebooks: Graceful handling of events #6338 @kimwnasptd

notebooks: Fix endless restarts #6341 @kimwnasptd

Centraldashboard/add uncaughtexception handler #6203 @haoxins

fix dashboard sending malformed headers to kfam #6186 @thesuperzapper

Upgrade notes ⏫

Update the Jupyter web apps configmap with the new storage structure https://github.com/kubeflow/kubeflow/pull/6321

Notebook Controller is using different ENV Vars for culling https://github.com/kubeflow/kubeflow/blob/master/components/proposals/20220121-jupyter-notebook-idleness.md#upgrade-notes

The Profile Controller is adding different Katib metrics labels https://github.com/kubeflow/kubeflow/pull/6225

Source code(tar.gz)
Source code(zip)
v1.5.0-rc.0(Feb 14, 2022)
First RC of the KF 1.5 release 🎉

What's new ✨

Change namespace label for Katib metrics collector injection #6225 @andreyvelich

Tell user to select a namespace to access namespaced menu item #6181 @toshi-k

frontend: Use responsive tables instead of cards #6316 @kimwnasptd

What's fixed 🐛

Secure access to KFAM #6077 @DavidSpek

fix(backend): tensorboard-controller does not work because of missing permissions #6216 @juliusvonkohout

fix(notebooks) make culling work with multi-user #5980 @LEDfan

Centraldashboard/add uncaughtexception handler #6203 @haoxins

fix dashboard sending malformed headers to kfam #6186 @thesuperzapper

Source code(tar.gz)
Source code(zip)
v1.4.0(Oct 11, 2021)
Web apps:

Migrate to Angular 12 from 8: https://github.com/kubeflow/kubeflow/pull/6004 @DavidSpek

Internationalization progress:

Internationalization to web apps https://github.com/kubeflow/kubeflow/pull/5880 @wg102, @Jose-Matsuda

Use Angular's i18n solution https://github.com/kubeflow/kubeflow/pull/6065 @kimwnasptd

Central Dashboard:

Namespaced menu items https://github.com/kubeflow/kubeflow/pull/5995 @toshi-k

Make it possible to add Namespaced menu items #5871 @toshi-k

Scrollable side-bar https://github.com/kubeflow/kubeflow/pull/5964 @toshi-k

Add support for Models web app https://github.com/kubeflow/kubeflow/pull/6085 @kimwnasptd

Jupyter web app

Jupyter web app fix for autoscaling GPU nodegroups https://github.com/kubeflow/kubeflow/pull/6171 @kimwnasptd

Fix limits calculation when limitFactor is none in Jupyter web app #6058 @kimwnasptd

Make jupyter-web-app parse workspace volume MountPath #5952 @anencore94

JWA: Don't override assets with logos ConfigMap #5942 @kimwnasptd

Add fonts as assets to service #5691 @saffaalvi @wg102

The number of gpu must be set as string #5891 @juliusvonkohout

Notebooks

Remove virtualservice timeout to prevent websocket disconnect #6126 @abe-hpe

Correct missing predicates in controller watches#5873 @filintod

Improve the README of the example Notebook servers https://github.com/kubeflow/kubeflow/pull/6165 @thesuperzapper

TensorBoards

tensorboard-controller: fix binding issue #5925 @DavidSpek

PodDefaults

Bump Golang version for PodDefaults, TensorBoard Controller and KFAM https://github.com/kubeflow/kubeflow/pull/6180 @kimwnasptd

Use namespace from Admission Review in PodDefaults https://github.com/kubeflow/kubeflow/pull/6052 @henrysecond1

Add support for ServiceAccountName and AutomountServiceAccountName to PodDefaults #5939 @hopper-signifyd

Other improvements

Add kustomize tests in CI/CD https://github.com/kubeflow/kubeflow/pull/5919 @DavidSpek

Format checks for the Volumes web app https://github.com/kubeflow/kubeflow/pull/5820 @kimwnasptd

Make i18n work with prefixes https://github.com/kubeflow/kubeflow/pull/6034 @kimwnasptd

Allow user to add/delete labels to user namespace using ConfigMap #5761 @zijianjoy

Source code(tar.gz)
Source code(zip)
v1.4-rc.0(Sep 14, 2021)

The first RC for the 1.4 release
Source code(tar.gz)
Source code(zip)
v1.3.1-rc.0(Jun 25, 2021)

v1.3.1-rc.0 release for components owned by WG-Notebooks.
Source code(tar.gz)
Source code(zip)
v1.3.0(May 27, 2021)

v1.3.0 stable release for components owned by WG-Notebooks.
Source code(tar.gz)
Source code(zip)
v1.2.0(Nov 20, 2020)
To deploy Kubeflow, please follow the instruction

kfctl https://github.com/kubeflow/kfctl/releases/tag/v1.2.0

manifest https://github.com/kubeflow/manifests/releases/tag/v1.2.0

Changelog

1.2.0 (2020-11-20)

Please check full change log here
Source code(tar.gz)
Source code(zip)
v1.2-rc.0(Nov 9, 2020)

Please help test v1.2 kubeflow release.

The binaries are the same as published in https://github.com/kubeflow/kfctl/releases/tag/v1.2-rc.0
Source code(tar.gz)
Source code(zip)
v1.1.0(Nov 9, 2020)

Kubeflow v1.1.0 stable release. (It was release on July 29, 2020) I make up this release to make it more clear because lots of people check kubeflow/kubeflow release page to find latest version.

Note: The binaries are the same as published in https://github.com/kubeflow/kfctl/releases/tag/v1.1.0
Source code(tar.gz)
Source code(zip)
v1.0(Mar 9, 2020)

Kubeflow v1.0 stable release.

Note: The binaries are the same as published in https://github.com/kubeflow/kfctl/releases/tag/v1.0.
Source code(tar.gz)
Source code(zip)
kfctl_v1.0-0-g94c35cf_darwin.tar.gz(24.20 MB)
kfctl_v1.0-0-g94c35cf_linux.tar.gz(29.58 MB)
v0.7.1(Jan 6, 2020)

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.1-2-g55f9b2a_darwin.tar.gz(22.70 MB)
kfctl_v0.7.1-2-g55f9b2a_linux.tar.gz(28.18 MB)
v0.7.0(Nov 4, 2019)
0.7.0 release of Kubeflow

Refer to documentation on www.kubeflow.org for latest instructions

See blog post for list of new features

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0_linux.tar.gz(26.02 MB)
kfctl_v0.7.0_darwin.tar.gz(22.67 MB)
v0.7.0-rc.8(Nov 2, 2019)
Fixes issues with GCP basic auth

Print out a success message if apply succeeds

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0-rc.8_linux.tar.gz(26.02 MB)
kfctl_v0.7.0-rc.8_darwin.tar.gz(22.67 MB)
v0.7.0-rc.7(Oct 31, 2019)

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0-rc.5-27-g7f64d8b0_linux.tar.gz(28.18 MB)
kfctl_v0.7.0-rc.5-27-g7f64d8b0_darwin.tar.gz(22.69 MB)
v0.7.0-rc.6(Oct 25, 2019)

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0-rc.5-7-gc66ebff3_darwin.tar.gz(22.69 MB)
kfctl_v0.7.0-rc.5-7-gc66ebff3_linux.tar.gz(28.18 MB)
v0.7.0-rc.5(Oct 22, 2019)

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0-rc.5-0-g18fba77a_darwin.tar.gz(22.69 MB)
kfctl_v0.7.0-rc.5-0-g18fba77a_linux.tar.gz(28.18 MB)
v0.7.0-rc.4(Oct 21, 2019)

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0-rc.3-13-g32be850c_darwin.tar.gz(22.70 MB)
kfctl_v0.7.0-rc.3-13-g32be850c_linux.tar.gz(28.19 MB)
v0.7.0-rc.3(Oct 18, 2019)

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0-rc.3-0-g80e653f2_darwin.tar.gz(27.99 MB)
kfctl_v0.7.0-rc.3-0-g80e653f2_linux.tar.gz(27.99 MB)
v0.7.0-rc.2(Oct 16, 2019)

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0-rc.1-11-g3c813ef7_darwin.tar.gz(27.09 MB)
kfctl_v0.7.0-rc.1-11-g3c813ef7_linux.tar.gz(27.09 MB)
v0.7.0-rc.1(Oct 14, 2019)

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0-rc.1-0-g82a6ab76_darwin.tar.gz(26.96 MB)
kfctl_v0.7.0-rc.1-0-g82a6ab76_linux.tar.gz(26.96 MB)
v0.7.0-rc.0(Oct 11, 2019)

Source code(tar.gz)
Source code(zip)
kfctl_v0.7.0-rc.0-0-gfa8886ea_darwin.tar.gz(26.97 MB)
kfctl_v0.7.0-rc.0-0-gfa8886ea_linux.tar.gz(26.97 MB)
v0.6.2(Aug 26, 2019)
Change logs since V0.6.1

Add application overlay for tf-training (v0.6)

Backend changes for Central Dashboard to support multi-user isolation

Multi-User Isolation 2.0 - Manage Users

[Central Dashboard v2] Multi-User Isolation PR

[Central-Dashboard v2] MUI Basic Auth Support

[Central-Dashboard v2] MUI Style and Nomenclature

Pin kubeflow/manifests and kubeflow/kubeflow commits on the v0.6-branch

Fix the default poddefaults add-gcp-secret

Fix the error link description when using kfctl install

kfctl: existing_arrikto: direct all 5556 traffic to Dex

[Centraldashboard v2] Correctly update Kubeflow Dashboard version from Kubeflow Resource

add wait time for default profile to 15min

Source code(tar.gz)
Source code(zip)
kfctl_v0.6.2_darwin.tar.gz(27.36 MB)
kfctl_v0.6.2_linux.tar.gz(27.01 MB)
v0.6.2-rc.2(Aug 22, 2019)
Change logs since v0.6.2-rc.1

always create default profile; turn off istio rbac for kfctl_k8s_istio (#3959)

kfctl: existing_arrikto: rename config file (#3884)

Update repo commit. (#3969)

Source code(tar.gz)
Source code(zip)
kfctl_v0.6.2-rc.2_darwin.tar.gz(27.36 MB)
kfctl_v0.6.2-rc.2_linux.tar.gz(27.40 MB)
v0.6.2-rc.1(Aug 21, 2019)
Change logs since v0.6.2-rc.0

kfctl: pin configs to v0.6-branch (#3867)

update config file on v0.6 branch (#3892)

Pin kubeflow/manifests and kubeflow/kubeflow commits on the v0.6-branch (#3894)

Fix the default poddefaults add-gcp-secret in v0.6-branch (#3926)

Fix the error link description when using kfctl install. (#3942)

Create a kfctl_gcp_iap file to use as an RC for 0.6.2 (#3919)

kfctl: existing_arrikto: direct all 5556 traffic to Dex (#3293)

Incorrect indentation at line 106 in kfctl_existing_arrikto.0.6.yaml causing deployment to fail when using this config. (#3961)

Source code(tar.gz)
Source code(zip)
kfctl_v0.6.2-rc.1_darwin.tar.gz(27.36 MB)
kfctl_v0.6.2-rc.1_linux.tar.gz(27.40 MB)
v0.6.2-rc.0(Aug 8, 2019)

Include new central dashboard change supporting multi-tenancy
Source code(tar.gz)
Source code(zip)
v0.6.1(Jul 27, 2019)
The second release of v0.6

Change logs since v0.6.0:

Fix metadata UI routing issue (#3714)

Paper-menu-dropdown is aligned left (#3743)

Create PodDefault if platform is GCP (#3741)

Add AWS kfctl(golang) support and 0.6 support (#3695)

Fix ip discovery and cert creation on AWS for existing_arrikto config (#3760)

Add istio namespace for cloud endpoint

Fix basic auth DNS setup

Kubeflow v0.6:

introduces a new Metadata component, along with a Metadata API and initial corresponding clients.

introduces multi-user authentication & authorization with SSO.

extends the Kubeflow Pipelines’ DSL to seamlessly support the use of Persistent Volumes and Volume Snapshots as distinct Kubeflow Pipelines resources.

includes a number of new features and bug fixes within the Pipelines system, such as preemptible VMs, streamlined run creation, improved visualization of pipeline metadata, and support for default experiments.

Source code(tar.gz)
Source code(zip)
kfctl_v0.6.1_linux.tar.gz(29.59 MB)
kfctl_v0.6.1_darwin.tar.gz(18.16 MB)
v0.6.1-rc.2(Jul 27, 2019)
Add AWS kfctl(golang) support and 0.6 support #3695

Fix ip discovery and cert creation on AWS for existing_arrikto config #3760

Add istio namespace for cloud endpoint #240

Source code(tar.gz)
Source code(zip)
kfctl_v0.6.1-rc.2-1-g3a37cbc6_linux.tar.gz(29.59 MB)
kfctl_v0.6.1-rc.2-1-g3a37cbc6_darwin.tar.gz(18.16 MB)
v0.6.1-rc.1(Jul 24, 2019)

Changes since v0.6.0:

Paper-menu-dropdown is aligned left (#3743)
Create PodDefault if platform is GCP (#3741)
Fix metadata UI overlay (#3714)
Source code(tar.gz)
Source code(zip)
kfctl_v0.6.1-rc.1-1-gc8a1da0d_linux.tar.gz(28.72 MB)
kfctl_v0.6.1-rc.1-1-gc8a1da0d_darwin.tar.gz(17.68 MB)

Owner

Kubeflow

Kubeflow is an open, community driven project to make it easy to deploy and manage an ML stack on Kubernetes

GitHub

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17.3k Dec 29, 2022

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17k Feb 11, 2021

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

21.8k Jan 9, 2023

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

8.1k Jan 6, 2023

MILK: Machine Learning Toolkit

MILK: MACHINE LEARNING TOOLKIT Machine Learning in Python Milk is a machine learning toolkit in Python. Its focus is on supervised classification with

610 Dec 14, 2022

A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

11.6k Jan 1, 2023

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

408 Jan 1, 2023

Multi-Modal Machine Learning toolkit based on PyTorch.

简体中文 | English TorchMM 简介多模态学习工具包 TorchMM 旨在于提供模态联合学习和跨模态学习算法模型库，为处理图片文本等多模态数据提供高效的解决方案，助力多模态学习应用落地。近期更新 2022.1.5 发布 TorchMM 初始版本 v1.0 特性丰富的任务场景：工具

1 Jan 5, 2022

Multi-Modal Machine Learning toolkit based on PaddlePaddle.

简体中文 | English PaddleMM 简介飞桨多模态学习工具包 PaddleMM 旨在于提供模态联合学习和跨模态学习算法模型库，为处理图片文本等多模态数据提供高效的解决方案，助力多模态学习应用落地。近期更新 2022.1.5 发布 PaddleMM 初始版本 v1.0 特性丰富的任务

520 Dec 28, 2022

A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

OMNI A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes. Why? When I finished my Kubernetes cluster using a few Raspber

148 Dec 29, 2022

OpenDILab RL Kubernetes Custom Resource and Operator Lib

DI Orchestrator DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator. Prerequisites A w

205 Dec 29, 2022

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

8.9k Dec 30, 2022

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

81 Nov 26, 2022

This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Developed By Google!

Machine Learning Hand Detector This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Dev

3 Feb 25, 2022

Machine Learning Toolkit for Kubernetes

Related tags

Overview

Documentation

Working Groups

Quick Links

Get Involved

Comments

Volumes Viewer

Pre-Existing Work

My Work

API

How to install

Outlook and Future Work

Releases(v1.6.0)

v1.6.0(Sep 7, 2022)

What's new ✨

What's fixed 🐛

Upgrade notes ⏫

v1.5.0(Mar 10, 2022)

What's new ✨

What's fixed 🐛

Upgrade notes ⏫

v1.5.0-rc.1(Feb 14, 2022)

What's new ✨

What's fixed 🐛

Upgrade notes ⏫

v1.5.0-rc.0(Feb 14, 2022)

What's new ✨

What's fixed 🐛

v1.4.0(Oct 11, 2021)

v1.4-rc.0(Sep 14, 2021)

v1.3.1-rc.0(Jun 25, 2021)

v1.3.0(May 27, 2021)

v1.2.0(Nov 20, 2020)

Changelog

1.2.0 (2020-11-20)

v1.2-rc.0(Nov 9, 2020)

v1.1.0(Nov 9, 2020)

v1.0(Mar 9, 2020)

v0.7.1(Jan 6, 2020)

v0.7.0(Nov 4, 2019)

v0.7.0-rc.8(Nov 2, 2019)

v0.7.0-rc.7(Oct 31, 2019)

v0.7.0-rc.6(Oct 25, 2019)

v0.7.0-rc.5(Oct 22, 2019)

v0.7.0-rc.4(Oct 21, 2019)

v0.7.0-rc.3(Oct 18, 2019)

v0.7.0-rc.2(Oct 16, 2019)

v0.7.0-rc.1(Oct 14, 2019)

v0.7.0-rc.0(Oct 11, 2019)

v0.6.2(Aug 26, 2019)

v0.6.2-rc.2(Aug 22, 2019)

v0.6.2-rc.1(Aug 21, 2019)

v0.6.2-rc.0(Aug 8, 2019)

v0.6.1(Jul 27, 2019)

v0.6.1-rc.2(Jul 27, 2019)

v0.6.1-rc.1(Jul 24, 2019)

Owner

Kubeflow

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

MILK: Machine Learning Toolkit

A toolkit for making real world machine learning and data analysis applications in C++

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Multi-Modal Machine Learning toolkit based on PyTorch.

Multi-Modal Machine Learning toolkit based on PaddlePaddle.

A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

OpenDILab RL Kubernetes Custom Resource and Operator Lib

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Developed By Google!

Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

A toolkit for developing and comparing reinforcement learning algorithms.

D2Go is a toolkit for efficient deep learning

TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.