Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Capital One

Last update: Jan 3, 2023

Related tags

Overview

Rubicon

Purpose

Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a repeatable and searchable way. Rubicon's git integration associates these inputs and outputs directly with the model code that produced them to ensure full auditability and reproducibility for both developers and stakeholders alike. While experimenting, the Rubicon dashboard makes it easy to explore, filter, visualize, and share recorded work.

Components

Rubicon is composed of three parts:

A Python library for storing and retrieving model inputs, outputs, and analyses to filesystems that’s powered by fsspec
A dashboard for exploring, comparing, and visualizing logged data built with dash
And a process for sharing a selected subset of logged data with collaborators or reviewers that leverages intake

Workflow

Use the Rubicon library to capture model inputs and outputs over time. It can be easily integrated into existing Python models or pipelines and supports both concurrent logging (so multiple experiments can be logged in parallel) and asynchronous communication with S3 (so network reads and writes won’t block).

Meanwhile, periodically review the logged data within the Rubicon dashboard to steer the model tweaking process in the right direction. The dashboard lets you quickly spot trends by exploring and filtering your logged results and visualizes how the model inputs impacted the model outputs.

When the model is ready for review, Rubicon makes it easy to share specific subsets of the data with model reviewers and stakeholders, giving them the context necessary for a complete model review and approval.

Use

Here's a simple example:

from rubicon import Rubicon

rubicon = Rubicon(
    persistence="filesystem", root_dir="/rubicon-root", auto_git_enabled=True
)

project = rubicon.create_project(
    "Hello World", description="Using rubicon to track model results over time."
)

experiment = project.log_experiment(
    training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "my-data-set")],
    model_name="My Model Name",
    tags=["my_model_name"],
)

experiment.log_parameter("n_estimators", n_estimators)
experiment.log_parameter("n_features", n_features)
experiment.log_parameter("random_state", random_state)

accuracy = rfc.score(X_test, y_test)
experiment.log_metric("accuracy", accuracy)

Then explore the project by running the dashboard:

rubicon ui --root-dir /rubicon-root

Documentation

For a full overview, visit the docs. If you have suggestions or find a bug, please open an issue.

Install

rubicon is available on Conda Forge via conda and PyPi via pip.

conda config --add channels conda-forge
conda install rubicon-ml

pip install rubicon-ml

Develop

rubicon uses conda to manage environments. First, install conda. Then use conda to setup a development environment:

conda env create -f ci/environment.yml
conda activate rubicon-dev

Testing

The tests are separated into unit and integration tests. They can be run directly in the activated dev environment via pytest tests/unit or pytest tests/integration. Or by simply running pytest to execute all of them.

Note: some integration tests are intentionally marked to control when they are run (i.e. not during cicd). These tests include:

Integration tests that connect to physical filesystems (local, S3). You'll want to configure the root_dir appropriately for these tests (tests/integration/test_async_rubicon.py, tests/integration/test_rubicon.py). And they can be run with:
```
pytest -m "physical_filesystem_test"
```
Integration tests for the dashboard. To run these integration tests locally, you'll need to install one of the WebDrivers. To do so, follow the Install instructions in the Dash Testing Docs or install via brew with brew cask install chromedriver. You may have to update your permissions in Security & Privacy to install with brew.
```
pytest -m "dashboard_test"
```
Note: The --headless flag can be added to run the dashboard tests in headless mode.

Code Formatting

Install and configure pre-commit to automatically run black, flake8, and isort during commits:

install pre-commit
run pre-commit install to set up the git hook scripts

Now pre-commit will run automatically on git commit and will ensure consistent code format throughout the project. You can format without committing via pre-commit run or skip these checks with git commit --no-verify.

Contributors

_{Mike McCarty}

_{Sri Ranganathan}

_{Joe Wolfe}

_{Ryan Soley}

_{Diane Lee}

Comments

Edgetest action
What

This PR adds in the edgetest action to ensure the basic requirements are up to date given the tests pass.

I had to refactor the setup a bit to be PEP517 compliant.

How to Test

The install locally works for me but maybe a second set of eyes on this would be really helpful @ryanSoley @shania-m
opened by fdosani 15
auto versioning using git tags
What

in version.py and rubicon/_version.py, use git to get the version from the latest tag

this needs to be run from the repo with tags cloned too, which means it'll only work for devs who've installed from source (or the CI with the fetch_depth maxed out for now, in our case)

to deal with that, setup.py calls version.py's _write_version_file function when the package is bundled

this replaces rubicon/_version.py with a function that returns a hardcoded string (fetched from the git tags in the build environment)

this change won't ever need to be committed to the repo, because the current git solution will always work for anyone installed from source

How to Test

from the repo's root, run pip install -e ., launch a python interpreter, and import rubicon

rubicon.__version__ should return the latest version

navigate out of the repo and try the same thing - get_version will fail since theres no git repo

build rubicon - python setup.py sdist bdist_wheel - and install the wheel file

now rubicon.__version__ should return the latest version from a python interpreter started anywhere

enhancement
opened by ryanSoley 9
"Rubicon" name collides with existing project in the Python ecosystem
Describe the bug

I was just made aware of this project via a PyCon US announcement email.

The problem: the name you've chosen for this project collides with an existing project in the Python ecosystem.

I've been using the name Rubicon in the Python ecosystem since 2014. I'm the owner of the Rubicon record in PyPI, as well as some related projects:

https://pypi.org/project/rubicon/

https://pypi.org/project/rubicon-java/

https://pypi.org/project/rubicon-objc/

These projects are in active use in the Python community, and the Java subproject received funding (indirectly) from the PSF through their support of the BeeWare Android port.

I can only assume this is something you were at least partially aware of, because you've chosen the name rubicon-ml for your PyPI package, and changed the name of the package in setup.py.

Although the projects are in a different domain (language bridging vs numerical processing), I'd argue there is potential for confusion since they're both active projects in the same language ecosystem, and there is some usage of BeeWare tooling in the numerical processing community.

I humbly request you choose a different name for your project that doesn't collide with my pre-existing usage.
bug needs triage
opened by freakboy3742 7
Use conda incubator action for environment setup
Unpin Python in environment file to make sure Python version is not always 3.8

closes: #42

What

Uses the conda-incubator action for more flexible miniconda setup

Unset strict python version in environment file so the version matrix checks all the versions

Add percy to conda instead of using pip

How to Test

I think if the CI passes it works?
opened by gforsyth 5
python-3.10.6-h582c2e5_0_cpython.tar.bz2: 3 vulnerabilities (highest severity is: 9.8)
Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

General purpose programming language

Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

Path to dependency file: /environment.yml

Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

Vulnerabilities

| CVE | Severity | CVSS | Dependency | Type | Fixed in | Remediation Available | | ------------- | ------------- | ----- | ----- | ----- | --- | --- | | CVE-2015-20107 | High | 9.8 | python-3.10.6-h582c2e5_0_cpython.tar.bz2 | Direct | N/A | ❌ | | CVE-2020-10735 | High | 7.5 | python-3.10.6-h582c2e5_0_cpython.tar.bz2 | Direct | N/A | ❌ | | CVE-2021-28861 | High | 7.4 | python-3.10.6-h582c2e5_0_cpython.tar.bz2 | Direct | v3.11 | ❌ |

Details

CVE-2015-20107

Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

General purpose programming language

Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

Path to dependency file: /environment.yml

Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

Dependency Hierarchy:

:x: python-3.10.6-h582c2e5_0_cpython.tar.bz2 (Vulnerable Library)

Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

Found in base branch: main

Vulnerability Details

In Python (aka CPython) through 3.10.4, the mailcap module does not add escape characters into commands discovered in the system mailcap file. This may allow attackers to inject shell commands into applications that call mailcap.findmatch with untrusted input (if they lack validation of user-provided filenames or arguments).

Publish Date: 2022-04-13
URL: CVE-2015-20107

CVSS 3 Score Details (9.8)

Base Score Metrics:

Exploitability Metrics:

Attack Vector: Network

Attack Complexity: Low

Privileges Required: None

User Interaction: None

Scope: Unchanged

Impact Metrics:

Confidentiality Impact: High

Integrity Impact: High

Availability Impact: High

For more information on CVSS3 Scores, click here.

CVE-2020-10735

Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

General purpose programming language

Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

Path to dependency file: /environment.yml

Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

Dependency Hierarchy:

:x: python-3.10.6-h582c2e5_0_cpython.tar.bz2 (Vulnerable Library)

Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

Found in base branch: main

Vulnerability Details

A flaw was found in python. In algorithms with quadratic time complexity using non-binary bases, when using int("text"), a system could take 50ms to parse an int string with 100,000 digits and 5s for 1,000,000 digits (float, decimal, int.from_bytes(), and int() for binary bases 2, 4, 8, 16, and 32 are not affected). The highest threat from this vulnerability is to system availability.

Publish Date: 2022-09-09
URL: CVE-2020-10735

CVSS 3 Score Details (7.5)

Base Score Metrics:

Exploitability Metrics:

Attack Vector: Network

Attack Complexity: Low

Privileges Required: None

User Interaction: None

Scope: Unchanged

Impact Metrics:

Confidentiality Impact: None

Integrity Impact: None

Availability Impact: High

For more information on CVSS3 Scores, click here.

CVE-2021-28861

Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

General purpose programming language

Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

Path to dependency file: /environment.yml

Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

Dependency Hierarchy:

:x: python-3.10.6-h582c2e5_0_cpython.tar.bz2 (Vulnerable Library)

Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

Found in base branch: main

Vulnerability Details

** DISPUTED ** Python 3.x through 3.10 has an open redirection vulnerability in lib/http/server.py due to no protection against multiple (/) at the beginning of URI path which may leads to information disclosure. NOTE: this is disputed by a third party because the http.server.html documentation page states "Warning: http.server is not recommended for production. It only implements basic security checks."

Publish Date: 2022-08-23
URL: CVE-2021-28861

CVSS 3 Score Details (7.4)

Base Score Metrics:

Exploitability Metrics:

Attack Vector: Network

Attack Complexity: Low

Privileges Required: None

User Interaction: Required

Scope: Changed

Impact Metrics:

Confidentiality Impact: High

Integrity Impact: None

Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28861

Release Date: 2022-08-23

Fix Resolution: v3.11

security vulnerability
opened by mend-for-github-com[bot] 4

Issue with the edgetest action and Dask

Describe the bug @ryanSoley @ak-gupta I was digging into the edgetest action a bit more and I was able to recreate the bug we were seeing.

Steps/Code to reproduce bug Running the following:

conda create -n test python=3.9 pip conda
conda activate test
pip install dask==2022.2.0 prefect

If I do a pip list I end up with:

dask                    2022.2.0
prefect                 1.1.0

Then if I upgrade the following:

pip install dask prefect --upgrade

>
Requirement already satisfied: dask in ~/miniconda3/envs/test/lib/python3.9/site-packages (2022.2.0)
Collecting dask
  Using cached dask-2022.3.0-py3-none-any.whl (1.1 MB)
Requirement already satisfied: prefect in ~/miniconda3/envs/test/lib/python3.9/site-packages (1.1.0)
Requirement already satisfied: partd>=0.3.10 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (1.2.0)
Requirement already satisfied: fsspec>=0.6.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (2022.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (2.0.0)
Requirement already satisfied: packaging>=20.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (21.3)
Requirement already satisfied: toolz>=0.8.2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (0.11.2)
Requirement already satisfied: pyyaml>=5.3.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (6.0)
Requirement already satisfied: requests>=2.25 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2.27.1)
Requirement already satisfied: python-box>=5.1.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (6.0.1)
Requirement already satisfied: pendulum>=2.0.4 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2.1.2)
Requirement already satisfied: marshmallow>=3.0.0b19 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (3.15.0)
Requirement already satisfied: toml>=0.9.4 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (0.10.2)
Requirement already satisfied: docker>=3.4.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (5.0.3)
Requirement already satisfied: distributed>=2.17.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2022.2.0)
Requirement already satisfied: python-slugify>=1.2.6 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (6.1.1)
Requirement already satisfied: importlib-resources>=3.0.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (5.4.0)
Requirement already satisfied: croniter>=0.3.24 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (1.3.4)
Requirement already satisfied: urllib3>=1.26.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (1.26.9)
Requirement already satisfied: mypy-extensions>=0.4.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (0.4.3)
Requirement already satisfied: pytz>=2018.7 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2022.1)
Requirement already satisfied: msgpack>=0.6.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (1.0.3)
Requirement already satisfied: tabulate>=0.8.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (0.8.9)
Requirement already satisfied: python-dateutil>=2.7.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2.8.2)
Requirement already satisfied: click>=7.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (8.0.4)
Requirement already satisfied: marshmallow-oneofschema>=2.0.0b2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (3.0.1)
Requirement already satisfied: psutil>=5.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (5.9.0)
Requirement already satisfied: tblib>=1.6.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (1.7.0)
Requirement already satisfied: jinja2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (3.0.3)
Requirement already satisfied: setuptools in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (60.10.0)
Requirement already satisfied: zict>=0.1.3 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (2.1.0)
Requirement already satisfied: tornado>=6.0.3 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (6.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (2.4.0)
Requirement already satisfied: websocket-client>=0.32.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from docker>=3.4.1->prefect) (1.3.1)
Requirement already satisfied: zipp>=3.1.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from importlib-resources>=3.0.0->prefect) (3.7.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from packaging>=20.0->dask) (3.0.7)
Requirement already satisfied: locket in ~/miniconda3/envs/test/lib/python3.9/site-packages (from partd>=0.3.10->dask) (0.2.1)
Requirement already satisfied: pytzdata>=2020.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from pendulum>=2.0.4->prefect) (2020.1)
Requirement already satisfied: six>=1.5 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from python-dateutil>=2.7.0->prefect) (1.16.0)
Requirement already satisfied: text-unidecode>=1.3 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from python-slugify>=1.2.6->prefect) (1.3)
Requirement already satisfied: certifi>=2017.4.17 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from requests>=2.25->prefect) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from requests>=2.25->prefect) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from requests>=2.25->prefect) (3.3)
Requirement already satisfied: heapdict in ~/miniconda3/envs/test/lib/python3.9/site-packages (from zict>=0.1.3->distributed>=2.17.0->prefect) (1.0.1)
Requirement already satisfied: MarkupSafe>=2.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from jinja2->distributed>=2.17.0->prefect) (2.1.1)

Expected behavior dask should be upgraded to 2022.3.0, but due to some interaction with prefect it doesn't seem to. I've tested with other packages and it seems like dask is the only one which causes this.

Additional context Will need to dig in a bit more but I'm thinking this isn't a edgetest issue but something to do with prefect? Appreciate any thoughts or insights either of you might have.

bug needs triage

opened by fdosani 4

added buttons to select all or remove all columns in UI table
closes: #61

What

Introduced buttons to select all columns or hide all columns in Rubicon UI experiment tables

How to Test

When select all columns button is clicked, all columns appear in the table

When clear all columns button is clicked, no columns appear in the table
opened by shania-m 4

Logging MultiIndex Dataframes Fails

Describe the bug It appears that internally, rubicon's .log_dataframe() converts pandas dataframes to dask dataframes regardless of the situation. This can cause issues in scenarios where dask might not support certain dataframe features such as multiindex dataframes.

Steps/Code to reproduce bug

import pandas as pd
from rubicon.client import Rubicon
# Create sample data
df = pd.DataFrame([[0,1,'a'],[1,1,'b'],[2,2,'c'],[3,2,'d']], columns=['a', 'b', 'c'])
df = df.set_index(['b', 'a']) # Set multiindex
df
     c
b a   
1 0  a
  1  b
2 2  c
  3  d

# Log dataframe to rubicon
rubicon = Rubicon(persistence="memory")
project = rubicon.get_or_create_project("test")
exp = project.log_experiment('test_exp')
exp.log_dataframe(df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/rubicon/client/mixin.py", line 251, in log_dataframe
    self.repository.create_dataframe(dataframe, df, project_name, experiment_id=experiment_id)
  File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/rubicon/repository/base.py", line 426, in create_dataframe
    data = self._convert_to_dask_dataframe(data)
  File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/rubicon/repository/base.py", line 396, in _convert_to_dask_dataframe
    return dd.from_pandas(df, npartitions=1)
  File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/dask/dataframe/io/io.py", line 202, in from_pandas
    raise NotImplementedError("Dask does not support MultiIndex Dataframes.")
NotImplementedError: Dask does not support MultiIndex Dataframes.

Additional context Not familiar as to why pandas dataframes need to be converted to dask dataframes every time during logging but the solution would revolve around avoiding conversion to dask since dask in this case does not support multiindex.

bug

opened by Lazea 4

Automatic sklearn pipeline logging
Is your feature request related to a problem? Please describe

One way to log training data to Rubicon would be to extend the scikit-learn.pipeline so information could be logged before and/or after each step. We could extend the class and override the fit and predict methods to add optional hooks before and after.

Describe the solution you'd like

Something like...

from sklearn.pipeline import Pipeline class RubiconPipeline(Pipeline): def before_fit(X, y=None, **fit_params): # logs info from self.steps ... def after_fit(X, y=None, **fit_params): # logs info from self.steps after fitting ... def fit(self, X, y=None, **fit_params): self.before_fit(X, y) retval = super().fit(X, y=y, **fit_params) self.after_fit(X, y) return retval

Additional context

Three cases to consider:

Inferred logging from inspecting X's, y's and estimator object

Logging through an extended common Rubicon/SKLearn API (optionally call .rubicon_log methods on estimators)

Logging through user defined functions (UDFs) optionally provided to RubiconPipeline.__init__

development feature
opened by joe-wolfe21 4
reorganize the existing docs

Is your documentation request related to a problem? Please describe

we would like to update the rubicon-ml docs to follow the Diataxis framework

Describe the solution you'd like

once #207 is completed and we have a plan for reorganizing the docs, this issue will track the actual reorganization work
documentation

opened by ryanSoley 3
`jupyter-dash` proof-of-concept
I've been thinking about how we could get a live example of the dashboard hosted for users for a while now. I saw how the dask examples use JupyterLab through binder to show off their task graphs and stuff, so I thought it'd be great if we had a way to run the UI in JupyterLab. Of course we can launch it from lab, but that runs it on a localhost port which may not always be accessible.

Then I came across this blog and thought if we could just use this it'd solve it.

https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e

I was gonna raise this as an issue but it ended up being super easy to implement, so here it is.

What

uses jupyter-dash to instantiate the dashboard app

the default option, "external", runs it exactly the same as dash.Dash would

now there are options for "jupyterlab" and "inline" which display the dashboard in a new JLab window or inline in a notebook respectively

How to Test

run through the added notebook (should we even keep it?)
opened by ryanSoley 3
return rubicon objects with proper parents
Is your enhancement request related to a problem? Please describe

the rubicon objects that can optionally be returned by RubiconJSON are currently utilizing a NoOpParent instead of the proper experiment/project parent

https://github.com/capitalone/rubicon-ml/blob/jsonpath-poc/rubicon_ml/client/rubicon_json.py#L24

this means that any operations that need to actually reach out to the filesystem will not be possible, essentially making the objects read-only

Describe the solution you'd like

we should try to associate the proper parents with each returned object so that they will be fully functional rubicon objects

this will likely require the inspection of the match.value objects returned from each query. the JSON in match.value should have an experiment_id (feature, metric, parameter), or a parent_id (experiment, artifact, dataframe)

projects are the exception - they require a config object rather than a parent. ‼️ this is actually incorrect in the current implementation and I didn't catch it before merging to the integration branch, so we'll have to address it too ‼️

there are a few steps that'll be required in the process:

for each match in the results object:

extract the experiment_id or parent_id from the result

if the queried object is not a project:

if RubiconJSON is instantiated with experiments as an input and the parent of the queried object is an experiment, the parent should be in that list

if RubiconJSON is instantiated with projects as an input and the parent of the queried object is a project, the parent should be in that list

if RubiconJSON is instantiated with projects as an input and the parent of the queried object is an experiment, the parent should retrievable from one of those projects using project.experiment(id=...

if RubiconJSON is instantiated with top level rubicon objects as an input, we'll need to leverage get_project(id=... and experiment(id=... to retrieve the proper parents whether they be a project or experiment

if the queried object is a project:

we must be retrieving it from a top level rubicon input to RubiconJSON - so we can just take the config object off that top level rubicon object and dump it into the new project

Describe alternatives you've considered

if this ends up being infeasible, its fine to just leave the NoOpParents

Additional context Add any other context, code examples, or references to existing implementations about the feature request here.
enhancement development
opened by ryanSoley 0
add example & docs for `RubiconJSON`

Is your documentation request related to a problem? Please describe

since this is a completely new feature, we'll need a new section in the docs for it

Describe the solution you'd like

design a notebook showing how the new RubiconJSON class works (maybe just adapt the poc one we're basing this all off) for the documentation. add said notebook to the docs. add any new, public python methods to the API reference
documentation example

opened by ryanSoley 0
add python 3.11 to test matrix
What

adds python 3.11 to test matrix in the testing workflow

How to Test

make sure the CI on this branch runs tests for four python versions, 3.8-3.11

enhancement
opened by ryanSoley 1
add example showing rubicon w/ DataProfiler
Is your feature request related to a problem? Please describe

after seeing the DataProfiler demo at PyCon, we decided we could show an example of rubicon tracking data profiles over a project/experiments' lifetime

Describe the solution you'd like

create an example that shows experiments profiling each incoming dataset and logging the profiles to rubicon

use rubicon to illustrate a change in data profiles over experiments

reference new data profiler integration example in "logging training metadata" example, as it is basically an extension of logging training metadata

title the new example "Integrate with DataProfiler" and add it to the "How to..." section of the docs

documentation example
opened by ryanSoley 0
Validate fspec backends work with Rubicon
Rubicon-ml leverages fsspec for persistence. This issue includes:

Determine which fsspec backends apply to rubicon

Validate each backend works with rubicon-ml

add working backends to docs

documentation discussion
opened by shania-m 1
add test for the value error in `project`, `metric`, `experiment` and the other getters

Describe the solution you'd like Create a test function so that the value error is correctly thrown in each getter for making sure name or id is sent as a parameter.

Describe alternatives you've considered Alternatively, different functions to test this value error in each getter function could be created but it would most likely be more efficient to write one to be used for all.
enhancement development

opened by andreafehrman 1

Releases(0.4.3)

0.4.3(Dec 14, 2022)
changelog

s3fs dependency now optional and installed via the s3 extra (#326)

renamed ui extra to viz to reflect module name change (#326)

dependency updates via edgetest

Source code(tar.gz)
Source code(zip)
0.4.2(Dec 12, 2022)
changelog

json encode/decode numpy objects (#321)

dependency updates via edgetest (#320)

bugfixes

fix tag display in experiments table (#318)

Source code(tar.gz)
Source code(zip)
0.4.1(Dec 2, 2022)
changelog

added type checking for tags (#304)

update existing intake catalogs (#308)

log python objects as artifacts directly (#310)

dependency updates via edgetest (#305)

bugfixes

update setup.cfg (#316)

Source code(tar.gz)
Source code(zip)
0.4.0(Nov 16, 2022)
changelog

address existing deprecations (#286)

deprecate async submodule (#287)

add new examples & example cleanup (#292, #293, #295)

add failure modes (#301)

dependency updates via edgetest (#283, #289, #291, #296, #297, #300)

bugfixes

fix Binder examples (#284)

fix tag removal bug (#298)

Source code(tar.gz)
Source code(zip)
0.3.10(Sep 29, 2022)
changelog

added tagging to features (#278)

added tagging to parameters (#280)

dependency updates via edgetest (#279, #281)

bugfixes

fixes a bug where add_tags and remove_tags did not work properly on entities with names with underscores in them (#280)

Source code(tar.gz)
Source code(zip)
0.3.9(Sep 23, 2022)
changelog

added tagging to metrics (#273, #276)

dependency updates via edgetest (#267, #274)

bugfixes

artifacts can now be retrieved by tags (#275)

Source code(tar.gz)
Source code(zip)
0.3.8(Sep 16, 2022)
changelog

tags can now be applied to artifacts (#268)

dependency updates via edgetest (#257, #259, #260, #264, #265)

bugfixes

fixes duplicate source registration error from newest intake release (#262)

Source code(tar.gz)
Source code(zip)
0.3.7(Jul 6, 2022)
changelog

get edgetest working (#225, #230)

dependency updates via edgetest (#231, #238, #239, #241, #242, #245, #248, #250)

documentation and example updates (#218, #222, #234, #235, #246, #249)

Source code(tar.gz)
Source code(zip)
0.3.6(Apr 12, 2022)
changelog

configure new whitesource bot (#217, #220)

dependency management (#211, #219)

Source code(tar.gz)
Source code(zip)
0.3.5(Mar 29, 2022)
changelog

publish experiments from experiment table (#202)

new "hello rubicon" example (#204)

bugfixes

__version__ fix (#200)

Source code(tar.gz)
Source code(zip)
0.3.4(Mar 17, 2022)
changelog

Added edgetest action for up-to-date requirements (#195)

bug fixes

Update intake dependency to include msgpack when using pip (#199)

Source code(tar.gz)
Source code(zip)
0.3.3(Mar 17, 2022)
changelog

Added make_pipeline to Rubicon_ml.sklearn.pipeline to create pipelines (#185)

RubiconPipeline constructor takes memory and verbose arguments as well without ***kwargs (#185)

Added multiple scores and fits to pipelines in Rubicon_ml.sklearn.pipeline (#186)

Support score_samples on pipelines in Rubicon_ml.sklearn.pipeline (#192)

Add pipeline slicing on pipelines in Rubicon_ml.sklearn.pipeline (#194)

bugfixes

Support NoneType values in correlation plot (#189)

Source code(tar.gz)
Source code(zip)
0.3.2(Feb 16, 2022)
changelog

move publish into intake_rubicon module (#180)

remove intake project source (#181)

Source code(tar.gz)
Source code(zip)
0.3.1(Jan 24, 2022)
bugfixes

fixes broken 0.3.0 PyPi release artifact (#170, #171)

Source code(tar.gz)
Source code(zip)
0.3.0(Jan 21, 2022)
changelog

adds new viz module to visualize logged data (#149)

more info in our docs

deprecates ui module in favor of viz

removes old rubicon module that was deprecated in favor of rubicon_ml in #93 (#169)

Source code(tar.gz)
Source code(zip)
0.2.11(Nov 29, 2021)
changelog

add ability to instantiate dashboard with Rubicon object (#119)

support Dash 2.0.0 (#121)

preserve logging order on fetches (#129)

add ability to get all rubicon-ml entities by name and ID (#128, #131, #133, #135, #141, #152, #153)

add storage_options passthru to prefect task (#155)

bugfixes

fix local dataframe logging (#156)

Source code(tar.gz)
Source code(zip)
0.2.10(Aug 24, 2021)
changelog

add passthrough for dash.Dash keyword arguments to the Dashboard (#117)

bugfixes

get dashboard working behind Jupyter proxies (#116)

Source code(tar.gz)
Source code(zip)
0.2.9(Aug 19, 2021)
bugfixes

preserve the order in which artifacts were logged (#114)

Source code(tar.gz)
Source code(zip)
0.2.8(Jul 22, 2021)
bugfixes

dashboard fix so it can render param/metric's that have dict values (#112)

Source code(tar.gz)
Source code(zip)
0.2.7(Jul 8, 2021)
changelog

log estimator parameters passed to fit in the Scikit-learn pipeline (#111)

bugfixes

properly serialize date types when logging (#108)

properly serialize datetime types with null fields (#111)

Source code(tar.gz)
Source code(zip)
0.2.6(Jun 9, 2021)
changelog

add runnable binder example (#99)

bugfixes

check for root_dir before initializing in-memory filesystem (#104)

address whitesource vulnerability (#100)

Source code(tar.gz)
Source code(zip)
0.2.5(May 21, 2021)
bugfixes

reset groupings whenever a project is selected (#98)

Source code(tar.gz)
Source code(zip)
0.2.4(May 19, 2021)
bugfixes

update manifest paths to use rubicon_ml (#97)

Source code(tar.gz)
Source code(zip)
0.2.3(May 17, 2021)
changelog

rename package to rubicon_ml (#93)

Source code(tar.gz)
Source code(zip)
0.2.2(Apr 29, 2021)
changelog

adds test suite for example notebooks (#90)

bugfixes

ignore non-rubicon files within data directories (#84)

ignore non-rubicon files in async repo (#91)

fix pytest warnings (#92)

Source code(tar.gz)
Source code(zip)
0.2.1(Apr 19, 2021)
bugfixes

revisit examples (#79)

ensures all examples in the notebooks directory are working with the latest version of rubicon

the asynchronous S3 client can now read data back

the dashboard now works with an in-memory filesystem with a default root_dir

Source code(tar.gz)
Source code(zip)
0.2.0(Apr 12, 2021)
changelog

jupyter-dash proof-of-concept (#77)

automatic rubicon logging with Scikit-klearn pipelines (#50)

Source code(tar.gz)
Source code(zip)
0.1.8(Apr 7, 2021)
bugfixes

add pass through for filesystem storage_options (#69)

Source code(tar.gz)
Source code(zip)
0.1.7(Apr 5, 2021)
bugfixes

relaxing constraint for s3fs (#66)

Source code(tar.gz)
Source code(zip)
0.1.6(Apr 1, 2021)
changelog

support hiding cols within experiment table and comparison plot (#60)

bugfixes

fixes a bug related to dataframe plotting using hvplot (#62)

Source code(tar.gz)
Source code(zip)