Rubicon
Purpose
Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a repeatable and searchable way. Rubicon's git
integration associates these inputs and outputs directly with the model code that produced them to ensure full auditability and reproducibility for both developers and stakeholders alike. While experimenting, the Rubicon dashboard makes it easy to explore, filter, visualize, and share recorded work.
Components
Rubicon is composed of three parts:
- A Python library for storing and retrieving model inputs, outputs, and analyses to filesystems that’s powered by
fsspec
- A dashboard for exploring, comparing, and visualizing logged data built with
dash
- And a process for sharing a selected subset of logged data with collaborators or reviewers that leverages
intake
Workflow
Use the Rubicon library to capture model inputs and outputs over time. It can be easily integrated into existing Python models or pipelines and supports both concurrent logging (so multiple experiments can be logged in parallel) and asynchronous communication with S3 (so network reads and writes won’t block).
Meanwhile, periodically review the logged data within the Rubicon dashboard to steer the model tweaking process in the right direction. The dashboard lets you quickly spot trends by exploring and filtering your logged results and visualizes how the model inputs impacted the model outputs.
When the model is ready for review, Rubicon makes it easy to share specific subsets of the data with model reviewers and stakeholders, giving them the context necessary for a complete model review and approval.
Use
Here's a simple example:
from rubicon import Rubicon
rubicon = Rubicon(
persistence="filesystem", root_dir="/rubicon-root", auto_git_enabled=True
)
project = rubicon.create_project(
"Hello World", description="Using rubicon to track model results over time."
)
experiment = project.log_experiment(
training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "my-data-set")],
model_name="My Model Name",
tags=["my_model_name"],
)
experiment.log_parameter("n_estimators", n_estimators)
experiment.log_parameter("n_features", n_features)
experiment.log_parameter("random_state", random_state)
accuracy = rfc.score(X_test, y_test)
experiment.log_metric("accuracy", accuracy)
Then explore the project by running the dashboard:
rubicon ui --root-dir /rubicon-root
Documentation
For a full overview, visit the docs. If you have suggestions or find a bug, please open an issue.
Install
rubicon
is available on Conda Forge via conda
and PyPi via pip
.
conda config --add channels conda-forge
conda install rubicon-ml
or
pip install rubicon-ml
Develop
rubicon
uses conda to manage environments. First, install conda. Then use conda to setup a development environment:
conda env create -f ci/environment.yml
conda activate rubicon-dev
Testing
The tests are separated into unit and integration tests. They can be run directly in the activated dev environment via pytest tests/unit
or pytest tests/integration
. Or by simply running pytest
to execute all of them.
Note: some integration tests are intentionally marked
to control when they are run (i.e. not during cicd). These tests include:
-
Integration tests that connect to physical filesystems (local, S3). You'll want to configure the
root_dir
appropriately for these tests (tests/integration/test_async_rubicon.py, tests/integration/test_rubicon.py). And they can be run with:pytest -m "physical_filesystem_test"
-
Integration tests for the dashboard. To run these integration tests locally, you'll need to install one of the WebDrivers. To do so, follow the
Install
instructions in the Dash Testing Docs or install via brew withbrew cask install chromedriver
. You may have to update your permissions in Security & Privacy to install with brew.pytest -m "dashboard_test"
Note: The
--headless
flag can be added to run the dashboard tests in headless mode.
Code Formatting
Install and configure pre-commit to automatically run black
, flake8
, and isort
during commits:
- install pre-commit
- run
pre-commit install
to set up the git hook scripts
Now pre-commit
will run automatically on git commit and will ensure consistent code format throughout the project. You can format without committing via pre-commit run
or skip these checks with git commit --no-verify
.
Contributors
Mike McCarty |
Sri Ranganathan |
Joe Wolfe |
Ryan Soley |
Diane Lee |