The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it inside a loop of Design, Model Development and Operations.

Maykon Schots

Last update: Nov 27, 2022

Related tags

Machine Learning mlops_project

Overview

MLOps

The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it inside a loop of Design, Model Development and Operations.

In this paradigm, teams can easily collaborate in models, with clear tracking of the data throughout the process of cleaning, processing, and feature creation. Automating every repetitive process avoids human error and reduces the delivery time, ensuring the team keeps focusing on the Business Problem.

Some benefits:

Versioning data and code, making models to be auditable and reproducible.
Automated tests and building ensuring quality functioning of artifacts and availability for the delivery pipelines.
Makes it easier and faster the deployment of new models by using an automated cycle.

The MLOps Project

The MLOps project is a path to learning how to implement a study case aiming to be testable and reproducible within the CI/CD methodology, using the best programming practices.

The scope of this project is delimited as you can see in the image below.

We will select the best tool to implement every step, integrate them, and build a Machine Learning Orchestrator. That said, in the end, new ML experiments will be easily made, and delivered as simples as typing a terminal command or clicking on a button!

Prerequisites

For mlops_project to work correctly, first, you should install the prerequisites

Contributing

Have an idea of how to improve this project but don't know how to start, try to contribute

You can understand the project organization here

How to use?

If you are interested just in using this package, follow the steps below.

Clone the repository

Open a terminal (if you are using Windows, make sure of using the git bash) navigate to the desired destination folder and clone the repository,
```
git clone https://github.com/Schots/mlops_project.git
```
The Makefile on the root folder defines a set of functions needed to automate repetitive processes in this project. Type "make" in the terminal and see the available functions.

Create an environment & Install requirements

Create a Python virtual environment for the MLOps project on your local machine. Use any tool you desire. Activate the environment and install the requirements using make:
```
make requirements
```
Download data

To download the raw dataset, use the get_data
```
make get_data
```
type the dataset name when prompted. The zip file with data will be downloaded and unzipped under the data/raw folder

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Comments

Bump black from 21.12b0 to 22.1.0
Bumps black from 21.12b0 to 22.1.0.

Release notes

Sourced from black's releases.

22.1.0

At long last, Black is no longer a beta product! This is the first non-beta release and the first release covered by our new stability policy.

Highlights

Remove Python 2 support (#2740)

Introduce the --preview flag (#2752)

Style

Deprecate --experimental-string-processing and move the functionality under --preview (#2789)

For stubs, one blank line between class attributes and methods is now kept if there's at least one pre-existing blank line (#2736)

Black now normalizes string prefix order (#2297)

Remove spaces around power operators if both operands are simple (#2726)

Work around bug that causes unstable formatting in some cases in the presence of the magic trailing comma (#2807)

Use parentheses for attribute access on decimal float and int literals (#2799)

Don't add whitespace for attribute access on hexadecimal, binary, octal, and complex literals (#2799)

Treat blank lines in stubs the same inside top-level if statements (#2820)

Fix unstable formatting with semicolons and arithmetic expressions (#2817)

Fix unstable formatting around magic trailing comma (#2572)

Parser

Fix mapping cases that contain as-expressions, like case {"key": 1 | 2 as password} (#2686)

Fix cases that contain multiple top-level as-expressions, like case 1 as a, 2 as b (#2716)

Fix call patterns that contain as-expressions with keyword arguments, like case Foo(bar=baz as quux) (#2749)

Tuple unpacking on return and yield constructs now implies 3.8+ (#2700)

Unparenthesized tuples on annotated assignments (e.g values: Tuple[int, ...] = 1, 2, 3) now implies 3.8+ (#2708)

Fix handling of standalone match() or case() when there is a trailing newline or a comment inside of the parentheses. (#2760)

from __future__ import annotations statement now implies Python 3.7+ (#2690)

Performance

Speed-up the new backtracking parser about 4X in general (enabled when --target-version is set to 3.10 and higher). (#2728)

Black is now compiled with mypyc for an overall 2x speed-up. 64-bit Windows, MacOS, and Linux (not including musl) are supported. (#1009, #2431)

Configuration

Do not accept bare carriage return line endings in pyproject.toml (#2408)

Add configuration option (python-cell-magics) to format cells with custom magics in Jupyter Notebooks (#2744)

Allow setting custom cache directory on all platforms with environment variable BLACK_CACHE_DIR (#2739).

Enable Python 3.10+ by default, without any extra need to specify --target-version=py310. (#2758)

Make passing SRC or --code mandatory and mutually exclusive (#2804)

Output

Improve error message for invalid regular expression (#2678)

Improve error message when parsing fails during AST safety check by embedding the underlying SyntaxError (#2693)

No longer color diff headers white as it's unreadable in light themed terminals (#2691)

Text coloring added in the final statistics (#2712)

Verbose mode also now describes how a project root was discovered and which paths will be formatted. (#2526)

Packaging

All upper version bounds on dependencies have been removed (#2718)

typing-extensions is no longer a required dependency in Python 3.10+ (#2772)

Set click lower bound to 8.0.0 as Black crashes on 7.1.2 (#2791)

... (truncated)

Changelog

Sourced from black's changelog.

22.1.0

At long last, Black is no longer a beta product! This is the first non-beta release and the first release covered by our new stability policy.

Highlights

Remove Python 2 support (#2740)

Introduce the --preview flag (#2752)

Style

Deprecate --experimental-string-processing and move the functionality under --preview (#2789)

For stubs, one blank line between class attributes and methods is now kept if there's at least one pre-existing blank line (#2736)

Black now normalizes string prefix order (#2297)

Remove spaces around power operators if both operands are simple (#2726)

Work around bug that causes unstable formatting in some cases in the presence of the magic trailing comma (#2807)

Use parentheses for attribute access on decimal float and int literals (#2799)

Don't add whitespace for attribute access on hexadecimal, binary, octal, and complex literals (#2799)

Treat blank lines in stubs the same inside top-level if statements (#2820)

Fix unstable formatting with semicolons and arithmetic expressions (#2817)

Fix unstable formatting around magic trailing comma (#2572)

Parser

Fix mapping cases that contain as-expressions, like case {"key": 1 | 2 as password} (#2686)

Fix cases that contain multiple top-level as-expressions, like case 1 as a, 2 as b (#2716)

Fix call patterns that contain as-expressions with keyword arguments, like case Foo(bar=baz as quux) (#2749)

Tuple unpacking on return and yield constructs now implies 3.8+ (#2700)

Unparenthesized tuples on annotated assignments (e.g values: Tuple[int, ...] = 1, 2, 3) now implies 3.8+ (#2708)

Fix handling of standalone match() or case() when there is a trailing newline or a comment inside of the parentheses. (#2760)

from __future__ import annotations statement now implies Python 3.7+ (#2690)

Performance

Speed-up the new backtracking parser about 4X in general (enabled when --target-version is set to 3.10 and higher). (#2728)

Black is now compiled with mypyc for an overall 2x speed-up. 64-bit Windows, MacOS, and Linux (not including musl) are supported. (#1009, #2431)

... (truncated)

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies safe to test
opened by dependabot[bot] 2
build(deps-dev): bump notebook from 6.4.10 to 6.5.1
Bumps notebook from 6.4.10 to 6.5.1.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps): bump matplotlib from 3.5.1 to 3.6.0
Bumps matplotlib from 3.5.1 to 3.6.0.

Release notes

Sourced from matplotlib's releases.

REL: v3.6.0

Highlights of this release include:

Figure and Axes creation / management

subplots, subplot_mosaic accept height_ratios and width_ratios arguments

Constrained layout is no longer considered experimental

New layout_engine module

Compressed layout added for fixed-aspect ratio Axes

Layout engines may now be removed

Axes.inset_axes flexibility

WebP is now a supported output format

Garbage collection is no longer run on figure close

Plotting methods

Striped lines (experimental)

Custom cap widths in box and whisker plots in bxp and boxplot

Easier labelling of bars in bar plot

New style format string for colorbar ticks

Linestyles for negative contours may be set individually

Improved quad contour calculations via ContourPy

errorbar supports markerfacecoloralt

streamplot can disable streamline breaks

New axis scale asinh (experimental)

stairs(..., fill=True) hides patch edge by setting linewidth

Fix the dash offset of the Patch class

Rectangle patch rotation point

Colors and colormaps

Color sequence registry

Colormap method for creating a different lookup table size

Setting norms with strings

Titles, ticks, and labels

plt.xticks and plt.yticks support minor keyword argument

Legends

Legend can control alignment of title and handles

ncol keyword argument to legend renamed to ncols

Markers

marker can now be set to the string "none"

Customization of MarkerStyle join and cap style

Fonts and Text

Font fallback

List of available font names

math_to_image now has a color keyword argument

Active URL area rotates with link text

rcParams improvements

Allow setting figure label size and weight globally and separately from title

Mathtext parsing can be disabled globally

Double-quoted strings in matplotlibrc

3D Axes improvements

Standardized views for primary plane viewing angles

Custom focal length for 3D camera

3D plots gained a 3rd "roll" viewing angle

... (truncated)

Commits

a302267 REL: v3.6.0

5b20854 DOC: Fix a typo in the what's new

e676dfb DOC: Update version switcher

9e6e8d3 Update security policy

d011a7d Pin to 3.6 version of mpl-sphinx-theme

56a5db2 DOC: Update GitHub stats for 3.6.0

8490024 Merge branch 'v3.5.x' into HEAD

e3a8c45 Merge branch 'v3.5.3-doc' into v3.5.x

df42410 Merge pull request #23814 from QuLogic/relnotes36

e467be4 Remove redundant behaviour changes in 3.6

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps): bump xgboost from 1.5.2 to 1.6.2
Bumps xgboost from 1.5.2 to 1.6.2.

Release notes

Sourced from xgboost's releases.

1.6.1 Patch Release

v1.6.1 (2022 May 9)

This is a patch release for bug fixes and Spark barrier mode support. The R package is unchanged.

Experimental support for categorical data

Fix segfault when the number of samples is smaller than the number of categories. (dmlc/xgboost#7853)

Enable partition-based split for all model types. (dmlc/xgboost#7857)

JVM packages

We replaced the old parallelism tracker with spark barrier mode to improve the robustness of the JVM package and fix the GPU training pipeline.

Fix GPU training pipeline quantile synchronization. (#7823, #7834)

Use barrier model in spark package. (dmlc/xgboost#7836, dmlc/xgboost#7840, dmlc/xgboost#7845, dmlc/xgboost#7846)

Fix shared object loading on some platforms. (dmlc/xgboost#7844)

Artifacts

You can verify the downloaded packages by running this on your Unix shell:

echo "<hash> <artifact>" | shasum -a 256 --check

2633f15e7be402bad0660d270e0b9a84ad6fcfd1c690a5d454efd6d55b4e395b ./xgboost.tar.gz

Release 1.6.0 stable

v1.6.0 (2022 Apr 16)

After a long period of development, XGBoost v1.6.0 is packed with many new features and improvements. We summarize them in the following sections starting with an introduction to some major new features, then moving on to language binding specific changes including new features and notable bug fixes for that binding.

Development of categorical data support

This version of XGBoost features new improvements and full coverage of experimental categorical data support in Python and C package with tree model. Both hist, approx and gpu_hist now support training with categorical data. Also, partition-based categorical split is introduced in this release. This split type is first available in LightGBM in the context of gradient boosting. The previous XGBoost release supported one-hot split where the splitting criteria is of form x \in {c}, i.e. the categorical feature x is tested against a single candidate. The new release allows for more expressive conditions: x \in S where the categorical feature x is tested against multiple candidates. Moreover, it is now possible to use any tree algorithms (hist, approx, gpu_hist) when creating categorical splits. For more information, please see our tutorial on categorical data, along with examples linked on that page. (#7380, #7708, #7695, #7330, #7307, #7322, #7705, #7652, #7592, #7666, #7576, #7569, #7529, #7575, #7393, #7465, #7385, #7371, #7745, #7810)

In the future, we will continue to improve categorical data support with new features and optimizations. Also, we are looking forward to bringing the feature beyond Python binding, contributions and feedback are welcomed! Lastly, as a result of experimental status, the behavior might be subject to change, especially the default value of related

... (truncated)

Changelog

Sourced from xgboost's changelog.

XGBoost Change Log

This file records the changes in xgboost library in reverse chronological order.

v1.6.1 (2022 May 9)

This is a patch release for bug fixes and Spark barrier mode support. The R package is unchanged.

Experimental support for categorical data

Fix segfault when the number of samples is smaller than the number of categories. (dmlc/xgboost#7853)

Enable partition-based split for all model types. (dmlc/xgboost#7857)

JVM packages

We replaced the old parallelism tracker with spark barrier mode to improve the robustness of the JVM package and fix the GPU training pipeline.

Fix GPU training pipeline quantile synchronization. (#7823, #7834)

Use barrier model in spark package. (dmlc/xgboost#7836, dmlc/xgboost#7840, dmlc/xgboost#7845, dmlc/xgboost#7846)

Fix shared object loading on some platforms. (dmlc/xgboost#7844)

v1.6.0 (2022 Apr 16)

After a long period of development, XGBoost v1.6.0 is packed with many new features and improvements. We summarize them in the following sections starting with an introduction to some major new features, then moving on to language binding specific changes including new features and notable bug fixes for that binding.

Development of categorical data support

This version of XGBoost features new improvements and full coverage of experimental categorical data support in Python and C package with tree model. Both hist, approx and gpu_hist now support training with categorical data. Also, partition-based categorical split is introduced in this release. This split type is first available in LightGBM in the context of gradient boosting. The previous XGBoost release supported one-hot split where the splitting criteria is of form x \in {c}, i.e. the categorical feature x is tested against a single candidate. The new release allows for more expressive conditions: x \in S where the categorical feature x is tested against multiple candidates. Moreover, it is now possible to use any tree algorithms (hist, approx, gpu_hist) when creating categorical splits. For more information, please see our tutorial on categorical data, along with examples linked on that page. (#7380, #7708, #7695, #7330, #7307, #7322, #7705, #7652, #7592, #7666, #7576, #7569, #7529, #7575, #7393, #7465, #7385, #7371, #7745, #7810)

In the future, we will continue to improve categorical data support with new features and optimizations. Also, we are looking forward to bringing the feature beyond Python binding, contributions and feedback are welcomed! Lastly, as a result of experimental status, the behavior might be subject to change, especially the default value of related hyper-parameters.

Experimental support for multi-output model

XGBoost 1.6 features initial support for the multi-output model, which includes multi-output regression and multi-label classification. Along with this, the XGBoost classifier has proper support for base margin without to need for the user to flatten the input. In this initial support, XGBoost builds one model for each target similar to the sklearn meta estimator, for more details, please see our quick introduction.

... (truncated)

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps): bump matplotlib from 3.5.1 to 3.5.3
Bumps matplotlib from 3.5.1 to 3.5.3.

Release notes

Sourced from matplotlib's releases.

REL: v3.5.3

This is the third bugfix release of the 3.5.x series.

This release contains several bug-fixes and adjustments:

Fix alignment of over/under symbols

Fix bugs in colorbars:

alpha of extensions

drawedges=True with extensions

handling of panchor=False

Fix builds on Cygwin and IBM i

Fix contour labels in SubFigures

Fix cursor output:

for imshow with all negative values

when using BoundaryNorm

Fix interactivity in IPython/Jupyter

Fix NaN handling in errorbar

Fix NumPy conversion from AstroPy unit arrays

Fix positional markerfmt passed to stem

Fix unpickling:

crash loading in a separate process

incorrect DPI when HiDPI screens

REL: v3.5.2

This is the second bugfix release of the 3.5.x series.

This release contains several bug-fixes and adjustments:

Add support for Windows on ARM (source-only; no wheels provided yet)

Add year to concise date formatter when displaying less than 12 months

Disable QuadMesh mouse cursor to avoid severe performance regression in pcolormesh

Delay backend selection to allow choosing one in more cases

Fix automatic layout bugs in EPS output

Fix autoscaling of scatter plots

Fix clearing of subfigures

Fix colorbar exponents, inversion of extensions, and use on inset axes

Fix compatibility with various NumPy-like classes (e.g., Pandas, xarray, etc.)

Fix constrained layout bugs with mixed subgrids

Fix errorbar with dashes

Fix errors in conversion to GTK4 and Qt6

Fix figure options accidentally re-ordering data

Fix keyboard focus of TkAgg backend

Fix manual selection of contour labels

Fix path effects on text with whitespace

Fix quiver in subfigures

Fix RangeSlider.set_val displaying incorrectly

Fix regressions in collection data limits

Fix stairs with no edgecolor

Fix some leaks in Tk backends

Fix tight layout DPI confusion

... (truncated)

Commits

d04c8de REL: v3.5.3

318cacc DOC: Update release notes for 3.5.3

f4d4b47 Merge branch 'v3.5.2-doc' into v3.5.x

071413e DOC: Update GitHub stats for 3.5.3

0428306 Merge pull request #23591 from meeseeksmachine/auto-backport-of-pr-23549-on-v...

2f3abfb Merge pull request #23593 from QuLogic/fix-flake8

530457e STY: Fix whitespace error from new flake8

ab78318 Backport PR #23549: Don't clip colorbar dividers

952227e Merge pull request #23528 from meeseeksmachine/auto-backport-of-pr-23523-on-v...

632e4d7 Backport PR #23523: TST: Update Quantity test class

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps-dev): bump dvc from 2.9.5 to 2.12.1
Bumps dvc from 2.9.5 to 2.12.1.

Release notes

Sourced from dvc's releases.

2.12.1 🦉

Refer to https://dvc.org/doc/install for installation instructions.

Changes

🚀 New Features and Enhancements

api: params_show: Raise exception if no params found. (#7938) @daavoo

parsing: Support dict unpacking in cmd. (#7907) @daavoo

Help text for dvc update (#7958) @dberenbaum

Initial support for flexible plots (#7477) @pared

🐛 Bug Fixes

dvc: normalize targets before entering brancher (#7966) @efiop

🔨 Maintenance

deps: bump dvc-data to 0.0.19 (#7979) @efiop

build(deps): Bump dvc-data from 0.0.16 to 0.0.18 (#7968) @dependabot

deps: remove setuptools_scm_git_archive (#7952) @skshetry

Thanks again to @alexmojaki, @daavoo, @dberenbaum, @dependabot, @dependabot[bot], @efiop, @pared, @pre-commit-ci, @pre-commit-ci[bot] and @skshetry for the contributions! 🎉

2.12.0 🦉

Refer to https://dvc.org/doc/install for installation instructions.

Changes

deps: bump dvc-data to 0.0.16 (#7948) @skshetry

setup: bump dvc-data (#7935) @efiop

dvc: use dvc-data 0.0.8 (#7912) @efiop

readme: mention vs code extension (#7859) @dberenbaum

🚀 New Features and Enhancements

s3fs: Adds support for SSE client keys (#7671) @ap-kulkarni

api: Add params_show. (#7613) @daavoo

dvc plots: allow for setting output directory via config (#7911) @ykasimov

api: open: Raise ValueError if rev is used in wrong mode. (#7823) @daavoo

🐛 Bug Fixes

render: image_converter: Support slash in revision. (#7937) @daavoo

exp apply: preserve untracked files (#7910) @pmrowla

run-cache: http: restrict uploads/downloads (#7874) @skshetry

run-cache: fix push from Windows to remote filesystems (#7873) @skshetry

🔨 Maintenance

... (truncated)

Commits

bd93d85 deps: bump dvc-data

eed6a84 data_cloud: remove logger check

954de0c api: params_show: Raise exception if no params found.

9099413 parsing: Support dict unpacking in cmd.

c59f935 Switch uses of %r to %s with quotes to avoid escaped backslashes in logs, esp...

03b789e logger: use lazy formatting

cacc2f1 [pre-commit.ci] pre-commit autoupdate

0d71194 dvc: normalize targets before entering brancher

683d22c build(deps): Bump dvc-data from 0.0.16 to 0.0.18

c687d1b cli: help text for dvc update

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps-dev): bump dvc from 2.9.5 to 2.12.0
Bumps dvc from 2.9.5 to 2.12.0.

Release notes

Sourced from dvc's releases.

2.12.0 🦉

Refer to https://dvc.org/doc/install for installation instructions.

Changes

deps: bump dvc-data to 0.0.16 (#7948) @skshetry

setup: bump dvc-data (#7935) @efiop

dvc: use dvc-data 0.0.8 (#7912) @efiop

readme: mention vs code extension (#7859) @dberenbaum

🚀 New Features and Enhancements

s3fs: Adds support for SSE client keys (#7671) @ap-kulkarni

api: Add params_show. (#7613) @daavoo

dvc plots: allow for setting output directory via config (#7911) @ykasimov

api: open: Raise ValueError if rev is used in wrong mode. (#7823) @daavoo

🐛 Bug Fixes

render: image_converter: Support slash in revision. (#7937) @daavoo

exp apply: preserve untracked files (#7910) @pmrowla

run-cache: http: restrict uploads/downloads (#7874) @skshetry

run-cache: fix push from Windows to remote filesystems (#7873) @skshetry

🔨 Maintenance

build(deps-dev): Bump pytest-mock from 3.7.0 to 3.8.1 (#7936) @dependabot

build(deps-dev): Bump pylint from 2.14.3 to 2.14.4 (#7950) @dependabot

build(deps): Bump styfle/cancel-workflow-action from 0.9.1 to 0.10.0 (#7941) @dependabot

config: remove core.jobs dead code (#7908) @dberenbaum

setup: bump dvc-data to 0.0.12 (#7928) @efiop

readme: more VS Code Extension info. (#7916) @jorgeorpinel

build(deps-dev): Bump pylint from 2.14.2 to 2.14.3 (#7921) @dependabot

setup: bump dvc-data to 0.0.10 (#7919) @efiop

readme: consolidate intro (#7906) @dberenbaum

build(deps-dev): Bump pylint from 2.14.1 to 2.14.2 (#7902) @dependabot

build(deps): Unpin networkx; quote node names for pydot output (#7899) @dependabot

deps: bump dvc-data to 0.0.6; fix imports (#7895) @skshetry

build(deps): Bump dvc-render from 0.0.5 to 0.0.6 (#7893) @dependabot

[pre-commit.ci] pre-commit autoupdate (#7892) @pre-commit-ci

build(deps-dev): Bump pylint from 2.13.9 to 2.14.1 (#7853) @dependabot

build(deps-dev): Bump mypy from 0.942 to 0.961 (#7852) @dependabot

build(deps-dev): Bump filelock from 3.7.0 to 3.7.1 (#7834) @dependabot

build(deps): Bump actions/setup-python from 3 to 4 (#7870) @dependabot

dvc: upgrade to dvc-data 0.0.5 (#7887) @efiop

readme: remove donate requests (#7862) @dberenbaum

Thanks again to @ap-kulkarni, @daavoo, @dberenbaum, @dependabot, @dependabot[bot], @dtrifiro, @efiop, @jorgeorpinel, @pmrowla, @pre-commit-ci, @pre-commit-ci[bot], @skshetry, @ykasimov and Yury for the contributions! 🎉

2.11.0 🦉

... (truncated)

Commits

c347d9f build(deps-dev): Bump pytest-mock from 3.7.0 to 3.8.1

efc4787 build(deps-dev): Bump pylint from 2.14.3 to 2.14.4

be3ec9d deps: bump dvc-data to 0.0.16

685a2d5 build(deps): Bump styfle/cancel-workflow-action from 0.9.1 to 0.10.0

c336507 render: image_converter: Support slash in revision.

d448eb5 setup: bump dvc-data

3dcd010 deps: bump dvc-data to 0.0.13

2773e99 config: remove core.jobs dead code

9a1a3a0 Merge pull request #7671 from ap-kulkarni/amey/6141

286fab4 setup: bump dvc-data to 0.0.12

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps-dev): bump dvc from 2.9.5 to 2.11.0
Bumps dvc from 2.9.5 to 2.11.0.

Release notes

Sourced from dvc's releases.

2.11.0 🦉

Refer to https://dvc.org/doc/install for installation instructions.

Changes

scm: fix clone (#7674) @dtrifiro

exp init: create output dirs (#7752) @dberenbaum

checkout: --relink show helpful message on completion (#7709) @ykasimov

setup: set upper bound on networkx (#7849) @efiop

docs: update the package installation guide (#7839) @huang06

setup: bump scmrepo to 0.0.24 (#7837) @efiop

bump scmrepo to 0.0.23 (#7831) @dtrifiro

Revert "dvc ls: not raise PathMissingError on empty dir." (#7728) @karajan1001

dvcfile: preserve 'remote' on add and commit (#7618) @SamKnightGit

plots: image converter return absolute paths (#7664) @pared

🚀 New Features and Enhancements

exp: output troubleshooting link on shallow merge failure (#7791) @pmrowla

exp: show: Include additional info in --json. (#7690) @daavoo

exp list: fix git_remote metavar (#7808) @dberenbaum

fs.callbacks: simplify, ensure None does not break them, lazy richcallbacks (#7722) @skshetry

exp init: fixes #7534; simplifies/updates exp init --live (#7703) @dberenbaum

run/repro/stage add: regroup options/flags (#7524) @jorgeorpinel

🏇 Optimizations

plots: grouping: stop using dpath.util.search (#7811) @pared

fs: path: use flavour.basename (#7764) @dtrifiro

dvc.data: save and try loading raw dir objects (#7597) @dtrifiro

repofs: only use dvcfs when --dvc-only is specified (#7659) @skshetry

exp: speed up repro execution with untracked directories in workspace (#7786) @dtrifiro

🐛 Bug Fixes

sshfs: bump min ver to 2022.6.0 (#7856) @pmrowla

brancher: use scm.root_dir to determine relative cwd (#7845) @efiop

plots: Pass templates_dir to match_renderers. (#7820) @daavoo

dag: mermaid: Use quotation marks to escape node name. (#7803) @daavoo

dvc.stage.cache: fix typo, was using src filesystem to transfer (#7739) @skshetry

Catch correct exception class in params.show() (#7750) @Suor

dag: fix dot file rendering order. (#7725) @tirkarthi

Fail on sync when there is no match for glob. (#7687) @tirkarthi

dvc ls: not raise PathMissingError on empty dir. (#7729) @karajan1001

dvc ls: not raise PathMissingError on empty dir. (#6120) @karajan1001

FileSystem: handle encoding via kwargs. (#7694) @daavoo

🔨 Maintenance

build(deps): Bump pre-commit/action from 2.0.3 to 3.0.0 (#7846) @dependabot

... (truncated)

Commits

c9a3cb8 sshfs: bump min ver to 2022.6.0

41ecd2e scm: fix clone

33b3afa exp init: create output dirs

e849162 exp: speed up repro execution with untracked directories in workspace

c14f963 checkout: --relink show helpful message on completion

88d3582 build(deps): Bump pre-commit/action from 2.0.3 to 3.0.0

8fa0e40 setup: set upper bound on networkx

a3d6b12 brancher: use scm.root_dir to determine relative cwd

20c7b0e plots: grouping: stop using dpath.util.search

6794dd2 docs: update package installation

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps-dev): bump notebook from 6.4.10 to 6.4.12
Bumps notebook from 6.4.10 to 6.4.12.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps): bump xgboost from 1.5.2 to 1.6.1
Bumps xgboost from 1.5.2 to 1.6.1.

Release notes

Sourced from xgboost's releases.

1.6.1 Patch Release

v1.6.1 (2022 May 9)

This is a patch release for bug fixes and Spark barrier mode support. The R package is unchanged.

Experimental support for categorical data

Fix segfault when the number of samples is smaller than the number of categories. (dmlc/xgboost#7853)

Enable partition-based split for all model types. (dmlc/xgboost#7857)

JVM packages

We replaced the old parallelism tracker with spark barrier mode to improve the robustness of the JVM package and fix the GPU training pipeline.

Fix GPU training pipeline quantile synchronization. (#7823, #7834)

Use barrier model in spark package. (dmlc/xgboost#7836, dmlc/xgboost#7840, dmlc/xgboost#7845, dmlc/xgboost#7846)

Fix shared object loading on some platforms. (dmlc/xgboost#7844)

Artifacts

You can verify the downloaded packages by running this on your Unix shell:

echo "<hash> <artifact>" | shasum -a 256 --check

2633f15e7be402bad0660d270e0b9a84ad6fcfd1c690a5d454efd6d55b4e395b ./xgboost.tar.gz

Release 1.6.0 stable

v1.6.0 (2022 Apr 16)

After a long period of development, XGBoost v1.6.0 is packed with many new features and improvements. We summarize them in the following sections starting with an introduction to some major new features, then moving on to language binding specific changes including new features and notable bug fixes for that binding.

Development of categorical data support

This version of XGBoost features new improvements and full coverage of experimental categorical data support in Python and C package with tree model. Both hist, approx and gpu_hist now support training with categorical data. Also, partition-based categorical split is introduced in this release. This split type is first available in LightGBM in the context of gradient boosting. The previous XGBoost release supported one-hot split where the splitting criteria is of form x \in {c}, i.e. the categorical feature x is tested against a single candidate. The new release allows for more expressive conditions: x \in S where the categorical feature x is tested against multiple candidates. Moreover, it is now possible to use any tree algorithms (hist, approx, gpu_hist) when creating categorical splits. For more information, please see our tutorial on categorical data, along with examples linked on that page. (#7380, #7708, #7695, #7330, #7307, #7322, #7705, #7652, #7592, #7666, #7576, #7569, #7529, #7575, #7393, #7465, #7385, #7371, #7745, #7810)

In the future, we will continue to improve categorical data support with new features and optimizations. Also, we are looking forward to bringing the feature beyond Python binding, contributions and feedback are welcomed! Lastly, as a result of experimental status, the behavior might be subject to change, especially the default value of related

... (truncated)

Changelog

Sourced from xgboost's changelog.

XGBoost Change Log

This file records the changes in xgboost library in reverse chronological order.

v1.6.0 (2022 Apr 16)

After a long period of development, XGBoost v1.6.0 is packed with many new features and improvements. We summarize them in the following sections starting with an introduction to some major new features, then moving on to language binding specific changes including new features and notable bug fixes for that binding.

Development of categorical data support

This version of XGBoost features new improvements and full coverage of experimental categorical data support in Python and C package with tree model. Both hist, approx and gpu_hist now support training with categorical data. Also, partition-based categorical split is introduced in this release. This split type is first available in LightGBM in the context of gradient boosting. The previous XGBoost release supported one-hot split where the splitting criteria is of form x \in {c}, i.e. the categorical feature x is tested against a single candidate. The new release allows for more expressive conditions: x \in S where the categorical feature x is tested against multiple candidates. Moreover, it is now possible to use any tree algorithms (hist, approx, gpu_hist) when creating categorical splits. For more information, please see our tutorial on categorical data, along with examples linked on that page. (#7380, #7708, #7695, #7330, #7307, #7322, #7705, #7652, #7592, #7666, #7576, #7569, #7529, #7575, #7393, #7465, #7385, #7371, #7745, #7810)

In the future, we will continue to improve categorical data support with new features and optimizations. Also, we are looking forward to bringing the feature beyond Python binding, contributions and feedback are welcomed! Lastly, as a result of experimental status, the behavior might be subject to change, especially the default value of related hyper-parameters.

Experimental support for multi-output model

XGBoost 1.6 features initial support for the multi-output model, which includes multi-output regression and multi-label classification. Along with this, the XGBoost classifier has proper support for base margin without to need for the user to flatten the input. In this initial support, XGBoost builds one model for each target similar to the sklearn meta estimator, for more details, please see our quick introduction.

(#7365, #7736, #7607, #7574, #7521, #7514, #7456, #7453, #7455, #7434, #7429, #7405, #7381)

External memory support

External memory support for both approx and hist tree method is considered feature complete in XGBoost 1.6. Building upon the iterator-based interface introduced in the previous version, now both hist and approx iterates over each batch of data during training and prediction. In previous versions, hist concatenates all the batches into an internal representation, which is removed in this version. As a result, users can expect higher scalability in terms of data size but might experience lower performance due to disk IO. (#7531, #7320, #7638, #7372)

Rewritten approx

... (truncated)

Commits

5d92a7d Bump release version to 1.6.1. (#7872)

c250881 [backport] Use maximum category in sketch. (#7853) (#7866)

b1b6246 [backport] Always use partition based categorical splits. (#7857) (#7865)

f4eb6b9 [backport] jvm-packages 1.6.1 (#7849)

f75c007 Make 1.6.0 release. (#7813)

816e788 [backport] #7808 #7810 (#7811)

3ee3b18 [doc] fix a typo in jvm/index.rst (#7806) [skip ci] (#7807)

ece4dc4 [backport] Backport jvm changes to 1.6. (#7803)

67298cc [backport] Backport JVM fixes and document update to 1.6 (#7792)

78d2312 [CI] Enable faulthandler to show details when 0xC0000005 error occurs (#7771)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps-dev): bump notebook from 6.4.10 to 6.4.11
Bumps notebook from 6.4.10 to 6.4.11.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
build(deps-dev): bump notebook from 6.4.10 to 6.5.2
Bumps notebook from 6.4.10 to 6.5.2.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
build(deps): bump xgboost from 1.5.2 to 1.7.0
Bumps xgboost from 1.5.2 to 1.7.0.

Release notes

Sourced from xgboost's releases.

Release 1.7.0 stable

v1.7.0 (2022 Oct 20)

We are excited to announce the feature packed XGBoost 1.7 release. The release note will walk through some of the major new features first, then make a summary for other improvements and language-binding-specific changes.

PySpark

XGBoost 1.7 features initial support for PySpark integration. The new interface is adapted from the existing PySpark XGBoost interface developed by databricks with additional features like QuantileDMatrix and the rapidsai plugin (GPU pipeline) support. The new Spark XGBoost Python estimators not only benefit from PySpark ml facilities for powerful distributed computing but also enjoy the rest of the Python ecosystem. Users can define a custom objective, callbacks, and metrics in Python and use them with this interface on distributed clusters. The support is labeled as experimental with more features to come in future releases. For a brief introduction please visit the tutorial on XGBoost's document page. (#8355, #8344, #8335, #8284, #8271, #8283, #8250, #8231, #8219, #8245, #8217, #8200, #8173, #8172, #8145, #8117, #8131, #8088, #8082, #8085, #8066, #8068, #8067, #8020, #8385)

Due to its initial support status, the new interface has some limitations; categorical features and multi-output models are not yet supported.

Development of categorical data support

More progress on the experimental support for categorical features. In 1.7, XGBoost can handle missing values in categorical features and features a new parameter max_cat_threshold, which limits the number of categories that can be used in the split evaluation. The parameter is enabled when the partitioning algorithm is used and helps prevent over-fitting. Also, the sklearn interface can now accept the feature_types parameter to use data types other than dataframe for categorical features. (#8280, #7821, #8285, #8080, #7948, #7858, #7853, #8212, #7957, #7937, #7934)

Experimental support for federated learning and new communication collective

An exciting addition to XGBoost is the experimental federated learning support. The federated learning is implemented with a gRPC federated server that aggregates allreduce calls, and federated clients that train on local data and use existing tree methods (approx, hist, gpu_hist). Currently, this only supports horizontal federated learning (samples are split across participants, and each participant has all the features and labels). Future plans include vertical federated learning (features split across participants), and stronger privacy guarantees with homomorphic encryption and differential privacy. See Demo with NVFlare integration for example usage with nvflare.

As part of the work, XGBoost 1.7 has replaced the old rabit module with the new collective module as the network communication interface with added support for runtime backend selection. In previous versions, the backend is defined at compile time and can not be changed once built. In this new release, users can choose between rabit and federated. (#8029, #8351, #8350, #8342, #8340, #8325, #8279, #8181, #8027, #7958, #7831, #7879, #8257, #8316, #8242, #8057, #8203, #8038, #7965, #7930, #7911)

The feature is available in the public PyPI binary package for testing.

Quantile DMatrix

Before 1.7, XGBoost has an internal data structure called DeviceQuantileDMatrix (and its distributed version). We now extend its support to CPU and renamed it to QuantileDMatrix. This data structure is used for optimizing memory usage for the hist and gpu_hist tree methods. The new feature helps reduce CPU memory usage significantly, especially for dense data. The new QuantileDMatrix can be initialized from both CPU and GPU data, and regardless of where the data comes from, the constructed instance can be used by both the CPU algorithm and GPU algorithm including training and prediction (with some overhead of conversion if the device of data and training algorithm doesn't match). Also, a new parameter ref is added to QuantileDMatrix, which can be used to construct validation/test datasets. Lastly, it's set as default in the scikit-learn interface when a supported tree method is specified by users. (#7889, #7923, #8136, #8215, #8284, #8268, #8220, #8346, #8327, #8130, #8116, #8103, #8094, #8086, #7898, #8060, #8019, #8045, #7901, #7912, #7922)

Mean absolute error

The mean absolute error is a new member of the collection of objectives in XGBoost. It's noteworthy since MAE has zero hessian value, which is unusual to XGBoost as XGBoost relies on Newton optimization. Without valid Hessian values, the convergence speed can be slow. As part of the support for MAE, we added line searches into the XGBoost training algorithm to overcome the difficulty of training without valid Hessian values. In the future, we will extend the line search to other objectives where it's appropriate for faster convergence speed. (#8343, #8107, #7812, #8380)

XGBoost on Browser

With the help of the pyodide project, you can now run XGBoost on browsers. (#7954, #8369)

Experimental IPv6 Support for Dask

With the growing adaption of the new internet protocol, XGBoost joined the club. In the latest release, the Dask interface can be used on IPv6 clusters, see XGBoost's Dask tutorial for details. (#8225, #8234)

Optimizations

We have new optimizations for both the hist and gpu_hist tree methods to make XGBoost's training even more efficient.

Hist Hist now supports optional by-column histogram build, which is automatically configured based on various conditions of input data. This helps the XGBoost CPU hist algorithm to scale better with different shapes of training datasets. (#8233, #8259). Also, the build histogram kernel now can better utilize CPU registers (#8218)

GPU Hist GPU hist performance is significantly improved for wide datasets. GPU hist now supports batched node build, which reduces kernel latency and increases throughput. The improvement is particularly significant when growing deep trees with the default depthwise policy. (#7919, #8073, #8051, #8118, #7867, #7964, #8026)

Breaking Changes

Breaking changes made in the 1.7 release are summarized below.

The grow_local_histmaker updater is removed. This updater is rarely used in practice and has no test. We decided to remove it and focus have XGBoot focus on other more efficient algorithms. (#7992, #8091)

Single precision histogram is removed due to its lack of accuracy caused by significant floating point error. In some cases the error can be difficult to detect due to log-scale operations, which makes the parameter dangerous to use. (#7892, #7828)

Deprecated CUDA architectures are no longer supported in the release binaries. (#7774)

... (truncated)

Changelog

Sourced from xgboost's changelog.

v1.7.0 (2022 Oct 20)

We are excited to announce the feature packed XGBoost 1.7 release. The release note will walk through some of the major new features first, then make a summary for other improvements and language-binding-specific changes.

PySpark

XGBoost 1.7 features initial support for PySpark integration. The new interface is adapted from the existing PySpark XGBoost interface developed by databricks with additional features like QuantileDMatrix and the rapidsai plugin (GPU pipeline) support. The new Spark XGBoost Python estimators not only benefit from PySpark ml facilities for powerful distributed computing but also enjoy the rest of the Python ecosystem. Users can define a custom objective, callbacks, and metrics in Python and use them with this interface on distributed clusters. The support is labeled as experimental with more features to come in future releases. For a brief introduction please visit the tutorial on XGBoost's document page. (#8355, #8344, #8335, #8284, #8271, #8283, #8250, #8231, #8219, #8245, #8217, #8200, #8173, #8172, #8145, #8117, #8131, #8088, #8082, #8085, #8066, #8068, #8067, #8020, #8385)

Due to its initial support status, the new interface has some limitations; categorical features and multi-output models are not yet supported.

Development of categorical data support

More progress on the experimental support for categorical features. In 1.7, XGBoost can handle missing values in categorical features and features a new parameter max_cat_threshold, which limits the number of categories that can be used in the split evaluation. The parameter is enabled when the partitioning algorithm is used and helps prevent over-fitting. Also, the sklearn interface can now accept the feature_types parameter to use data types other than dataframe for categorical features. (#8280, #7821, #8285, #8080, #7948, #7858, #7853, #8212, #7957, #7937, #7934)

Experimental support for federated learning and new communication collective

An exciting addition to XGBoost is the experimental federated learning support. The federated learning is implemented with a gRPC federated server that aggregates allreduce calls, and federated clients that train on local data and use existing tree methods (approx, hist, gpu_hist). Currently, this only supports horizontal federated learning (samples are split across participants, and each participant has all the features and labels). Future plans include vertical federated learning (features split across participants), and stronger privacy guarantees with homomorphic encryption and differential privacy. See Demo with NVFlare integration for example usage with nvflare.

As part of the work, XGBoost 1.7 has replaced the old rabit module with the new collective module as the network communication interface with added support for runtime backend selection. In previous versions, the backend is defined at compile time and can not be changed once built. In this new release, users can choose between rabit and federated. (#8029, #8351, #8350, #8342, #8340, #8325, #8279, #8181, #8027, #7958, #7831, #7879, #8257, #8316, #8242, #8057, #8203, #8038, #7965, #7930, #7911)

The feature is available in the public PyPI binary package for testing.

Quantile DMatrix

Before 1.7, XGBoost has an internal data structure called DeviceQuantileDMatrix (and its distributed version). We now extend its support to CPU and renamed it to QuantileDMatrix. This data structure is used for optimizing memory usage for the hist and gpu_hist tree methods. The new feature helps reduce CPU memory usage significantly, especially for dense data. The new QuantileDMatrix can be initialized from both CPU and GPU data, and regardless of where the data comes from, the constructed instance can be used by both the CPU algorithm and GPU algorithm including training and prediction (with some overhead of conversion if the device of data and training algorithm doesn't match). Also, a new parameter ref is added to QuantileDMatrix, which can be used to construct validation/test datasets. Lastly, it's set as default in the scikit-learn interface when a supported tree method is specified by users. (#7889, #7923, #8136, #8215, #8284, #8268, #8220, #8346, #8327, #8130, #8116, #8103, #8094, #8086, #7898, #8060, #8019, #8045, #7901, #7912, #7922)

Mean absolute error

The mean absolute error is a new member of the collection of objectives in XGBoost. It's noteworthy since MAE has zero hessian value, which is unusual to XGBoost as XGBoost relies on Newton optimization. Without valid Hessian values, the convergence speed can be slow. As part of the support for MAE, we added line searches into the XGBoost training algorithm to overcome the difficulty of training without valid Hessian values. In the future, we will extend the line search to other objectives where it's appropriate for faster convergence speed. (#8343, #8107, #7812, #8380)

XGBoost on Browser

With the help of the pyodide project, you can now run XGBoost on browsers. (#7954, #8369)

Experimental IPv6 Support for Dask

With the growing adaption of the new internet protocol, XGBoost joined the club. In the latest release, the Dask interface can be used on IPv6 clusters, see XGBoost's Dask tutorial for details. (#8225, #8234)

Optimizations

We have new optimizations for both the hist and gpu_hist tree methods to make XGBoost's training even more efficient.

Hist Hist now supports optional by-column histogram build, which is automatically configured based on various conditions of input data. This helps the XGBoost CPU hist algorithm to scale better with different shapes of training datasets. (#8233, #8259). Also, the build histogram kernel now can better utilize CPU registers (#8218)

GPU Hist GPU hist performance is significantly improved for wide datasets. GPU hist now supports batched node build, which reduces kernel latency and increases throughput. The improvement is particularly significant when growing deep trees with the default depthwise policy. (#7919, #8073, #8051, #8118, #7867, #7964, #8026)

Breaking Changes

Breaking changes made in the 1.7 release are summarized below.

The grow_local_histmaker updater is removed. This updater is rarely used in practice and has no test. We decided to remove it and focus have XGBoot focus on other more efficient algorithms. (#7992, #8091)

Single precision histogram is removed due to its lack of accuracy caused by significant floating point error. In some cases the error can be difficult to detect due to log-scale operations, which makes the parameter dangerous to use. (#7892, #7828)

Deprecated CUDA architectures are no longer supported in the release binaries. (#7774)

As part of the federated learning development, the rabit module is replaced with the new collective module. It's a drop-in replacement with added runtime backend selection, see the federated learning section for more details (#8257)

... (truncated)

Commits

4bc59ef Release 1.7

e43cd60 [backport] Type fix for WebAssembly. (#8369) (#8394)

3f92970 [backport] Fix CUDA async stream. (#8380) (#8392)

e17f701 [backport][doc] Cleanup outdated documents for GPU. [skip ci] (#8378) (#8393)

aa30ce1 [backport][pyspark] Improve tutorial on enabling GPU support. (#8385) [skip c...

153d995 Fix building XGBoost with libomp 15 (#8384) (#8387)

463313d Remove cleanup script in R package. (#8370)

7cf58a2 Make 1.7.0rc1. (#8365)

28a466a Fixes for R checks. (#8330)

5bd849f Unify the partitioner for hist and approx.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
build(deps): bump matplotlib from 3.5.1 to 3.6.1
Bumps matplotlib from 3.5.1 to 3.6.1.

Release notes

Sourced from matplotlib's releases.

REL: v3.6.1

This is the first bugfix release of the 3.6.x series.

This release contains several bug-fixes and adjustments:

A warning is no longer raised when constrained layout explicitly disabled and tight layout is applied

Add missing get_cmap method to ColormapRegistry

Adding a colorbar on a ScalarMappable that is not attached to an Axes is now deprecated instead of raising a hard error

Fix barplot being empty when first element is NaN

Fix FigureManager.resize on GTK4

Fix fill_between compatibility with NumPy 1.24 development version

Fix hexbin with empty arrays and log scaling

Fix resize_event deprecation warnings when creating figure on macOS

Fix build in mingw

Fix compatibility with PyCharm's interagg backend

Fix crash on empty Text in PostScript backend

Fix generic font families in SVG exports

Fix horizontal colorbars with hatches

Fix misplaced mathtext using eqnarray

stackplot no longer changes the Axes cycler

REL: v3.6.0

Highlights of this release include:

Figure and Axes creation / management

subplots, subplot_mosaic accept height_ratios and width_ratios arguments

Constrained layout is no longer considered experimental

New layout_engine module

Compressed layout added for fixed-aspect ratio Axes

Layout engines may now be removed

Axes.inset_axes flexibility

WebP is now a supported output format

Garbage collection is no longer run on figure close

Plotting methods

Striped lines (experimental)

Custom cap widths in box and whisker plots in bxp and boxplot

Easier labelling of bars in bar plot

New style format string for colorbar ticks

Linestyles for negative contours may be set individually

Improved quad contour calculations via ContourPy

errorbar supports markerfacecoloralt

streamplot can disable streamline breaks

New axis scale asinh (experimental)

stairs(..., fill=True) hides patch edge by setting linewidth

Fix the dash offset of the Patch class

Rectangle patch rotation point

... (truncated)

Commits

318b234 REL: v3.6.1

b92cccc Update release notes for 3.6.1

746f3ce DOC: Update GitHub stats for 3.6.1

251e3ca Merge branch 'v3.6.0-doc' into v3.6.x

4627a5e Merge pull request #24124 from meeseeksmachine/auto-backport-of-pr-24111-on-v...

3863297 Backport PR #24111: FIX: add missing method to ColormapRegistry

78bcd91 Merge pull request #24117 from meeseeksmachine/auto-backport-of-pr-24113-on-v...

d336a67 Merge pull request #24116 from meeseeksmachine/auto-backport-of-pr-24115-on-v...

305a146 Backport PR #24113: Add exception class to pytest.warns calls

0c248ac Backport PR #24115: Fix mask lookup in fill_between for NumPy 1.24+

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
build(deps-dev): bump dvc from 2.9.5 to 2.13.0
Bumps dvc from 2.9.5 to 2.13.0.

Release notes

Sourced from dvc's releases.

2.13.0 🦉

Refer to https://dvc.org/doc/install for installation instructions.

Changes

deps: bump dvc-data, install dvc-data cli deps in dev mode (#7994) @skshetry

deps: bump dvc-data to 0.0.20 (#7992) @efiop

checkout: use dvcignore (#7963) @alexmojaki

🚀 New Features and Enhancements

metrics: support TOML files (#7965) @alexmojaki

🔨 Maintenance

build(deps-dev): Bump pytest-mock from 3.8.1 to 3.8.2 (#7982) @dependabot

Thanks again to @alexmojaki, @dependabot, @dependabot[bot], @efiop and @skshetry for the contributions! 🎉

2.12.1 🦉

Refer to https://dvc.org/doc/install for installation instructions.

Changes

🚀 New Features and Enhancements

api: params_show: Raise exception if no params found. (#7938) @daavoo

parsing: Support dict unpacking in cmd. (#7907) @daavoo

Help text for dvc update (#7958) @dberenbaum

Initial support for flexible plots (#7477) @pared

🐛 Bug Fixes

dvc: normalize targets before entering brancher (#7966) @efiop

🔨 Maintenance

deps: bump dvc-data to 0.0.19 (#7979) @efiop

build(deps): Bump dvc-data from 0.0.16 to 0.0.18 (#7968) @dependabot

deps: remove setuptools_scm_git_archive (#7952) @skshetry

Thanks again to @alexmojaki, @daavoo, @dberenbaum, @dependabot, @dependabot[bot], @efiop, @pared, @pre-commit-ci, @pre-commit-ci[bot] and @skshetry for the contributions! 🎉

2.12.0 🦉

Refer to https://dvc.org/doc/install for installation instructions.

Changes

deps: bump dvc-data to 0.0.16 (#7948) @skshetry

setup: bump dvc-data (#7935) @efiop

... (truncated)

Commits

d8be684 deps: bump dvc-data to 0.0.23

f6adc49 deps: bump dvc-data to 0.0.22

a4c0ae9 deps: bump dvc-data, install dvc-data cli deps in dev mode

8b3020a deps: bump dvc-data to 0.0.20

182a22b checkout: use dvcignore

892b235 metrics: support TOML files

da93e9a build(deps-dev): Bump pytest-mock from 3.8.1 to 3.8.2

bd93d85 deps: bump dvc-data

eed6a84 data_cloud: remove logger check

954de0c api: params_show: Raise exception if no params found.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0

build(deps): bump numpy from 1.21.5 to 1.21.6

Bumps numpy from 1.21.5 to 1.21.6.

Release notes

Sourced from numpy's releases.

v1.21.6

NumPy 1.21.6 Release Notes

NumPy 1.21.6 is a very small release that achieves two things:

Backs out the mistaken backport of C++ code into 1.21.5.
Provides a 32 bit Windows wheel for Python 3.10.

The provision of the 32 bit wheel is intended to make life easier for oldest-supported-numpy.

Checksums

MD5

5a3e5d7298056bcfbc3246597af474d4  numpy-1.21.6-cp310-cp310-macosx_10_9_universal2.whl
d981d2859842e7b62dc93e24808c7bac  numpy-1.21.6-cp310-cp310-macosx_10_9_x86_64.whl
171313893c26529404d09fadb3537ed3  numpy-1.21.6-cp310-cp310-macosx_11_0_arm64.whl
5a7a6dfdd43069f9b29d3fe6b7f3a2ce  numpy-1.21.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
a9e25375a72725c5d74442eda53af405  numpy-1.21.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
6f9a782477380b2cdb7606f6f7634c00  numpy-1.21.6-cp310-cp310-win32.whl
32a73a348864700a3fa510d2fc4350b7  numpy-1.21.6-cp310-cp310-win_amd64.whl
0db8941ebeb0a02cd839d9cd3c5c20bb  numpy-1.21.6-cp37-cp37m-macosx_10_9_x86_64.whl
67882155be9592850861f4ad8ba36623  numpy-1.21.6-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl
c70e30e1ff9ab49f898c19e7a6492ae6  numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
e32dbd291032c7554a742f1bb9b2f7a3  numpy-1.21.6-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
689bf804c2cd16cb241fd943e3833ffd  numpy-1.21.6-cp37-cp37m-win32.whl
0062a7b0231a07cb5b9f3d7c495e6fe4  numpy-1.21.6-cp37-cp37m-win_amd64.whl
0d08809980ab497659e7aa0df9ce120e  numpy-1.21.6-cp38-cp38-macosx_10_9_universal2.whl
3c67d14ea2009069844b27bfbf74304d  numpy-1.21.6-cp38-cp38-macosx_10_9_x86_64.whl
5f0e773745cb817313232ac1bf4c7eee  numpy-1.21.6-cp38-cp38-macosx_11_0_arm64.whl
fa8011e065f1964d3eb870bb3926fc99  numpy-1.21.6-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl
486cf9d4daab59aad253aa5b84a5aa83  numpy-1.21.6-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
88509abab303c076dfb26f00e455180d  numpy-1.21.6-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
f7234e2ef837f5f6ddbde8db246fd05b  numpy-1.21.6-cp38-cp38-win32.whl
e1063e01fb44ea7a49adea0c33548217  numpy-1.21.6-cp38-cp38-win_amd64.whl
61c4caad729e3e0e688accbc1424ed45  numpy-1.21.6-cp39-cp39-macosx_10_9_universal2.whl
67488d8ccaeff798f2e314aae7c4c3d6  numpy-1.21.6-cp39-cp39-macosx_10_9_x86_64.whl
128c3713b5d1de45a0f522562bac5263  numpy-1.21.6-cp39-cp39-macosx_11_0_arm64.whl
50e79cd0610b4ed726b3bf08c3716dab  numpy-1.21.6-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl
bd0c9e3c0e488faac61daf3227fb95af  numpy-1.21.6-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
aa5e9baf1dec16b15e481c23f8a23214  numpy-1.21.6-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
a2405b0e5d3f775ad30177296a997092  numpy-1.21.6-cp39-cp39-win32.whl
f0d20eda8c78f957ea70c5527954303e  numpy-1.21.6-cp39-cp39-win_amd64.whl
9682abbcc38cccb7f56e48aacca7de23  numpy-1.21.6-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
6aa3c2e8ea2886bf593bd8e0a1425c64  numpy-1.21.6.tar.gz
04aea95dcb1d256d13a45df42173aa1e  numpy-1.21.6.zip

SHA256

... (truncated)

Commits

ef0ec78 Merge pull request #21323 from charris/prepare-1.21.6-release
24a8ec0 REL: Prepare for NumPy 1.21.6 release.
68ff2d3 Merge pull request #21318 from charris/revert-20354
30ba38c REV: Revert pull request #20464 from charris/backport-20354
7cfef93 REL: prepare 1.21.x for further development
See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies

opened by dependabot[bot] 0

Generalizando
Fazer o repositório de ser uso geral:

Deveria ser baixada a versão limpa do repositório, sem arquivos ou configurações setadas.

Atualmente um problema de classificação está implementado. Precisa ser capaz de funcionar para outros tipos de problema, como regressões. Assim, as métricas devem ter um comportamento geral, sendo configuráveis via .yaml

O make_dataset apenas baixa uma base de dados tabular em formato .csv, todo o repo se baseia nisso. Precisa-se garantir que seja funcional com outros formatos de arquivo (.xml, .txt). Também precisa-se ter em mente que nem todo problema é resolvido com dados tabulares, no caso de problemas de NLP precisa de uma implementação específica para transformar um texto e um dataset, isso deveria estar disponível.

Diferentes formas de validação de dados deveriam ser possíveis. Poderia ser configurável via arquivo qual o tipo de validação será usado no problema.
opened by maikereis 3