Machine Learning toolbox for Humans

Related tags

Machine Learning rep
Overview

Reproducible Experiment Platform (REP)

Join the chat at https://gitter.im/yandex/rep Build Status PyPI version Documentation CircleCI

REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way.

Main features:

  • unified python wrapper for different ML libraries (wrappers follow extended scikit-learn interface)
    • Sklearn
    • TMVA
    • XGBoost
    • uBoost
    • Theanets
    • Pybrain
    • Neurolab
    • MatrixNet service(available to CERN)
  • parallel training of classifiers on cluster
  • classification/regression reports with plots
  • interactive plots supported
  • smart grid-search algorithms with parallel execution
  • research versioning using git
  • pluggable quality metrics for classification
  • meta-algorithm design (aka 'rep-lego')

REP is not trying to substitute scikit-learn, but extends it and provides better user experience.

Howto examples

To get started, look at the notebooks in /howto/

Notebooks can be viewed (not executed) online at nbviewer
There are basic introductory notebooks (about python, IPython) and more advanced ones (about the REP itself)

Examples code is written in python 2, but library is python 2 and python 3 compatible.

Installation with Docker

We provide the docker image with REP and all it's dependencies. It is a recommended way, specially if you're not experienced in python.

Installation with bare hands

However, if you want to install REP and all of its dependencies on your machine yourself, follow this manual: installing manually and running manually.

Links

License

Apache 2.0, library is open-source.

Minimal examples

REP wrappers are sklearn compatible:

from rep.estimators import XGBoostClassifier, SklearnClassifier, TheanetsClassifier
clf = XGBoostClassifier(n_estimators=300, eta=0.1).fit(trainX, trainY)
probabilities = clf.predict_proba(testX)

Beloved trick of kagglers is to run bagging over complex algorithms. This is how it is done in REP:

from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(base_estimator=XGBoostClassifier(), n_estimators=10)
# wrapping sklearn to REP wrapper
clf = SklearnClassifier(clf)

Another useful trick is to use folding instead of splitting data into train/test. This is specially useful when you're using some kind of complex stacking

from rep.metaml import FoldingClassifier
clf = FoldingClassifier(TheanetsClassifier(), n_folds=3)
probabilities = clf.fit(X, y).predict_proba(X)

In example above all data are splitted into 3 folds, and each fold is predicted by classifier which was trained on other 2 folds.

Also REP classifiers provide report:

report = clf.test_on(testX, testY)
report.roc().plot() # plot ROC curve
from rep.report.metrics import RocAuc
# learning curves are useful when training GBDT!
report.learning_curve(RocAuc(), steps=10)  

You can read about other REP tools (like smart distributed grid search, folding and factory) in documentation and howto examples.

Comments
  • Problem with TMVAClassifier

    Problem with TMVAClassifier

    After REP installation from here, I've met the following problem with TMVAClassifier fitting: I'm trying to train TMVAClassifier, and IOError raises after following strings: " baseline = TMVAClassifier(method='kBDT', features=variables, BoostType='Grad', NTrees=40, Shrinkage=0.01, MaxDepth=7, UseNvars=6, nCuts=-1) features=variables)

    baseline.fit(train, train['signal'])"

    Stacktrace is next: IOError Traceback (most recent call last) in () 3 UseNvars=6, nCuts=-1) 4 # baseline = TMVAClassifier(method='kBDT', NTrees=50, Shrinkage=0.05, features=variables) ----> 5 baseline.fit(train, train['signal'])

    /usr/local/lib/python2.7/dist-packages/rep-0.6.3-py2.7.egg/rep/estimators/tmva.pyc in fit(self, X, y, sample_weight) 288 self.factory_options = '{}:AnalysisType=Multiclass'.format(self.factory_options) 289 --> 290 return self._fit(X, y, sample_weight=sample_weight) 291 292 def predict_proba(self, X):

    /usr/local/lib/python2.7/dist-packages/rep-0.6.3-py2.7.egg/rep/estimators/tmva.pyc in _fit(self, X, y, sample_weight, model_type) 104 add_info = _AdditionalInformation(directory, model_type=model_type) 105 try: --> 106 self._run_tmva_training(add_info, X, y, sample_weight) 107 finally: 108 self._remove_tmp_directory(directory)

    /usr/local/lib/python2.7/dist-packages/rep-0.6.3-py2.7.egg/rep/estimators/tmva.pyc in run_tmva_training(self, info, X, y, sample_weight) 134 xml_filename = os.path.join(info.directory, 'weights', 135 '{job}{name}.weights.xml'.format(job=info.tmva_job, name=self._method_name)) --> 136 with open(xml_filename, 'r') as xml_file: 137 self.formula_xml = xml_file.read() 138

    IOError: [Errno 2] No such file or directory: '/home/artem/Documents/IPython Notebooks/CERN + Yandex/Original Baseline/flavours-of-physics-start/tmp0Fhtqe/weights/TMVAEstimation_REP_Estimator.weights.xml'

    As I found, weights/ folder was created outside of temporary folder instead created inside in last one. It causes the error above.

    ROOT 5.34, Python 2.7, GCC 4.8, Ubuntu 14.04 LTS (x64). All requirenments for REP were installed successfully (from requirenments.txt)

    bug 
    opened by HolyBayes 9
  • FoldingClassifier: KFold vs StratifiedKFold

    FoldingClassifier: KFold vs StratifiedKFold

    Hey,

    first of all a compliment: I really like your repo and I build a lot of code on it, it's so useful! About the FoldingClassifier: There was already a request to implement the StratifiedKFolding additionally to the "normal" KFolding. I would be very glad to see this but I'd even go a step further: why don't you completely replace the KFold with a StratifiedKFold?

    I think, from an ML point of view, it is always better (or, in best case, equally good) to use a stratified one. Using a normal KFolding only introduces different class-balances which (usually) result in "shifted" probabilities among the different classifier, whereas a stratified one does not and therefore makes each trained classifiers predictions "comparable".

    Or in other words: I cannot think of any case where you want to have a non-stratified KFolding instead of a stratified one.

    What do you think?

    Best, Mayou

    enhancement 
    opened by jonas-eschle 5
  • Support for build on hosted on (ana)conda

    Support for build on hosted on (ana)conda

    I see that some of the continuous integration scripts support conda builds, although not all the dependencies are installed this way. Is there any hope of seeing a build on conda soon for Linux x86_64 systems?

    The reason I ask is that I have accounts on numerous batch systems, none of which I have root access or have any way to use docker. They're all linux-based though, as is the norm. So far as I know, this is the case for many researchers.

    It'd be great to see a way to quickly install REP on these systems. This would:

    • Cut down on the time needed to introduce people to REP
    • Hook into the environment management and environment logging provided by conda
    • Easily and quickly deploy REP on supercomputing nodes while requiring little of their filesystem

    This is especially useful for ensuring the ROOT install is sane. I know there has already been a lot of work in the direction of making REP easy to access and install. Perhaps this could be a healthy addition?

    question 
    opened by ewengillies 5
  • Add ability to initialise FoldingBase objects with external parser

    Add ability to initialise FoldingBase objects with external parser

    If you would like to run rep with eg a StratifiedKFold instead of a normal KFold, this will be possible after the pull request. If no external folder-object is parsed, the default KFold algorithm is used.

    opened by mschlupp 5
  • test_xgboost file is not running on windows 10

    test_xgboost file is not running on windows 10

    test_xgboost file is not running on windows 10 File "c:\Sander\my_code\rep-master\tests\test_xgboost.py", line 4, in from rep.estimators import XGBoostClassifier, XGBoostRegressor

    ImportError: cannot import name XGBoostClassifier

    when rep installatoin is ok but xgboost instal fails Microsoft Windows Version 10.0.10586 2015 Microsoft Corporation. All rights reserved.

    c:\Sander>pip install rep --no-dependencies Collecting rep Downloading rep-0.6.5.tar.gz (72kB) 100% |################################| 81kB 511kB/s Building wheels for collected packages: rep Running setup.py bdist_wheel for rep ... done Stored in directory: C:\Users\Sander\AppData\Local\pip\Cache\wheels\db\ee\06\ac6e3f3ec208edaee29654f0b55ffaf2719a51de799c396b91 Successfully built rep Installing collected packages: rep Successfully installed rep-0.6.5 You are using pip version 8.1.0, however version 8.1.2 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command.

    c:\Sander>pip install xgboost==0.4a30 Collecting xgboost==0.4a30 Downloading xgboost-0.4a30.tar.gz (753kB) 100% |################################| 757kB 553kB/s No files/directories in c:\users\sander\appdata\local\temp\pip-build-exobfm\xgboost\pip-egg-info (from PKG-INFO) You are using pip version 8.1.0, however version 8.1.2 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command.

    c:\Sander>

    opened by Sandy4321 5
  • Manual Install on Windows

    Manual Install on Windows

    Hi! Is there a way to install REP manually on Windows environment? When installing dependencies i get an error when installing gnureadline:

    Error: this module is not meant to work on Windows (try pyreadline instead)

    Is there a way to use pyreadline for windows uoosers?

    wontfix 
    opened by funkindy 4
  • Mac OS instalation with docker

    Mac OS instalation with docker

    It seems last docker release depricates boot2docker http://docs.docker.com/installation/mac/ "This release of Docker deprecates the Boot2Docker command line in favor of Docker Machine"

    How to install REP with latest docker release?

    opened by pupadupa 4
  • test failed

    test failed

    after python setup.py install I run cd tests ; nosetests . it runs for long time and ends up with errors:

    ..Info in <TCanvas::Print>: png file /tmp/tmpBg1dar.png has been created
    Error in <TFile::TFile>: file toy_datasets/toyMC_bck_mass.root does not exist
    E..E.
    ======================================================================
    ERROR: tests.z_test_notebook.test_notebooks_in_folder('/root/rep/howto/00-intro-ROOT.ipynb',)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
        self.test(*self.arg)
      File "/root/rep/rep/test/test_notebooks.py", line 43, in check_single_notebook
        raise RuntimeError(description)
    RuntimeError: Cell failed: 'T.Draw("min_DOCA")
    c1'
    
     Traceback:
    ---------------------------------------------------------------------------
    ReferenceError                            Traceback (most recent call last)
    <ipython-input-5-aa6c7320180d> in <module>()
    ----> 1 T.Draw("min_DOCA")
          2 c1
    
    ReferenceError: attempt to access a null-pointer
    

    What am I missing?

    opened by anaderi 3
  • Updating numpy in 0.6.6 docker breaks matplotlib

    Updating numpy in 0.6.6 docker breaks matplotlib

    % docker run -ti yandex/rep:0.6.6 bash -lc 'pip install -U numpy; python -c "from matplotlib import pyplot as plt; plt.figure()"'
    Activate: ROOT has been sourced. Environment settings are ready.
    ROOTSYS=/root/miniconda/envs/rep_py2
    Deactivate:Unsetting ROOT environment variables..
    Activate: ROOT has been sourced. Environment settings are ready.
    ROOTSYS=/root/miniconda/envs/rep_py2
    Collecting numpy
      Downloading numpy-1.11.2-cp27-cp27mu-manylinux1_x86_64.whl (15.3MB)
        100% |################################| 15.3MB 46kB/s
    Installing collected packages: numpy
      Found existing installation: numpy 1.10.4
        DEPRECATION: Uninstalling a distutils installed project (numpy) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
        Uninstalling numpy-1.10.4:
          Successfully uninstalled numpy-1.10.4
    Successfully installed numpy-1.11.2
    /root/miniconda/envs/rep_py2/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
      warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
    bash: line 1:   222 Illegal instruction     python -c "from matplotlib import pyplot as plt; plt.figure()"
    
    opened by sashabaranov 2
  • do we need to measure fit/predict time without %time?

    do we need to measure fit/predict time without %time?

    it is useful if jupyter frontend disconnects during fit/predict execution.

    might the following snippet be handy for such cases

    class Stopwatch(object):
        def __enter__(self):
            self.t0 = datetime.datetime.now()
            return self
    
        def __exit__(self, type, value, traceback):
            self.t1 = datetime.datetime.now()
    
        def __repr__(self):
            return "delta: (%s)" % (self.t1 - self.t0)
    
    
    with Stopwatch() as sfit:
        time.sleep(1)
    with Stopwatch() as spredict:
        time.sleep(1)
    
    print "fit:", sfit, "spredict:", spredict
    
    opened by anaderi 2
  • New REP docker version running in /var/lib/docker/volumes/ instead of ~/rep_container

    New REP docker version running in /var/lib/docker/volumes/ instead of ~/rep_container

    Hi.

    I had old REP docker version in ~/rep_container which started with run.sh script on 8080 port. I updated REP and it broke: sudo $REPDIR/run.sh worked, but I couldn't connect to localhost:8080 (connection refused). I've decided to update docker and REP according to new instructions: https://github.com/yandex/rep/wiki/Install-REP-with-Docker-(Linux).

    1. I installed Docker, according to instructions.
    2. netstat -anl | grep 8888 gave empty result
    3. git checkout https://github.com/yandex/rep.git didn't work (pathspec did not match any file(s) known to git), so I used git clone instead.
    4. First run of sudo make run was successful and installed container.
    5. I rebooted and second sudo make run gave the following

    docker run -ti --rm -p 8888:8888 --name rep yandex/rep:0.6.4
    Error response from daemon: Conflict. The name "rep" is already in use by container 3af0884aeedb. You have to remove (or rename) that container to be able to reuse that name. make: *
    * [run] Error 1* 6. I ran sudo docker images

    REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE yandex/rep 0.6.4 18a48bc5a3b6 8 hours ago 2.635 GB anaderi/rep latest 63c3db2850b6 4 months ago 1.649 GB 91c95931e552 7 months ago 910 B 7. I tried sudo docker start rep. It worked and I opned REP on localhost:8888. But its working folder changed. Now it is /var/lib/docker/volumes/dbcc7ff99538007d9c6b244fb6b8f03bdcfd564f6076b36d79fa3330d2041107/_data/. It is quite unhandy, because it requires superuser rights to access and not conveniently located at all.

    Question: Is it a new system or did I something wrong? If latter, how to I fix it and run REP container in handy folder?

    opened by lodurality 2
  • Bump notebook from 4.2.1 to 6.4.12

    Bump notebook from 4.2.1 to 6.4.12

    Bumps notebook from 4.2.1 to 6.4.12.

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Update lib

    Update lib

    Issue:

    ModuleNotFoundError Traceback (most recent call last) in 5 from sklearn.ensemble import HistGradientBoostingClassifier 6 from rep.report.metrics import RocAuc ----> 7 from rep.metaml import GridOptimalSearchCV, FoldingScorer, RandomParameterOptimizer 8 from rep.estimators import SklearnClassifier

    ~/.local/lib/python3.8/site-packages/rep/metaml/init.py in 2 3 from .factory import ClassifiersFactory, RegressorsFactory ----> 4 from .folding import FoldingClassifier, FoldingRegressor 5 from .gridsearch import GridOptimalSearchCV 6 from .stacking import FeatureSplitter

    ~/.local/lib/python3.8/site-packages/rep/metaml/folding.py in 11 12 from sklearn import clone ---> 13 from sklearn.cross_validation import KFold 14 from sklearn.utils import check_random_state 15 from . import utils

    ModuleNotFoundError: No module named 'sklearn.cross_validation'

    Correction suggested based on https://stackoverflow.com/questions/30667525/importerror-no-module-named-sklearn-cross-validation

    opened by RobsonRocha 1
  • Bump requests from 2.9.1 to 2.20.0

    Bump requests from 2.9.1 to 2.20.0

    Bumps requests from 2.9.1 to 2.20.0.

    Changelog

    Sourced from requests's changelog.

    2.20.0 (2018-10-18)

    Bugfixes

    • Content-Type header parsing is now case-insensitive (e.g. charset=utf8 v Charset=utf8).
    • Fixed exception leak where certain redirect urls would raise uncaught urllib3 exceptions.
    • Requests removes Authorization header from requests redirected from https to http on the same hostname. (CVE-2018-18074)
    • should_bypass_proxies now handles URIs without hostnames (e.g. files).

    Dependencies

    • Requests now supports urllib3 v1.24.

    Deprecations

    • Requests has officially stopped support for Python 2.6.

    2.19.1 (2018-06-14)

    Bugfixes

    • Fixed issue where status_codes.py's init function failed trying to append to a __doc__ value of None.

    2.19.0 (2018-06-12)

    Improvements

    • Warn user about possible slowdown when using cryptography version < 1.3.4
    • Check for invalid host in proxy URL, before forwarding request to adapter.
    • Fragments are now properly maintained across redirects. (RFC7231 7.1.2)
    • Removed use of cgi module to expedite library load time.
    • Added support for SHA-256 and SHA-512 digest auth algorithms.
    • Minor performance improvement to Request.content.
    • Migrate to using collections.abc for 3.7 compatibility.

    Bugfixes

    • Parsing empty Link headers with parse_header_links() no longer return one bogus entry.
    ... (truncated)
    Commits
    • bd84045 v2.20.0
    • 7fd9267 remove final remnants from 2.6
    • 6ae8a21 Add myself to AUTHORS
    • 89ab030 Use comprehensions whenever possible
    • 2c6a842 Merge pull request #4827 from webmaven/patch-1
    • 30be889 CVE URLs update: www sub-subdomain no longer valid
    • a6cd380 Merge pull request #4765 from requests/encapsulate_urllib3_exc
    • bbdbcc8 wrap url parsing exceptions from urllib3's PoolManager
    • ff0c325 Merge pull request #4805 from jdufresne/https
    • b0ad249 Prefer https:// for URLs throughout project
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Changes to TMVA API in new ROOT versions break TMVAClassifier

    Changes to TMVA API in new ROOT versions break TMVAClassifier

    Hi all,

    first of all, I wanted to thank and compliment the developers for this brilliant library. I finally had the chance to start playing with it today, but I was stopped in my tracks when trying to use a TMVAClassifier:

    AssertionError: ERROR: TMVA process is incorrect finished 
     LOG: None 
     Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/ludo/miniconda3/envs/pyroot/lib/python2.7/site-packages/rep/estimators/_tmvaFactory.py", line 86, in main
        tmva_process(classifier, info, data, labels, sample_weight)
      File "/home/ludo/miniconda3/envs/pyroot/lib/python2.7/site-packages/rep/estimators/_tmvaFactory.py", line 40, in tmva_process
        factory.AddVariable(var)
    AttributeError: 'Factory' object has no attribute 'AddVariable'
    

    My ROOT/TMVA versions are:

    You are running ROOT Version: 6.08/00, Nov 4, 2016
    TMVA Version 4.2.1, Feb 5, 2015
    

    Searching the web for this error message led me to this post on the ROOT forum: https://root-forum.cern.ch/t/25090, where the cause of problem is indicated as being due to a breaking change in the TMVA API:

    In recent ROOT versions (6.06 or 6.08, don't remember exactly), the TMVA interface has changed. You need to create a TMVA::DataLoader and call AddVariable on the dataloader object.

    As I understand, this is related to what was mentioned by @gandreassi in a comment to #104. Any idea on how complicated it would be to adapt tmva_process to the new interface?

    opened by fndari 1
Releases(0.6.6)
  • 0.6.6(Aug 9, 2016)

    • python2 and python3 dockers
    • updated libraries
    • added CacheClassifier
    • minimized size of docker image, simplified building process
    • some fixes for ML libraries
    • some documentation updates
    • deleted plot.ly
    • solved theanets reproducibility
    Source code(tar.gz)
    Source code(zip)
  • 0.6.5(Feb 3, 2016)

    Fixes:

    • TMVA process correct termination
    • TMVA fix for MAX OS El Capitan (problems with dynamic libraries paths)
    • fix travis (show not passed tests, create docker on dockerhub)
    • fix wget in notebooks
    • fix errors calculation in efficiencies (for flatness property)
    • added Makefile
    • fix normalization in the multi dimentional metric
    Source code(tar.gz)
    Source code(zip)
  • 0.6.4(Nov 21, 2015)

    • Add continuous integration
    • Python 3 support
    • Conda installation in docker and travis
    • Kitematic-friendly docker
    • Update all libraries versions
    • added Folding Regressor, added feature importances for folding
    • added minimization to gridsearch, added random gridsearch from distributions
    • added folding scorer for regressor to gridsearch
    • faster tests
    • updated notebooks
    • Fixes:
      • tmva termination
      • documentation for grid search
      • Gridsearch bugs with metrics (metric fit)
      • learning curve with mask for folding
    Source code(tar.gz)
    Source code(zip)
  • 0.6.3(Jul 30, 2015)

  • 0.6.2(Jul 6, 2015)

    • Support of neural networks in common interface:

      • theanets
      • neurolab
      • pybrain

      Now all the REP stuff is available for classifiers and regressors from these libraries:

      • usage inside sklearn pipeline
      • grid_search for hyper parameter optimization
      • reports, parallel training on cluster
    • New lovely documentation, check it out!

    • Fixes in metaclassifiers connected with usage of expressions-as-features

    • Rewritten FeatureSplitter

    • Switched to sklearn 0.16

    • New method train_test_split_group - splitting into train and test by the value of special column. Samples with same values are either both in train or both in test.

    • Update howto/notebooks with new open physical datasets

    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(May 22, 2015)

    • Tmva implementation enhancement with root_numpy https://github.com/yandex/rep/issues/2.
    • Add FPRatTPR (return fpr value at fixed tpr) and TPRatFPR (return tpr value at fixed fpr) metrics, which are required, e.g. for tuning online triggering system. Moreover learning curves are available for these metrics now.
    • Many improvements in documentation.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(May 12, 2015)

    • unified classifiers wrapper for variety of implementations: TMVA, Sklearn, XGBoost, uBoost
    • parallel training of classifiers on cluster
    • classification/regression reports with plots
    • support of interactive plots (bokeh, plotly)
    • grid-search with parallelized execution on a cluster
    • git, versioning of research
    • computation of different classification metrics
    • partial support of python 3.
    Source code(tar.gz)
    Source code(zip)
Owner
Yandex
Yandex open source projects and technologies
Yandex
Datetimes for Humans™

Maya: Datetimes for Humans™ Datetimes are very frustrating to work with in Python, especially when dealing with different locales on different systems

Timo Furrer 3.4k Dec 28, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 9, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 8.1k Dec 30, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 3, 2022
Python 3.6+ toolbox for submitting jobs to Slurm

Submit it! What is submitit? Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster. It basically wraps

Facebook Incubator 768 Jan 3, 2023
A Python implementation of the Robotics Toolbox for MATLAB

Robotics Toolbox for Python A Python implementation of the Robotics Toolbox for MATLAB® GitHub repository Documentation Wiki (examples and details) Sy

Peter Corke 1.2k Jan 7, 2023
PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing values.

Wenjie Du 179 Dec 31, 2022
Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft 366 Jan 3, 2023
A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

Allen Chiang 152 Jan 7, 2023
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Daniel Formoso 5.7k Dec 30, 2022
A comprehensive repository containing 30+ notebooks on learning machine learning!

A comprehensive repository containing 30+ notebooks on learning machine learning!

Jean de Dieu Nyandwi 3.8k Jan 9, 2023
MIT-Machine Learning with Python–From Linear Models to Deep Learning

MIT-Machine Learning with Python–From Linear Models to Deep Learning | One of the 5 courses in MIT MicroMasters in Statistics & Data Science Welcome t

null 2 Aug 23, 2022
Implemented four supervised learning Machine Learning algorithms

Implemented four supervised learning Machine Learning algorithms from an algorithmic family called Classification and Regression Trees (CARTs), details see README_Report.

Teng (Elijah)  Xue 0 Jan 31, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 8, 2023
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 3.1k Dec 28, 2022
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4.2k Jan 1, 2023
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 2, 2023
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022