A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

Overview

Master status: Master Build Status Master Code Health Master Coverage Status

Development status: Development Build Status Development Code Health Development Coverage Status

Package information: Python 2.7 Python 3.5 License PyPI version

scikit-rebate

This package includes a scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. These Relief-Based algorithms (RBAs) are designed for feature weighting/selection as part of a machine learning pipeline (supervised learning). Presently this includes the following core RBAs: ReliefF, SURF, SURF*, MultiSURF*, and MultiSURF. Additionally, an implementation of the iterative TuRF mechanism and VLSRelief is included. It is still under active development and we encourage you to check back on this repository regularly for updates.

These algorithms offer a computationally efficient way to perform feature selection that is sensitive to feature interactions as well as simple univariate associations, unlike most currently available filter-based feature selection methods. The main benefit of Relief algorithms is that they identify feature interactions without having to exhaustively check every pairwise interaction, thus taking significantly less time than exhaustive pairwise search.

Certain algorithms require user specified run parameters (e.g. ReliefF requires the user to specify some 'k' number of nearest neighbors).

Relief algorithms are commonly applied to genetic analyses, where epistasis (i.e., feature interactions) is common. However, the algorithms implemented in this package can be applied to almost any supervised classification data set and supports:

  • Feature sets that are discrete/categorical, continuous-valued or a mix of both

  • Data with missing values

  • Binary endpoints (i.e., classification)

  • Multi-class endpoints (i.e., classification)

  • Continuous endpoints (i.e., regression)

Built into this code, is a strategy to 'automatically' detect from the loaded data, these relevant characteristics.

Of our two initial ReBATE software releases, this scikit-learn compatible version primarily focuses on ease of incorporation into a scikit learn analysis pipeline. This code is most appropriate for scikit-learn users, Windows operating system users, beginners, or those looking for the most recent ReBATE developments.

An alternative 'stand-alone' version of ReBATE is also available that focuses on improving run-time with the use of Cython for optimization. This implementation also outputs feature names and associated feature scores as a text file by default.

License

Please see the repository license for the licensing and usage information for scikit-rebate.

Generally, we have licensed scikit-rebate to make it as widely usable as possible.

Installation

scikit-rebate is built on top of the following existing Python packages:

  • NumPy

  • SciPy

  • scikit-learn

All of the necessary Python packages can be installed via the Anaconda Python distribution, which we strongly recommend that you use. We also strongly recommend that you use Python 3 over Python 2 if you're given the choice.

NumPy, SciPy, and scikit-learn can be installed in Anaconda via the command:

conda install numpy scipy scikit-learn

Once the prerequisites are installed, you should be able to install scikit-rebate with a pip command:

pip install skrebate

Please file a new issue if you run into installation problems.

Usage

We have designed the Relief algorithms to be integrated directly into scikit-learn machine learning workflows. For example, the ReliefF algorithm can be used as a feature selection step in a scikit-learn pipeline as follows.

import pandas as pd
import numpy as np
from sklearn.pipeline import make_pipeline
from skrebate import ReliefF
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-rebate/raw/master/data/'
                           'GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz',
                           sep='\t', compression='gzip')

features, labels = genetic_data.drop('class', axis=1).values, genetic_data['class'].values

clf = make_pipeline(ReliefF(n_features_to_select=2, n_neighbors=100),
                    RandomForestClassifier(n_estimators=100))

print(np.mean(cross_val_score(clf, features, labels)))
>>> 0.795

For more information on the Relief algorithms available in this package and how to use them, please refer to our usage documentation.

Contributing to scikit-rebate

We welcome you to check the existing issues for bugs or enhancements to work on. If you have an idea for an extension to scikit-rebate, please file a new issue so we can discuss it.

Please refer to our contribution guidelines prior to working on a new feature or bug fix.

Citing scikit-rebate

If you use scikit-rebate in a scientific publication, please consider citing the following paper:

Ryan J. Urbanowicz, Randal S. Olson, Peter Schmitt, Melissa Meeker, Jason H. Moore (2017). Benchmarking Relief-Based Feature Selection Methods. arXiv preprint, under review.

BibTeX entry:

@misc{Urbanowicz2017Benchmarking,
    author = {Urbanowicz, Ryan J. and Olson, Randal S. and Schmitt, Peter and Meeker, Melissa and Moore, Jason H.},
    title = {Benchmarking Relief-Based Feature Selection Methods},
    year = {2017},
    howpublished = {arXiv e-print. https://arxiv.org/abs/1711.08477},
}
Comments
  • The range of importance scores is not between -1 and 1 in skrebate 0.3-0.4

    The range of importance scores is not between -1 and 1 in skrebate 0.3-0.4

    I tested the codes below with skrebate 0.4

    import pandas as pd
    import numpy as np
    from sklearn.pipeline import make_pipeline
    from skrebate import ReliefF
    
    genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-rebate/raw/master/data/'
                               'GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz',
                               sep='\t', compression='gzip')
    
    features, labels = genetic_data.drop('class', axis=1).values, genetic_data['class'].values
    
    
    fs = ReliefF()
    fs.fit(features, labels )
    
    for feature_name, feature_score in zip(genetic_data.drop('class', axis=1).columns,
                                           fs.feature_importances_):
        print(feature_name, '\t', feature_score)
    

    And stdout is showed below, which is not similar with the example

    N0       -24.5
    N1       -396.5
    N2       -753.0
    N3       -640.0
    N4       -614.5
    N5       -668.5
    N6       -810.5
    N7       -767.5
    N8       -486.0
    N9       -637.5
    N10      -47.5
    N11      -666.0
    N12      -553.5
    N13      -253.0
    N14      -254.0
    N15      -816.5
    N16      -564.0
    N17      -632.0
    P1       9543.0
    P2       9782.5
    

    Then I downgraded to skrebate 0.2 and tested with the same scripts above. The stdout was changed to:

    N0       -0.00025
    N1       -0.00521875
    N2       -0.0086125
    N3       -0.00770625
    N4       -0.00755
    N5       -0.00968125
    N6       -0.01015625
    N7       -0.00959375
    N8       -0.00599375
    N9       -0.0082625
    N10      -0.001125
    N11      -0.00750625
    N12      -0.00778125
    N13      -0.00265625
    N14      -0.00309375
    N15      -0.0104125
    N16      -0.007375
    N17      -0.00760625
    P1       0.1201375
    P2       0.1222
    

    @ryanurbs and I think it maybe a normalization issue.

    question 
    opened by weixuanfu 6
  • Bug with pre_normalize with no missing values for continous data

    Bug with pre_normalize with no missing values for continous data

    When doing pre_normalize with no missing values for continous data,

    x[idx] -= cmin
    x[idx] /= diff
    

    is not correct. Because x[idx] here is a sample, what we need is a feature. So the input of pre_normalize should be self._X.transpose(), and returns x.transpose(). There are also problems with mixed data.

     def _distarray_no_missing(self, xc, xd):
            """Distance array for data with no missing values"""
            from scipy.spatial.distance import pdist, squareform
            attr = self._get_attribute_info()
            #------------------------------------------#
            def pre_normalize(x):
                """Normalizes continuous features so they are in the same range"""
                idx = 0
                print(x.shape)
                for i in attr:
                    cmin = attr[i][2]
                    diff = attr[i][3]
                    x[idx] -= cmin
                    x[idx] /= diff
                    idx += 1
                return x
            #------------------------------------------#
            if self.data_type == 'discrete':
                return squareform(pdist(self._X, metric='hamming'))
            elif self.data_type == 'mixed':
                d_dist = squareform(pdist(xd, metric='hamming'))
                c_dist = squareform(pdist(pre_normalize(xc), metric='cityblock'))
                return np.add(d_dist, c_dist) / self._num_attributes
            else:
                self._X = pre_normalize(self._X)
                return squareform(pdist(self._X, metric='cityblock'))
    
    bug 
    opened by hongliuuuu 5
  • ModuleNotFoundError: No module named 'sklearn.externals.joblib'

    ModuleNotFoundError: No module named 'sklearn.externals.joblib'

    ---> 33 from skrebate import ReliefF 34 import numpy as np 35 from sklearn.linear_model import ARDRegression as ard

    ~/opt/anaconda3/envs/causal_bi/lib/python3.8/site-packages/skrebate/init.py in 26 27 from ._version import version ---> 28 from .relieff import ReliefF 29 from .surf import SURF 30 from .surfstar import SURFstar

    ~/opt/anaconda3/envs/causal_bi/lib/python3.8/site-packages/skrebate/relieff.py in 31 import sys 32 from sklearn.base import BaseEstimator ---> 33 from sklearn.externals.joblib import Parallel, delayed 34 from .scoring_utils import get_row_missing, ReliefF_compute_scores 35

    ModuleNotFoundError: No module named 'sklearn.externals.joblib'

    opened by ghost 3
  • add MANIFEST.in and include_package_data

    add MANIFEST.in and include_package_data

    Hi folks!

    This little PR just ensures that the LICENSE file gets included along with the distribution. If you'd prefer just the package_data line rather than a MANIFEST.in, I can re-submit, but this seems to be the more conventional approach.

    Background: working on packaging this over on conda-forge: https://github.com/conda-forge/staged-recipes/pull/4708

    Of course let me know if anyone from the team would like to co-maintain: otherwise I'll be bumping versions on a best-effort basis, and always welcome a heads-up issue!

    I looked into including the tests, and saw that they bring along some rather big-ish fixture data, so i didn't just drop those right in, but we'll run them, too, if you include them in your distribution!

    enhancement 
    opened by bollwyvl 3
  • Release 0.62 does not correspond to commit 16798854 (master head)

    Release 0.62 does not correspond to commit 16798854 (master head)

    Hello, My team is willing to use the latest commit in master branch (16798854e7fbca553416409be8f9ff6f71204dac). There is a release in PyPi (https://pypi.org/project/skrebate/0.62/#history) at the same date of this commit (Feb 15th, 2021). But this release does not contains the changes of this commit. For example the class skrebate.TURF is still skrebate.TuRF.

    Please can you give more details on when this last commit will be released ?

    Thanks a lot for your help. Arnaud

    opened by arnaud-fossop 2
  • Weights are different for different runs

    Weights are different for different runs

    Hi ,

    I'm new to ML and was implementing this code to check how this works. The feature scores were constant at first, but now they are changing in every run. As per my understanding they should always be same as I don't have any dependency on classifier. Could you please confirm if this is expected?

    Code : features, classes = df.drop('Class', axis=1).values, df['Class'].values X_train, X_test, y_train, y_test = train_test_split(features, classes)

    arr = X_train.astype('float64') fs = ReliefF() fs.fit(arr, y_train)

    top_n=[] names=[] for feature_name, feature_score in zip(df.drop('Class', axis=1).columns, fs.feature_importances_): top_n.append(feature_score) names.append(feature_name)

    a = pd.DataFrame(top_n) b =pd.DataFrame(names)

    info = pd.concat([a,b], axis=1) info.columns = ['Score','Features']

    top = info.nlargest(50,'Score') ft= np.array(top['Features']) ft

    opened by moumitam28 2
  • DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib

    DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib

    https://github.com/EpistasisLab/scikit-rebate/blob/c15a57f12113efd66d499356cec64178f0f90f54/skrebate/relieff.py#L33

    The deprecation warning in the title shows when importing ReliefF from the latest version of skrebate published to anaconda. I believe it just requires replacing the sklearn.externals.joblib import with joblib?

    Thanks

    opened by raomidi 2
  • Use scikit-learn's interface to joblib for multiprocessing

    Use scikit-learn's interface to joblib for multiprocessing

    1. Removed joblib dependency and replaced it by scikit-learn's customized joblib

    2. Move scoring instancemethod function out of class to make it pickleable in python 2.7

    3. Works in both Windows OS and python2.7 environment.

    Related issue #22

    Test codes

    import pandas as pd
    import numpy as np
    from sklearn.pipeline import make_pipeline
    from skrebate import ReliefF, SURF, SURFstar, MultiSURF
    from sklearn.feature_selection import RFE
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import cross_val_score
    
    data_link = ('https://github.com/EpistasisLab/scikit-rebate/raw/master/data/'
                'GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz')
    
    genetic_data = pd.read_csv(data_link, sep='\t', compression='gzip')
    
    features, labels = genetic_data.drop('class', axis=1).values, genetic_data['class'].values
    
    
    if __name__ == "__main__":
        # NOTE: we put the following in a 'if __name__ == "__main__"' protected
        # block to be able to use a multi-core algorithm that also works under
        # Windows, see: http://docs.python.org/library/multiprocessing.html#windows
        # The multiprocessing module is used as the backend of joblib.Parallel
        # that is used when n_jobs != 1
    
    
        # ReliefF
    
        clf = make_pipeline(ReliefF(n_features_to_select=2, n_neighbors=100, n_jobs=-1),
                            RandomForestClassifier(n_estimators=100))
    
        print('ReliefF',np.mean(cross_val_score(clf, features, labels)))
    
    
        # SURF
    
        clf = make_pipeline(SURF(n_features_to_select=2, n_jobs=-1),
                            RandomForestClassifier(n_estimators=100))
        print('SURF',np.mean(cross_val_score(clf, features, labels)))
    
        # SURF*
    
        clf = make_pipeline(SURFstar(n_features_to_select=2, n_jobs=-1),
                            RandomForestClassifier(n_estimators=100))
    
        print('SURF*',np.mean(cross_val_score(clf, features, labels)))
    
        # MultiSURF
    
        clf = make_pipeline(MultiSURF(n_features_to_select=2, n_jobs=-1),
                            RandomForestClassifier(n_estimators=100))
    
        print('MultiSURF',np.mean(cross_val_score(clf, features, labels)))
    
        # TURF
    
        clf = make_pipeline(RFE(ReliefF(n_jobs=-1), n_features_to_select=2),
                            RandomForestClassifier(n_estimators=100))
    
        print('TURF',np.mean(cross_val_score(clf, features, labels)))
    
    enhancement 
    opened by weixuanfu 2
  • Look into using scikit-learn's interface to joblib for multiprocessing

    Look into using scikit-learn's interface to joblib for multiprocessing

    @weixuanfu2016 is looking into using scikit-learn's interface to joblib for multiprocessing. This may allow scikit-rebate to have multiprocessing on Python 2 as well as Windows.

    @weixuanfu2016, please report your findings on this issue.

    enhancement 
    opened by rhiever 2
  • VLSRelief + MultiSURF run time

    VLSRelief + MultiSURF run time

    Thank you for this library, it is most useful!

    I'm currently running VLSRelief with 230k features, 2350 samples, and 3 classes (it's a very noisy dataset, BTW). It's running for 14:30 hours in a Google Cloud instance with 64 vCPUs (CPU utilization is near 100% at all times) and not yet finished.

    Here are the settings I've used:

    selector = VLSRelief(
        core_algorithm="MultiSURF",
        num_feature_subset=200,
        size_feature_subset=1000,
        n_jobs=-1
    )
    

    Is it expected to take that long in a dataset of this size? Any suggestions on how to speed it up while keeping a high accuracy?

    Thank you.

    opened by brunofacca 1
  • How to use this package without sklearn?

    How to use this package without sklearn?

    I want to do some feature selection work, I can find some candidates. The wiki code is combined with sklearn. How can I use this package to get feature weight without sklearn pipe?

    opened by l0o0 1
  • ENH: Implemented performance speedup for binary ReliefF + bug fixes

    ENH: Implemented performance speedup for binary ReliefF + bug fixes

    Context

    TL:DR - I was able to implement some significant performance improvements for ReliefF on binary + discrete data. For a GAMETES generated binary class discrete data file with 3200 rows and 400 attributes, the statistics are: -ReliefF w/ 10 neighbors Before: 59.05 seconds (mean) | 0.73 seconds (std) (n=4 runs) -ReliefF w/ 10 neighbors After: 12.68 seconds (mean) | 0.65 seconds (std) (n=4 runs)

    Process

    I set out to see whether I could speed up the implementation of these algorithms. I had a few hypotheses:

    • The algorithms are currently overindexed on code-deduplication, with central functions in scoring_utils.py that try to serve all types of algorithms
    • We can improve parallelization of the algorithms
    • We can strategically use numba to speed up frequently called functions
    • We can optimize functions to rely more on numpy array math

    My general plan of action was as follows:

    • Write a simple performance benchmarking tool for the library.
    • Pick one algorithm and data type case (continuous vs. discrete features, etc.)
    • Create one optimized scoring function for that (algorithm / data) case using numpy / numba

    Changes in This PR

    For benchmarking I wrote a simple tool that takes some of the common test cases and runs timeit repeatedly. I drop the results of the first run to avoid measuring compilation overhead, which can be quite high for numba when we're using small testing datasets. The tool will print results but also dump them to a csv file in the same directory. To run this tool, python performance_tests.py Additionally - you can run a parameter sweep to estimate how performance varies across different parameter values with python performance_tests.py sweep.

    I also wrote a simple shell script to automate the generation of profiling graphs using Graphviz. This is available at ./run_performance_benchmark.sh and creates a performance csv for selected test cases, a python cprofile data file, and a png performance graph.

    Using these tools I confirmed that the bulk of the runtime was in finding relevant neighbors for the relief algorithm and scoring. This is unsurprising as those functions are called roughly O(num_rows ^ 2) times and were not currently performance optimized.

    Additional Notes & Results

    In the interests of time I focused on a proof of concept for ReliefF with binary features and discrete data. The main scoring functions in scoring_utils.py were indeed very complex, with a lot of switch statements and non-performant operations. There is an elegance to the centralization, but I think dedicated scoring functions are not only clearer to a reader but also allow for dedicated optimization. My approach was to create a dedicated compute_binary_score function which uses optimized numpy functions. I also parallelized the selection of neighbors, and optimized the neighbor selection function using numba. Doing this caused me to have to refactor some code, but I aimed to try and keep the contracts the same so that the rest of the codebase would not be affected.

    Lastly - all tests were failing when I checked out the repo. I determined this was due to Imputer being deprecated from sklearn. I updated it to the new SimpleImputer. This caused only 5 tests to fail, but looking at recent repo history I don't believe all tests had been passing. It seems to be a more general bug with how the algorithms treat data with a mixture of contiuous and discrete attributes. I did not attempt to fix this problem.

    opened by CaptainKanuk 0
  • Have the implementation of reliefF weighted with the prior probability of each class?

    Have the implementation of reliefF weighted with the prior probability of each class?

    File: scoring_utils.py
    Function: compute_score(attr, mcmap, NN, feature, inst, nan_entries, headers, class_type, X, y, labels_std, data_type, near=True)

    In compute_score, the parameter mcmap stores class frequencies, but it doesnot seem to have been used in the the process of normalization.

    opened by dahaiyu 0
  • Problem with headers in VLSRelief and TurF during the fit

    Problem with headers in VLSRelief and TurF during the fit

    Hi,

    I run VLSRelief in a small dataset (100 features) in order to check if it run without any problems. However, after I rerun it in a large dataset (>160 000 features), at the end of the fit I got the following error:

    from skrebate.vlsrelief import VLSRelief fs= VLSRelief(core_algorithm="ReliefF", n_features_to_select=1000, num_feature_subset=100, size_feature_subset=1630, verbose=True, n_jobs=-1) headers = list(X) fs.fit(np.array(X), y_encoded, headers)

    Traceback (most recent call last): File "", line 1, in File "/Users/pmc/opt/anaconda3/lib/python3.7/site-packages/skrebate/vlsrelief.py", line 139, in fit self.headers_model = list(np.array(self.headers)[head_idx]) IndexError: index 101 is out of bounds for axis 0 with size 100

    Also for TurF I got a similar error at the end of the run, but in this case already with 100 features. fs = TuRF(core_algorithm="ReliefF", n_features_to_select=10, pct=0.7, verbose=True, n_jobs=-1)

    Traceback (most recent call last): File "", line 1, in File "/Users/pmc/opt/anaconda3/lib/python3.7/site-packages/skrebate/turf.py", line 166, in fit score_index = self.headers.index(i) ValueError: 'XYZ' is not in list

    opened by patriciamartinsconde 0
  • Questions about sample code?

    Questions about sample code?

    Hello, I am new to python and machine learning but need to use the library for a project. I read the website and the sample code but am still confused on how I can retrieve the features that have been (selected?) by each of the Relief algorithms.

    Apologies if the site goes over this, but I didn't see any information on this. I had a couple questions:

    1. How do we get back the features selected by each algorithm?
    2. The sample code below for the ReliefF algorithm prints a number at the end of running the code, is this number relevant to feature selection?
    import pandas as pd
    import numpy as np
    from sklearn.pipeline import make_pipeline
    from skrebate import ReliefF
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import cross_val_score
    
    genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-rebate/raw/master/data/'
                               'GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz',
                               sep='\t', compression='gzip')
    
    features, labels = genetic_data.drop('class', axis=1).values, genetic_data['class'].values
    
    clf = make_pipeline(ReliefF(n_features_to_select=2, n_neighbors=100),
                        RandomForestClassifier(n_estimators=100))
    
    print(np.mean(cross_val_score(clf, features, labels)))
    >>> 0.795
    
    

    Thanks for any help, I've been trying to figure out this code using the internet for a couple weeks now but have not really gotten anywhere

    opened by megancooper 7
  • TuRF doesn't work with odd number of features

    TuRF doesn't work with odd number of features

    When the number of features is odd, TuRF often leaves out one feature (causing a value error at this line https://github.com/EpistasisLab/scikit-rebate/blob/master/skrebate/turf.py#L166) because segmenting of features into selected and non_selected is based on the number of features to retain: https://github.com/EpistasisLab/scikit-rebate/blob/master/skrebate/turf.py#L131

    This code fixes the problem for me:

    num_features_non_select = len(features_iter[iter_count]) - num_features
    non_select = np.array(features_iter[iter_count].argsort()[:num_features_non_select])
    
    opened by jgoecks 1
Owner
Epistasis Lab at UPenn
Prof. Jason H. Moore's research lab at the University of Pennsylvania
Epistasis Lab at UPenn
A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

Master status: Development status: Package information: MDR A scikit-learn-compatible Python implementation of Multifactor Dimensionality Reduction (M

Epistasis Lab at UPenn 122 Jul 6, 2022
scikit-learn addon to operate on set/"group"-based features

skl-groups skl-groups is a package to perform machine learning on sets (or "groups") of features in Python. It extends the scikit-learn library with s

Danica J. Sutherland 41 Apr 6, 2022
open-source feature selection repository in python

scikit-feature Feature selection repository scikit-feature in Python. scikit-feature is an open-source feature selection repository in Python develope

Jundong Li 1.3k Jan 5, 2023
Python implementations of the Boruta all-relevant feature selection method.

boruta_py This project hosts Python implementations of the Boruta all-relevant feature selection method. Related blog post How to install Install with

null 1.2k Jan 4, 2023
A fast xgboost feature selection algorithm

BoostARoota A Fast XGBoost Feature Selection Algorithm (plus other sklearn tree-based classifiers) Why Create Another Algorithm? Automated processes l

Chase DeHan 187 Dec 22, 2022
An open source python library for automated feature engineering

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to

alteryx 6.4k Jan 5, 2023
a feature engineering wrapper for sklearn

Few Few is a Feature Engineering Wrapper for scikit-learn. Few looks for a set of feature transformations that work best with a specified machine lear

William La Cava 47 Nov 18, 2022
zoofs is a Python library for performing feature selection using an variety of nature inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics based to Evolutionary. It's easy to use ,flexible and powerful tool to reduce your feature size.

zoofs is a Python library for performing feature selection using a variety of nature-inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics-based to Evolutionary. It's easy to use , flexible and powerful tool to reduce your feature size.

Jaswinder Singh 168 Dec 30, 2022
Genetic feature selection module for scikit-learn

sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu

Manuel Calzolari 260 Dec 14, 2022
Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

Thomas J. Fan 6 Dec 27, 2022
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 5, 2022
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 5, 2022
Donors data of Tamil Nadu Chief Ministers Relief Fund scrapped from https://ereceipt.tn.gov.in/cmprf/Interface/CMPRF/MonthWiseReport

Tamil Nadu Chief Minister's Relief Fund Donors Scrapped data from https://ereceipt.tn.gov.in/cmprf/Interface/CMPRF/MonthWiseReport Scrapper scrapper.p

Arunmozhi 5 May 18, 2021
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Dec 31, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 3.8k Feb 13, 2021
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Jan 3, 2023
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

SimAI 584 Dec 27, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022