A Python library for dynamic classifier and ensemble selection

Overview
Documentation Status https://circleci.com/gh/scikit-learn-contrib/DESlib.svg?style=shield https://travis-ci.org/scikit-learn-contrib/DESlib.svg?branch=master Codacy Badge https://badges.gitter.im/DESlib/gitter.png

DESlib

DESlib is an easy-to-use ensemble learning library focused on the implementation of the state-of-the-art techniques for dynamic classifier and ensemble selection. The library is is based on scikit-learn, using the same method signatures: fit, predict, predict_proba and score. All dynamic selection techniques were implemented according to the definitions from [1].

Dynamic Selection:

Dynamic Selection (DS) refers to techniques in which the base classifiers are selected dynamically at test time, according to each new sample to be classified. Only the most competent, or an ensemble of the most competent classifiers is selected to predict the label of a specific test sample. The rationale for these techniques is that not every classifier in the pool is an expert in classifying all unknown samples, but rather each base classifier is an expert in a different local region of the feature space.

DS is one of the most promising MCS approaches (Multiple Classifier Systems) due to an increasing number of empirical studies reporting superior performance over static combination methods. Such techniques have achieved better classification performance especially when dealing with small-sized and imbalanced datasets.

Installation:

The package can be installed using pip:

Stable version:

pip install deslib

Latest version (under development):

pip install git+https://github.com/scikit-learn-contrib/DESlib

Dependencies:

DESlib is tested to work with Python 3.5, 3.6 and 3.7. The dependency requirements are:

  • scipy(>=1.4.0)
  • numpy(>=1.17.0)
  • scikit-learn(>=0.20.0)

These dependencies are automatically installed using the pip commands above.

Examples:

Here we show an example using the KNORA-E method with random forest as a pool of classifiers:

from deslib.des.knora_e import KNORAE

# Train a pool of 10 classifiers
pool_classifiers = RandomForestClassifier(n_estimators=10)
pool_classifiers.fit(X_train, y_train)

# Initialize the DES model
knorae = KNORAE(pool_classifiers)

# Preprocess the Dynamic Selection dataset (DSEL)
knorae.fit(X_dsel, y_dsel)

# Predict new examples:
knorae.predict(X_test)

The library accepts any list of classifiers (compatible with scikit-learn) as input, including a list containing different classifier models (heterogeneous ensembles). More examples on how to use the API can be found in the documentation and in the Examples directory.

Organization:

The library is divided into four modules:

  1. deslib.des: Implementation of DES techniques (Dynamic Ensemble Selection).
  2. deslib.dcs: Implementation of DCS techniques (Dynamic Classifier Selection).
  3. deslib.static: Implementation of baseline ensemble methods.
  4. deslib.util: A collection of aggregation functions and diversity measures for ensemble of classifiers.
  • DES techniques currently available are:
    1. META-DES [7] [8] [15]
    2. K-Nearest-Oracle-Eliminate (KNORA-E) [3]
    3. K-Nearest-Oracle-Union (KNORA-U) [3]
    4. Dynamic Ensemble Selection-Performance(DES-P) [12]
    5. K-Nearest-Output Profiles (KNOP) [9]
    6. Randomized Reference Classifier (DES-RRC) [10]
    7. DES Kullback-Leibler Divergence (DES-KL) [12]
    8. DES-Exponential [21]
    9. DES-Logarithmic [11]
    10. DES-Minimum Difference [21]
    11. DES-Clustering [16]
    12. DES-KNN [16]
    13. DES Multiclass Imbalance (DES-MI) [24]
  • DCS techniques currently available are:
    1. Modified Classifier Rank (Rank) [19]
    2. Overall Local Accuracy (OLA) [4]
    3. Local Class Accuracy (LCA) [4]
    4. Modified Local Accuracy (MLA) [23]
    5. Multiple Classifier Behaviour (MCB) [5]
    6. A Priori Selection (A Priori) [6]
    7. A Posteriori Selection (A Posteriori) [6]
  • Baseline methods:
    1. Oracle [20]
    2. Single Best [2]
    3. Static Selection [2]
    4. Stacked Classifier [25]

Variations of each DES techniques are also provided by the library (e.g., different versions of the META-DES framework).

The following techniques are also available for all methods:
  • For DES techniques, the combination of the selected classifiers can be done as Dynamic Selection (majority voting), Dynamic Weighting (weighted majority voting) or a Hybrid (selection + weighting).
  • For all DS techniques, Dynamic Frienemy Pruning (DFP) [13] can be used.
  • For all DS techniques, Instance Hardness (IH) can be used to classify easy samples with a KNN and hard samples using the DS technique. More details on IH and Dynamic Selection can be found in [14].

As an optional requirement, the fast KNN implementation from FAISS can be used to speed-up the computation of the region of competence.

Citation

If you use DESLib in a scientific paper, please consider citing the following paper:

Rafael M. O. Cruz, Luiz G. Hafemann, Robert Sabourin and George D. C. Cavalcanti DESlib: A Dynamic ensemble selection library in Python. arXiv preprint arXiv:1802.04967 (2018).

@article{JMLR:v21:18-144,
    author  = {Rafael M. O. Cruz and Luiz G. Hafemann and Robert Sabourin and George D. C. Cavalcanti},
    title   = {DESlib: A Dynamic ensemble selection library in Python},
    journal = {Journal of Machine Learning Research},
    year    = {2020},
    volume  = {21},
    number  = {8},
    pages   = {1-5},
    url     = {http://jmlr.org/papers/v21/18-144.html}
}

References:

[1] : R. M. O. Cruz, R. Sabourin, and G. D. Cavalcanti, “Dynamic classifier selection: Recent advances and perspectives,” Information Fusion, vol. 41, pp. 195 – 216, 2018.
[2] (1, 2) : A. S. Britto, R. Sabourin, L. E. S. de Oliveira, Dynamic selection of classifiers - A comprehensive review, Pattern Recognition 47 (11) (2014) 3665–3680.
[3] (1, 2) : A. H. R. Ko, R. Sabourin, u. S. Britto, Jr., From dynamic classifier selection to dynamic ensemble selection, Pattern Recognition 41 (2008) 1735–1748.
[4] (1, 2) : K. Woods, W. P. Kegelmeyer, Jr., K. Bowyer, Combination of multiple classifiers using local accuracy estimates, IEEE Transactions on Pattern Analysis Machine Intelligence 19 (1997) 405–410.
[5] : G. Giacinto, F. Roli, Dynamic classifier selection based on multiple classifier behaviour, Pattern Recognition 34 (2001) 1879–1881.
[6] (1, 2) : L. Didaci, G. Giacinto, F. Roli, G. L. Marcialis, A study on the performances of dynamic classifier selection based on local accuracy estimation, Pattern Recognition 38 (11) (2005) 2188–2191.
[7] : R. M. O. Cruz, R. Sabourin, G. D. C. Cavalcanti, T. I. Ren, META-DES: A dynamic ensemble selection framework using meta-learning, Pattern Recognition 48 (5) (2015) 1925–1935.
[8] : Cruz, R.M., Sabourin, R. and Cavalcanti, G.D., 2015, July. META-DES. H: a dynamic ensemble selection technique using meta-learning and a dynamic weighting approach. In Neural Networks (IJCNN), 2015 International Joint Conference on (pp. 1-8)
[9] : P. R. Cavalin, R. Sabourin, C. Y. Suen, Dynamic selection approaches for multiple classifier systems, Neural Computing and Applications 22 (3-4) (2013) 673–688.
[10] : T.Woloszynski, M. Kurzynski, A probabilistic model of classifier competence for dynamic ensemble selection, Pattern Recognition 44 (2011) 2656–2668.
[11] : T.Woloszynski, M. Kurzynski, A measure of competence based on randomized reference classifier for dynamic ensemble selection, in: International Conference on Pattern Recognition (ICPR), 2010, pp. 4194–4197.
[12] (1, 2) : T. Woloszynski, M. Kurzynski, P. Podsiadlo, G. W. Stachowiak, A measure of competence based on random classification for dynamic ensemble selection, Information Fusion 13 (3) (2012) 207–213.
[13] : Oliveira, D.V.R., Cavalcanti, G.D.C. and Sabourin, R., Online Pruning of Base Classifiers for Dynamic Ensemble Selection, Pattern Recognition, vol. 72, December 2017, pp 44-58.
[14] : Cruz RM, Zakane HH, Sabourin R, Cavalcanti GD. Dynamic Ensemble Selection VS K-NN: why and when Dynamic Selection obtains higher classification performance?.
[15] : R. M. O. Cruz, R. Sabourin, G. D. C. Cavalcanti, META-DES.Oracle: Meta-learning and feature selection for dynamic ensemble selection, Information Fusion 38 (2017) 84–103.Nov 30;38:84-103.
[16] (1, 2) : R. G. F. Soares, A. Santana, A. M. P. Canuto, M. C. P. de Souto, Using accuracy and diversity to select classifiers to build ensembles, Proceedings of the International Joint Conference on Neural Networks (2006) 1310–1316.
[17] : L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, Wiley-Interscience, 2004.
[18] : Shipp, Catherine A., and Ludmila I. Kuncheva. "Relationships between combination methods and measures of diversity in combining classifiers." Information fusion 3.2 (2002): 135-148.
[19] : M. Sabourin, A. Mitiche, D. Thomas, G. Nagy, Classifier combination for handprinted digit recognition, International Conference on Document Analysis and Recognition (1993) 163–166.
[20] : L. I. Kuncheva, A theoretical study on six classifier fusion strategies, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2) (2002) 281–286.
[21] (1, 2) : B. Antosik, M. Kurzynski, New measures of classifier competence – heuristics and application to the design of multiple classifier systems., in: Computer recognition systems 4., 2011, pp. 197–206.
[22] : Smith, Michael R., Tony Martinez, and Christophe Giraud-Carrier. "An instance level analysis of data complexity." Machine learning 95.2 (2014), pp 225-256.
[23] : P. C. Smits, Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection, IEEE Transactions on Geoscience and Remote Sensing 40 (4) (2002) 801–813.
[24] : García, S., Zhang, Z.L., Altalhi, A., Alshomrani, S. and Herrera, F., "Dynamic ensemble selection for multi-class imbalanced datasets." Information Sciences 445 (2018): 22-37.
[25] : Wolpert, David H. "Stacked generalization." Neural networks 5, no. 2 (1992): 241-259.
Comments
  • index is out of bounds

    index is out of bounds

    I was trying to run StaticSelection on my dataset using Jupyter Notebook.

    I used loop to run on 40 different datasets (1 dataset = 1 subject).

    For other classifiers (SVM, KNN, etc), nothing was wrong.

    But for StaticSelection, I got the error below.

    It seems it has an issue with the test set and its label.

    Why only this happened to StaticSelection (and also SingleBest)?

    IndexError                                Traceback (most recent call last)
    <ipython-input-82-a2f7f60d152c> in <module>
         76 
         77 result_stacked_user = model_stacked.score(X_test, y_test)
    ---> 78 result_static_selection_user = model_static_selection.score(X_test, y_test)
         79 #result_single_best_user = model_single_best.score(X_test, y_test)
    
    
    ~\Anaconda3\lib\site-packages\sklearn\base.py in score(self, X, y, sample_weight)
        355         """
        356         from .metrics import accuracy_score
    --> 357         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
        358 
        359 
    
    ~\Anaconda3\lib\site-packages\deslib\static\static_selection.py in predict(self, X)
        113         predicted_labels = majority_voting(self.ensemble_, X).astype(int)
        114 
    --> 115         return self.classes_.take(predicted_labels)
        116 
        117     def _check_is_fitted(self):
    
    IndexError: index 11 is out of bounds for size 2
    
    opened by jayahm 18
  • Results of the same methods are not the same

    Results of the same methods are not the same

    Hi,

    I faced some issue with my experiments.

    Even though I have set the rendom_satate, the results seem not to be the same (For several methods)..

    The first and second experiment was run on two different Jupyter files.

    Is this normal or could be some mistake?

    Also, the results of single base classifiers are just fine. Except for DS.

    opened by jayahm 13
  • "invalid value encountered in true_divide"

    Hi,

    I got the following warnings when running "Probabilistic DES"

    C:\Users\razai0002\Anaconda3\lib\site-packages\deslib\des\probabilistic\base.py:133: RuntimeWarning: invalid value encountered in true_divide
      competences = competences / sum_potential.reshape(-1, 1)
    
    C:\Users\razai0002\Anaconda3\lib\site-packages\deslib\des\probabilistic\base.py:164: 
    RuntimeWarning: invalid value encountered in greater
      selected_classifiers = (competences > selection_threshold)
    

    Where should I check to fix this?

    opened by jayahm 10
  • How can we create pool of classifiers?

    How can we create pool of classifiers?

    As you have mentioned in your examples, BaggingClassifier or RandomeForest classifier are considered as a pool of classifier itself.

    I am wondering is it possible if I create a pool of classifiers including traditional ensemble methods like RF, Adaboost in combination of single classifiers like SVM, kNN?

    Thanks

    opened by sara-eb 10
  • Fitting DESClustering on X_dsel raise a value error when I have a pool of (pre-trained) Random Forest classifier model

    Fitting DESClustering on X_dsel raise a value error when I have a pool of (pre-trained) Random Forest classifier model

    I have trained the random forest on X_train in advance and load the model to create the pool of classifier:

    model_rf = load(base_model_dir + 'model_rf800.joblib')
    pool_classifiers = [model_rf]  
    

    It seems that other DS methods (i.e., OLA, MLA, DESP, and etc) are successfully fitting the X_dsel on the DS models, however DESClustering is raising the value error:

    print("Fitting DES-Clustering on X_DSEL dataset")
    kmeans = KMeans(n_clusters=80, random_state = rng)
    desclustering = DESClustering(pool_classifiers=pool_classifiers, random_state = rng, clustering = kmeans)
    

    which is raised on this line

    ValueError                                Traceback (most recent call last)
    <ipython-input-10-6583b8d75519> in <module>
         11 desclustering = DESClustering(pool_classifiers=pool_classifiers, random_state = rng, clustering = kmeans)
         12 
    ---> 13 desclustering.fit(X_dsel, y_dsel)
         14 end = time.clock()
         15 print(" DES-Clustering fitting time for 5 patients in DSEL = {}".format(end))
    
    ~/deslib-env/lib/python3.6/site-packages/deslib/des/des_clustering.py in fit(self, X, y)
        132         self.J_ = int(np.ceil(self.n_classifiers_ * self.pct_diversity))
        133 
    --> 134         self._check_parameters()
        135 
        136         if self.clustering is None:
    
    ~/deslib-env/lib/python3.6/site-packages/deslib/des/des_clustering.py in _check_parameters(self)
        374         if self.N_ <= 0 or self.J_ <= 0:
        375             raise ValueError("The values of N_ and J_ should be higher than 0"
    --> 376                              "N_ = {}, J_= {} ".format(self.N_, self.J_))
        377         if self.N_ < self.J_:
        378             raise ValueError(
    
    ValueError: The values of N_ and J_ should be higher than 0N_ = 0, J_= 1 
    
    1. Why this is happening when a Random Forest is given as a pool to DESClustering?

    2. Is it because I have a pre-trained RF model and loading it as the pool? Is there any difference between loading a pre-trained classifier and training the pool with RF model (as it can be seen here

    pool_classifiers = BaggingClassifier(Perceptron(max_iter=100), random_state=rng)
    pool_classifiers.fit(X_train, y_train)
    

    Your expert opinion is really appreciated. Thanks

    opened by sara-eb 8
  • Warning

    Warning

    I'm getting this error in Jupyter Notebook

    ...\Anaconda3\lib\site-packages\sklearn\model_selection_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning. warnings.warn(CV_WARNING, FutureWarning)

    ...\Anaconda3\lib\site-packages\sklearn\calibration.py:455: RuntimeWarning: invalid value encountered in multiply TEP_minus_T1P = P * (T * E - T1)

    ...\Anaconda3\lib\site-packages\sklearn\calibration.py:455: RuntimeWarning: invalid value encountered in multiply TEP_minus_T1P = P * (T * E - T1)

    ...\PhD\Anaconda3\lib\site-packages\sklearn\calibration.py:455: RuntimeWarning: invalid value encountered in multiply TEP_minus_T1P = P * (T * E - T1)

    ...\Anaconda3\lib\site-packages\sklearn\model_selection_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning. warnings.warn(CV_WARNING, FutureWarning)

    ...\Anaconda3\lib\site-packages\sklearn\svm\base.py:929: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. "the number of iterations.", ConvergenceWarning)

    ...\Anaconda3\lib\site-packages\sklearn\svm\base.py:929: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. "the number of iterations.", ConvergenceWarning)

    ...\Anaconda3\lib\site-packages\sklearn\svm\base.py:929: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. "the number of iterations.", ConvergenceWarning)

    ...\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. "avoid this warning.", FutureWarning)

    ...\Anaconda3\lib\site-packages\deslib\des\probabilistic\base.py:191: RuntimeWarning: invalid value encountered in true_divide competences = competences / sum_potential.reshape(-1, 1)

    ...\Anaconda3\lib\site-packages\deslib\des\probabilistic\base.py:222: RuntimeWarning: invalid value encountered in greater selected_classifiers = (competences > selection_threshold)

    opened by jayahm 8
  • Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

    Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

    Hi,

    I got this error for Single Best method.

    Other methods so far were fine, but up to SB, it produced this error:

    ~\anaconda3\lib\site-packages\sklearn\base.py in score(self, X, y, sample_weight)
        367         """
        368         from .metrics import accuracy_score
    --> 369         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
        370 
        371 
    
    ~\anaconda3\lib\site-packages\deslib\static\single_best.py in predict(self, X)
        110         self._check_is_fitted()
        111         predicted_labels = self._encode_base_labels(self.best_clf_.predict(X))
    --> 112         return self.classes_.take(predicted_labels)
        113 
        114     def predict_proba(self, X):
    
    TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'
    
    opened by jayahm 7
  • Problems with probabilities and outputs

    Problems with probabilities and outputs

    I have been using the Deslib library to classify a binary problem and used various DES algorithms. When I applied the model on the test dataset, I ran into some problems.

    According to my understanding, for a binary classification problem, 0.5 is the default threshold. If the "predict_proba" is higher than 0.5, then it should be classified as 1, otherwise 0.

    However, for some instances when the predicted probabilites were higher than 0.5, I got a 0 (instead of 1) and when the predicted probabilites were less than 0.5, I got a 1 (instead of 0).

    I have never seen this kind of phenomenon whilst using the scikit-learn library, therefore I would like to know if this is normal behaviour in DES?

    Please note that I am using the latest version (0.3.5) of DESlib.

    opened by atifov 6
  • Random subspace method

    Random subspace method

    Hi

    I was trying to generate a pool of classifiers based on random subspace method suing BaggingClassifier, but got this error:

    I used exactly your example code but only modified the bagging part.

    Here is the code: https://www.dropbox.com/s/z4iseijtawb53ey/plot_comparing_dynamic_static.ipynb?dl=0

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-2-43bece03c684> in <module>
         70 scores = []
         71 for method, name in zip(methods, names):
    ---> 72     method.fit(X_dsel, y_dsel)
         73     scores.append(method.score(X_test, y_test))
         74     print("Classification accuracy {} = {}"
    
    ~\anaconda3\lib\site-packages\deslib\static\single_best.py in fit(self, X, y)
         80             y_encoded = self.enc_.transform(y)
         81 
    ---> 82         performances = self._estimate_performances(X, y_encoded)
         83         self.best_clf_index_ = np.argmax(performances)
         84         self.best_clf_ = self.pool_classifiers_[self.best_clf_index_]
    
    ~\anaconda3\lib\site-packages\deslib\static\single_best.py in _estimate_performances(self, X, y)
         90         for idx, clf in enumerate(self.pool_classifiers_):
         91             scorer = check_scoring(clf, self.scoring)
    ---> 92             performances[idx] = scorer(clf, X, y)
         93         return performances
         94 
    
    ~\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py in _passthrough_scorer(estimator, *args, **kwargs)
        369 def _passthrough_scorer(estimator, *args, **kwargs):
        370     """Function that wraps estimator.score"""
    --> 371     return estimator.score(*args, **kwargs)
        372 
        373 
    
    ~\anaconda3\lib\site-packages\sklearn\base.py in score(self, X, y, sample_weight)
        367         """
        368         from .metrics import accuracy_score
    --> 369         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
        370 
        371 
    
    ~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in predict(self, X, check_input)
        417         """
        418         check_is_fitted(self)
    --> 419         X = self._validate_X_predict(X, check_input)
        420         proba = self.tree_.predict(X)
        421         n_samples = X.shape[0]
    
    ~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in _validate_X_predict(self, X, check_input)
        389                              "match the input. Model n_features is %s and "
        390                              "input n_features is %s "
    --> 391                              % (self.n_features_, n_features))
        392 
        393         return X
    
    ValueError: Number of features of the model must match the input. Model n_features is 10 and input n_features is 20 
    
    opened by jayahm 6
  • Problem with predicting and scoring when using Faiss KNN is being used for the knn calculation in the DS models

    Problem with predicting and scoring when using Faiss KNN is being used for the knn calculation in the DS models

    Hi everyone,

    I used Faiss method for knn calculation. Creating DS models and saving them was performed, and then the model was fitted into DSEL dataset. However, once I tried to do prediction and scoring for the test set, I am facing an error:

     score = knorae.score(X_test, y_test)
      File "/home/esara/deslib-env/lib/python3.6/site-packages/sklearn/base.py", line 357, in score
        return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
      File "/home/esara/deslib-env/lib/python3.6/site-packages/deslib/base.py", line 440, in predict
        distances, neighbors = self._get_region_competence(X_DS)
      File "/home/esara/deslib-env/lib/python3.6/site-packages/deslib/base.py", line 381, in _get_region_competence
        return_distance=True)
      File "/home/esara/deslib-env/lib/python3.6/site-packages/deslib/util/faiss_knn_wrapper.py", line 112, in kneighbors
        dist, idx = self.index_.search(X, n_neighbors)
    AttributeError: 'numpy.ndarray' object has no attribute 'search'
    

    Do you have any idea about that? Your help and expert opinion is appreciated in advance.

    opened by sara-eb 6
  • Allow approximate NN searchs on the FaissKNNClassifier class

    Allow approximate NN searchs on the FaissKNNClassifier class

    The current FaissKNNClassifier class only perform brute force search. The class should also allow approximate methods in order to speed up inference time on large datasets.

    More info about approximate search on faiss can be found in the following links: https://github.com/facebookresearch/faiss/wiki/Getting-started and https://github.com/facebookresearch/faiss/wiki/Faster-search

    enhancement 
    opened by Menelau 6
  • Is DES applicable for Graph convolutional Networks (GCNs)

    Is DES applicable for Graph convolutional Networks (GCNs)

    Hi, Thank you for your great contribution to science. I have a question and wondering if DESlib can be applied for GCNs or is there any other library similar to DESlib algorithms that they have extended DESlib for dynamic selection of GCNs?

    opened by qm-intel 2
  • Multi datasets features into DESlib

    Multi datasets features into DESlib

    This PR allows to use DESlib with multiples datasets. In simple terms, instead of using data with the next shape (samples x dimensions), we can use data with this shape (datasets x samples x dimensions). It's useful when we have different types of data for the same samples.

    It's not compatible with all techniques, lacks some static techniques, but it is for a lot of them. Some checks in base.py have been disabled to make the feature works. Finally, some code has been let unfinished or blank as all normal technique features were not required in the original research it came from.

    opened by pierremarcthibault 0
  • Proper way of fitting classifiers before creating an heterogeneous pool

    Proper way of fitting classifiers before creating an heterogeneous pool

    Hey, I'm working on a research paper focused on building a binary classification model in the biomedical domain.The dataset comprises approximately 800 data points. Let's say I want to feed an heterogeneous pool of classifiers to the dynamic selection methods. By following the instructions on the examples, I've found two different ways of splitting the dataset and fitting the base classifiers of the pool.

    1. Split in train/test (e.g., 75/25) and then split the training in train/dsel (e.g., 50/50). In this random forest example, the RF is fitted on the 75% training portion and the DS methods on the 50% DSEL portion.
    2. In all the other examples, the 50% training portion is used to fit the classifier and the 50% DSEL portion is used to fit DS methods.

    Furthermore, I wanted to point out this tip taken from the tutorial :

    An important point here is that in case of small datasets or when the base classifier models in the pool are weak estimators such as Decision Stumps or Perceptrons, an overlap between the training data and DSEL may be beneficial for achieving better performance.

    That seems my case, as my dataset is rather small compared to most datasets in the ML domain. Hence, I was thinking of fitting my base classifiers on the 75% part and then leveraging some overlap to get better performance (and this is really the case! In fact, overlapping leads to a median auc of 0.76 whereas non-overlapping gives 0.71).

    What would be the best way of dealing with the problem ?

    opened by francescopisu 4
  • Issue with using XGBClassifier

    Issue with using XGBClassifier

    When I use XGBClassifier (from XGboost library) using any DES or DCS algorithm, I am getting a features_names mismatch error. I have pooled several other classifiers successfully. This error only arises when Xgboost is included in the pooled classifiers. Moreover I have successfully been able to use XGBoost in the Deslib Stacked Classifier algorithm. Please note as per previous advice, I have installed the latest version (0.3.5) of the library using the code:

    pip install git+https://github.com/scikit-learn-contrib/DESlib

    Following is the main code:

    X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.2, random_state=0)
    
    classifiers = [LogisticRegression(), RandomForestClassifier(), RUSBoostClassifier(), XGBClassifier(), DecisionTreeClassifier()]
    
    for c in classifiers: c.fit(X_train, Y_train)
    
    model = RRC(pool_classifiers=classifiers, random_state=0)
    
    model.fit(X_train, Y_train)
    
    

    ValueError: feature_names mismatch:

    opened by atifov 1
  • Faiss Powered Multi-Label Classification

    Faiss Powered Multi-Label Classification

    Do any of the implementations of KNN within this repo extend to the multi-label case? That is to say, usable in the context of multi-object detection from neural network embeddings or similar.

    I was particularly hopeful that KNNE might work? But it's not obvious from the relevant literature.

    Thanks a lot.

    opened by GeorgePearse 2
Owner
scikit-learn compatible projects
null
A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

null 2.1k Jan 2, 2023
Data Analysis Baseline Library

dabl The data analysis baseline library. "Mr Sanchez, are you a data scientist?" "I dabl, Mr president." Find more information on the website. State o

Andreas Mueller 122 Dec 27, 2022
Large-scale linear classification, regression and ranking in Python

lightning lightning is a library for large-scale linear classification, regression and ranking in Python. Highlights: follows the scikit-learn API con

null 1.6k Dec 31, 2022
Multivariate imputation and matrix completion algorithms implemented in Python

A variety of matrix completion and imputation algorithms implemented in Python 3.6. To install: pip install fancyimpute Do not use conda. We don't sup

Alex Rubinsteyn 1.1k Dec 18, 2022
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

null 6.2k Jan 1, 2023
(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

combo: A Python Toolbox for Machine Learning Model Combination Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License

Yue Zhao 606 Dec 21, 2022
Topological Data Analysis for Python🐍

Scikit-TDA is a home for Topological Data Analysis Python libraries intended for non-topologists. This project aims to provide a curated library of TD

Scikit-TDA 373 Dec 24, 2022
machine learning with logical rules in Python

skope-rules Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license. Skope-rules a

null 504 Dec 31, 2022
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 1, 2023
ML-Ensemble – high performance ensemble learning

A Python library for high performance ensemble learning ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framew

Sebastian Flennerhag 764 Dec 31, 2022
Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Static Features Classifier This is a static features classifier for Point-Could

ABDALKARIM MOHTASIB 1 Jan 25, 2022
Abhijith Neil Abraham 2 Nov 5, 2021
PyTorch Implementation for AAAI'21 "Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection"

UMS for Multi-turn Response Selection Implements the model described in the following paper Do Response Selection Models Really Know What's Next? Utte

Taesun Whang 47 Nov 22, 2022
Implementation of "Selection via Proxy: Efficient Data Selection for Deep Learning" from ICLR 2020.

Selection via Proxy: Efficient Data Selection for Deep Learning This repository contains a refactored implementation of "Selection via Proxy: Efficien

Stanford Future Data Systems 70 Nov 16, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

Intelligent Systems Lab Org 2.3k Jan 1, 2023
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

null 10 Oct 7, 2022
zoofs is a Python library for performing feature selection using an variety of nature inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics based to Evolutionary. It's easy to use ,flexible and powerful tool to reduce your feature size.

zoofs is a Python library for performing feature selection using a variety of nature-inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics-based to Evolutionary. It's easy to use , flexible and powerful tool to reduce your feature size.

Jaswinder Singh 168 Dec 30, 2022
The Python ensemble sampling toolkit for affine-invariant MCMC

emcee The Python ensemble sampling toolkit for affine-invariant MCMC emcee is a stable, well tested Python implementation of the affine-invariant ense

Dan Foreman-Mackey 1.3k Jan 4, 2023
The Python ensemble sampling toolkit for affine-invariant MCMC

emcee The Python ensemble sampling toolkit for affine-invariant MCMC emcee is a stable, well tested Python implementation of the affine-invariant ense

Dan Foreman-Mackey 1.3k Dec 31, 2022