A library of extension and helper modules for Python's data analysis and machine learning libraries.

Overview

DOI PyPI version Anaconda-Server Badge Build statu  s Coverage Status Python 3 License Discuss

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks.


Sebastian Raschka 2014-2021


Links



Installing mlxtend

PyPI

To install mlxtend, just execute

pip install mlxtend  

Alternatively, you could download the package manually from the Python Package Index https://pypi.python.org/pypi/mlxtend, unzip it, navigate into the package, and use the command:

python setup.py install

Conda

If you use conda, to install mlxtend just execute

conda install -c conda-forge mlxtend 

Dev Version

The mlxtend version on PyPI may always be one step behind; you can install the latest development version from the GitHub repository by executing

pip install git+git://github.com/rasbt/mlxtend.git#egg=mlxtend

Or, you can fork the GitHub repository from https://github.com/rasbt/mlxtend and install mlxtend from your local drive via

python setup.py install


Examples

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import itertools
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import EnsembleVoteClassifier
from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions

# Initializing Classifiers
clf1 = LogisticRegression(random_state=0)
clf2 = RandomForestClassifier(random_state=0)
clf3 = SVC(random_state=0, probability=True)
eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[2, 1, 1], voting='soft')

# Loading some example data
X, y = iris_data()
X = X[:,[0, 2]]

# Plotting Decision Regions
gs = gridspec.GridSpec(2, 2)
fig = plt.figure(figsize=(10, 8))

for clf, lab, grd in zip([clf1, clf2, clf3, eclf],
                         ['Logistic Regression', 'Random Forest', 'RBF kernel SVM', 'Ensemble'],
                         itertools.product([0, 1], repeat=2)):
    clf.fit(X, y)
    ax = plt.subplot(gs[grd[0], grd[1]])
    fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2)
    plt.title(lab)
plt.show()


If you use mlxtend as part of your workflow in a scientific publication, please consider citing the mlxtend repository with the following DOI:

@article{raschkas_2018_mlxtend,
  author       = {Sebastian Raschka},
  title        = {MLxtend: Providing machine learning and data science 
                  utilities and extensions to Python’s  
                  scientific computing stack},
  journal      = {The Journal of Open Source Software},
  volume       = {3},
  number       = {24},
  month        = apr,
  year         = 2018,
  publisher    = {The Open Journal},
  doi          = {10.21105/joss.00638},
  url          = {http://joss.theoj.org/papers/10.21105/joss.00638}
}
  • Raschka, Sebastian (2018) MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack. J Open Source Softw 3(24).

License

  • This project is released under a permissive new BSD open source license (LICENSE-BSD3.txt) and commercially usable. There is no warranty; not even for merchantability or fitness for a particular purpose.
  • In addition, you may use, copy, modify and redistribute all artistic creative works (figures and images) included in this distribution under the directory according to the terms and conditions of the Creative Commons Attribution 4.0 International License. See the file LICENSE-CC-BY.txt for details. (Computer-generated graphics such as the plots produced by matplotlib fall under the BSD license mentioned above).

Contact

I received a lot of feedback and questions about mlxtend recently, and I thought that it would be worthwhile to set up a public communication channel. Before you write an email with a question about mlxtend, please consider posting it here since it can also be useful to others! Please join the Google Groups Mailing List!

If Google Groups is not for you, please feel free to write me an email or consider filing an issue on GitHub's issue tracker for new feature requests or bug reports. In addition, I setup a Gitter channel for live discussions.

Issues
  • Group time series split

    Group time series split

    Code of Conduct

    Description

    Add group time series cross-validator implementation. Add tests with 100% coverage using pytest. I decided to create pull request before creating documentation and change log modification, to discuss current implementation and further steps to implement.

    Related issues or pull requests

    Fixes #910

    Pull Request Checklist

    • [x] Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
    • [X] Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
    • [x] Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
    • [X] Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
    • [X] Checked for style issues by running flake8 ./mlxtend
    opened by labdmitriy 40
  • Adds plot_decision_region_slices function

    Adds plot_decision_region_slices function

    Description

    Adds plot_decision_region_slices function to mlxtend/plotting/decision_regions.py

    Related issues or pull requests

    See issue #188

    Pull Request requirements

    • [x] Added appropriate unit test functions in the ./mlxtend/*/tests directories
    • [x] Ran nosetests ./mlxtend -sv and make sure that all unit tests pass
    • [ ] Checked the test coverage by running nosetests ./mlxtend --with-coverage
    • [x] Checked for style issues by running flake8 ./mlxtend
    • [x] Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file
    • [x] Modify documentation in the appropriate location under mlxtend/docs/sources/ (optional)
    • [x] Checked that the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend
    opened by jrbourbeau 27
  • Difference between RFECV and SFS ?

    Difference between RFECV and SFS ?

    Hi there,

    I used scikit.REFCV and mlxtend.SFS (backward) on the same data, same classifier, same cv, same scorer,... I also did a third version with sample weights passed to SFS's estimator

    RFE

    And i'm conflicted to see that many discrepencies. Should the largest subsets have at least the more or less same CV score ?

    From my unterstanding, RFECV and SFS backward should work the same except the first one selects subset based on feature importance and the latter one on the metric provided.

    Thanks for any input.

    Question 
    opened by lschneidpro 26
  • Module not found

    Module not found

    I'm new to python, so apologies if this is a silly question.

    I'm trying to use mlxtend, and have installed it using pip. Pip confirms that it is installed (when I type "pip install mlxtend" it notes that the requirement is already satisfied). However, when I try and import mlxtend in python using "import mlxtend as ml", I get the error: "ModuleNotFoundError: No module named 'mlxtend'". I used the same process for installing and importing pandas and numpy, and they both worked. Any advice?

    I should note that I have resorted to dropping in the specific code I need from mlxtend (apriori and association rules), which is working, but hardly a good long term strategy!

    I'm using python version 2.7

    Thanks!

    Question 
    opened by Minitabb 25
  • Out of fold stacking regressor

    Out of fold stacking regressor

    Description

    I've implemented a new ensemble regressor, for out-of-fold stacking. It's a different approach for training base regressors, that better avoids overfitting. For description of algorithm, see:

    https://dnc1994.com/2016/05/rank-10-percent-in-first-kaggle-competition-en/#Stacking

    I've only implemented the algorithm and some basic tests, but not written documentation yet - right now i'd like to know if you are interested in including this algorithm in the mlxtend code base before i spend more time.

    If you're interested to include this, i'm happy to iterate on review / code!

    Thanks for taking a look at this!

    Pull Request requirements

    • [X] Added appropriate unit test functions in the ./mlxtend/*/tests directories
    • [X] Ran nosetests ./mlxtend -sv and make sure that all unit tests pass
    • [X] Checked the test coverage by running nosetests ./mlxtend --with-coverage
    • [X] Checked for style issues by running flake8 ./mlxtend
    • [X] Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file
    • [ ] Modify documentation in the appropriate location under mlxtend/docs/sources/ (optional)
    • [X] Checked that the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend
    New Feature good-to-merge 
    opened by EikeDehling 25
  • Added optional name field to utils.counter

    Added optional name field to utils.counter

    Description

    Added a naming variable to utils.counter and chnaged time precision

    Related issues or pull requests

    Link related issues/pull requests here

    Pull Request requirements

    • [ ] Added appropriate unit test functions in the ./mlxtend/*/tests directories
    • [ ] Ran nosetests ./mlxtend -sv and make sure that all unit tests pass
    • [ ] Checked the test coverage by running nosetests ./mlxtend --with-coverage
    • [ ] Checked for style issues by running flake8 ./mlxtend
    • [x] Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file
    • [ ] Modify documentation in the appropriate location under mlxtend/docs/sources/ (optional)
    • [ ] Checked that the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend
    opened by matsavage 21
  • Stackingcvclassifier

    Stackingcvclassifier

    Please make sure that these boxes are checked before submitting a new pull request -- thank you!

    • [x] Create and check out a new topic branch
    • [x] Implement the new feature or apply the bug-fix
    • [x] Add appropriate unit test functions in the ./mlxtend/*/tests directories
    • [x] Run nosetests ./mlxtend -sv and make sure that all unit tests pass
    • [x] Check/improve the test coverage by running nosetests ./mlxtend --with-coverage
    • [x] Check for style issues by running flake8 ./mlxtend (you may want to run nosetests again after you made modifications to the code)
    • [x] Add a note about the modification or contribution to the ./docs/sources/CHANGELOG.md` file
    • [ ] Modify documentation in the appropriate location under mlxtend/docs/sources/
    • [x] Push the topic branch to the server and create a pull request
    • [x] Check the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend
    • [x] Squash (many small) commits to a larger commit

    For more information and instructions, please see http://rasbt.github.io/mlxtend/contributing/

    opened by reiinakano 20
  • getting a 'ValueError: all the input arrays must have same number of dimensions

    getting a 'ValueError: all the input arrays must have same number of dimensions" only for CV

    Hi,

    Thanks for the nice program. I am trying to use it for stacking and started with the CV version but can't seem to make it work.

    I am passing X & Y as below. They have the right shape and the code runs for the two classifiers clf1 and clf2. however, it hits a snag when it does the meta classifier. It complains:

    ValueError: all the input arrays must have same number of dimensions

    I don't understand this since the arrays have the right dimensions: X = (49352, 217) y = (49352,)

    Could someone help? Thanks! Sub

    Here's the code:

    clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) #clf3 = GaussianNB() lr = LogisticRegression() gb = xgb.XGBClassifier( learning_rate =0.1, n_estimators=1000, max_depth=5, min_child_weight=1,
    gamma=0,subsample=0.8,colsample_bytree=0.8,objective= 'multi:softprob',
    nthread=4,scale_pos_weight=1,seed=27)

    sclf = StackingCVClassifier(classifiers=[clf1, clf2], use_probas=True, meta_classifier=gb, random_state=42)

    X = train_X y = train_y

    print(train_X.shape,train_y.shape)

    print('3-fold cross validation:\n')

    for clf, label in zip([clf1, clf2, sclf], ['KNN', 'Random Forest', 'StackingClassifier']):

    scores = model_selection.cross_val_score(clf, X, y, 
                                              cv=3, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" 
          % (scores.mean(), scores.std(), label))
    

    =========== OUTPUT BELOW ====

    (49352, 217) (49352,)

    3-fold cross validation:

    Accuracy: 0.59 (+/- 0.01) [KNN] Accuracy: 0.71 (+/- 0.00) [Random Forest]

    ValueError Traceback (most recent call last) in () 29 30 scores = model_selection.cross_val_score(clf, X, y, ---> 31 cv=3, scoring='accuracy') 32 print("Accuracy: %0.2f (+/- %0.2f) [%s]" 33 % (scores.mean(), scores.std(), label))

    C:\Users\tickles\Anaconda3\lib\site-packages\sklearn\model_selection_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch) 138 train, test, verbose, None, 139 fit_params) --> 140 for train, test in cv_iter) 141 return np.array(scores)[:, 0] 142

    C:\Users\tickles\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in call(self, iterable) 756 # was dispatched. In particular this covers the edge 757 # case of Parallel used with an exhausted iterator. --> 758 while self.dispatch_one_batch(iterator): 759 self._iterating = True 760 else:

    C:\Users\tickles\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator) 606 return False 607 else: --> 608 self._dispatch(tasks) 609 return True 610

    C:\Users\tickles\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in _dispatch(self, batch) 569 dispatch_timestamp = time.time() 570 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self) --> 571 job = self._backend.apply_async(batch, callback=cb) 572 self._jobs.append(job) 573

    C:\Users\tickles\Anaconda3\lib\site-packages\sklearn\externals\joblib_parallel_backends.py in apply_async(self, func, callback) 107 def apply_async(self, func, callback=None): 108 """Schedule a func to be run""" --> 109 result = ImmediateResult(func) 110 if callback: 111 callback(result)

    C:\Users\tickles\Anaconda3\lib\site-packages\sklearn\externals\joblib_parallel_backends.py in init(self, batch) 324 # Don't delay the application, to avoid keeping the input 325 # arguments in memory --> 326 self.results = batch() 327 328 def get(self):

    C:\Users\tickles\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in call(self) 129 130 def call(self): --> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items] 132 133 def len(self):

    C:\Users\tickles\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in (.0) 129 130 def call(self): --> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items] 132 133 def len(self):

    C:\Users\tickles\Anaconda3\lib\site-packages\sklearn\model_selection_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score) 236 estimator.fit(X_train, **fit_params) 237 else: --> 238 estimator.fit(X_train, y_train, **fit_params) 239 240 except Exception as e:

    C:\Users\tickles\Anaconda3\lib\site-packages\mlxtend\classifier\stacking_cv_classification.py in fit(self, X, y) 187 y[test_index])) 188 reordered_features = np.concatenate((reordered_features, --> 189 X[test_index])) 190 191 # Fit the base models correctly this time using ALL the training set

    ValueError: all the input arrays must have same number of dimensions

    opened by subratac 19
  • Rule generation

    Rule generation

    Description

    Implemented a rule generation algorithm to extract rules from frequent itemsets based on the metrics confidece, lift and conviction.

    Related issues or pull requests

    Fixes #157

    Pull Request requirements

    • [x] Added appropriate unit test functions in the ./mlxtend/*/tests directories
    • [x] Ran nosetests ./mlxtend -sv and make sure that all unit tests pass
    • [x] Checked the test coverage by running nosetests ./mlxtend --with-coverage
    • [x] Checked for style issues by running flake8 ./mlxtend
    • [x] Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file
    • [x] Modify documentation in the appropriate location under mlxtend/docs/sources/ (optional)
    • [x] Checked that the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend
    opened by jgoerner 19
  • Stoppable Sequential Feature Selection

    Stoppable Sequential Feature Selection

    Large SFS jobs can be pretty slow at times, so it’s useful to track not only the number of features that have been selected (the current print_progress), but also the avg. score at that step, and allow the user to control+c to raise a keyboard interrupt, and then catch and handle that safely to get back the subset of selected features.

    This PR accomplishes these by adding an optional verbose parameter (similar to other libraries), which allows the user to get not only the existing progress, but also to raise the verbosity to get detailed progress updates like:

    [2016-09-27 15:45:52] Features: 1/50 -- avg. score: 0.566246124946 [2016-09-27 15:46:29] Features: 2/50 -- avg. score: 0.592720840564 [2016-09-27 15:47:06] Features: 3/50 -- avg. score: 0.606385302233 [2016-09-27 15:47:46] Features: 4/50 -- avg. score: 0.625515735334 [2016-09-27 15:48:26] Features: 5/50 -- avg. score: 0.633775958769

    And when the user is happy with the score, or unhappy with the runtime, can keyboard interrupt and safely return the data backing the 5 features that have been selected so far.

    No core functionality of the feature selection is changed.

    opened by wdm0006 19
  • Take the best performance during a Sequential Feature Selector with Pipeline process

    Take the best performance during a Sequential Feature Selector with Pipeline process

    Hi Sebastian,

    I posted an issue after some tweet with you (I hope it could help other people).

    I would like to perform a Sequential Feature Selector (SFS) with Pipeline. But at the end of the process, SFS takes SFS.k_features (25 for this exemple) :

    clf1 = LogisticRegression(class_weight='balanced', solver='newton-cg', C=100.0, random_state=17)
    
    sfs1 = SFS(clf1, 
               k_features=25, 
               forward=True, 
               floating=False, 
               scoring='roc_auc',
               cv=5)
    sfs1 = sfs1.fit(data.values, y.values)
    
    clf1_pipe = Pipeline([('sfs1', sfs1),
                          ('Logistic Newton', clf1)])
    
    print clf1_pipe.named_steps['sfs1'].k_feature_idx_
    # (0, 1, 3, 4, 5, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 22, 23, 25, 27, 29, 30, 31, 34, 35)
    

    The score clf1_pipe.named_steps['sfs1'].k_score_is 0.6956081 but it is not the best score (performance) we got. In fact for we have a better score with 10 features :

    result_clf1_pipe = pd.DataFrame.from_dict(clf1_pipe.named_steps['sfs1'].get_metric_dict(confidence_interval=0.90)).T
    result_clf1_pipe.sort_values('avg_score', ascending=0, inplace=True)
    result_clf1_pipe.head()
    

    image

    Can we during the Pipeline process get automatically the feature selection corresponding to the best performance ?

    You can find the nootbook with the pipeline process SFS ("Using Pipieline to do it").

    Edit : I manually research the best number of k_features for SFS for all my Estimators. Then I plug them in a EnsembleVoteClassifier. The result is not what I expected (see : "Find Manually the best k_features for SFS and fit our ensemble ")

    Enhancement moderate in progress 
    opened by armgilles 18
  • Add Restacking references

    Add Restacking references

    Add https://github.com/kaz-Anova/StackNet as a reference for use_features_in_secondary for StackingClassifier and StackingCVClassifier. They point to a Kaggle competition and blog post that might also be worthwhile mentioning.

    Documentation 
    opened by rasbt 0
  • ExhaustiveFeatureSelector(print_progress=True) doesn't do anything

    ExhaustiveFeatureSelector(print_progress=True) doesn't do anything

    Describe the bug

    Waiting through a long EFS run, hoping to get some progress feedback, I don't see any progress shown anywhere.

    Steps/Code to Reproduce

    from sklearn.linear_model import LinearRegression
    from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS
    lr = LinearRegression()
    efs_def = EFS(
        lr,
        scoring="neg_mean_squared_error",
        print_progress=True,
        min_features=1,
        max_features=n_features,
        n_jobs=os.cpu_count(),
    )
    efsl = efs_def.fit(X.values, y.values)
    

    X is a Pandas dataframe with 19 columns and 90 rows. y is a single-column dataframe.

    This runs in Jupyter Notebook on Windows 10.

    Expected Results

    Some messages somewhere?

    Actual Results

    Nothing is printed in the notebook.

    Nothing is printed in the Command Prompt where I run jupyter notebook.

    Versions

    MLxtend 0.20.0 Windows-10-10.0.19044-SP0 Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] Scikit-learn 1.1.1 NumPy 1.22.4 SciPy 1.8.1

    Bug 
    opened by FlorinAndrei 2
  • SequentialFeatureSelector with estimator that requires a pandas DataFrame input

    SequentialFeatureSelector with estimator that requires a pandas DataFrame input

    Describe the workflow you want to enable

    When passing a pandas Dataframe into SequentialFeatureSelector, pass the dataframe into the estimator's fit method (sequential_feature_selector.py#L432) - not a numpy array as is currently done (sequential_feature_selector.py#L337).

    Describe your proposed solution

    Pass the provided X dataset into the _calc_score functions without changing into numpy arrays. If the dataset is a numpy array use X[:, k_idx] to select the columns. If the dataset is a pandas array use X.iloc[:, k_idx] to select the columns.

    Describe alternatives you've considered, if relevant

    I've tried using numpy arrays in my estimator, but it selects features based on their dtype. Pandas dataframes have unique dtypes per column, whereas numpy arrays lose this information and only have a global dtype.

    New Feature 
    opened by shane-breeze 2
  • Disable unnecessary warning in EnsembleVoteClassifier

    Disable unnecessary warning in EnsembleVoteClassifier

    There is an unnecessary warning

     /Users/sebastian/miniforge3/lib/python3.9/site-packages/mlxtend/classifier/ensemble_vote.py:166: UserWarning: fit_base_estimators=False enforces use_clones to be `False`
      warnings.warn("fit_base_estimators=False "
    

    when both use_clones and fit_base_estimators are False:

    eclf = EnsembleVoteClassifier(clfs=(clf1, clf2, clf3),
                                  weights=(1, 1, 1),
                                  use_clones=False,
                                  fit_base_estimators=False)
    eclf.fit(X_train, y_train)
    

    This warning should only be shown if

    • fit_base_estimators=False and use_clones=True

    not

    • fit_base_estimators=False and use_clones=False
    easy 
    opened by rasbt 0
  • Update contributor guide regarding setup.py develop

    Update contributor guide regarding setup.py develop

    I think the more modern version of

    python setup.py develop
    

    is

    pip install -e .
    

    The contributor guide should probably be updated with

    pip install -r requirements.txt
    pip install -e .
    
    Documentation 
    opened by rasbt 0
  • Contributor guide updates

    Contributor guide updates

    Just a suggestion: You may want to add to the contributor guide that the contributors should do git add and git commit after correcting the format with black, before pushing it to the upstream. (Maybe, there is a particular message that let users know about this. I didn't pay attention. If there is a message, then things should be clear.)

    As discussed in #911

    Documentation 
    opened by rasbt 0
Releases(v0.20.0)
  • v0.20.0(May 27, 2022)

    New Features and Enhancements

    Downloads
    New Features and Enhancements
    • The mlxtend.evaluate.bootstrap_point632_score now supports fit_params. (#861)
    • The mlxtend/plotting/decision_regions.py function now has a contourf_kwargs for matplotlib to change the look of the decision boundaries if desired. (#881 via [pbloem])
    • Add a norm_colormap parameter to mlxtend.plotting.plot_confusion_matrix, to allow normalizing the colormap, e.g., using matplotlib.colors.LogNorm() (#895)
    • Add new GroupTimeSeriesSplit class for evaluation in time series tasks with support of custom groups and additional parameters in comparison with scikit-learn's TimeSeriesSplit. (#915 via Dmitry Labazkin)
    Changes
    • Due to compatibility issues with newer package versions, certain functions from six.py have been removed so that mlxtend may not work anymore with Python 2.7.
    • As an internal change to speed up unit testing, unit testing is now faciliated by GitHub workflows, and Travis CI and Appveyor hooks have been removed.
    • Improved axis label rotation in mlxtend.plotting.heatmap and mlxtend.plotting.plot_confusion_matrix (#872)
    • Fix various typos in McNemar guides.
    • Raises a warning if non-bool arrays are used in the frequent pattern functions apriori, fpmax, and fpgrowth. (#934 via NimaSarajpoor)
    Bug Fixes
    • Fix unreadable labels in heatmap for certain colormaps. (#852)
    • Fix an issue in mlxtend.plotting.plot_confusion_matrix when string class names are passed (#894)
    Source code(tar.gz)
    Source code(zip)
  • v0.19.0(Sep 2, 2021)

    Version 0.19.0 (09/02/2021)

    New Features
    • Adds a second "balanced accuracy" interpretation ("balanced") to evaluate.accuracy_score in addition to the existing "average" option to compute the scikit-learn-style balanced accuracy. (#764)
    • Adds new scatter_hist function to mlxtend.plotting for generating a scattered histogram. (#757 via Maitreyee Mhasaka)
    • The evaluate.permutation_test function now accepts a paired argument to specify to support paired permutation/randomization tests. (#768)
    • The StackingCVRegressor now also supports multi-dimensional targets similar to StackingRegressor via StackingCVRegressor(..., multi_output=True). (#802 via Marco Tiraboschi)
    Changes
    • Updates unit tests for scikit-learn 0.24.1 compatibility. (#774)
    • StackingRegressor now requires setting StackingRegressor(..., multi_output=True) if the target is multi-dimensional; this allows for better input validation. (#802)
    • Removes deprecated res argument from plot_decision_regions. (#803)
    • Adds a title_fontsize parameter to plot_learning_curves for controlling the title font size; also the plot style is now the matplotlib default. (#818)
    • Internal change using 'c': 'none' instead of 'c': '' in mlxtend.plotting.plot_decision_regions's scatterplot highlights to stay compatible with Matplotlib 3.4 and newer. (#822)
    • Adds a fontcolor_threshold parameter to the mlxtend.plotting.plot_confusion_matrix function as an additional option for determining the font color cut-off manually. (#827)
    • The frequent_patterns.association_rules now raises a ValueError if an empty frequent itemset DataFrame is passed. (#843)
    • The .632 and .632+ bootstrap method implemented in the mlxtend.evaluate.bootstrap_point632_score function now use the whole training set for the resubstitution weighting term instead of the internal training set that is a new bootstrap sample in each round. (#844)
    Bug Fixes
    Source code(tar.gz)
    Source code(zip)
  • 0.18.0(Nov 26, 2020)

    New Features
    • The bias_variance_decomp function now supports optional fit_params for the estimators that are fit on bootstrap samples. (#748)
    • The bias_variance_decomp function now supports Keras estimators. (#725 via @hanzigs)
    • Adds new mlxtend.classifier.OneRClassifier (One Rule Classfier) class, a simple rule-based classifier that is often used as a performance baseline or simple interpretable model. (#726
    • Adds new create_counterfactual method for creating counterfactuals to explain model predictions. (#740)
    Changes
    • permutation_test (mlxtend.evaluate.permutation) ìs corrected to give the proportion of permutations whose statistic is at least as extreme as the one observed. (#721 via Florian Charlier)
    • Fixes the McNemar confusion matrix layout to match the convention (and documentation), swapping the upper left and lower right cells. (#744 via mmarius)
    Bug Fixes
    • The loss in LogisticRegression for logging purposes didn't include the L2 penalty for the first weight in the weight vector (this is not the bias unit). However, since this loss function was only used for logging purposes, and the gradient remains correct, this does not have an effect on the main code. (#741)
    • Fixes a bug in bias_variance_decomp where when the mse loss was used, downcasting to integers caused imprecise results for small numbers. (#749)
    Source code(tar.gz)
    Source code(zip)
  • 0.17.3(Jul 28, 2020)

    New Features
    • Add predict_proba kwarg to bootstrap methods, to allow bootstrapping of scoring functions that take in probability values. (#700 via Adam Li)
    • Add a cell_values parameter to mlxtend.plotting.heatmap() to optionally suppress cell annotations by setting cell_values=False. (#703
    Changes
    • Implemented both use_clones and fit_base_estimators (previously refit in EnsembleVoteClassifier) for EnsembleVoteClassifier and StackingClassifier. (#670 via Katrina Ni)
    • Switched to using raw strings for regex in mlxtend.text to prevent deprecation warning in Python 3.8 (#688)
    • Slice data in sequential forward selection before sending to parallel backend, reducing memory consumption.
    Bug Fixes
    • Fixes axis DeprecationWarning in matplotlib v3.1.0 and newer. (#673)
    • Fixes an issue with using meshgrid in no_information_rate function used by the bootstrap_point632_score function for the .632+ estimate. (#688)
    • Fixes an issue in fpmax that could lead to incorrect support values. (#692 via Steve Harenberg)
    Source code(tar.gz)
    Source code(zip)
  • v0.17.2(Feb 24, 2020)

    New Features
    Changes
    • The previously deprecated OnehotTransactions has been removed in favor of the TransactionEncoder.
    • Removed SparseDataFrame support in frequent pattern mining functions in favor of pandas >=1.0's new way for working sparse data. If you used SparseDataFrame formats, please see pandas' migration guide at https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating (#667)
    Bug Fixes
    Source code(tar.gz)
    Source code(zip)
  • v0.17.1(Jan 29, 2020)

    New Features
    • The SequentialFeatureSelector now supports using pre-specified feature sets via the fixed_features parameter. (#578)
    • Adds a new accuracy_score function to mlxtend.evaluate for computing basic classifcation accuracy, per-class accuracy, and average per-class accuracy. (#624 via Deepan Das)
    • StackingClassifier and StackingCVClassifiernow have a decision_function method, which serves as a preferred choice over predict_proba in calculating roc_auc and average_precision scores when the meta estimator is a linear model or support vector classifier. (#634 via Qiang Gu)
    Changes
    • Improve the runtime performance for the apriori frequent itemset generating function when low_memory=True. Setting low_memory=False (default) is still faster for small itemsets, but low_memory=True can be much faster for large itemsets and requires less memory. Also, input validation for apriori, ̀ fpgrowthandfpmaxtakes a significant amount of time when input pandas DataFrame is large; this is now dramatically reduced when input contains boolean values (and not zeros/ones), which is the case when usingTransactionEncoder`. (#619 via Denis Barbier)
    • Add support for newer sparse pandas DataFrame for frequent itemset algorithms. Also, input validation for apriori, ̀ fpgrowthandfpmax` runs much faster on sparse DataFrame when input pandas DataFrame contains integer values. (#621 via Denis Barbier)
    • Let fpgrowth and fpmax directly work on sparse DataFrame, they were previously converted into dense Numpy arrays. (#622 via Denis Barbier)
    Bug Fixes
    • Fixes a bug in mlxtend.plotting.plot_pca_correlation_graph that caused the explaind variances not summing up to 1. Also, improves the runtime performance of the correlation computation and adds a missing function argument for the explained variances (eigenvalues) if users provide their own principal components. (#593 via Gabriel Azevedo Ferreira)
    • Behavior of fpgrowth and apriori consistent for edgecases such as min_support=0. (#573 via Steve Harenberg)
    • fpmax returns an empty data frame now instead of raising an error if the frequent itemset set is empty. (#573 via Steve Harenberg)
    • Fixes and issue in mlxtend.plotting.plot_confusion_matrix, where the font-color choice for medium-dark cells was not ideal and hard to read. #588 via sohrabtowfighi)
    • The svd mode of mlxtend.feature_extraction.PrincipalComponentAnalysis now also n-1 degrees of freedom instead of n d.o.f. when computing the eigenvalues to match the behavior of eigen. #595
    • Disable input validation for StackingCVClassifier because it causes issues if pipelines are used as input. #606
    Source code(tar.gz)
    Source code(zip)
  • v0.17.0(Jul 19, 2019)

    New Features
    • Added an enhancement to the existing iris_data() such that both the UCI Repository version of the Iris dataset as well as the corrected, original version of the dataset can be loaded, which has a slight difference in two data points (consistent with Fisher's paper; this is also the same as in R). (via #539 via janismdhanbad)
    • Added optional groups parameter to SequentialFeatureSelector and ExhaustiveFeatureSelector fit() methods for forwarding to sklearn CV (#537 via arc12)
    • Added a new plot_pca_correlation_graph function to the mlxtend.plotting submodule for plotting a PCA correlation graph. (#544 via Gabriel-Azevedo-Ferreira)
    • Added a zoom_factor parameter to the mlxten.plotting.plot_decision_region function that allows users to zoom in and out of the decision region plots. (#545)
    • Added a function fpgrowth that implements the FP-Growth algorithm for mining frequent itemsets as a drop-in replacement for the existing apriori algorithm. (#550 via Steve Harenberg)
    • New heatmap function in mlxtend.plotting. (#552)
    • Added a function fpmax that implements the FP-Max algorithm for mining maximal itemsets as a drop-in replacement for the fpgrowth algorithm. (#553 via Steve Harenberg)
    • New figsize parameter for the plot_decision_regions function in mlxtend.plotting. (#555 via Mirza Hasanbasic)
    • New low_memory option for the apriori frequent itemset generating function. Setting low_memory=False (default) uses a substantially optimized version of the algorithm that is 3-6x faster than the original implementation (low_memory=True). (#567 via jmayse)
    Changes
    • Now uses the latest joblib library under the hood for multiprocessing instead of sklearn.externals.joblib. (#547)
    • Changes to StackingCVClassifier and StackingCVRegressor such that first-level models are allowed to generate output of non-numeric type. (#562)
    Bug Fixes
    • Fixed documentation of iris_data() under iris.py by adding a note about differences in the iris data in R and UCI machine learning repo.
    • Make sure that if the 'svd' mode is used in PCA, the number of eigenvalues is the same as when using 'eigen' (append 0's zeros in that case) (#565)
    Source code(tar.gz)
    Source code(zip)
  • v0.16.0(May 12, 2019)

    New Features
    • StackingCVClassifier and StackingCVRegressor now support random_state parameter, which, together with shuffle, controls the randomness in the cv splitting. (#523 via Qiang Gu)
    • StackingCVClassifier and StackingCVRegressor now have a new drop_last_proba parameter. It drops the last "probability" column in the feature set since if True, because it is redundant: p(y_c) = 1 - p(y_1) + p(y_2) + ... + p(y_{c-1}). This can be useful for meta-classifiers that are sensitive to perfectly collinear features. (#532)
    • Other stacking estimators, including StackingClassifier, StackingCVClassifier and StackingRegressor, support grid search over the regressors and even a single base regressor. (#522 via Qiang Gu)
    • Adds multiprocessing support to StackingCVClassifier. (#522 via Qiang Gu)
    • Adds multiprocessing support to StackingCVRegressor. (#512 via Qiang Gu)
    • Now, the StackingCVRegressor also enables grid search over the regressors and even a single base regressor. When there are level-mixed parameters, GridSearchCV will try to replace hyperparameters in a top-down order (see the documentation for examples details). (#515 via Qiang Gu)
    • Adds a verbose parameter to apriori to show the current iteration number as well as the itemset size currently being sampled. (#519
    • Adds an optional class_name parameter to the confusion matrix function to display class names on the axis as tick marks. (#487 via sandpiturtle)
    Changes
    • Due to new features, restructuring, and better scikit-learn support (for GridSearchCV, etc.) the StackingCVRegressor's meta regressor is now being accessed via 'meta_regressor__* in the parameter grid. E.g., if a RandomForestRegressor as meta- egressor was previously tuned via 'randomforestregressor__n_estimators', this has now changed to 'meta_regressor__n_estimators'. (#515 via Qiang Gu)
    • The same change mentioned above is now applied to other stacking estimators, including StackingClassifier, StackingCVClassifier and StackingRegressor. (#522 via Qiang Gu)
    Bug Fixes
    • The feature_selection.ColumnSelector now also supports column names of type int (in addition to str names) if the input is a pandas DataFrame. (#500 via tetrar124
    • Fix unreadable labels in plot_confusion_matrix for imbalanced datasets if show_absolute=True and show_normed=True. (#504)
    • Raises a more informative error if a SparseDataFrame is passed to apriori and the dataframe has integer column names that don't start with 0 due to current limitations of the SparseDataFrame implementation in pandas. (#503)
    • SequentialFeatureSelector now supports DataFrame as input for all operating modes (forward/backward/floating). #506
    • mlxtend.evaluate.feature_importance_permutation now correctly accepts scoring functions with proper function signature as metric argument. #528
    Source code(tar.gz)
    Source code(zip)
  • v0.15.0(Jan 19, 2019)

    New Features
    • Adds a new transformer class to mlxtend.image, EyepadAlign, that aligns face images based on the location of the eyes. (#466 by Vahid Mirjalili)
    • Adds a new function, mlxtend.evaluate.bias_variance_decomp that decomposes the loss of a regressor or classifier into bias and variance terms. (#470)
    • Adds a whitening parameter to PrincipalComponentAnalysis, to optionally whiten the transformed data such that the features have unit variance. (#475)
    Changes
    • Changed the default solver in PrincipalComponentAnalysis to 'svd' instead of 'eigen' to improve numerical stability. (#474)
    • The mlxtend.image.extract_face_landmarks now returns None if no facial landmarks were detected instead of an array of all zeros. (#466)
    Bug Fixes
    • The eigenvectors maybe have not been sorted in certain edge cases if solver was 'eigen' in PrincipalComponentAnalysis and LinearDiscriminantAnalysis. (#477, #478)
    Source code(tar.gz)
    Source code(zip)
  • v0.14.0(Nov 10, 2018)

    New Features
    • Added a scatterplotmatrix function to the plotting module. (#437)
    • Added sample_weight option to StackingRegressor, StackingClassifier, StackingCVRegressor, StackingCVClassifier, EnsembleVoteClassifier. (#438)
    • Added a RandomHoldoutSplit class to perform a random train/valid split without rotation in SequentialFeatureSelector, scikit-learn GridSearchCV etc. (#442)
    • Added a PredefinedHoldoutSplit class to perform a train/valid split, based on user-specified indices, without rotation in SequentialFeatureSelector, scikit-learn GridSearchCV etc. (#443)
    • Created a new mlxtend.image submodule for working on image processing-related tasks. (#457)
    • Added a new convenience function extract_face_landmarks based on dlib to mlxtend.image. (#458)
    • Added a method='oob' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the classic out-of-bag bootstrap estimate (#459)
    • Added a method='.632+' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the .632+ bootstrap estimate that addresses the optimism bias of the .632 bootstrap (#459)
    • Added a new mlxtend.evaluate.ftest function to perform an F-test for comparing the accuracies of two or more classification models. (#460)
    • Added a new mlxtend.evaluate.combined_ftest_5x2cv function to perform an combined 5x2cv F-Test for comparing the performance of two models. (#461)
    • Added a new mlxtend.evaluate.difference_proportions test for comparing two proportions (e.g., classifier accuracies) (#462)
    Changes
    • Addressed deprecations warnings in NumPy 0.15. (#425)
    • Because of complications in PR (#459), Python 2.7 was now dropped; since official support for Python 2.7 by the Python Software Foundation is ending in approx. 12 months anyways, this re-focussing will hopefully free up some developer time with regard to not having to worry about backward compatibility
    Bug Fixes
    • Fixed an issue with a missing import in mlxtend.plotting.plot_confusion_matrix. (#428)
    Source code(tar.gz)
    Source code(zip)
  • 0.13.0(Jul 21, 2018)

    Version 0.13.0 (07/20/2018)

    New Features
    • A meaningful error message is now raised when a cross-validation generator is used with SequentialFeatureSelector. (#377)
    • The SequentialFeatureSelector now accepts custom feature names via the fit method for more interpretable feature subset reports. (#379)
    • The SequentialFeatureSelector is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. (#379)
    • ColumnSelector now works with Pandas DataFrames columns. (#378 by Manuel Garrido)
    • The ExhaustiveFeatureSelector estimator in mlxtend.feature_selection now is safely stoppable mid-process by control+c. (#380)
    • Two new functions, vectorspace_orthonormalization and vectorspace_dimensionality were added to mlxtend.math to use the Gram-Schmidt process to convert a set of linearly independent vectors into a set of orthonormal basis vectors, and to compute the dimensionality of a vectorspace, respectively. (#382)
    • mlxtend.frequent_patterns.apriori now supports pandas SparseDataFrames to generate frequent itemsets. (#404 via Daniel Morales)
    • The plot_confusion_matrix function now has the ability to show normalized confusion matrix coefficients in addition to or instead of absolute confusion matrix coefficients with or without a colorbar. The text display method has been changed so that the full range of the colormap is used. The default size is also now set based on the number of classes.
    • Added support for merging the meta features with the original input features in StackingRegressor (via use_features_in_secondary) like it is already supported in the other Stacking classes. (#418)
    • Added a support_only to the association_rules function, which allow constructing association rules (based on the support metric only) for cropped input DataFrames that don't contain a complete set of antecedent and consequent support values. (#421)
    Changes
    • Itemsets generated with apriori are now frozensets (#393 by William Laney and #394)
    • Now raises an error if a input DataFrame to apriori contains non 0, 1, True, False values. #419)
    Bug Fixes
    • Allow mlxtend estimators to be cloned via scikit-learn's clone function. (#374)
    • Fixes bug to allow the correct use of refit=False in StackingRegressor and StackingCVRegressor (#384 and (#385) by selay01)
    • Allow StackingClassifier to work with sparse matrices when use_features_in_secondary=True (#408 by Floris Hoogenbook)
    • Allow StackingCVRegressor to work with sparse matrices when use_features_in_secondary=True (#416)
    • Allow StackingCVClassifier to work with sparse matrices when use_features_in_secondary=True (#417)
    Source code(tar.gz)
    Source code(zip)
  • v0.12.0(Apr 21, 2018)

    Downloads
    New Features
    • A new feature_importance_permuation function to compute the feature importance in classifiers and regressors via the permutation importance method (#358)
    • The fit method of the ExhaustiveFeatureSelector now optionally accepts **fit_params for the estimator that is used for the feature selection. (#354 by Zach Griffith)
    • The fit method of the SequentialFeatureSelector now optionally accepts **fit_params for the estimator that is used for the feature selection. (#350 by Zach Griffith)
    Changes
    • Replaced plot_decision_regions colors by a colorblind-friendly palette and adds contour lines for decision regions. (#348)
    • All stacking estimators now raise NonFittedErrors if any method for inference is called prior to fitting the estimator. (#353)
    • Renamed the refit parameter of both the StackingClassifier and StackingCVClassifier to use_clones to be more explicit and less misleading. (#368)
    Bug Fixes
    • Various changes in the documentation and documentation tools to fix formatting issues (#363)
    • Fixes a bug where the StackingCVClassifier's meta features were not stored in the original order when shuffle=True (#370)
    • Many documentation improvements, including links to the User Guides in the API docs (#371)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Mar 15, 2018)

    New Features
    • New function implementing the resampled paired t-test procedure (paired_ttest_resampled) to compare the performance of two models (also called k-hold-out paired t-test). (#323)
    • New function implementing the k-fold paired t-test procedure (paired_ttest_kfold_cv) to compare the performance of two models (also called k-hold-out paired t-test). (#324)
    • New function implementing the 5x2cv paired t-test procedure (paired_ttest_5x2cv) proposed by Dieterrich (1998) to compare the performance of two models. (#325)
    • A refit parameter was added to stacking classes (similar to the refit parameter in the EnsembleVoteClassifier), to support classifiers and regressors that follow the scikit-learn API but are not compatible with scikit-learn's clone function. (#325)
    • The ColumnSelector now has a drop_axis argument to use it in pipelines with CountVectorizers. (#333)
    Changes
    • Raises an informative error message if predict or predict_meta_features is called prior to calling the fit method in StackingRegressor and StackingCVRegressor. (#315)
    • The plot_decision_regions function now automatically determines the optimal setting based on the feature dimensions and supports anti-aliasing. The old res parameter has been deprecated. (#309 by Guillaume Poirier-Morency)
    • Apriori code is faster due to optimization in onehot transformation and the amount of candidates generated by the apriori algorithm. (#327 by Jakub Smid)
    • The OnehotTransactions class (which is typically often used in combination with the apriori function for association rule mining) is now more memory efficient as it uses boolean arrays instead of integer arrays. In addition, the OnehotTransactions class can be now be provided with sparse argument to generate sparse representations of the onehot matrix to further improve memory efficiency. (#328 by Jakub Smid)
    • The OneHotTransactions has been deprecated and replaced by the TransactionEncoder. (#332
    • The plot_decision_regions function now has three new parameters, scatter_kwargs, contourf_kwargs, and scatter_highlight_kwargs, that can be used to modify the plotting style. (#342 by James Bourbeau)
    Bug Fixes
    • Fixed issue when class labels were provided to the EnsembleVoteClassifier when refit was set to false. (#322)
    • Allow arrays with 16-bit and 32-bit precision in plot_decision_regions function. (#337)
    • Fixed bug that raised an indexing error if the number of items was <= 1 when computing association rules using the conviction metric. (#340)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Dec 22, 2017)

    New Features
    • New store_train_meta_features parameter for fit in StackingCVRegressor. if True, train meta-features are stored in self.train_meta_features_. New pred_meta_features method for StackingCVRegressor. People can get test meta-features using this method. (#294 via takashioya)
    • The new store_train_meta_features attribute and pred_meta_features method for the StackingCVRegressor were also added to the StackingRegressor, StackingClassifier, and StackingCVClassifier (#299 & #300)
    • New function (evaluate.mcnemar_tables) for creating multiple 2x2 contigency from model predictions arrays that can be used in multiple McNemar (post-hoc) tests or Cochran's Q or F tests, etc. (#307)
    • New function (evaluate.cochrans_q) for performing Cochran's Q test to compare the accuracy of multiple classifiers. (#310)
    Changes
    Bug Fixes
    • Improved numerical stability for p-values computed via the the exact McNemar test (#306)
    • nose is not required to use the library (#302)
    Source code(tar.gz)
    Source code(zip)
  • v.0.9.1(Nov 19, 2017)

    Version 0.9.1 (2017-11-19)

    Downloads
    New Features
    • Added mlxtend.evaluate.bootstrap_point632_score to evaluate the performance of estimators using the .632 bootstrap. (#283)
    • New max_len parameter for the frequent itemset generation via the apriori function to allow for early stopping. (#270)
    Changes
    • All feature index tuples in SequentialFeatureSelector or now in sorted order. (#262)
    • The SequentialFeatureSelector now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994). Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases. (#262)
    • utils.Counter now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. (#278 via Mathew Savage)
    Bug Fixes
    • Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. (#283)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Oct 22, 2017)

    New Features
    • Added evaluate.permutation_test, a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). (#250)
    • Added 'leverage' and 'conviction as evaluation metrics to the frequent_patterns.association_rules function. (#246 & #247)
    • Added a loadings_ attribute to PrincipalComponentAnalysis to compute the factor loadings of the features on the principal components. (#251)
    • Allow grid search over classifiers/regressors in ensemble and stacking estimators. (#259)
    • New make_multiplexer_dataset function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. (#263)
    • Added a new BootstrapOutOfBag class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. (#265)
    • The parameters for StackingClassifier, StackingCVClassifier, StackingRegressor, StackingCVRegressor, and EnsembleVoteClassifier can now be tuned using scikit-learn's GridSearchCV (#254 via James Bourbeau)
    Changes
    • The 'support' column returned by frequent_patterns.association_rules was changed to compute the support of "antecedant union consequent", and new antecedant support' and 'consequent support' column were added to avoid ambiguity. (#245)
    • Allow the OnehotTransactions to be cloned via scikit-learn's clone function, which is required by e.g., scikit-learn's FeatureUnion or GridSearchCV (via Iaroslav Shcherbatyi). (#249)
    Bug Fixes
    • Fix issues with self._init_time parameter in _IterativeModel subclasses. (#256)
    • Fix imprecision bug that occurred in plot_ecdf when run on Python 2.7. (264)
    • The vectors from SVD in PrincipalComponentAnalysis are no being scaled so that the eigenvalues via solver='eigen' and solver='svd' now store eigenvalues that have the same magnitudes. (#251)
    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Sep 9, 2017)

    Downloads
    New Features
    • Added a mlxtend.evaluate.bootstrap that implements the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth) #232
    • SequentialFeatureSelecor's k_features now accepts a string argument "best" or "parsimonious" for more "automated" feature selection. For instance, if "best" is provided, the feature selector will return the feature subset with the best cross-validation performance. If "parsimonious" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. #238
    Changes
    • SequentialFeatureSelector now uses np.nanmean over normal mean to support scorers that may return np.nan #211 (via mrkaiser)
    • The skip_if_stuck parameter was removed from SequentialFeatureSelector in favor of a more efficient implementation comparing the conditional inclusion/exclusion results (in the floating versions) to the performances of previously sampled feature sets that were cached #237
    • ExhaustiveFeatureSelector was modified to consume substantially less memory #195 (via Adam Erickson)
    Bug Fixes
    • Fixed a bug where the SequentialFeatureSelector selected a feature subset larger than then specified via the k_features tuple max-value #213
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Jun 23, 2017)

    Version 0.7.0 (2017-06-22)

    New Features
    Changes
    • The TensorFlow estimator have been removed from mlxtend, since TensorFlow has now very convenient ways to build on estimators, which render those implementations obsolete.
    • plot_decision_regions now supports plotting decision regions for more than 2 training features #189, via James Bourbeau).
    • Parallel execution in mlxtend.feature_selection.SequentialFeatureSelector and mlxtend.feature_selection.ExhaustiveFeatureSelector is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large (#193, via @whalebot-helmsman).
    • Raise meaningful error messages if pandas DataFrames or Python lists of lists are fed into the StackingCVClassifer as a fit arguments (198).
    • The n_folds parameter of the StackingCVClassifier was changed to cv and can now accept any kind of cross validation technique that is available from scikit-learn. For example, StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3)) or StackingCVClassifier(..., cv=GroupKFold(n_splits=3)) (#203, via Konstantinos Paliouras).
    Bug Fixes
    • SequentialFeatureSelector now correctly accepts a None argument for the scoring parameter to infer the default scoring metric from scikit-learn classifiers and regressors (#171).
    • The plot_decision_regions function now supports pre-existing axes objects generated via matplotlib's plt.subplots. (#184, see example)
    • Made math.num_combinations and math.num_permutations numerically stable for large numbers of combinations and permutations (#200).
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 18, 2017)

    Version 0.6.0 (2017-03-18)

    Downloads
    New Features
    • An association_rules function is implemented that allows to generate rules based on a list of frequent itemsets (via Joshua Goerner).
    Changes
    • Adds a black edgecolor to plots via plotting.plot_decision_regions to make markers more distinguishable from the background in matplotlib>=2.0.
    • The association submodule was renamed to frequent_patterns.
    Bug Fixes
    • The DataFrame index of apriori results are now unique and ordered.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Feb 14, 2017)

    Version 0.5.1 (2017-02-14)

    The CHANGELOG for the current development version is available at https://github.com/rasbt/mlxtend/blob/master/docs/sources/CHANGELOG.md.

    New Features
    • The EnsembleVoteClassifier has a new refit attribute that prevents refitting classifiers if refit=False to save computational time.
    • Added a new lift_score function in evaluate to compute lift score (via Batuhan Bardak).
    • StackingClassifier and StackingRegressor support multivariate targets if the underlying models do (via kernc).
    • StackingClassifier has a new use_features_in_secondary attribute like StackingCVClassifier.
    Changes
    • Changed default verbosity level in SequentialFeatureSelector to 0
    • The EnsembleVoteClassifier now raises a NotFittedError if the estimator wasn't fit before calling predict. (via Anton Loss)
    • Added new TensorFlow variable initialization syntax to guarantee compatibility with TensorFlow 1.0
    Bug Fixes
    • Fixed wrong default value for k_features in SequentialFeatureSelector
    • Cast selected feature subsets in the SequentialFeautureSelector as sets to prevent the iterator from getting stuck if the k_idx are different permutations of the same combination (via Zac Wellmer).
    • Fixed an issue with learning curves that caused the performance metrics to be reversed (via ipashchenko)
    • Fixed a bug that could occur in the SequentialFeatureSelector if there are similarly-well performing subsets in the floating variants (via Zac Wellmer).
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Nov 11, 2016)

    Version 0.5.0

    Downloads
    New Features
    • New ExhaustiveFeatureSelector estimator in mlxtend.feature_selection for evaluating all feature combinations in a specified range
    • The StackingClassifier has a new parameter average_probas that is set to True by default to maintain the current behavior. A deprecation warning was added though, and it will default to False in future releases (0.6.0); average_probas=False will result in stacking of the level-1 predicted probabilities rather than averaging these.
    • New StackingCVClassifier estimator in 'mlxtend.classifier' for implementing a stacking ensemble that uses cross-validation techniques for training the meta-estimator to avoid overfitting (Reiichiro Nakano)
    • New OnehotTransactions encoder class added to the preprocessing submodule for transforming transaction data into a one-hot encoded array
    • The SequentialFeatureSelector estimator in mlxtend.feature_selection now is safely stoppable mid-process by control+c, and deprecated print_progress in favor of a more tunable verbose parameter (Will McGinnis)
    • New apriori function in association to extract frequent itemsets from transaction data for association rule mining
    • New checkerboard_plot function in plotting to plot checkerboard tables / heat maps
    • New mcnemar_table and mcnemar functions in evaluate to compute 2x2 contingency tables and McNemar's test
    Changes
    • All plotting functions have been moved to mlxtend.plotting for compatibility reasons with continuous integration services and to make the installation of matplotlib optional for users of mlxtend's core functionality
    • Added a compatibility layer for scikit-learn 0.18 using the new model_selection module while maintaining backwards compatibility to scikit-learn 0.17.
    Bug Fixes
    • mlxtend.plotting.plot_decision_regions now draws decision regions correctly if more than 4 class labels are present
    • Raise AttributeError in plot_decision_regions when the X_higlight argument is a 1D array (chkoar)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Aug 25, 2016)

    Version 0.4.2 (2016-08-24)

    New Features
    • Added preprocessing.CopyTransformer, a mock class that returns copies of imput arrays via transform and fit_transform
    Changes
    • Added AppVeyor to CI to ensure MS Windows compatibility
    • Dataset are now saved as compressed .txt or .csv files rather than being imported as Python objects
    • feature_selection.SequentialFeatureSelector now supports the selection of k_features using a tuple to specify a "min-max" k_features range
    • Added "SVD solver" option to the PrincipalComponentAnalysis
    • Raise a AttributeError with "not fitted" message in SequentialFeatureSelector if transform or get_metric_dict are called prior to fit
    • Use small, positive bias units in TfMultiLayerPerceptron's hidden layer(s) if the activations are ReLUs in order to avoid dead neurons
    • Added an optional clone_estimator parameter to the SequentialFeatureSelector that defaults to True, avoiding the modification of the original estimator objects
    • More rigorous type and shape checks in the evaluate.plot_decision_regions function
    • DenseTransformer now doesn't raise and error if the input array is not sparse
    • API clean-up using scikit-learn's BaseEstimator as parent class for feature_selection.ColumnSelector
    Bug Fixes
    • Fixed a problem when a tuple-range was provided as argument to the SequentialFeatureSelector's k_features parameter and the scoring metric was more negative than -1 (e.g., as in scikit-learn's MSE scoring function) via wahutch
    • Fixed an AttributeError issue when verbose > 1 in StackingClassifier
    • Fixed a bug in classifier.SoftmaxRegression where the mean values of the offsets were used to update the bias units rather than their sum
    • Fixed rare bug in MLP _layer_mapping functions that caused a swap between the random number generation seed when initializing weights and biases
    Source code(tar.gz)
    Source code(zip)
  • 0.4.1(May 2, 2016)

    Version 0.4.1 (2016-05-01)

    New Features
    Changes
    • Due to refactoring of the estimator classes, the init_weights parameter of the fit methods was globally renamed to init_params
    • Overall performance improvements of estimators due to code clean-up and refactoring
    • Added several additional checks for correct array types and more meaningful exception messages
    • Added optional dropout to the tf_classifier.TfMultiLayerPerceptron classifier for regularization
    • Added an optional decay parameter to the tf_classifier.TfMultiLayerPerceptron classifier for adaptive learning via an exponential decay of the learning rate eta
    • Replaced old NeuralNetMLP by more streamlined MultiLayerPerceptron (classifier.MultiLayerPerceptron); now also with softmax in the output layer and categorical cross-entropy loss.
    • Unified init_params parameter for fit functions to continue training where the algorithm left off (if supported)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Feb 1, 2016)

    Version 0.3.0 (2016-01-31)

    • The mlxtend.preprocessing.standardize function now optionally returns the parameters, which are estimated from the array, for re-use. A further improvement makes the standardize function smarter in order to avoid zero-division errors
    • Added a progress bar tracker to classifier.NeuralNetMLP
    • Added a function to score predicted vs. target class labels evaluate.scoring
    • Added confusion matrix functions to create (evaluate.confusion_matrix) and plot (evaluate.plot_confusion_matrix) confusion matrices
    • Cosmetic improvements to the evaluate.plot_decision_regions function such as hiding plot axes
    • Renaming of classifier.EnsembleClassfier to classifier.EnsembleVoteClassifier
    • Improved random weight initialization in Perceptron, Adaline, LinearRegression, and LogisticRegression
    • Changed learning parameter of mlxtend.classifier.Adaline to solver and added "normal equation" as closed-form solution solver
    • New style parameter and improved axis scaling in mlxtend.evaluate.plot_learning_curves
    • Hide y-axis labels in mlxtend.evaluate.plot_decision_regions in 1 dimensional evaluations
    • Added loadlocal_mnist to mlxtend.data for streaming MNIST from a local byte files into numpy arrays
    • New NeuralNetMLP parameters: random_weights, shuffle_init, shuffle_epoch
    • Sequential Feature Selection algorithms were unified into a single SequentialFeatureSelector class with parameters to enable floating selection and toggle between forward and backward selection.
    • New SFS features such as the generation of pandas DataFrame results tables and plotting functions (with confidence intervals, standard deviation, and standard error bars)
    • Added support for regression estimators in SFS
    • Stratified sampling of MNIST (now 500x random samples from each of the 10 digit categories)
    • Added Boston housing dataset
    • Renaming mlxtend.plotting to mlxtend.general_plotting in order to distinguish general plotting function from specialized utility function such as evaluate.plot_decision_regions
    • Shuffle fix and new shuffle parameter for classifier.NeuralNetMLP
    Source code(tar.gz)
    Source code(zip)
Owner
Sebastian Raschka
Machine Learning researcher & open source contributor. Author of "Python Machine Learning." Asst. Prof. of Statistics @ UW-Madison.
Sebastian Raschka
SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow, in High Performance Computing (HPC) simulations and workloads.

Cray Labs 126 Jun 15, 2022
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Daniel Formoso 5.5k Jun 20, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.2k Jun 23, 2022
A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

Allen Chiang 140 Jun 10, 2022
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

Benedek Rozemberczki 595 Jun 28, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.6k Jun 21, 2022
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 76 Jun 13, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 8k Jun 29, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 17 Apr 4, 2022
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

null 27 Jun 20, 2022
Python Automated Machine Learning library for tabular data.

Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Scie

Daniel Khromov 46 Jun 27, 2022
Upgini : data search library for your machine learning pipelines

Automated data search library for your machine learning pipelines → find & deliver relevant external data & features to boost ML accuracy :chart_with_upwards_trend:

Upgini 43 Jun 25, 2022
customer churn prediction prevention in telecom industry using machine learning and survival analysis

Telco Customer Churn Prediction - Plotly Dash Application Description This dash application allows you to predict telco customer churn using machine l

Benaissa Mohamed Fayçal 3 Nov 20, 2021
Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

Trading Tesla with Machine Learning and Sentiment Analysis An interactive program to train a Random Forest Classifier to predict Tesla daily prices us

Renato Votto 27 Jun 20, 2022
A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects

KXY: A Seemless API to 10x The Productivity of Machine Learning Engineers Documentation https://www.kxy.ai/reference/ Installation From PyPi: pip inst

KXY Technologies, Inc. 14 Apr 17, 2022
Data science, Data manipulation and Machine learning package.

duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l

David Kundih 3 Mar 12, 2022
Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

Azaria Gebremichael 2 Jul 29, 2021
Uber Open Source 1.4k Jun 20, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.6k Jun 28, 2022