pandas, scikit-learn, xgboost and seaborn integration

Last update: Dec 30, 2022

Related tags

Machine Learning pandas-ml

Overview

pandas-ml

https://travis-ci.org/pandas-ml/pandas-ml.svg?branch=master

Overview

pandas, scikit-learn and xgboost integration.

Installation

$ pip install pandas_ml

Documentation

http://pandas-ml.readthedocs.org/en/stable/

Example

>>> import pandas_ml as pdml
>>> import sklearn.datasets as datasets

# create ModelFrame instance from sklearn.datasets
>>> df = pdml.ModelFrame(datasets.load_digits())
>>> type(df)
<class 'pandas_ml.core.frame.ModelFrame'>

# binarize data (features), not touching target
>>> df.data = df.data.preprocessing.binarize()
>>> df.head()
   .target  0  1  2  3  4  5  6  7  8 ...  54  55  56  57  58  59  60  61  62  63
0        0  0  0  1  1  1  1  0  0  0 ...   0   0   0   0   1   1   1   0   0   0
1        1  0  0  0  1  1  1  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
2        2  0  0  0  1  1  1  0  0  0 ...   1   0   0   0   0   1   1   1   1   0
3        3  0  0  1  1  1  1  0  0  0 ...   1   0   0   0   1   1   1   1   0   0
4        4  0  0  0  1  1  0  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
[5 rows x 65 columns]

# split to training and test data
>>> train_df, test_df = df.model_selection.train_test_split()

# create estimator (accessor is mapped to sklearn namespace)
>>> estimator = df.svm.LinearSVC()

# fit to training data
>>> train_df.fit(estimator)

# predict test data
>>> test_df.predict(estimator)
0     4
1     2
2     7
...
448    5
449    8
Length: 450, dtype: int64

# Evaluate the result
>>> test_df.metrics.confusion_matrix()
Predicted   0   1   2   3   4   5   6   7   8   9
Target
0          52   0   0   0   0   0   0   0   0   0
1           0  37   1   0   0   1   0   0   3   3
2           0   2  48   1   0   0   0   1   1   0
3           1   1   0  44   0   1   0   0   3   1
4           1   0   0   0  43   0   1   0   0   0
5           0   1   0   0   0  39   0   0   0   0
6           0   1   0   0   1   0  35   0   0   0
7           0   0   0   0   2   0   0  42   1   0
8           0   2   1   0   1   0   0   0  33   1
9           0   2   1   2   0   0   0   0   1  38

Supported Packages

scikit-learn
patsy
xgboost

Comments

Fixed imports of deprecated modules which were removed in pandas 0.24.0

Certain functions were deprecated in a previous version of pandas and moved to a different module (see #117). This PR fixes the imports of those functions.

opened by kristofve 8
REL: v0.4.0
[x] Compat/test for sklearn 0.18.0 (#81)

[x] initial fix (#81)

[x] wrapper for cross validation classes (re-enable skipped tests) (#85)

[x] tests for multioutput (#86)

[x] Update doc

[x] Compat/test for pandas 0.19.0 (#83)

[x] Update release note (#88)
opened by sinhrks 4
Importation error

I tried to import pandas_ml but it gave the error :

AttributeError: type object 'NDFrame' has no attribute 'groupby'

I'm running python3.8.1 and I installed pandas_ml via pip (version 20.0.2)

I dig in the code, error is l.80 of file series.py

@Appender(pd.core.generic.NDFrame.groupby.__doc__)

Here pandas is imported at the top of the file with a classic import pandas as pd

I guess there is a problem with the versions...

Thanks in advance for any help

opened by ierezell 2
Confusion Matrix no accessible

Hi,

I've been using confusion_matrix since it was an independent package. I've installed pandas_ml to continue using the package, but it seems that the setup.py script does not install the package.

Could it be an issue with the find_packages function?

opened by mmartinortiz 2

Seaborn Scatterplot matrix / pairplot integration

import seaborn as sns
sns.set()

df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")

displays

iris_scatter_matrix

but pairplot doesn't work the same way with ModelFrame

import pandas as pd
pd.set_option('max_rows', 10)
import sklearn.datasets as datasets
import pandas_ml as pdml  # https://github.com/pandas-ml/pandas-ml
import seaborn as sns
import matplotlib.pyplot as plt
df = pdml.ModelFrame(datasets.load_iris())
sns.pairplot(df, hue=".target")

iris_modelframe

There is some useless subplots

opened by scls19fr 2

Error while running train.py from speech commands in tensorflow examples.

Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

opened by ayush7 1
error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

code from example https://pandas-ml.readthedocs.io/en/latest/xgboost.html '''import pandas_ml as pdml import sklearn.datasets as datasets df = pdml.ModelFrame(datasets.load_digits()) train_df, test_df = df.cross_validation.train_test_split() estimator = df.xgboost.XGBClassifier() train_df.fit(estimator) predicted = test_df.predict(estimator) q=1 test_df.metrics.confusion_matrix() train_df.xgboost.plot_importance()

tuned_parameters = [{'max_depth': [3, 4]}] cv = df.grid_search.GridSearchCV(df.xgb.XGBClassifier(), tuned_parameters, cv=5)

df.fit(cv) df.grid_search.describe(cv) q=1

'''

gives error ''' File "E:\Pandas\my_code\S_pandas_ml_feb27.py", line 10, in train_df.xgboost.plot_importance() File "C:\Users\sndr\Anaconda3\Lib\site-packages\pandas_ml\xgboost\base.py", line 61, in plot_importance return xgb.plot_importance(self._df.estimator.booster(),

builtins.TypeError: 'str' object is not callable ''' I use Windows and 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] Python Type "help", "copyright", "credits" or "license" for more information.

opened by Sandy4321 1
pandas 0.24.0 has deprecated pandas.util.decorators

See https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#deprecations

This causes the import statement in https://github.com/pandas-ml/pandas-ml/blob/master/pandas_ml/core/frame.py to break.

Looks like just need to change it to 'from pandas.utils'

opened by usul83 1
'mean_absoloute_error

from sklearn import metrics print('MAE:',metrics.mean_absoloute_error(y_test,y_pred)) module 'sklearn.metrics' has no attribute 'mean_absoloute_error This error is occurred..any solution

opened by vikramk1507 0
AttributeError: type object 'NDFrame' has no attribute 'groupby'

AttributeError: type object 'NDFrame' has no attribute 'groupby'

from pandas_ml import ConfusionMatrix cm = ConfusionMatrix(actu, pred) cm.print_stats()

AttributeError Traceback (most recent call last) in ----> 1 from pandas_ml import confusion_matrix 2 3 cm = ConfusionMatrix(actu, pred) 4 cm.print_stats()

/usr/local/lib/python3.8/site-packages/pandas_ml/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core import ModelFrame, ModelSeries # noqa 4 from pandas_ml.tools import info # noqa 5 from pandas_ml.version import version as version # noqa

/usr/local/lib/python3.8/site-packages/pandas_ml/core/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core.frame import ModelFrame # noqa 4 from pandas_ml.core.series import ModelSeries # noqa

/usr/local/lib/python3.8/site-packages/pandas_ml/core/frame.py in 16 from pandas_ml.core.accessor import _AccessorMethods 17 from pandas_ml.core.generic import ModelPredictor, _shared_docs ---> 18 from pandas_ml.core.series import ModelSeries 19 20

/usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in 9 10 ---> 11 class ModelSeries(ModelTransformer, pd.Series): 12 """ 13 Wrapper for pandas.Series to support sklearn.preprocessing

/usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in ModelSeries() 78 return df 79 ---> 80 @Appender(pd.core.generic.NDFrame.groupby.doc) 81 def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, 82 group_keys=True, squeeze=False):

AttributeError: type object 'NDFrame' has no attribute 'groupby'

opened by gfranco008 5
AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

I am using scikit-learn version 0.23.1 and I get the following error: AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' when calling the function ConfusionMatrix.

opened by petraknovak 11
Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

opened by ayush7 3

Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

SKLEARN

sklearn.preprocessing.Imputer Warning DEPRECATED

class sklearn.preprocessing.Imputer(*args, **kwargs)[source] Imputation transformer for completing missing values.

Releases(v0.6.1)

v0.6.1(Mar 5, 2019)

Source code(tar.gz)
Source code(zip)
v0.6.0(Jan 15, 2019)

Source code(tar.gz)
Source code(zip)
v0.5.0(Nov 16, 2017)

Source code(tar.gz)
Source code(zip)
v0.4.0(Oct 15, 2016)
Support scikit-learn v0.17.x and v0.18.0.

Support imbalanced-learn via .imbalance accessor.

Added pandas_ml.ConfusionMatrix class for easier classification results evaluation.

Source code(tar.gz)
Source code(zip)
v0.3.0(Oct 22, 2015)

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 12, 2015)

Source code(tar.gz)
Source code(zip)
pandas_ml-0.2.0.tar.gz(41.68 KB)
v0.1.1(Mar 13, 2015)

Source code(tar.gz)
Source code(zip)
v0.1.0(Mar 7, 2015)

Source code(tar.gz)
Source code(zip)
v0.0.1(Mar 1, 2015)

Source code(tar.gz)
Source code(zip)

Owner

GitHub

PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

1.1k Jan 4, 2023

Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

2 Jan 18, 2022

Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

5 May 14, 2022

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

519 Jan 3, 2023

XGBoost + Optuna

AutoXGB XGBoost + Optuna: no brainer auto train xgboost directly from CSV files auto tune xgboost using optuna auto serve best xgboot model using fast

517 Dec 31, 2022

Mortality risk prediction for COVID-19 patients using XGBoost models

Mortality risk prediction for COVID-19 patients using XGBoost models Using demographic and lab test data received from the HM Hospitales in Spain, I b

1 Jan 19, 2022

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

1 Jan 11, 2022

Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Iris Species Predictor Iris species predictor app is used to classify iris species using their sepal length, sepal width, petal length and petal width

5 Apr 5, 2022

A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

0 Mar 30, 2022

Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

3 Apr 5, 2022

Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

2 Nov 18, 2021

K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

1 Dec 13, 2021

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

24 Dec 9, 2022

A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

802 Jan 1, 2023

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

482 Nov 19, 2022

pandas, scikit-learn, xgboost and seaborn integration

Related tags

Overview

pandas-ml

Overview

Installation

Documentation

Example

Supported Packages

Comments

Releases(v0.6.1)

v0.6.1(Mar 5, 2019)

v0.6.0(Jan 15, 2019)

v0.5.0(Nov 16, 2017)

v0.4.0(Oct 15, 2016)

v0.3.0(Oct 22, 2015)

v0.2.0(Sep 12, 2015)

v0.1.1(Mar 13, 2015)

v0.1.0(Mar 7, 2015)

v0.0.1(Mar 1, 2015)

Owner

PySpark + Scikit-learn = Sparkit-learn

Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

XGBoost + Optuna

Mortality risk prediction for COVID-19 patients using XGBoost models

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

A collection of Scikit-Learn compatible time series transformers and tools.

Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Scikit learn library models to account for data and concept drift.

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

K-Means clusternig example with Python and Scikit-learn

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

A scikit-learn based module for multi-label et. al. classification

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Automated Machine Learning with scikit-learn

Relevance Vector Machine implementation using the scikit-learn API.

Distributed scikit-learn meta-estimators in PySpark