๐Ÿค– โšก scikit-learn tips

Overview

๐Ÿค– โšก scikit-learn tips

New tips are posted on LinkedIn, Twitter, and Facebook.

๐Ÿ‘‰ Sign up to receive 2 video tips by email every week! ๐Ÿ‘ˆ

List of all tips

Click to discuss the tip on LinkedIn, click to view the Jupyter notebook for a tip, or click to watch the tip video on YouTube:

# Description Links
1 Use ColumnTransformer to apply different preprocessing to different columns
2 Seven ways to select columns using ColumnTransformer
3 What is the difference between "fit" and "transform"?
4 Use "fit_transform" on training data, but "transform" (only) on testing/new data
5 Four reasons to use scikit-learn (not pandas) for ML preprocessing
6 Encode categorical features using OneHotEncoder or OrdinalEncoder
7 Handle unknown categories with OneHotEncoder by encoding them as zeros
8 Use Pipeline to chain together multiple steps
9 Add a missing indicator to encode "missingness" as a feature
10 Set a "random_state" to make your code reproducible
11 Impute missing values using KNNImputer or IterativeImputer
12 What is the difference between Pipeline and make_pipeline?
13 Examine the intermediate steps in a Pipeline
14 HistGradientBoostingClassifier natively supports missing values
15 Three reasons not to use drop='first' with OneHotEncoder
16 Use cross_val_score and GridSearchCV on a Pipeline
17 Try RandomizedSearchCV if GridSearchCV is taking too long
18 Display GridSearchCV or RandomizedSearchCV results in a DataFrame
19 Important tuning parameters for LogisticRegression
20 Plot a confusion matrix
21 Compare multiple ROC curves in a single plot
22 Use the correct methods for each type of Pipeline
23 Display the intercept and coefficients for a linear model
24 Visualize a decision tree two different ways
25 Prune a decision tree to avoid overfitting
26 Use stratified sampling with train_test_split
27 Two ways to impute missing values for a categorical feature
28 Save a model or Pipeline using joblib
29 Vectorize two text columns in a ColumnTransformer
30 Four ways to examine the steps of a Pipeline
31 Shuffle your dataset when using cross_val_score
32 Use AUC to evaluate multiclass problems
33 Use FunctionTransformer to convert functions into transformers
34 Add feature selection to a Pipeline
35 Don't use .values when passing a pandas object to scikit-learn
36 Most parameters should be passed as keyword arguments
37 Create an interactive diagram of a Pipeline in Jupyter
38 Get the feature names output by a ColumnTransformer
39 Load a toy dataset into a DataFrame
40 Estimators only print parameters that have been changed
41 Drop the first category from binary features (only) with OneHotEncoder
42 Passthrough some columns and drop others in a ColumnTransformer
43 Use OrdinalEncoder instead of OneHotEncoder with tree-based models
44 Speed up GridSearchCV using parallel processing
45 Create feature interactions using PolynomialFeatures
46 Ensemble multiple models using VotingClassifer or VotingRegressor
47 Tune the parameters of a VotingClassifer or VotingRegressor
48 Access part of a Pipeline using slicing
49 Tune multiple models simultaneously with GridSearchCV
50 Adapt this pattern to solve many Machine Learning problems

You can interact with all of these notebooks online using Binder:

Note: Some of the tips do not include any code, and can only be viewed on LinkedIn.

Who creates these tips?

Hi! I'm Kevin Markham, the founder of Data School. I've been teaching data science in Python since 2014. I create these tips because I love using scikit-learn and I want to help others use it more effectively.

How can I get better at scikit-learn?

I teach three courses:

๐Ÿ‘‰ Find out which course is right for you! ๐Ÿ‘ˆ

Do you have any other tips?

Yes! In 2019, I posted 100 pandas tricks. I also created a video featuring my top 25 pandas tricks.

ยฉ 2020-2021 Data School. All rights reserved.

You might also like...
K-Means clusternig example with Python and Scikit-learn
K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

Scikit-Learn useful pre-defined Pipelines Hub
Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn
Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn

Clustering Clustering Application in Python Using scikit-learn This repository contains the prediction of baseball metric clusters using MLB Statcast

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

scikit-multimodallearn is a Python package implementing algorithms multimodal data.

scikit-multimodallearn is a Python package implementing algorithms multimodal data. It is compatible with scikit-learn, a popul

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.
scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly. Its main purpose is the transformation of bilinear forms into sparse matrices and linear forms into vectors.

Learn how to responsibly deliver value with ML.
Learn how to responsibly deliver value with ML.

Made With ML Applied ML ยท MLOps ยท Production Join 30K+ developers in learning how to responsibly deliver value with ML. ๐Ÿ”ฅ Among the top MLOps reposit

Comments
  • scikit-learn tip #37: issue with set_config() function

    scikit-learn tip #37: issue with set_config() function

    Dear all,

    Thanks to the fantastic work from Kevin to spread his data science knowledges accross the web, I discovered this new pipeline interactive diagramm in sklearn v0.23. I tried to use it on one of my jupyter notebook after ruing a conda update --all, to ensure no compatibility issues, and I am really struggeling with it. I can import the package. But when I just try to set display parameter to 'diagram', it is just telling me "set_config() got an unexpected keyword argument 'display'"

    Screenshot_1

    I am searching to fix this issue on my notebook since nearly 1 h, but impossible to find what went wrong. Does anyone have an idea on how to run this new function?

    Thanks a lot for your support, and take good care!

    opened by Xavier-F 4
Owner
Kevin Markham
Founder of Data School
Kevin Markham
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

null 802 Jan 1, 2023
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
Automated Machine Learning with scikit-learn

auto-sklearn auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. Find the documentation here

AutoML-Freiburg-Hannover 6.7k Jan 7, 2023
Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

James Ritchie 204 Nov 18, 2022
Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

Ibotta 282 Dec 9, 2022
Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Iris Species Predictor Iris species predictor app is used to classify iris species using their sepal length, sepal width, petal length and petal width

Siva Prakash 5 Apr 5, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 5, 2022
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

null 7 Nov 18, 2021
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

Samrat Mitra 2 Nov 18, 2021