Materials for my scikit-learn tutorial

Overview

Scikit-learn Tutorial

Jake VanderPlas

This repository contains notebooks and other files associated with my Scikit-learn tutorial.

Installation Notes

This tutorial requires the following packages:

The easiest way to get these is to use the conda environment manager. I suggest downloading and installing miniconda.

Once this is installed, the following command will install all required packages in your Python environment:

Original install (2015)
$ conda install numpy scipy matplotlib scikit-learn ipython-notebook seaborn

Or for current versions of Anaconda (Mar 2018)
 
$ conda create -n skl_tut python=3.4.5 ipywidgets=5.2.2 numpy scipy matplotlib scikit-learn ipython-notebook seaborn pillow

$ activate skl_tut

$ jupyter notebook --notebook-dir='<tutorial folder>'

Alternatively, you can download and install the (very large) Anaconda software distribution, found at https://store.continuum.io/.

Downloading the Tutorial Materials

I would highly recommend using git, not only for this tutorial, but for the general betterment of your life. Once git is installed, you can clone the material in this tutorial by using the git address shown above:

git clone git://github.com/jakevdp/sklearn_tutorial.git

If you can't or don't want to install git, there is a link above to download the contents of this repository as a zip file. I may make minor changes to the repository in the days before the tutorial, however, so cloning the repository is a much better option.

Notebook Listing

You can view the tutorial materials using the excellent nbviewer service.

Note, however, that you cannot modify or run the contents within nbviewer. To modify them, first download the tutorial repository, change to the notebooks directory, and run ipython notebook. You should see the list in the ipython notebook launch page in your web browser. For more information on the IPython notebook, see http://ipython.org/notebook.html

Note also that some of the code in these notebooks will not work outside the directory structure of this tutorial, so it is important to clone the full repository if possible.

Comments
  • Adds requirements file, updates API

    Adds requirements file, updates API

    There were some references to sklearn.cross_validation in some places, etc.

    This diff is probably much larger than the changes I actually made. I tried to mitigate this by executing a "kernel restart and collapse output", but ipynb...

    opened by arokem 1
  • UnicodeEncodeError seen during starting 02.2-Basic-Principles.ipynb

    UnicodeEncodeError seen during starting 02.2-Basic-Principles.ipynb

    [E 03:25:30.511 NotebookApp] Uncaught exception GET /api/contents/notebooks/02.2-Basic-Principles.ipynb?type=notebook&=1579470930307 (127.0.0.1) HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/contents/notebooks/02.2-Basic-Principles.ipynb?type=notebook&=1579470930307', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Accept-Language': 'en-IN,en-GB;q=0.9,en-US;q=0.8,en;q=0.7', 'Accept-Encoding': 'gzip, deflate, br', 'X-Xsrftoken': '2|1649b2cc|42b280df55672c38a719ab21acbad7d7|1579468890', 'Sec-Fetch-Site': 'same-origin', 'Host': 'localhost:8888', 'Accept': 'application/json, text/javascript, /; q=0.01', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36', 'Dnt': '1', 'Connection': 'keep-alive', 'X-Requested-With': 'XMLHttpRequest', 'Sec-Fetch-Mode': 'cors', 'Referer': 'http://localhost:8888/notebooks/notebooks/02.2-Basic-Principles.ipynb', 'Cookie': '_xsrf=2|1649b2cc|42b280df55672c38a719ab21acbad7d7|1579468890; username-localhost-8888="2|1:0|10:1579468890|23:username-localhost-8888|44:ZjE0MWRlNWVhMTIzNGFlMGI3MGRlOWE2YmJkOGI4OWE=|52c1232bd8aa482e6e27972d6b5b71bf99de9d0f65c84a020774444cd1b115b7"'}) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 1511, in _execute result = yield result File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1055, in run value = future.result() File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 307, in wrapper yielded = next(result) File "/usr/local/lib/python2.7/dist-packages/notebook/services/contents/handlers.py", line 112, in get path=path, type=type, format=format, content=content, File "/usr/local/lib/python2.7/dist-packages/notebook/services/contents/filemanager.py", line 433, in get model = self._notebook_model(path, content=content) File "/usr/local/lib/python2.7/dist-packages/notebook/services/contents/filemanager.py", line 392, in _notebook_model self.mark_trusted_cells(nb, path) File "/usr/local/lib/python2.7/dist-packages/notebook/services/contents/manager.py", line 503, in mark_trusted_cells trusted = self.notary.check_signature(nb) File "/usr/local/lib/python2.7/dist-packages/nbformat/sign.py", line 438, in check_signature signature = self.compute_signature(nb) File "/usr/local/lib/python2.7/dist-packages/nbformat/sign.py", line 417, in compute_signature for b in yield_everything(nb): File "/usr/local/lib/python2.7/dist-packages/nbformat/sign.py", line 272, in yield_everything for b in yield_everything(value): File "/usr/local/lib/python2.7/dist-packages/nbformat/sign.py", line 276, in yield_everything for b in yield_everything(element): File "/usr/local/lib/python2.7/dist-packages/nbformat/sign.py", line 272, in yield_everything for b in yield_everything(value): File "/usr/local/lib/python2.7/dist-packages/nbformat/sign.py", line 281, in yield_everything yield str(obj).encode('utf8') UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 118: ordinal not in range(128) [W 03:25:30.512 NotebookApp] Unhandled error [E 03:25:30.512 NotebookApp] { "Accept-Language": "en-IN,en-GB;q=0.9,en-US;q=0.8,en;q=0.7", "Accept-Encoding": "gzip, deflate, br", "X-Xsrftoken": "2|1649b2cc|42b280df55672c38a719ab21acbad7d7|1579468890", "Sec-Fetch-Site": "same-origin", "Host": "localhost:8888", "Accept": "application/json, text/javascript, /; q=0.01", "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36", "Dnt": "1", "Connection": "keep-alive", "X-Requested-With": "XMLHttpRequest", "Sec-Fetch-Mode": "cors", "Referer": "http://localhost:8888/notebooks/notebooks/02.2-Basic-Principles.ipynb", "Cookie": "xsrf=2|1649b2cc|42b280df55672c38a719ab21acbad7d7|1579468890; username-localhost-8888="2|1:0|10:1579468890|23:username-localhost-8888|44:ZjE0MWRlNWVhMTIzNGFlMGI3MGRlOWE2YmJkOGI4OWE=|52c1232bd8aa482e6e27972d6b5b71bf99de9d0f65c84a020774444cd1b115b7"" } [E 03:25:30.513 NotebookApp] 500 GET /api/contents/notebooks/02.2-Basic-Principles.ipynb?type=notebook&=1579470930307 (127.0.0.1) 40.91ms referer=http://localhost:8888/notebooks/notebooks/02.2-Basic-Principles.ipynb

    opened by shobhitmittal 2
  • Docker with JupyterLab and pre-installed libraries

    Docker with JupyterLab and pre-installed libraries

    To avoid all manual installation of the packages and collision of the versions, use Docker with pre-installed necessary libraries.

    Features:

    1. Python3.6 (all notebooks checked manually)
    2. JupyterLab - looks much nicer :)
    3. Makefile - allows install and use in 2 commands
    4. Adjusted readme with explanation how install and use
    opened by DanilBaibak 0
  • Update to latest scikit-learn syntax (0.18.1)

    Update to latest scikit-learn syntax (0.18.1)

    This leaves the content of the tutorial unchanged, but updates some of the syntax in imports and in the code, depending on what has changed in the scikit-learn API. It should make the code compliant at least until scikit-learn 0.20.

    opened by teoguso 0
  • Deprecation warning in machine learning intro

    Deprecation warning in machine learning intro

    I am seeing a Deprecation Warning in code line number two where the "plot_sgd_separator()" function is called. The warning is as follows.

    DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

    screenshot 2016-04-30 17 52 03
    opened by sahaia1 2
  • Small Changes

    Small Changes

    • at notebooks/fig_code/figures.py added plt.axis('equal') in plot_kmeans_interactive. The previous aspect ratio made it unclear why some points where assigned to a centroid.. it seemed like the nearest centroid was another one.

    • at notebooks/fig_code/sgd_separator.py changed clf.decision_function([x1, x2]) clf.decision_function(np.array([[x1, x2]])) to fix a Deprecation Warning

    • Notebook notebooks/04.2-Clustering-KMeans.ipynb Last example produces this warning with Python 3.5 Sklearn 0.17.1 (latest right now)

      DeprecationWarning: This function is deprecated. Please call randint(0, 273279 + 1) instead 0, n_samples - 1, init_size)

      so added a couple of lines to avoid it.

      Also added plt.axis('equal') to all kmeans plots to agree with the first change ( in plot_kmeans_interactive )

    opened by ghost 0
Owner
Jake Vanderplas
Python, Astronomy, Data Science
Jake Vanderplas
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 5, 2022
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

ETS 528 Nov 25, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 4, 2023
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Dec 31, 2022
scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

scikit-learn 52.5k Jan 8, 2023
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 3.8k Feb 13, 2021
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Jan 3, 2023
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

null 417 Dec 20, 2022
Genetic Programming in Python, with a scikit-learn inspired API

Welcome to gplearn! gplearn implements Genetic Programming in Python, with a scikit-learn inspired and compatible API. While Genetic Programming (GP)

Trevor Stephens 1.3k Jan 3, 2023
Genetic feature selection module for scikit-learn

sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu

Manuel Calzolari 260 Dec 14, 2022
Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

rsteca 709 Jan 3, 2023
SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

SigOpt 73 Sep 30, 2022
Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

Robert Martin 1.3k Dec 29, 2022
A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

SimAI 584 Dec 27, 2022
Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

Ashish Patel 11 Dec 16, 2022
A real-time speech emotion recognition application using Scikit-learn and gradio

Speech-Emotion-Recognition-App A real-time speech emotion recognition application using Scikit-learn and gradio. Requirements librosa==0.6.3 numpy sou

Son Tran 6 Oct 4, 2022
Convert scikit-learn models to PyTorch modules

sk2torch sk2torch converts scikit-learn models into PyTorch modules that can be tuned with backpropagation and even compiled as TorchScript. Problems

Alex Nichol 101 Dec 16, 2022
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022