Mypy stubs, i.e., type information, for numpy, pandas and matplotlib

Overview

Mypy type stubs for NumPy, pandas, and Matplotlib

Join the chat at https://gitter.im/data-science-types/community

This is a PEP-561-compliant stub-only package which provides type information for matplotlib, numpy and pandas. The mypy type checker (or pytype or PyCharm) can recognize the types in these packages by installing this package.

NOTE: This is a work in progress

Many functions are already typed, but a lot is still missing (NumPy and pandas are huge libraries). Chances are, you will see a message from Mypy claiming that a function does not exist when it does exist. If you encounter missing functions, we would be delighted for you to send a PR. If you are unsure of how to type a function, we can discuss it.

Installing

You can get this package from PyPI:

pip install data-science-types

To get the most up-to-date version, install it directly from GitHub:

pip install git+https://github.com/predictive-analytics-lab/data-science-types

Or clone the repository somewhere and do pip install -e ..

Examples

These are the kinds of things that can be checked:

Array creation

import numpy as np

arr1: np.ndarray[np.int64] = np.array([3, 7, 39, -3])  # OK
arr2: np.ndarray[np.int32] = np.array([3, 7, 39, -3])  # Type error
arr3: np.ndarray[np.int32] = np.array([3, 7, 39, -3], dtype=np.int32)  # OK
arr4: np.ndarray[float] = np.array([3, 7, 39, -3], dtype=float)  # Type error: the type of ndarray can not be just "float"
arr5: np.ndarray[np.float64] = np.array([3, 7, 39, -3], dtype=float)  # OK

Operations

import numpy as np

arr1: np.ndarray[np.int64] = np.array([3, 7, 39, -3])
arr2: np.ndarray[np.int64] = np.array([4, 12, 9, -1])

result1: np.ndarray[np.int64] = np.divide(arr1, arr2)  # Type error
result2: np.ndarray[np.float64] = np.divide(arr1, arr2)  # OK

compare: np.ndarray[np.bool_] = (arr1 == arr2)

Reductions

import numpy as np

arr: np.ndarray[np.float64] = np.array([[1.3, 0.7], [-43.0, 5.6]])

sum1: int = np.sum(arr)  # Type error
sum2: np.float64 = np.sum(arr)  # OK
sum3: float = np.sum(arr)  # Also OK: np.float64 is a subclass of float
sum4: np.ndarray[np.float64] = np.sum(arr, axis=0)  # OK

# the same works with np.max, np.min and np.prod

Philosophy

The goal is not to recreate the APIs exactly. The main goal is to have useful checks on our code. Often the actual APIs in the libraries is more permissive than the type signatures in our stubs; but this is (usually) a feature and not a bug.

Contributing

We always welcome contributions. All pull requests are subject to CI checks. We check for compliance with Mypy and that the file formatting conforms to our Black specification.

You can install these dev dependencies via

pip install -e '.[dev]'

This will also install NumPy, pandas, and Matplotlib to be able to run the tests.

Running CI locally (recommended)

We include a script for running the CI checks that are triggered when a PR is opened. To test these out locally, you need to install the type stubs in your environment. Typically, you would do this with

pip install -e .

Then use the check_all.sh script to run all tests:

./check_all.sh

Below we describe how to run the various checks individually, but check_all.sh should be easier to use.

Checking compliance with Mypy

The settings for Mypy are specified in the mypy.ini file in the repository. Just running

mypy tests

from the base directory should take these settings into account. We enforce 0 Mypy errors.

Formatting with black

We use Black to format the stub files. First, install black and then run

black .

from the base directory.

Pytest

python -m pytest -vv tests/

Flake8

flake8 *-stubs

License

Apache 2.0

Issues
  • Update pandas read_csv and to_csv

    Update pandas read_csv and to_csv

    Hey! I updated pandas read_csv and to _csv, and also a small fix to pandas.Series (map function)

    There are some small changes made by black formatter, (was it bad formatted before or did I hve something wrong in my settings?)

    I would appreciate it if you could review this.

    opened by hellocoldworld 9
  • Support str and int as dtypes.

    Support str and int as dtypes.

    Extend the set of dtypes with the str and int literals.

    Note -- it would help to add some comments to describe the intended use of the _Dtype types -- it was hard for me to guess if I needed to also extend any of these.

    Fix for #73

    opened by rpgoldman 9
  • Small additions to DataFrame and Series

    Small additions to DataFrame and Series

    I've made a few more additions to the stub, fleshing it out as I found I needed more for my work. I've corrected the issue I found - thanks again, thomkeh! - and hope that others can benefit from this work.

    Thank you!

    opened by ZHSimon 8
  • Add Series.sort_index() signature

    Add Series.sort_index() signature

    • [x] Adds Series.sort_index based on stable version documentation
    • [x] Fixes wrong order of arguments in Series.sort_values() (ascending should go before inplace)
    • [x] Adds missing arguments to Series.sort_values()
    opened by krassowski 8
  • Flesh out pandas and numpy a bit more

    Flesh out pandas and numpy a bit more

    This is the result of testing data-science-types against another project I contribute to: https://github.com/jldbc/pybaseball

    • I added some common .gitignores for venv and vscode
    • I found a few Pandas tweaks to support functions and parameters that we are using.
      • Tweak DataFrame.apply, DataFrame.drop, DataFrame.merge, DataFrame.rank, DataFrame.reindex, DataFrame.replace
      • Add DataFrame.assign, DataFrame.filter
      • Tweak Series.rank
      • Add pandas.isnull
      • Tweak DataFrame.loc
    • A few changes to numpy as well
      • Allow tuples -> numpy.array
      • Tweak numpy.setdiff1d
      • Add numpy.cos, numpy.deg2rad, numpy.sin, numpy.cos

    Everything was done using the latest Pandas docs for reference to data types:

    https://pandas.pydata.org/pandas-docs/stable/reference/

    I also did my best to add tests to support the changes as well

    opened by TheCleric 7
  • Shelelena/pandas improvements

    Shelelena/pandas improvements

    Improvements in pandas DataFrame, DataFrameGroupBy and SeriesGroupBy

    • specify DataFrame.groupby
    • add DataFrameGroupBy.aggregate
    • adjust data type in DataFrame.__init__
    • add __getattr__ to get columns in DataFrame and DataFrameGroupBy
    • correct return type of DataFrameGroupBy.__getitem__
    • add some missing statistical methods to DataFrameGroupBy and SeriesGroupBy
    opened by Shelelena 7
  • Additional type signatures for DataFrame and Series methods

    Additional type signatures for DataFrame and Series methods

    Hi,

    I'm not very sure about the signatures in Series and would be happy to discuss them. I'm still inexperienced with Mypy.

    opened by aguillon 7
  • Small additions to Pandas

    Small additions to Pandas

    Added read_sql method, added usecols argument; added fillna and isin methods to DataFrame; added parameter information to to_frame in Series.

    opened by ZHSimon 6
  • add MANIFEST.in

    add MANIFEST.in

    close #204

    opened by ickc 6
  • No overload variant of

    No overload variant of "subplots" matches argument type "bool"

    When I perform the following:

    from matplotlib.pyplot import subplots
    FIG, AXES = subplots(constrained_layout=True)
    

    I get the warning:

    No overload variant of "subplots" matches argument type "bool".

    Does that need to be added?

    opened by uihsnv 0
  • Missing stub for np.count_nonzero

    Missing stub for np.count_nonzero

    opened by jgonsior 0
  •  Missing stub for numpy.argwhere

    Missing stub for numpy.argwhere

    opened by jgonsior 0
  • Add missing stubs for notna and notnull

    Add missing stubs for notna and notnull

    Add missing stubs for the functions notna and notnull on Index, Series, DataFrame and the pandas module

    opened by eganjs 0
  • Missing stub for numpy.quantile

    Missing stub for numpy.quantile

    Module has no attribute "quantile" Code used: np.quantile(ors, 0.75)

    opened by Bradley-Butcher 0
  • test_frame_iloc fails on Pandas 1.2

    test_frame_iloc fails on Pandas 1.2

    tests/pandas_test.py line 92 fails on Pandas 1.2

    Extracting the relevant code

    import pandas as pd
    df: pd.DataFrame = pd.DataFrame(
        [[1.0, 2.0], [4.0, 5.0], [7.0, 8.0]],
        index=["cobra", "viper", "sidewinder"],
        columns=["max_speed", "shield"],
    )
    s: "pd.Series[float]" = df["shield"].copy()
    df.iloc[0] = s
    

    Results in

    ValueError: could not broadcast input array from shape (3) into shape (2)
    

    This runs fine on Pandas 1.1.5

    opened by EdwardJRoss 0
  • Pandas `DataFrame.concat` missing some arguments

    Pandas `DataFrame.concat` missing some arguments

    The concat method for joining multiple DataFrames appears to be missing several arguments, such as join, keys, levels, and more.

    https://github.com/predictive-analytics-lab/data-science-types/blob/faebf595b16772d3aa70d56ea179a2eaffdbd565/pandas-stubs/init.pyi#L37-L42

    Compare to the Pandas docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

    opened by kevinhu 0
  • Add `isfinite`, `isinf`, `isscalar`, `diagonal`, `allclose`

    Add `isfinite`, `isinf`, `isscalar`, `diagonal`, `allclose`

    and improve isnan support in NumPy stubs. Also add support for scalar + array.

    fixes #209

    opened by tadeu 0
  • Pandas `DataFrame.drop_duplicates` missing keywords 'subset' and 'inplace'

    Pandas `DataFrame.drop_duplicates` missing keywords 'subset' and 'inplace'

    Script used:

    import pandas as pd
    
    df: pd.DataFrame = pd.DataFrame([[1, 2], [1, 4]], columns=["a", "b"], index=["c", "d"])
    
    df.drop_duplicates(subset=["a"], inplace=True)
    
    print(df)
    
    opened by kevinhu 0
  • Pandas has no method 'read_hdf'

    Pandas has no method 'read_hdf'

    Script used:

    import pandas as pd
    
    x: pd.DataFrame = pd.read_hdf("your_hdf_here.hdf")
    
    opened by kevinhu 0
Releases(v0.2.23)
Owner
Predictive Analytics Lab
Predictive Analytics Lab
Mypy stubs, i.e., type information, for numpy, pandas and matplotlib

Mypy type stubs for NumPy, pandas, and Matplotlib This is a PEP-561-compliant stub-only package which provides type information for matplotlib, numpy

Predictive Analytics Lab 186 Sep 2, 2021
A plugin for flake8 integrating Mypy.

flake8-mypy NOTE: THIS PROJECT IS DEAD It was created in early 2017 when Mypy performance was often insufficient for in-editor linting. The Flake8 plu

Łukasz Langa 103 Apr 27, 2021
Unbearably fast O(1) runtime type-checking in pure Python.

Look for the bare necessities, the simple bare necessities. Forget about your worries and your strife. — The Jungle Book.

beartype 461 Oct 20, 2021
PEP-484 typing stubs for SQLAlchemy 1.4 and SQLAlchemy 2.0

SQLAlchemy 2 Stubs These are PEP-484 typing stubs for SQLAlchemy 1.4 and 2.0. They are released concurrently along with a Mypy extension which is desi

SQLAlchemy 91 Oct 22, 2021
A plugin for Flake8 that checks pandas code

pandas-vet pandas-vet is a plugin for flake8 that provides opinionated linting for pandas code. It began as a project during the PyCascades 2019 sprin

Jacob Deppen 108 Sep 26, 2021
Static type checker for Python

Static type checker for Python Speed Pyright is a fast type checker meant for large Python source bases. It can run in a “watch” mode and performs fas

Microsoft 7.3k Oct 23, 2021
A python documentation linter which checks that the docstring description matches the definition.

Darglint A functional docstring linter which checks whether a docstring's description matches the actual function/method implementation. Darglint expe

Terrence Reilly 360 Oct 18, 2021
Enforce the same configuration across multiple projects

Nitpick Flake8 plugin to enforce the same tool configuration (flake8, isort, mypy, Pylint...) across multiple Python projects. Useful if you maintain

Augusto W. Andreoli 184 Oct 25, 2021
A static type analyzer for Python code

pytype - ?? ✔ Pytype checks and infers types for your Python code - without requiring type annotations. Pytype can: Lint plain Python code, flagging c

Google 3.5k Oct 20, 2021
Code audit tool for python.

Pylama Code audit tool for Python and JavaScript. Pylama wraps these tools: pycodestyle (formerly pep8) © 2012-2013, Florent Xicluna; pydocstyle (form

Kirill Klenov 817 Oct 14, 2021
:sparkles: Surface lint errors during code review

✨ Linty Fresh ✨ Keep your codebase sparkly clean with the power of LINT! Linty Fresh parses lint errors and report them back to GitHub as comments on

Lyft 184 Jun 8, 2021
The strictest and most opinionated python linter ever!

wemake-python-styleguide Welcome to the strictest and most opinionated python linter ever. wemake-python-styleguide is actually a flake8 plugin with s

wemake.services 1.7k Oct 23, 2021
Performant type-checking for python.

Pyre is a performant type checker for Python compliant with PEP 484. Pyre can analyze codebases with millions of lines of code incrementally – providi

Facebook 5.6k Oct 16, 2021
A framework for detecting, highlighting and correcting grammatical errors on natural language text.

Gramformer Human and machine generated text often suffer from grammatical and/or typographical errors. It can be spelling, punctuation, grammatical or

Prithivida 786 Oct 24, 2021
A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycodestyle.

flake8-bugbear A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycode

Python Code Quality Authority 612 Oct 14, 2021
Pylint plugin for improving code analysis for when using Django

pylint-django About pylint-django is a Pylint plugin for improving code analysis when analysing code using Django. It is also used by the Prospector t

Python Code Quality Authority 484 Oct 16, 2021
It's not just a linter that annoys you!

README for Pylint - https://pylint.pycqa.org/ Professional support for pylint is available as part of the Tidelift Subscription. Tidelift gives softwa

Python Code Quality Authority 3.6k Oct 23, 2021
Flake8 wrapper to make it nice, legacy-friendly, configurable.

THE PROJECT IS ARCHIVED Forks: https://github.com/orsinium/forks It's a Flake8 wrapper to make it cool. Lint md, rst, ipynb, and more. Shareable and r

Life4 223 Sep 24, 2021
Custom Python linting through AST expressions

bellybutton bellybutton is a customizable, easy-to-configure linting engine for Python. What is this good for? Tools like pylint and flake8 provide, o

H. Chase Stevens 222 Oct 14, 2021