Create HTML profiling reports from pandas DataFrame objects

Overview

Pandas Profiling

Pandas Profiling Logo Header

Build Status Code Coverage Release Version Python Version Code style: black

Documentation | Slack | Stack Overflow

Generates profile reports from a pandas DataFrame.

The pandas df.describe() function is great but a little basic for serious exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

  • Type inference: detect the types of columns in a dataframe.
  • Essentials: type, unique values, missing values
  • Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
  • Most frequent values
  • Histogram
  • Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
  • Missing values matrix, count, heatmap and dendrogram of missing values
  • Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
  • File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.

Announcements

Version v2.11.0 released featuring an exciting integration with Great Expectations that many of you requested (see details below).

Spark backend in progress: We can happily announce that we're nearing v1 for the Spark backend for generating profile reports. Stay tuned.

Support pandas-profiling

The development of pandas-profiling relies completely on contributions. If you find value in the package, we welcome you to support the project directly through GitHub Sponsors! Please help me to continue to support this package. It's extra exciting that GitHub matches your contribution for the first year.

Find more information here:

February 20, 2021 💘


Contents: Examples | Installation | Documentation | Large datasets | Command line usage | Advanced usage | integrations | Support | Types | How to contribute | Editor Integration | Dependencies


Examples

The following examples can give you an impression of what the package can do:

Specific features:

Tutorials:

Installation

Using pip

PyPi Downloads PyPi Monthly Downloads PyPi Version

You can install using the pip package manager by running

pip install pandas-profiling[notebook]

Alternatively, you could install the latest version directly from Github:

pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Using conda

Conda Downloads Conda Version

You can install using the conda package manager by running

conda install -c conda-forge pandas-profiling

From source

Download the source code by cloning the repository or by pressing 'Download ZIP' on this page.

Install by navigating to the proper directory and running:

python setup.py install

Documentation

The documentation for pandas_profiling can be found here. Previous documentation is still available here.

Getting started

Start by loading in your pandas DataFrame, e.g. by using:

import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=["a", "b", "c", "d", "e"]
)

To generate the report, run:

profile = ProfileReport(df, title="Pandas Profiling Report")

Explore deeper

You can configure the profile report in any way you like. The example code below loads the explorative configuration file, that includes many features for text (length distribution, unicode information), files (file size, creation time) and images (dimensions, exif information). If you are interested what exact settings were used, you can compare with the default configuration file.

profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)

Learn more about configuring pandas-profiling on the Advanced usage page.

Jupyter Notebook

We recommend generating reports interactively by using the Jupyter notebook. There are two interfaces (see animations below): through widgets and through a HTML report.

Notebook Widgets

This is achieved by simply displaying the report. In the Jupyter Notebook, run:

profile.to_widgets()

The HTML report can be included in a Jupyter notebook:

HTML

Run the following code:

profile.to_notebook_iframe()

Saving the report

If you want to generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")

Alternatively, you can obtain the data as JSON:

# As a string
json_data = profile.to_json()

# As a file
profile.to_file("your_report.json")

Large datasets

Version 2.4 introduces minimal mode.

This is a default configuration that disables expensive computations (such as correlations and dynamic binning).

Use the following syntax:

profile = ProfileReport(large_dataset, minimal=True)
profile.to_file("output.html")

Command line usage

For standard formatted CSV files that can be read immediately by pandas, you can use the pandas_profiling executable.

Run the following for information about options and arguments.

pandas_profiling -h

Advanced usage

A set of options is available in order to adapt the report generated.

  • title (str): Title for the report ('Pandas Profiling Report' by default).
  • pool_size (int): Number of workers in thread pool. When set to zero, it is set to the number of CPUs available (0 by default).
  • progress_bar (bool): If True, pandas-profiling will display a progress bar.
  • infer_dtypes (bool): When True (default) the dtype of variables are inferred using visions using the typeset logic (for instance a column that has integers stored as string will be analyzed as if being numeric).

More settings can be found in the default configuration file, minimal configuration file and dark themed configuration file.

You find the configuration docs on the advanced usage page here

Example

profile = df.profile_report(title='Pandas Profiling Report', plot={'histogram': {'bins': 8}})
profile.to_file("output.html")

Integrations

Great Expectations

Great Expectations

Profiling your data is closely related to data validation: often validation rules are defined in terms of well-known statistics. For that purpose, pandas-profiling integrates with Great Expectations. This a world-class open-source library that helps you to maintain data quality and improve communication about data between teams. Great Expectations allows you to create Expectations (which are basically unit tests for your data) and Data Docs (conveniently shareable HTML data reports). pandas-profiling features a method to create a suite of Expectations based on the results of your ProfileReport, which you can store, and use to validate another (or future) dataset.

You can find more details on the Great Expectations integration here

Supporting open source

Maintaining and developing the open-source code for pandas-profiling, with millions of downloads and thousands of users, would not be possible without support of our gracious sponsors.

Lambda Labs

Lambda workstations, servers, laptops, and cloud services power engineers and researchers at Fortune 500 companies and 94% of the top 50 universities. Lambda Cloud offers 4 & 8 GPU instances starting at $1.50 / hr. Pre-installed with TensorFlow, PyTorch, Ubuntu, CUDA, and cuDNN.

We would like to thank our generous Github Sponsors supporters who make pandas-profiling possible:

Martin Sotir, Brian Lee, Stephanie Rivera, abdulAziz, gramster

More info if you would like to appear here: Github Sponsor page

Types

Types are a powerful abstraction for effective data analysis, that goes beyond the logical data types (integer, float etc.). pandas-profiling currently, recognizes the following types: Boolean, Numerical, Date, Categorical, URL, Path, File and Image.

We have developed a type system for Python, tailored for data analysis: visions. Selecting the right typeset drastically reduces the complexity the code of your analysis. Future versions of pandas-profiling will have extended type support through visions!

Contributing

Read on getting involved in the Contribution Guide.

A low threshold place to ask questions or start contributing is by reaching out on the pandas-profiling Slack. Join the Slack community.

Editor integration

PyCharm integration

  1. Install pandas-profiling via the instructions above
  2. Locate your pandas-profiling executable.
    • On macOS / Linux / BSD:
      $ which pandas_profiling
      (example) /usr/local/bin/pandas_profiling
    • On Windows:
      $ where pandas_profiling
      (example) C:\ProgramData\Anaconda3\Scripts\pandas_profiling.exe
  3. In PyCharm, go to Settings (or Preferences on macOS) > Tools > External tools
  4. Click the + icon to add a new external tool
  5. Insert the following values
    • Name: Pandas Profiling
    • Program: The location obtained in step 2
    • Arguments: "$FilePath$" "$FileDir$/$FileNameWithoutAllExtensions$_report.html"
    • Working Directory: $ProjectFileDir$

PyCharm Integration

To use the PyCharm Integration, right click on any dataset file:

External Tools > Pandas Profiling.

Other integrations

Other editor integrations may be contributed via pull requests.

Dependencies

The profile report is written in HTML and CSS, which means pandas-profiling requires a modern browser.

You need Python 3 to run this package. Other dependencies can be found in the requirements files:

Filename Requirements
requirements.txt Package requirements
requirements-dev.txt Requirements for development
requirements-test.txt Requirements for testing
setup.py Requirements for Widgets etc.
Comments
  • pandas-profiling not compatible with pandas v1.0

    pandas-profiling not compatible with pandas v1.0

    Describe the bug

    pandas-profiling not compatible with pandas v1.0. The key method "ProfileReport" returns error "TypeError: concat() got an unexpected keyword argument 'join_axes'" as join_axes is deprecated starting from Pandas v1.0. https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html?highlight=concat

    To Reproduce

    import pandas as pd import pandas_profiling

    def test_issueXXX(): df = pd.read_csv(r'')

    pf = pandas.profiling.ProfileReport(df)

    TypeError: concat() got an unexpected keyword argument 'join_axes'

    Version information:

    • Python version: 3.7.
    • Environment: Command Line and Pycharm
    • pandas-profiling: 1.4.1
    • pandas: 1.0
    bug 🐛 
    opened by mantou16 27
  • AttributeError: 'DataFrame' object has no attribute 'profile_report'

    AttributeError: 'DataFrame' object has no attribute 'profile_report'

    Describe the bug Running the example in readme generates an error.

    To Reproduce Running:

    import numpy as np
    import pandas as pd
    import pandas_profiling
    
    df = pd.DataFrame(
        np.random.rand(100, 5),
        columns=['a', 'b', 'c', 'd', 'e']
    )
    df.profile_report()
    

    in a Jupyter notebook gives:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-16-f9a7584e785c> in <module>
    ----> 1 df.profile_report()
    
    ~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
       5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):
       5066                 return self[name]
    -> 5067             return object.__getattribute__(self, name)
       5068 
       5069     def __setattr__(self, name, value):
    
    AttributeError: 'DataFrame' object has no attribute 'profile_report'
    

    Version information: alabaster==0.7.12 anaconda-client==1.7.2 anaconda-navigator==1.9.7 anaconda-project==0.8.2 asn1crypto==0.24.0 astroid==2.2.5 astropy==3.1.2 atomicwrites==1.3.0 attrs==19.1.0 Babel==2.6.0 backcall==0.1.0 backports.os==0.1.1 backports.shutil-get-terminal-size==1.0.0 beautifulsoup4==4.7.1 bitarray==0.8.3 bkcharts==0.2 bleach==3.1.0 bokeh==1.0.4 boto==2.49.0 Bottleneck==1.2.1 certifi==2019.3.9 cffi==1.12.2 chardet==3.0.4 Click==7.0 cloudpickle==0.8.0 clyent==1.2.2 colorama==0.4.1 conda==4.6.14 conda-build==3.17.8 conda-verify==3.1.1 confuse==1.0.0 contextlib2==0.5.5 cryptography==2.6.1 cycler==0.10.0 Cython==0.29.6 cytoolz==0.9.0.1 dask==1.1.4 decorator==4.4.0 defusedxml==0.5.0 distributed==1.26.0 docutils==0.14 entrypoints==0.3 et-xmlfile==1.0.1 fastcache==1.0.2 filelock==3.0.10 Flask==1.0.2 future==0.17.1 gevent==1.4.0 glob2==0.6 gmpy2==2.0.8 greenlet==0.4.15 h5py==2.9.0 heapdict==1.0.0 hpat==0.28.1 html5lib==1.0.1 htmlmin==0.1.12 idna==2.8 imageio==2.5.0 imagesize==1.1.0 importlib-metadata==0.0.0 ipykernel==5.1.0 ipyparallel==6.2.4 ipython==7.4.0 ipython-genutils==0.2.0 ipywidgets==7.4.2 isort==4.3.16 itsdangerous==1.1.0 jdcal==1.4 jedi==0.13.3 jeepney==0.4 Jinja2==2.10 jsonschema==3.0.1 jupyter==1.0.0 jupyter-client==5.2.4 jupyter-console==6.0.0 jupyter-core==4.4.0 jupyterlab==0.35.4 jupyterlab-server==0.2.0 keyring==18.0.0 kiwisolver==1.0.1 lazy-object-proxy==1.3.1 libarchive-c==2.8 lief==0.9.0 lightgbm==2.2.3 llvmlite==0.28.0 locket==0.2.0 lxml==4.3.2 MarkupSafe==1.1.1 matplotlib==3.0.3 mccabe==0.6.1 missingno==0.4.1 mistune==0.8.4 mkl-fft==1.0.10 mkl-random==1.0.2 mock==2.0.0 more-itertools==6.0.0 mpi4py==3.0.1 mpmath==1.1.0 msgpack==0.6.1 multipledispatch==0.6.0 navigator-updater==0.2.1 nbconvert==5.4.1 nbformat==4.4.0 networkx==2.2 nltk==3.4 nose==1.3.7 notebook==5.7.8 numba==0.43.1 numerapi==1.5.1 numerox==3.7.0 numexpr==2.6.9 numpy==1.16.2 numpydoc==0.8.0 olefile==0.46 openpyxl==2.6.1 packaging==19.0 pandas==0.24.2 pandas-profiling==1.4.1 pandocfilters==1.4.2 parso==0.3.4 partd==0.3.10 path.py==11.5.0 pathlib2==2.3.3 patsy==0.5.1 pbr==5.2.0 pep8==1.7.1 pexpect==4.6.0 phik==0.9.8 pickleshare==0.7.5 Pillow==5.4.1 pkginfo==1.5.0.1 plotly==3.8.1 pluggy==0.9.0 ply==3.11 prometheus-client==0.6.0 prompt-toolkit==2.0.9 psutil==5.6.1 ptyprocess==0.6.0 py==1.8.0 pyarrow==0.11.1 pycodestyle==2.5.0 pycosat==0.6.3 pycparser==2.19 pycrypto==2.6.1 pycurl==7.43.0.2 pyflakes==2.1.1 Pygments==2.3.1 pylint==2.3.1 pyodbc==4.0.26 pyOpenSSL==19.0.0 pyparsing==2.3.1 pyrsistent==0.14.11 PySocks==1.6.8 pytest==4.3.1 pytest-arraydiff==0.3 pytest-astropy==0.5.0 pytest-doctestplus==0.3.0 pytest-openfiles==0.3.2 pytest-pylint==0.14.0 pytest-remotedata==0.3.1 python-dateutil==2.8.0 python-igraph==0.7.1.post6 pytz==2018.9 PyWavelets==1.0.2 PyYAML==5.1 pyzmq==18.0.0 QtAwesome==0.5.7 qtconsole==4.4.3 QtPy==1.7.0 requests==2.21.0 retrying==1.3.3 rope==0.12.0 ruamel-yaml==0.15.46 scikit-image==0.14.2 scikit-learn==0.20.3 scipy==1.2.1 seaborn==0.9.0 SecretStorage==3.1.1 Send2Trash==1.5.0 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.12.0 snowballstemmer==1.2.1 sortedcollections==1.1.2 sortedcontainers==2.1.0 soupsieve==1.8 Sphinx==1.8.5 sphinxcontrib-websupport==1.1.0 spyder==3.3.3 spyder-kernels==0.4.2 SQLAlchemy==1.3.1 statsmodels==0.9.0 sympy==1.3 tables==3.5.1 tblib==1.3.2 terminado==0.8.1 testpath==0.4.2 toolz==0.9.0 tornado==6.0.2 tqdm==4.31.1 traitlets==4.3.2 typed-ast==1.4.0 unicodecsv==0.14.1 urllib3==1.24.1 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==0.14.1 widgetsnbextension==3.4.2 wrapt==1.11.1 wurlitzer==1.0.2 xlrd==1.2.0 XlsxWriter==1.1.5 xlwt==1.3.0 zict==0.1.4 zipp==0.3.3

    Additional context Add any other context about the problem here.

    bug 🐛 
    opened by bdch1234 22
  • Ploting a response variable on the histograms

    Ploting a response variable on the histograms

    Hey,

    Great job with pandas-profiling I love it. I think it would be great to have an extra parameter to specify a response column. Plotting the average response for every bin of the histograms (for each variables) would allow to see obvious trends/correlations and would be useful for any regression problem (might be more tricky for classification where the response are discrete). What do you think ?

    Thanks!

    feature request 💬 
    opened by Optimox 17
  • feat: added filter to locate columns

    feat: added filter to locate columns

    This is a follow-up PR to the PR made earlier (#1096). Closes #638 Have changed the input from an text field to a dropdown as per @fabclmnt's suggestion.

    Here's how it looks and works now:

    https://user-images.githubusercontent.com/57868024/194428807-a7642deb-6ba5-4404-95ef-3e9605ba10cd.mp4

    The dropdown isn't visible due to restrictions on the screen-recorder, here's an image of it in action for reference.

    image

    P.S. I'm sorry for the hassle in the previous PR, I haven't worked with git very much. Thank you for your patience.

    opened by g-kabra 16
  • Potential incompatiblity with Pandas 1.4.0

    Potential incompatiblity with Pandas 1.4.0

    Describe the bug

    Pandas version 1.4.0 was release few days ago and some tests start failing. I was able to reproduce with a minimum example which is failing with Pandas 1.4.0 and working with Pandas 1.3.5.

    To Reproduce

    import pandas as pd
    import pandas_profiling
    
    data = {"col1": [1, 2], "col2": [3, 4]}
    dataframe = pd.DataFrame(data=data)
    
    profile = pandas_profiling.ProfileReport(dataframe, minimal=False)
    profile.to_html()
    

    When running with Pandas 1.4.0, I get the following traceback:

    Traceback (most recent call last):
      File "/tmp/bug.py", line 8, in <module>
        profile.to_html()
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 368, in to_html
        return self.html
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 185, in html
        self._html = self._render_html()
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 287, in _render_html
        report = self.report
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 179, in report
        self._report = get_report_structure(self.config, self.description_set)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 161, in description_set
        self._description_set = describe_df(
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/describe.py", line 71, in describe
        series_description = get_series_descriptions(
      File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
        return func(*args, **kwargs)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 92, in pandas_get_series_descriptions
        for i, (column, description) in enumerate(
      File "/home/lothiraldan/.pyenv/versions/3.9.1/lib/python3.9/multiprocessing/pool.py", line 870, in next
        raise value
      File "/home/lothiraldan/.pyenv/versions/3.9.1/lib/python3.9/multiprocessing/pool.py", line 125, in worker
        result = (True, func(*args, **kwds))
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 72, in multiprocess_1d
        return column, describe_1d(config, series, summarizer, typeset)
      File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
        return func(*args, **kwargs)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 50, in pandas_describe_1d
        return summarizer.summarize(config, series, dtype=vtype)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summarizer.py", line 37, in summarize
        _, _, summary = self.handle(str(dtype), config, series, {"type": str(dtype)})
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 62, in handle
        return op(*args)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
        return f(*res)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
        return f(*res)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
        return f(*res)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 17, in func2
        res = g(*x)
      File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
        return func(*args, **kwargs)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summary_algorithms.py", line 65, in inner
        return fn(config, series, summary)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summary_algorithms.py", line 82, in inner
        return fn(config, series, summary)
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/describe_categorical_pandas.py", line 205, in pandas_describe_categorical_1d
        summary.update(length_summary_vc(value_counts))
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/describe_categorical_pandas.py", line 162, in length_summary_vc
        "median_length": weighted_median(
      File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/utils_pandas.py", line 13, in weighted_median
        w_median = (data[weights == np.max(weights)])[0]
    IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
    

    If I try changing the minimal from False to True, the script is now passing.

    Version information:

    Failing environment

    Python version: Python 3.9.1 Pip version: pip 21.3.1 Pandas and pandas-profiling versions: 1.4.0 | 3.1.0 Full pip list:

    Package               Version
    --------------------- ---------
    attrs                 21.4.0
    certifi               2021.10.8
    charset-normalizer    2.0.10
    cycler                0.11.0
    fonttools             4.28.5
    htmlmin               0.1.12
    idna                  3.3
    ImageHash             4.2.1
    Jinja2                3.0.3
    joblib                1.0.1
    kiwisolver            1.3.2
    MarkupSafe            2.0.1
    matplotlib            3.5.1
    missingno             0.5.0
    multimethod           1.6
    networkx              2.6.3
    numpy                 1.22.1
    packaging             21.3
    pandas                1.4.0
    pandas-profiling      3.1.0
    phik                  0.12.0
    Pillow                9.0.0
    pip                   21.3.1
    pydantic              1.9.0
    pyparsing             3.0.7
    python-dateutil       2.8.2
    pytz                  2021.3
    PyWavelets            1.2.0
    PyYAML                6.0
    requests              2.27.1
    scipy                 1.7.3
    seaborn               0.11.2
    setuptools            60.0.5
    six                   1.16.0
    tangled-up-in-unicode 0.1.0
    tqdm                  4.62.3
    typing_extensions     4.0.1
    urllib3               1.26.8
    visions               0.7.4
    wheel                 0.37.1
    

    Working environment

    Python version: Python 3.9.1 Pip version: pip 21.3.1 Pandas and pandas-profiling versions: 1.3.5 | 3.1.0 Full pip list:

    Package               Version
    --------------------- ---------
    attrs                 21.4.0
    certifi               2021.10.8
    charset-normalizer    2.0.10
    cycler                0.11.0
    fonttools             4.28.5
    htmlmin               0.1.12
    idna                  3.3
    ImageHash             4.2.1
    Jinja2                3.0.3
    joblib                1.0.1
    kiwisolver            1.3.2
    MarkupSafe            2.0.1
    matplotlib            3.5.1
    missingno             0.5.0
    multimethod           1.6
    networkx              2.6.3
    numpy                 1.22.1
    packaging             21.3
    pandas                1.3.5
    pandas-profiling      3.1.0
    phik                  0.12.0
    Pillow                9.0.0
    pip                   21.3.1
    pydantic              1.9.0
    pyparsing             3.0.7
    python-dateutil       2.8.2
    pytz                  2021.3
    PyWavelets            1.2.0
    PyYAML                6.0
    requests              2.27.1
    scipy                 1.7.3
    seaborn               0.11.2
    setuptools            60.0.5
    six                   1.16.0
    tangled-up-in-unicode 0.1.0
    tqdm                  4.62.3
    typing_extensions     4.0.1
    urllib3               1.26.8
    visions               0.7.4
    wheel                 0.37.1
    

    Let me know if I can provide more details and thank you for your good work!

    bug 🐛 
    opened by Lothiraldan 15
  • TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

    TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

        stats['range'] = stats['max'] - stats['min']
    TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
    

    I got this error

    bug 🐛 information requested ❔ help wanted 🙋 
    opened by eyadsibai 15
  • 2.10.0 -  TraitError: The 'value' trait of a HTML instance must be a unicode string...

    2.10.0 - TraitError: The 'value' trait of a HTML instance must be a unicode string...

    Describe the bug

    Hi there - Looks like latest release (2.10.0) has broken a to_widgets functionality as outlined in the Getting started section of the docs. Confirmed eolling back to 2.9.0 does not produce the issue.

    To Reproduce

    # pandas_profiling==2.10.0
    import numpy as np
    import pandas as pd
    from pandas_profiling import ProfileReport
    
    df = pd.DataFrame(
        np.random.rand(100, 5),
        columns=["a", "b", "c", "d", "e"]
    )
    
    profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)
    
    profile.to_widgets()
    
    

    Returns:

    TraitError: The 'value' trait of a HTML instance must be a unicode string, but a value of Numeric <class 'visions.types.type.VisionsBaseTypeMeta'> was specified.
    

    Version information: 2.10.0

    Additional context

    opened by rynmccrmck 14
  • ZeroDivisionError when using version 1.4.1

    ZeroDivisionError when using version 1.4.1

    There was a change in behavior between versions 1.4.0 and 1.4.1 where some calls to ProfileReport that previously succeeded will now raise a ZeroDivisionError.

    An example reproduction is to take the following code and run it in a Jupyter notebook cell:

    import pandas
    import pandas_profiling
    
    import IPython
    
    df = pandas.DataFrame({'c': 'v'}, index=['c'])
    report = pandas_profiling.ProfileReport(df)
    IPython.core.display.HTML(report.html)
    

    With version 1.4.0 this produced an HTML report, but with version 1.4.1 it produces the following stack trace:

    ZeroDivisionErrorTraceback (most recent call last)
    <ipython-input-2-ffb5392b4284> in <module>()
          5 
          6 df = pandas.DataFrame({'c': 'v'}, index=['c'])
    ----> 7 report = pandas_profiling.ProfileReport(df)
          8 IPython.core.display.HTML(report.html)
    
    /usr/local/lib/python2.7/dist-packages/pandas_profiling/__init__.pyc in __init__(self, df, **kwargs)
         67 
         68         self.html = to_html(sample,
    ---> 69                             description_set)
         70 
         71         self.description_set = description_set
    
    /usr/local/lib/python2.7/dist-packages/pandas_profiling/report.pyc in to_html(sample, stats_object)
        192 
        193     # Add plot of matrix correlation
    --> 194     pearson_matrix = plot.correlation_matrix(stats_object['correlations']['pearson'], 'Pearson')
        195     spearman_matrix = plot.correlation_matrix(stats_object['correlations']['spearman'], 'Spearman')
        196     correlations_html = templates.template('correlations').render(
    
    /usr/local/lib/python2.7/dist-packages/pandas_profiling/plot.pyc in correlation_matrix(corrdf, title, **kwargs)
        134     plt.title(title, size=18)
        135     plt.colorbar(matrix_image)
    --> 136     axes_cor.set_xticks(np.arange(0, corrdf.shape[0], corrdf.shape[0] * 1.0 / len(labels)))
        137     axes_cor.set_yticks(np.arange(0, corrdf.shape[1], corrdf.shape[1] * 1.0 / len(labels)))
        138     axes_cor.set_xticklabels(labels, rotation=90)
    
    ZeroDivisionError: float division by zero
    
    opened by ojarjur 14
  • pandas_profiling.utils.cache

    pandas_profiling.utils.cache

    ModuleNotFoundError: No module named 'pandas_profiling.utils'*

    To Reproduce

    Version information:

    Additional context

    information requested ❔ 
    opened by ajaimes07 13
  • This call to matplotlib.use() has no effect because the backend has already

    This call to matplotlib.use() has no effect because the backend has already

    /home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/pandas_profiling/base.py:20: UserWarning: This call to matplotlib.use() has no effect because the backend has already been chosen; matplotlib.use() must be called before pylab, matplotlib.pyplot, or matplotlib.backends is imported for the first time.

    The backend was originally set to 'module://ipykernel.pylab.backend_inline' by the following code: File "/home/flash1/work/software/python/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in app.launch_new_instance() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance app.start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 477, in start ioloop.IOLoop.instance().start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start super(ZMQIOLoop, self).start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start handler_func(fd_obj, events) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events self._handle_recv() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback callback(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher return self.dispatch_shell(stream, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell handler(stream, idents, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request user_expressions, allow_stdin) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2718, in run_cell interactivity=interactivity, compiler=compiler, result=result) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes if self.run_code(code, result): File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2882, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 8, in import matplotlib.pyplot as plt File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", line 69, in from matplotlib.backends import pylab_setup File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/matplotlib/backends/init.py", line 14, in line for line in traceback.format_stack()

    matplotlib.use('Agg')

    opened by iweey 13
  • IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

    IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

    Describe the bug

    running the example below gives this error IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

    latest version on conda-forge

    To Reproduce

    wine.csv

    import numpy as np
    import pandas as pd
    from pandas_profiling import ProfileReport
    
    df = pd.read_csv("wine.csv")
    
    profile = ProfileReport(df, title="Pandas Profiling Report")
    
    profile.to_file("tmp.html")
    

    Version information:

    • Python 3.9
    • pandas-profiling 3.1.0 pyhd8ed1ab_ 0 conda-forge
    • pandas 1.4.2 py39h1832856_1 conda-forge
    bug 🐛 
    opened by darenr 12
  • chore(deps): update dependency scipy to >=1.10, <1.11

    chore(deps): update dependency scipy to >=1.10, <1.11

    Mend Renovate

    This PR contains the following updates:

    | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | scipy (source) | >=1.4.1, <1.10 -> >=1.10, <1.11 | age | adoption | passing | confidence |


    Release Notes

    scipy/scipy

    v1.10.0: SciPy 1.10.0

    Compare Source

    SciPy 1.10.0 Release Notes

    SciPy 1.10.0 is the culmination of 6 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Before upgrading, we recommend that users check that their own code does not use deprecated SciPy functionality (to do so, run your code with python -Wd and check for DeprecationWarning s). Our development attention will now shift to bug-fix releases on the 1.10.x branch, and on adding new features on the main branch.

    This release requires Python 3.8+ and NumPy 1.19.5 or greater.

    For running on PyPy, PyPy3 6.0+ is required.

    Highlights of this release

    • A new dedicated datasets submodule (scipy.datasets) has been added, and is now preferred over usage of scipy.misc for dataset retrieval.
    • A new scipy.interpolate.make_smoothing_spline function was added. This function constructs a smoothing cubic spline from noisy data, using the generalized cross-validation (GCV) criterion to find the tradeoff between smoothness and proximity to data points.
    • scipy.stats has three new distributions, two new hypothesis tests, three new sample statistics, a class for greater control over calculations involving covariance matrices, and many other enhancements.

    New features

    scipy.datasets introduction

    • A new dedicated datasets submodule has been added. The submodules is meant for datasets that are relevant to other SciPy submodules ands content (tutorials, examples, tests), as well as contain a curated set of datasets that are of wider interest. As of this release, all the datasets from scipy.misc have been added to scipy.datasets (and deprecated in scipy.misc).

    • The submodule is based on Pooch (a new optional dependency for SciPy), a Python package to simplify fetching data files. This move will, in a subsequent release, facilitate SciPy to trim down the sdist/wheel sizes, by decoupling the data files and moving them out of the SciPy repository, hosting them externally and downloading them when requested. After downloading the datasets once, the files are cached to avoid network dependence and repeated usage.

    • Added datasets from scipy.misc: scipy.datasets.face, scipy.datasets.ascent, scipy.datasets.electrocardiogram

    • Added download and caching functionality:

      • scipy.datasets.download_all: a function to download all the scipy.datasets associated files at once.
      • scipy.datasets.clear_cache: a simple utility function to clear cached dataset files from the file system.
      • scipy/datasets/_download_all.py can be run as a standalone script for packaging purposes to avoid any external dependency at build or test time. This can be used by SciPy packagers (e.g., for Linux distros) which may have to adhere to rules that forbid downloading sources from external repositories at package build time.

    scipy.integrate improvements

    • Added parameter complex_func to scipy.integrate.quad, which can be set True to integrate a complex integrand.

    scipy.interpolate improvements

    • scipy.interpolate.interpn now supports tensor-product interpolation methods (slinear, cubic, quintic and pchip)
    • Tensor-product interpolation methods (slinear, cubic, quintic and pchip) in scipy.interpolate.interpn and scipy.interpolate.RegularGridInterpolator now allow values with trailing dimensions.
    • scipy.interpolate.RegularGridInterpolator has a new fast path for method="linear" with 2D data, and RegularGridInterpolator is now easier to subclass
    • scipy.interpolate.interp1d now can take a single value for non-spline methods.
    • A new extrapolate argument is available to scipy.interpolate.BSpline.design_matrix, allowing extrapolation based on the first and last intervals.
    • A new function scipy.interpolate.make_smoothing_spline has been added. It is an implementation of the generalized cross-validation spline smoothing algorithm. The lam=None (default) mode of this function is a clean-room reimplementation of the classic gcvspl.f Fortran algorithm for constructing GCV splines.
    • A new method="pchip" mode was aded to scipy.interpolate.RegularGridInterpolator. This mode constructs an interpolator using tensor products of C1-continuous monotone splines (essentially, a scipy.interpolate.PchipInterpolator instance per dimension).

    scipy.sparse.linalg improvements

    • The spectral 2-norm is now available in scipy.sparse.linalg.norm.

    • The performance of scipy.sparse.linalg.norm for the default case (Frobenius norm) has been improved.

    • LAPACK wrappers were added for trexc and trsen.

    • The scipy.sparse.linalg.lobpcg algorithm was rewritten, yielding the following improvements:

      • a simple tunable restart potentially increases the attainable accuracy for edge cases,
      • internal postprocessing runs one final exact Rayleigh-Ritz method giving more accurate and orthonormal eigenvectors,
      • output the computed iterate with the smallest max norm of the residual and drop the history of subsequent iterations,
      • remove the check for LinearOperator format input and thus allow a simple function handle of a callable object as an input,
      • better handling of common user errors with input data, rather than letting the algorithm fail.

    scipy.linalg improvements

    • scipy.linalg.lu_factor now accepts rectangular arrays instead of being restricted to square arrays.

    scipy.ndimage improvements

    • The new scipy.ndimage.value_indices function provides a time-efficient method to search for the locations of individual values with an array of image data.
    • A new radius argument is supported by scipy.ndimage.gaussian_filter1d and scipy.ndimage.gaussian_filter for adjusting the kernel size of the filter.

    scipy.optimize improvements

    • scipy.optimize.brute now coerces non-iterable/single-value args into a tuple.
    • scipy.optimize.least_squares and scipy.optimize.curve_fit now accept scipy.optimize.Bounds for bounds constraints.
    • Added a tutorial for scipy.optimize.milp.
    • Improved the pretty-printing of scipy.optimize.OptimizeResult objects.
    • Additional options (parallel, threads, mip_rel_gap) can now be passed to scipy.optimize.linprog with method='highs'.

    scipy.signal improvements

    • The new window function scipy.signal.windows.lanczos was added to compute a Lanczos window, also known as a sinc window.

    scipy.sparse.csgraph improvements

    • the performance of scipy.sparse.csgraph.dijkstra has been improved, and star graphs in particular see a marked performance improvement

    scipy.special improvements

    • The new function scipy.special.powm1, a ufunc with signature powm1(x, y), computes x**y - 1. The function avoids the loss of precision that can result when y is close to 0 or when x is close to 1.
    • scipy.special.erfinv is now more accurate as it leverages the Boost equivalent under the hood.

    scipy.stats improvements

    • Added scipy.stats.goodness_of_fit, a generalized goodness-of-fit test for use with any univariate distribution, any combination of known and unknown parameters, and several choices of test statistic (Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling).

    • Improved scipy.stats.bootstrap: Default method 'BCa' now supports multi-sample statistics. Also, the bootstrap distribution is returned in the result object, and the result object can be passed into the function as parameter bootstrap_result to add additional resamples or change the confidence interval level and type.

    • Added maximum spacing estimation to scipy.stats.fit.

    • Added the Poisson means test ("E-test") as scipy.stats.poisson_means_test.

    • Added new sample statistics.

      • Added scipy.stats.contingency.odds_ratio to compute both the conditional and unconditional odds ratios and corresponding confidence intervals for 2x2 contingency tables.
      • Added scipy.stats.directional_stats to compute sample statistics of n-dimensional directional data.
      • Added scipy.stats.expectile, which generalizes the expected value in the same way as quantiles are a generalization of the median.
    • Added new statistical distributions.

      • Added scipy.stats.uniform_direction, a multivariate distribution to sample uniformly from the surface of a hypersphere.
      • Added scipy.stats.random_table, a multivariate distribution to sample uniformly from m x n contingency tables with provided marginals.
      • Added scipy.stats.truncpareto, the truncated Pareto distribution.
    • Improved the fit method of several distributions.

      • scipy.stats.skewnorm and scipy.stats.weibull_min now use an analytical solution when method='mm', which also serves a starting guess to improve the performance of method='mle'.
      • scipy.stats.gumbel_r and scipy.stats.gumbel_l: analytical maximum likelihood estimates have been extended to the cases in which location or scale are fixed by the user.
      • Analytical maximum likelihood estimates have been added for scipy.stats.powerlaw.
    • Improved random variate sampling of several distributions.

      • Drawing multiple samples from scipy.stats.matrix_normal, scipy.stats.ortho_group, scipy.stats.special_ortho_group, and scipy.stats.unitary_group is faster.
      • The rvs method of scipy.stats.vonmises now wraps to the interval [-np.pi, np.pi].
      • Improved the reliability of scipy.stats.loggamma rvs method for small values of the shape parameter.
    • Improved the speed and/or accuracy of functions of several statistical distributions.

      • Added scipy.stats.Covariance for better speed, accuracy, and user control in multivariate normal calculations.
      • scipy.stats.skewnorm methods cdf, sf, ppf, and isf methods now use the implementations from Boost, improving speed while maintaining accuracy. The calculation of higher-order moments is also faster and more accurate.
      • scipy.stats.invgauss methods ppf and isf methods now use the implementations from Boost, improving speed and accuracy.
      • scipy.stats.invweibull methods sf and isf are more accurate for small probability masses.
      • scipy.stats.nct and scipy.stats.ncx2 now rely on the implementations from Boost, improving speed and accuracy.
      • Implemented the logpdf method of scipy.stats.vonmises for reliability in extreme tails.
      • Implemented the isf method of scipy.stats.levy for speed and accuracy.
      • Improved the robustness of scipy.stats.studentized_range for large df by adding an infinite degree-of-freedom approximation.
      • Added a parameter lower_limit to scipy.stats.multivariate_normal, allowing the user to change the integration limit from -inf to a desired value.
      • Improved the robustness of entropy of scipy.stats.vonmises for large concentration values.
    • Enhanced scipy.stats.gaussian_kde.

      • Added scipy.stats.gaussian_kde.marginal, which returns the desired marginal distribution of the original kernel density estimate distribution.
      • The cdf method of scipy.stats.gaussian_kde now accepts a lower_limit parameter for integrating the PDF over a rectangular region.
      • Moved calculations for scipy.stats.gaussian_kde.logpdf to Cython, improving speed.
      • The global interpreter lock is released by the pdf method of scipy.stats.gaussian_kde for improved multithreading performance.
      • Replaced explicit matrix inversion with Cholesky decomposition for speed and accuracy.
    • Enhanced the result objects returned by many scipy.stats functions

      • Added a confidence_interval method to the result object returned by scipy.stats.ttest_1samp and scipy.stats.ttest_rel.
      • The scipy.stats functions combine_pvalues, fisher_exact, chi2_contingency, median_test and mood now return bunch objects rather than plain tuples, allowing attributes to be accessed by name.
      • Attributes of the result objects returned by multiscale_graphcorr, anderson_ksamp, binomtest, crosstab, pointbiserialr, spearmanr, kendalltau, and weightedtau have been renamed to statistic and pvalue for consistency throughout scipy.stats. Old attribute names are still allowed for backward compatibility.
      • scipy.stats.anderson now returns the parameters of the fitted distribution in a scipy.stats._result_classes.FitResult object.
      • The plot method of scipy.stats._result_classes.FitResult now accepts a plot_type parameter; the options are 'hist' (histogram, default), 'qq' (Q-Q plot), 'pp' (P-P plot), and 'cdf' (empirical CDF plot).
      • Kolmogorov-Smirnov tests (e.g. scipy.stats.kstest) now return the location (argmax) at which the statistic is calculated and the variant of the statistic used.
    • Improved the performance of several scipy.stats functions.

      • Improved the performance of scipy.stats.cramervonmises_2samp and scipy.stats.ks_2samp with method='exact'.
      • Improved the performance of scipy.stats.siegelslopes.
      • Improved the performance of scipy.stats.mstats.hdquantile_sd.
      • Improved the performance of scipy.stats.binned_statistic_dd for several NumPy statistics, and binned statistics methods now support complex data.
    • Added the scramble optional argument to scipy.stats.qmc.LatinHypercube. It replaces centered, which is now deprecated.

    • Added a parameter optimization to all scipy.stats.qmc.QMCEngine subclasses to improve characteristics of the quasi-random variates.

    • Added tie correction to scipy.stats.mood.

    • Added tutorials for resampling methods in scipy.stats.

    • scipy.stats.bootstrap, scipy.stats.permutation_test, and scipy.stats.monte_carlo_test now automatically detect whether the provided statistic is vectorized, so passing the vectorized argument explicitly is no longer required to take advantage of vectorized statistics.

    • Improved the speed of scipy.stats.permutation_test for permutation types 'samples' and 'pairings'.

    • Added axis, nan_policy, and masked array support to scipy.stats.jarque_bera.

    • Added the nan_policy optional argument to scipy.stats.rankdata.

    Deprecated features

    • scipy.misc module and all the methods in misc are deprecated in v1.10 and will be completely removed in SciPy v2.0.0. Users are suggested to utilize the scipy.datasets module instead for the dataset methods.
    • scipy.stats.qmc.LatinHypercube parameter centered has been deprecated. It is replaced by the scramble argument for more consistency with other QMC engines.
    • scipy.interpolate.interp2d class has been deprecated. The docstring of the deprecated routine lists recommended replacements.

    Expired Deprecations

    • There is an ongoing effort to follow through on long-standing deprecations.

    • The following previously deprecated features are affected:

      • Removed cond & rcond kwargs in linalg.pinv
      • Removed wrappers scipy.linalg.blas.{clapack, flapack}
      • Removed scipy.stats.NumericalInverseHermite and removed tol & max_intervals kwargs from scipy.stats.sampling.NumericalInverseHermite
      • Removed local_search_options kwarg frrom scipy.optimize.dual_annealing.

    Other changes

    • scipy.stats.bootstrap, scipy.stats.permutation_test, and scipy.stats.monte_carlo_test now automatically detect whether the provided statistic is vectorized by looking for an axis parameter in the signature of statistic. If an axis parameter is present in statistic but should not be relied on for vectorized calls, users must pass option vectorized==False explicitly.
    • scipy.stats.multivariate_normal will now raise a ValueError when the covariance matrix is not positive semidefinite, regardless of which method is called.

    Authors

    • Name (commits)
    • h-vetinari (10)
    • Jelle Aalbers (1)
    • Oriol Abril-Pla (1) +
    • Alan-Hung (1) +
    • Tania Allard (7)
    • Oren Amsalem (1) +
    • Sven Baars (10)
    • Balthasar (1) +
    • Ross Barnowski (1)
    • Christoph Baumgarten (2)
    • Peter Bell (2)
    • Sebastian Berg (1)
    • Aaron Berk (1) +
    • boatwrong (1) +
    • boeleman (1) +
    • Jake Bowhay (50)
    • Matthew Brett (4)
    • Evgeni Burovski (93)
    • Matthias Bussonnier (6)
    • Dominic C (2)
    • Mingbo Cai (1) +
    • James Campbell (2) +
    • CJ Carey (4)
    • cesaregarza (1) +
    • charlie0389 (1) +
    • Hood Chatham (5)
    • Andrew Chin (1) +
    • Daniel Ching (1) +
    • Leo Chow (1) +
    • chris (3) +
    • John Clow (1) +
    • cm7S (1) +
    • cmgodwin (1) +
    • Christopher Cowden (2) +
    • Henry Cuzco (2) +
    • Anirudh Dagar (12)
    • Hans Dembinski (2) +
    • Jaiden di Lanzo (24) +
    • Felipe Dias (1) +
    • Dieter Werthmüller (1)
    • Giuseppe Dilillo (1) +
    • dpoerio (1) +
    • drpeteb (1) +
    • Christopher Dupuis (1) +
    • Jordan Edmunds (1) +
    • Pieter Eendebak (1) +
    • Jérome Eertmans (1) +
    • Fabian Egli (2) +
    • Sebastian Ehlert (2) +
    • Kian Eliasi (1) +
    • Tomohiro Endo (1) +
    • Stefan Endres (1)
    • Zeb Engberg (4) +
    • Jonas Eschle (1) +
    • Thomas J. Fan (9)
    • fiveseven (1) +
    • Neil Flood (1) +
    • Franz Forstmayr (1)
    • Sara Fridovich-Keil (1)
    • David Gilbertson (1) +
    • Ralf Gommers (251)
    • Marco Gorelli (2) +
    • Matt Haberland (387)
    • Andrew Hawryluk (2) +
    • Christoph Hohnerlein (2) +
    • Loïc Houpert (2) +
    • Shamus Husheer (1) +
    • ideasrule (1) +
    • imoiwm (1) +
    • Lakshaya Inani (1) +
    • Joseph T. Iosue (1)
    • iwbc-mzk (1) +
    • Nathan Jacobi (3) +
    • Julien Jerphanion (5)
    • He Jia (1)
    • jmkuebler (1) +
    • Johannes Müller (1) +
    • Vedant Jolly (1) +
    • Juan Luis Cano Rodríguez (2)
    • Justin (1) +
    • jvavrek (1) +
    • jyuv (2)
    • Kai Mühlbauer (1) +
    • Nikita Karetnikov (3) +
    • Reinert Huseby Karlsen (1) +
    • kaspar (2) +
    • Toshiki Kataoka (1)
    • Robert Kern (3)
    • Joshua Klein (1) +
    • Andrew Knyazev (7)
    • Jozsef Kutas (16) +
    • Eric Larson (4)
    • Lechnio (1) +
    • Antony Lee (2)
    • Aditya Limaye (1) +
    • Xingyu Liu (2)
    • Christian Lorentzen (4)
    • Loïc Estève (2)
    • Thibaut Lunet (2) +
    • Peter Lysakovski (1)
    • marianasalamoni (2) +
    • mariprudencio (1) +
    • Paige Martin (1) +
    • Arno Marty (1) +
    • matthewborish (3) +
    • Damon McDougall (1)
    • Nicholas McKibben (22)
    • McLP (1) +
    • mdmahendri (1) +
    • Melissa Weber Mendonça (9)
    • Jarrod Millman (1)
    • Naoto Mizuno (2)
    • Shashaank N (1)
    • Pablo S Naharro (1) +
    • nboudrie (2) +
    • Andrew Nelson (52)
    • Nico Schlömer (1)
    • NiMlr (1) +
    • o-alexandre-felipe (1) +
    • Maureen Ononiwu (1) +
    • Dimitri Papadopoulos (2) +
    • partev (1) +
    • Tirth Patel (10)
    • Paulius Šarka (1) +
    • Josef Perktold (1)
    • Giacomo Petrillo (3) +
    • Matti Picus (1)
    • Rafael Pinto (1) +
    • PKNaveen (1) +
    • Ilhan Polat (6)
    • Akshita Prasanth (2) +
    • Sean Quinn (1)
    • Tyler Reddy (155)
    • Martin Reinecke (1)
    • Ned Richards (1)
    • Marie Roald (1) +
    • Sam Rosen (4) +
    • Pamphile Roy (105)
    • sabonerune (2) +
    • Atsushi Sakai (94)
    • Daniel Schmitz (27)
    • Anna Scholtz (1) +
    • Eli Schwartz (11)
    • serge-sans-paille (2)
    • JEEVANSHI SHARMA (1) +
    • ehsan shirvanian (2) +
    • siddhantwahal (2)
    • Mathieu Dutour Sikiric (1) +
    • Sourav Singh (1)
    • Alexander Soare (1) +
    • Bjørge Solli (2) +
    • Scott Staniewicz (1)
    • Ethan Steinberg (3) +
    • Albert Steppi (3)
    • Thomas Stoeger (1) +
    • Kai Striega (4)
    • Tartopohm (1) +
    • Mamoru TASAKA (2) +
    • Ewout ter Hoeven (5)
    • TianyiQ (1) +
    • Tiger (1) +
    • Will Tirone (1)
    • Ajay Shanker Tripathi (1) +
    • Edgar Andrés Margffoy Tuay (1) +
    • Dmitry Ulyumdzhiev (1) +
    • Hari Vamsi (1) +
    • VitalyChait (1) +
    • Rik Voorhaar (1) +
    • Samuel Wallan (4)
    • Stefan van der Walt (2)
    • Warren Weckesser (145)
    • wei2222 (1) +
    • windows-server-2003 (3) +
    • Marek Wojciechowski (2) +
    • Niels Wouda (1) +
    • WRKampi (1) +
    • Yeonjoo Yoo (1) +
    • Rory Yorke (1)
    • Xiao Yuan (2) +
    • Meekail Zain (2) +
    • Fabio Zanini (1) +
    • Steffen Zeile (1) +
    • Egor Zemlyanoy (19)
    • Gavin Zhang (3) +

    A total of 184 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete.


    Configuration

    📅 Schedule: Branch creation - "after 9am and before 1pm every weekday" (UTC), Automerge - At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, check this box

    This PR has been generated by Mend Renovate. View repository job log here.

    opened by renovate[bot] 1
  • chore(deps): update dependency numpy to >=1.24.1,<1.25

    chore(deps): update dependency numpy to >=1.24.1,<1.25

    Mend Renovate

    This PR contains the following updates:

    | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | numpy (source) | >=1.16.0,<1.24 -> >=1.24.1,<1.25 | age | adoption | passing | confidence |


    Release Notes

    numpy/numpy

    v1.24.1

    Compare Source

    NumPy 1.24.1 Release Notes

    NumPy 1.24.1 is a maintenance release that fixes bugs and regressions discovered after the 1.24.0 release. The Python versions supported by this release are 3.8-3.11.

    Contributors

    A total of 12 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

    • Andrew Nelson
    • Ben Greiner +
    • Charles Harris
    • Clément Robert
    • Matteo Raso
    • Matti Picus
    • Melissa Weber Mendonça
    • Miles Cranmer
    • Ralf Gommers
    • Rohit Goswami
    • Sayed Adel
    • Sebastian Berg

    Pull requests merged

    A total of 18 pull requests were merged for this release.

    • #​22820: BLD: add workaround in setup.py for newer setuptools
    • #​22830: BLD: CIRRUS_TAG redux
    • #​22831: DOC: fix a couple typos in 1.23 notes
    • #​22832: BUG: Fix refcounting errors found using pytest-leaks
    • #​22834: BUG, SIMD: Fix invalid value encountered in several ufuncs
    • #​22837: TST: ignore more np.distutils.log imports
    • #​22839: BUG: Do not use getdata() in np.ma.masked_invalid
    • #​22847: BUG: Ensure correct behavior for rows ending in delimiter in...
    • #​22848: BUG, SIMD: Fix the bitmask of the boolean comparison
    • #​22857: BLD: Help raspian arm + clang 13 about __builtin_mul_overflow
    • #​22858: API: Ensure a full mask is returned for masked_invalid
    • #​22866: BUG: Polynomials now copy properly (#​22669)
    • #​22867: BUG, SIMD: Fix memory overlap in ufunc comparison loops
    • #​22868: BUG: Fortify string casts against floating point warnings
    • #​22875: TST: Ignore nan-warnings in randomized out tests
    • #​22883: MAINT: restore npymath implementations needed for freebsd
    • #​22884: BUG: Fix integer overflow in in1d for mixed integer dtypes #​22877
    • #​22887: BUG: Use whole file for encoding checks with charset_normalizer.

    Checksums

    MD5
    9e543db90493d6a00939bd54c2012085  numpy-1.24.1-cp310-cp310-macosx_10_9_x86_64.whl
    4ebd7af622bf617b4876087e500d7586  numpy-1.24.1-cp310-cp310-macosx_11_0_arm64.whl
    0c0a3012b438bb455a6c2fadfb1be76a  numpy-1.24.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    0bddb527345449df624d3cb9aa0e1b75  numpy-1.24.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    b246beb773689d97307f7b4c2970f061  numpy-1.24.1-cp310-cp310-win32.whl
    1f3823999fce821a28dee10ac6fdd721  numpy-1.24.1-cp310-cp310-win_amd64.whl
    8eedcacd6b096a568e4cb393d43b3ae5  numpy-1.24.1-cp311-cp311-macosx_10_9_x86_64.whl
    50bddb05acd54b4396100a70522496dd  numpy-1.24.1-cp311-cp311-macosx_11_0_arm64.whl
    2a76bd9da8a78b44eb816bd70fa3aee3  numpy-1.24.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    9e86658a414272f9749bde39344f9b76  numpy-1.24.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    915dfb89054e1631574a22a9b53a2b25  numpy-1.24.1-cp311-cp311-win32.whl
    ab7caa2c6c20e1fab977e1a94dede976  numpy-1.24.1-cp311-cp311-win_amd64.whl
    8246de961f813f5aad89bca3d12f81e7  numpy-1.24.1-cp38-cp38-macosx_10_9_x86_64.whl
    58366b1a559baa0547ce976e416ed76d  numpy-1.24.1-cp38-cp38-macosx_11_0_arm64.whl
    a96f29bf106a64f82b9ba412635727d1  numpy-1.24.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    4c32a43bdb85121614ab3e99929e33c7  numpy-1.24.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    09b20949ed21683ad7c9cbdf9ebb2439  numpy-1.24.1-cp38-cp38-win32.whl
    9e9f1577f874286a8bdff8dc5551eb9f  numpy-1.24.1-cp38-cp38-win_amd64.whl
    4383c1137f0287df67c364fbdba2bc72  numpy-1.24.1-cp39-cp39-macosx_10_9_x86_64.whl
    987f22c49b2be084b5d72f88f347d31e  numpy-1.24.1-cp39-cp39-macosx_11_0_arm64.whl
    848ad020bba075ed8f19072c64dcd153  numpy-1.24.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    864b159e644848bc25f881907dbcf062  numpy-1.24.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    db339ec0b2693cac2d7cf9ca75c334b1  numpy-1.24.1-cp39-cp39-win32.whl
    fec91d4c85066ad8a93816d71b627701  numpy-1.24.1-cp39-cp39-win_amd64.whl
    619af9cd4f33b668822ae2350f446a15  numpy-1.24.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
    46f19b4b147f8836c2bd34262fabfffa  numpy-1.24.1-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    e85b245c57a10891b3025579bf0cf298  numpy-1.24.1-pp38-pypy38_pp73-win_amd64.whl
    dd3aaeeada8e95cc2edf9a3a4aa8b5af  numpy-1.24.1.tar.gz
    
    SHA256
    179a7ef0889ab769cc03573b6217f54c8bd8e16cef80aad369e1e8185f994cd7  numpy-1.24.1-cp310-cp310-macosx_10_9_x86_64.whl
    b09804ff570b907da323b3d762e74432fb07955701b17b08ff1b5ebaa8cfe6a9  numpy-1.24.1-cp310-cp310-macosx_11_0_arm64.whl
    f1b739841821968798947d3afcefd386fa56da0caf97722a5de53e07c4ccedc7  numpy-1.24.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    0e3463e6ac25313462e04aea3fb8a0a30fb906d5d300f58b3bc2c23da6a15398  numpy-1.24.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    b31da69ed0c18be8b77bfce48d234e55d040793cebb25398e2a7d84199fbc7e2  numpy-1.24.1-cp310-cp310-win32.whl
    b07b40f5fb4fa034120a5796288f24c1fe0e0580bbfff99897ba6267af42def2  numpy-1.24.1-cp310-cp310-win_amd64.whl
    7094891dcf79ccc6bc2a1f30428fa5edb1e6fb955411ffff3401fb4ea93780a8  numpy-1.24.1-cp311-cp311-macosx_10_9_x86_64.whl
    28e418681372520c992805bb723e29d69d6b7aa411065f48216d8329d02ba032  numpy-1.24.1-cp311-cp311-macosx_11_0_arm64.whl
    e274f0f6c7efd0d577744f52032fdd24344f11c5ae668fe8d01aac0422611df1  numpy-1.24.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    0044f7d944ee882400890f9ae955220d29b33d809a038923d88e4e01d652acd9  numpy-1.24.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    442feb5e5bada8408e8fcd43f3360b78683ff12a4444670a7d9e9824c1817d36  numpy-1.24.1-cp311-cp311-win32.whl
    de92efa737875329b052982e37bd4371d52cabf469f83e7b8be9bb7752d67e51  numpy-1.24.1-cp311-cp311-win_amd64.whl
    b162ac10ca38850510caf8ea33f89edcb7b0bb0dfa5592d59909419986b72407  numpy-1.24.1-cp38-cp38-macosx_10_9_x86_64.whl
    26089487086f2648944f17adaa1a97ca6aee57f513ba5f1c0b7ebdabbe2b9954  numpy-1.24.1-cp38-cp38-macosx_11_0_arm64.whl
    caf65a396c0d1f9809596be2e444e3bd4190d86d5c1ce21f5fc4be60a3bc5b36  numpy-1.24.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    b0677a52f5d896e84414761531947c7a330d1adc07c3a4372262f25d84af7bf7  numpy-1.24.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    dae46bed2cb79a58d6496ff6d8da1e3b95ba09afeca2e277628171ca99b99db1  numpy-1.24.1-cp38-cp38-win32.whl
    6ec0c021cd9fe732e5bab6401adea5a409214ca5592cd92a114f7067febcba0c  numpy-1.24.1-cp38-cp38-win_amd64.whl
    28bc9750ae1f75264ee0f10561709b1462d450a4808cd97c013046073ae64ab6  numpy-1.24.1-cp39-cp39-macosx_10_9_x86_64.whl
    84e789a085aabef2f36c0515f45e459f02f570c4b4c4c108ac1179c34d475ed7  numpy-1.24.1-cp39-cp39-macosx_11_0_arm64.whl
    8e669fbdcdd1e945691079c2cae335f3e3a56554e06bbd45d7609a6cf568c700  numpy-1.24.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    ef85cf1f693c88c1fd229ccd1055570cb41cdf4875873b7728b6301f12cd05bf  numpy-1.24.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    87a118968fba001b248aac90e502c0b13606721b1343cdaddbc6e552e8dfb56f  numpy-1.24.1-cp39-cp39-win32.whl
    ddc7ab52b322eb1e40521eb422c4e0a20716c271a306860979d450decbb51b8e  numpy-1.24.1-cp39-cp39-win_amd64.whl
    ed5fb71d79e771ec930566fae9c02626b939e37271ec285e9efaf1b5d4370e7d  numpy-1.24.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
    ad2925567f43643f51255220424c23d204024ed428afc5aad0f86f3ffc080086  numpy-1.24.1-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    cfa1161c6ac8f92dea03d625c2d0c05e084668f4a06568b77a25a89111621566  numpy-1.24.1-pp38-pypy38_pp73-win_amd64.whl
    2386da9a471cc00a1f47845e27d916d5ec5346ae9696e01a8a34760858fe9dd2  numpy-1.24.1.tar.gz
    

    v1.24.0

    Compare Source

    NumPy 1.24 Release Notes

    The NumPy 1.24.0 release continues the ongoing work to improve the handling and promotion of dtypes, increase the execution speed, and clarify the documentation. There are also a large number of new and expired deprecations due to changes in promotion and cleanups. This might be called a deprecation release. Highlights are

    • Many new deprecations, check them out.
    • Many expired deprecations,
    • New F2PY features and fixes.
    • New "dtype" and "casting" keywords for stacking functions.

    See below for the details,

    This release supports Python versions 3.8-3.11.

    Deprecations

    Deprecate fastCopyAndTranspose and PyArray_CopyAndTranspose

    The numpy.fastCopyAndTranspose function has been deprecated. Use the corresponding copy and transpose methods directly:

    arr.T.copy()
    

    The underlying C function PyArray_CopyAndTranspose has also been deprecated from the NumPy C-API.

    (gh-22313)

    Conversion of out-of-bound Python integers

    Attempting a conversion from a Python integer to a NumPy value will now always check whether the result can be represented by NumPy. This means the following examples will fail in the future and give a DeprecationWarning now:

    np.uint8(-1)
    np.array([3000], dtype=np.int8)
    

    Many of these did succeed before. Such code was mainly useful for unsigned integers with negative values such as np.uint8(-1) giving np.iinfo(np.uint8).max.

    Note that conversion between NumPy integers is unaffected, so that np.array(-1).astype(np.uint8) continues to work and use C integer overflow logic. For negative values, it will also work to view the array: np.array(-1, dtype=np.int8).view(np.uint8). In some cases, using np.iinfo(np.uint8).max or val % 2**8 may also work well.

    In rare cases input data may mix both negative values and very large unsigned values (i.e. -1 and 2**63). There it is unfortunately necessary to use % on the Python value or use signed or unsigned conversion depending on whether negative values are expected.

    (gh-22385)

    Deprecate msort

    The numpy.msort function is deprecated. Use np.sort(a, axis=0) instead.

    (gh-22456)

    np.str0 and similar are now deprecated

    The scalar type aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually be removed.

    (gh-22607)

    Expired deprecations

    • The normed keyword argument has been removed from [np.histogram]{.title-ref}, [np.histogram2d]{.title-ref}, and [np.histogramdd]{.title-ref}. Use density instead. If normed was passed by position, density is now used.

      (gh-21645)

    • Ragged array creation will now always raise a ValueError unless dtype=object is passed. This includes very deeply nested sequences.

      (gh-22004)

    • Support for Visual Studio 2015 and earlier has been removed.

    • Support for the Windows Interix POSIX interop layer has been removed.

      (gh-22139)

    • Support for Cygwin < 3.3 has been removed.

      (gh-22159)

    • The mini() method of np.ma.MaskedArray has been removed. Use either np.ma.MaskedArray.min() or np.ma.minimum.reduce().

    • The single-argument form of np.ma.minimum and np.ma.maximum has been removed. Use np.ma.minimum.reduce() or np.ma.maximum.reduce() instead.

      (gh-22228)

    • Passing dtype instances other than the canonical (mainly native byte-order) ones to dtype= or signature= in ufuncs will now raise a TypeError. We recommend passing the strings "int8" or scalar types np.int8 since the byte-order, datetime/timedelta unit, etc. are never enforced. (Initially deprecated in NumPy 1.21.)

      (gh-22540)

    • The dtype= argument to comparison ufuncs is now applied correctly. That means that only bool and object are valid values and dtype=object is enforced.

      (gh-22541)

    • The deprecation for the aliases np.object, np.bool, np.float, np.complex, np.str, and np.int is expired (introduces NumPy 1.20). Some of these will now give a FutureWarning in addition to raising an error since they will be mapped to the NumPy scalars in the future.

      (gh-22607)

    Compatibility notes

    array.fill(scalar) may behave slightly different

    numpy.ndarray.fill may in some cases behave slightly different now due to the fact that the logic is aligned with item assignment:

    arr = np.array([1])  # with any dtype/value
    arr.fill(scalar)
    

    is now identical to:

    arr[0] = scalar
    

    Previously casting may have produced slightly different answers when using values that could not be represented in the target dtype or when the target had object dtype.

    (gh-20924)

    Subarray to object cast now copies

    Casting a dtype that includes a subarray to an object will now ensure a copy of the subarray. Previously an unsafe view was returned:

    arr = np.ones(3, dtype=[("f", "i", 3)])
    subarray_fields = arr.astype(object)[0]
    subarray = subarray_fields[0]  # "f" field
    
    np.may_share_memory(subarray, arr)
    

    Is now always false. While previously it was true for the specific cast.

    (gh-21925)

    Returned arrays respect uniqueness of dtype kwarg objects

    When the dtype keyword argument is used with :pynp.array(){.interpreted-text role="func"} or :pyasarray(){.interpreted-text role="func"}, the dtype of the returned array now always exactly matches the dtype provided by the caller.

    In some cases this change means that a view rather than the input array is returned. The following is an example for this on 64bit Linux where long and longlong are the same precision but different dtypes:

    >>> arr = np.array([1, 2, 3], dtype="long")
    >>> new_dtype = np.dtype("longlong")
    >>> new = np.asarray(arr, dtype=new_dtype)
    >>> new.dtype is new_dtype
    True
    >>> new is arr
    False
    

    Before the change, the dtype did not match because new is arr was True.

    (gh-21995)

    DLPack export raises BufferError

    When an array buffer cannot be exported via DLPack a BufferError is now always raised where previously TypeError or RuntimeError was raised. This allows falling back to the buffer protocol or __array_interface__ when DLPack was tried first.

    (gh-22542)

    NumPy builds are no longer tested on GCC-6

    Ubuntu 18.04 is deprecated for GitHub actions and GCC-6 is not available on Ubuntu 20.04, so builds using that compiler are no longer tested. We still test builds using GCC-7 and GCC-8.

    (gh-22598)

    New Features

    New attribute symbol added to polynomial classes

    The polynomial classes in the numpy.polynomial package have a new symbol attribute which is used to represent the indeterminate of the polynomial. This can be used to change the value of the variable when printing:

    >>> P_y = np.polynomial.Polynomial([1, 0, -1], symbol="y")
    >>> print(P_y)
    1.0 + 0.0·y¹ - 1.0·y²
    

    Note that the polynomial classes only support 1D polynomials, so operations that involve polynomials with different symbols are disallowed when the result would be multivariate:

    >>> P = np.polynomial.Polynomial([1, -1])  # default symbol is "x"
    >>> P_z = np.polynomial.Polynomial([1, 1], symbol="z")
    >>> P * P_z
    Traceback (most recent call last)
       ...
    ValueError: Polynomial symbols differ
    

    The symbol can be any valid Python identifier. The default is symbol=x, consistent with existing behavior.

    (gh-16154)

    F2PY support for Fortran character strings

    F2PY now supports wrapping Fortran functions with:

    • character (e.g. character x)
    • character array (e.g. character, dimension(n) :: x)
    • character string (e.g. character(len=10) x)
    • and character string array (e.g. character(len=10), dimension(n, m) :: x)

    arguments, including passing Python unicode strings as Fortran character string arguments.

    (gh-19388)

    New function np.show_runtime

    A new function numpy.show_runtime has been added to display the runtime information of the machine in addition to numpy.show_config which displays the build-related information.

    (gh-21468)

    strict option for testing.assert_array_equal

    The strict option is now available for testing.assert_array_equal. Setting strict=True will disable the broadcasting behaviour for scalars and ensure that input arrays have the same data type.

    (gh-21595)

    New parameter equal_nan added to np.unique

    np.unique was changed in 1.21 to treat all NaN values as equal and return a single NaN. Setting equal_nan=False will restore pre-1.21 behavior to treat NaNs as unique. Defaults to True.

    (gh-21623)

    casting and dtype keyword arguments for numpy.stack

    The casting and dtype keyword arguments are now available for numpy.stack. To use them, write np.stack(..., dtype=None, casting='same_kind').

    casting and dtype keyword arguments for numpy.vstack

    The casting and dtype keyword arguments are now available for numpy.vstack. To use them, write np.vstack(..., dtype=None, casting='same_kind').

    casting and dtype keyword arguments for numpy.hstack

    The casting and dtype keyword arguments are now available for numpy.hstack. To use them, write np.hstack(..., dtype=None, casting='same_kind').

    (gh-21627)

    The bit generator underlying the singleton RandomState can be changed

    The singleton RandomState instance exposed in the numpy.random module is initialized at startup with the MT19937 bit generator. The new function set_bit_generator allows the default bit generator to be replaced with a user-provided bit generator. This function has been introduced to provide a method allowing seamless integration of a high-quality, modern bit generator in new code with existing code that makes use of the singleton-provided random variate generating functions. The companion function get_bit_generator returns the current bit generator being used by the singleton RandomState. This is provided to simplify restoring the original source of randomness if required.

    The preferred method to generate reproducible random numbers is to use a modern bit generator in an instance of Generator. The function default_rng simplifies instantiation:

    >>> rg = np.random.default_rng(3728973198)
    >>> rg.random()
    

    The same bit generator can then be shared with the singleton instance so that calling functions in the random module will use the same bit generator:

    >>> orig_bit_gen = np.random.get_bit_generator()
    >>> np.random.set_bit_generator(rg.bit_generator)
    >>> np.random.normal()
    

    The swap is permanent (until reversed) and so any call to functions in the random module will use the new bit generator. The original can be restored if required for code to run correctly:

    >>> np.random.set_bit_generator(orig_bit_gen)
    

    (gh-21976)

    np.void now has a dtype argument

    NumPy now allows constructing structured void scalars directly by passing the dtype argument to np.void.

    (gh-22316)

    Improvements

    F2PY Improvements
    • The generated extension modules don't use the deprecated NumPy-C API anymore
    • Improved f2py generated exception messages
    • Numerous bug and flake8 warning fixes
    • various CPP macros that one can use within C-expressions of signature files are prefixed with f2py_. For example, one should use f2py_len(x) instead of len(x)
    • A new construct character(f2py_len=...) is introduced to support returning assumed length character strings (e.g. character(len=*)) from wrapper functions

    A hook to support rewriting f2py internal data structures after reading all its input files is introduced. This is required, for instance, for BC of SciPy support where character arguments are treated as character strings arguments in C expressions.

    (gh-19388)

    IBM zSystems Vector Extension Facility (SIMD)

    Added support for SIMD extensions of zSystem (z13, z14, z15), through the universal intrinsics interface. This support leads to performance improvements for all SIMD kernels implemented using the universal intrinsics, including the following operations: rint, floor, trunc, ceil, sqrt, absolute, square, reciprocal, tanh, sin, cos, equal, not_equal, greater, greater_equal, less, less_equal, maximum, minimum, fmax, fmin, argmax, argmin, add, subtract, multiply, divide.

    (gh-20913)

    NumPy now gives floating point errors in casts

    In most cases, NumPy previously did not give floating point warnings or errors when these happened during casts. For examples, casts like:

    np.array([2e300]).astype(np.float32)  # overflow for float32
    np.array([np.inf]).astype(np.int64)
    

    Should now generally give floating point warnings. These warnings should warn that floating point overflow occurred. For errors when converting floating point values to integers users should expect invalid value warnings.

    Users can modify the behavior of these warnings using np.errstate.

    Note that for float to int casts, the exact warnings that are given may be platform dependent. For example:

    arr = np.full(100, value=1000, dtype=np.float64)
    arr.astype(np.int8)
    

    May give a result equivalent to (the intermediate cast means no warning is given):

    arr.astype(np.int64).astype(np.int8)
    

    May return an undefined result, with a warning set:

    RuntimeWarning: invalid value encountered in cast
    

    The precise behavior is subject to the C99 standard and its implementation in both software and hardware.

    (gh-21437)

    F2PY supports the value attribute

    The Fortran standard requires that variables declared with the value attribute must be passed by value instead of reference. F2PY now supports this use pattern correctly. So integer, intent(in), value :: x in Fortran codes will have correct wrappers generated.

    (gh-21807)

    Added pickle support for third-party BitGenerators

    The pickle format for bit generators was extended to allow each bit generator to supply its own constructor when during pickling. Previous versions of NumPy only supported unpickling Generator instances created with one of the core set of bit generators supplied with NumPy. Attempting to unpickle a Generator that used a third-party bit generators would fail since the constructor used during the unpickling was only aware of the bit generators included in NumPy.

    (gh-22014)

    arange() now explicitly fails with dtype=str

    Previously, the np.arange(n, dtype=str) function worked for n=1 and n=2, but would raise a non-specific exception message for other values of n. Now, it raises a [TypeError]{.title-ref} informing that arange does not support string dtypes:

    >>> np.arange(2, dtype=str)
    Traceback (most recent call last)
       ...
    TypeError: arange() not supported for inputs with DType <class 'numpy.dtype[str_]'>.
    

    (gh-22055)

    numpy.typing protocols are now runtime checkable

    The protocols used in numpy.typing.ArrayLike and numpy.typing.DTypeLike are now properly marked as runtime checkable, making them easier to use for runtime type checkers.

    (gh-22357)

    Performance improvements and changes

    Faster version of np.isin and np.in1d for integer arrays

    np.in1d (used by np.isin) can now switch to a faster algorithm (up to >10x faster) when it is passed two integer arrays. This is often automatically used, but you can use kind="sort" or kind="table" to force the old or new method, respectively.

    (gh-12065)

    Faster comparison operators

    The comparison functions (numpy.equal, numpy.not_equal, numpy.less, numpy.less_equal, numpy.greater and numpy.greater_equal) are now much faster as they are now vectorized with universal intrinsics. For a CPU with SIMD extension AVX512BW, the performance gain is up to 2.57x, 1.65x and 19.15x for integer, float and boolean data types, respectively (with N=50000).

    (gh-21483)

    Changes

    Better reporting of integer division overflow

    Integer division overflow of scalars and arrays used to provide a RuntimeWarning and the return value was undefined leading to crashes at rare occasions:

    >>> np.array([np.iinfo(np.int32).min]*10, dtype=np.int32) // np.int32(-1)
    <stdin>:1: RuntimeWarning: divide by zero encountered in floor_divide
    array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
    

    Integer division overflow now returns the input dtype's minimum value and raise the following RuntimeWarning:

    >>> np.array([np.iinfo(np.int32).min]*10, dtype=np.int32) // np.int32(-1)
    <stdin>:1: RuntimeWarning: overflow encountered in floor_divide
    array([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648,
           -2147483648, -2147483648, -2147483648, -2147483648, -2147483648],
          dtype=int32)
    

    (gh-21506)

    masked_invalid now modifies the mask in-place

    When used with copy=False, numpy.ma.masked_invalid now modifies the input masked array in-place. This makes it behave identically to masked_where and better matches the documentation.

    (gh-22046)

    nditer/NpyIter allows all allocating all operands

    The NumPy iterator available through np.nditer in Python and as NpyIter in C now supports allocating all arrays. The iterator shape defaults to () in this case. The operands dtype must be provided, since a "common dtype" cannot be inferred from the other inputs.

    (gh-22457)

    Checksums

    MD5
    d60311246bd71b177258ce06e2a4ec57  numpy-1.24.0-cp310-cp310-macosx_10_9_x86_64.whl
    02022b335938af55cb83bbaebdbff8e1  numpy-1.24.0-cp310-cp310-macosx_11_0_arm64.whl
    02b35d6612369fcc614c6223aaec0119  numpy-1.24.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    7b8ad389a9619db3e1f8243fc0cfe63d  numpy-1.24.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    6ff4acbb7b1258ccbd528c151eb0fe84  numpy-1.24.0-cp310-cp310-win32.whl
    d194c96601222db97b0af54fce1cfb1d  numpy-1.24.0-cp310-cp310-win_amd64.whl
    5fe4eb551a9312e37492da9f5bfb8545  numpy-1.24.0-cp311-cp311-macosx_10_9_x86_64.whl
    a8e836a768f73e9f509b11c3873c7e09  numpy-1.24.0-cp311-cp311-macosx_11_0_arm64.whl
    10404d6d1a5a9624f85018f61110b2be  numpy-1.24.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    cfdb0cb844f1db9be2cde998be54d65f  numpy-1.24.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    73bc66ad3ae8656ba18d64db98feb5e1  numpy-1.24.0-cp311-cp311-win32.whl
    4bbc30a53009c48d364d4dc2c612af95  numpy-1.24.0-cp311-cp311-win_amd64.whl
    94ce5f6a09605a9675a0d464b1ec6597  numpy-1.24.0-cp38-cp38-macosx_10_9_x86_64.whl
    e5e42b69a209eda7e6895dda39ea8610  numpy-1.24.0-cp38-cp38-macosx_11_0_arm64.whl
    36eb6143d1e2aac3c618275edf636983  numpy-1.24.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    712c3718e8b53ff04c626cc4c78492aa  numpy-1.24.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    0a1a48a8e458bd4ce581169484c17e4f  numpy-1.24.0-cp38-cp38-win32.whl
    c8ab7e4b919548663568a5b5a8b5eab4  numpy-1.24.0-cp38-cp38-win_amd64.whl
    1783a5d769566111d93c474c79892c01  numpy-1.24.0-cp39-cp39-macosx_10_9_x86_64.whl
    c9e77130674372c73f8209d58396624d  numpy-1.24.0-cp39-cp39-macosx_11_0_arm64.whl
    14c0f2f52f20f13a81bba7df27f30145  numpy-1.24.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    c106393b46fa0302dbac49b14a4dfed4  numpy-1.24.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    c83e6d6946f32820f166c3f1ff010ab6  numpy-1.24.0-cp39-cp39-win32.whl
    acd5a4737d1094d5f40afa584dbd6d79  numpy-1.24.0-cp39-cp39-win_amd64.whl
    26e32f942c9fd62f64fd9bf6df95b5b1  numpy-1.24.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
    4f027df0cc313ca626b106849999de13  numpy-1.24.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    ac58db9a90d0bec95bc7850b9e462f34  numpy-1.24.0-pp38-pypy38_pp73-win_amd64.whl
    1ca41c84ad9a116402a025d21e35bc64  numpy-1.24.0.tar.gz
    
    SHA256
    6e73a1f4f5b74a42abb55bc2b3d869f1b38cbc8776da5f8b66bf110284f7a437  numpy-1.24.0-cp310-cp310-macosx_10_9_x86_64.whl
    9387c7d6d50e8f8c31e7bfc034241e9c6f4b3eb5db8d118d6487047b922f82af  numpy-1.24.0-cp310-cp310-macosx_11_0_arm64.whl
    7ad6a024a32ee61d18f5b402cd02e9c0e22c0fb9dc23751991b3a16d209d972e  numpy-1.24.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    73cf2c5b5a07450f20a0c8e04d9955491970177dce8df8d6903bf253e53268e0  numpy-1.24.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    cec79ff3984b2d1d103183fc4a3361f5b55bbb66cb395cbf5a920a4bb1fd588d  numpy-1.24.0-cp310-cp310-win32.whl
    4f5e78b8b710cd7cd1a8145994cfffc6ddd5911669a437777d8cedfce6c83a98  numpy-1.24.0-cp310-cp310-win_amd64.whl
    4445f472b246cad6514cc09fbb5ecb7aab09ca2acc3c16f29f8dca6c468af501  numpy-1.24.0-cp311-cp311-macosx_10_9_x86_64.whl
    ec3e5e8172a0a6a4f3c2e7423d4a8434c41349141b04744b11a90e017a95bad5  numpy-1.24.0-cp311-cp311-macosx_11_0_arm64.whl
    f9168790149f917ad8e3cf5047b353fefef753bd50b07c547da0bdf30bc15d91  numpy-1.24.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    ada6c1e9608ceadaf7020e1deea508b73ace85560a16f51bef26aecb93626a72  numpy-1.24.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    f3c4a9a9f92734a4728ddbd331e0124eabbc968a0359a506e8e74a9b0d2d419b  numpy-1.24.0-cp311-cp311-win32.whl
    90075ef2c6ac6397d0035bcd8b298b26e481a7035f7a3f382c047eb9c3414db0  numpy-1.24.0-cp311-cp311-win_amd64.whl
    0885d9a7666cafe5f9876c57bfee34226e2b2847bfb94c9505e18d81011e5401  numpy-1.24.0-cp38-cp38-macosx_10_9_x86_64.whl
    e63d2157f9fc98cc178870db83b0e0c85acdadd598b134b00ebec9e0db57a01f  numpy-1.24.0-cp38-cp38-macosx_11_0_arm64.whl
    cf8960f72997e56781eb1c2ea256a70124f92a543b384f89e5fb3503a308b1d3  numpy-1.24.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    2f8e0df2ecc1928ef7256f18e309c9d6229b08b5be859163f5caa59c93d53646  numpy-1.24.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    fe44e925c68fb5e8db1334bf30ac1a1b6b963b932a19cf41d2e899cf02f36aab  numpy-1.24.0-cp38-cp38-win32.whl
    d7f223554aba7280e6057727333ed357b71b7da7422d02ff5e91b857888c25d1  numpy-1.24.0-cp38-cp38-win_amd64.whl
    ab11f6a7602cf8ea4c093e091938207de3068c5693a0520168ecf4395750f7ea  numpy-1.24.0-cp39-cp39-macosx_10_9_x86_64.whl
    12bba5561d8118981f2f1ff069ecae200c05d7b6c78a5cdac0911f74bc71cbd1  numpy-1.24.0-cp39-cp39-macosx_11_0_arm64.whl
    9af91f794d2d3007d91d749ebc955302889261db514eb24caef30e03e8ec1e41  numpy-1.24.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    8b1ddfac6a82d4f3c8e99436c90b9c2c68c0bb14658d1684cdd00f05fab241f5  numpy-1.24.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    ac4fe68f1a5a18136acebd4eff91aab8bed00d1ef2fdb34b5d9192297ffbbdfc  numpy-1.24.0-cp39-cp39-win32.whl
    667b5b1f6a352419e340f6475ef9930348ae5cb7fca15f2cc3afcb530823715e  numpy-1.24.0-cp39-cp39-win_amd64.whl
    4d01f7832fa319a36fd75ba10ea4027c9338ede875792f7bf617f4b45056fc3a  numpy-1.24.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
    dbb0490f0a880700a6cc4d000384baf19c1f4df59fff158d9482d4dbbca2b239  numpy-1.24.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    0104d8adaa3a4cc60c2777cab5196593bf8a7f416eda133be1f3803dd0838886  numpy-1.24.0-pp38-pypy38_pp73-win_amd64.whl
    c4ab7c9711fe6b235e86487ca74c1b092a6dd59a3cb45b63241ea0a148501853  numpy-1.24.0.tar.gz
    

    Configuration

    📅 Schedule: Branch creation - "after 9am and before 1pm every weekday" (UTC), Automerge - At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, check this box

    This PR has been generated by Mend Renovate. View repository job log here.

    opened by renovate[bot] 1
  • fix: Issue#915, Error for large integers in Series

    fix: Issue#915, Error for large integers in Series

    This patch fixes issue#915: Error for series with large integers.

    The issue is caused when using the numpy.histogram function with large integers which causes unevenly spaced bin edges. here:

    https://github.com/ydataai/pandas-profiling/blob/5b1abac48ed9ed5a9e7e662be30c913acc3e7a5b/src/pandas_profiling/model/summary_algorithms.py#L39

    and here:

    https://github.com/ydataai/pandas-profiling/blob/5b1abac48ed9ed5a9e7e662be30c913acc3e7a5b/src/pandas_profiling/model/summary_algorithms.py#L52

    This can cause the resulting histogram to be distorted or misleading, as the bin sizes may not be uniform.

    To resolve this issue, I used the numpy.histogram_bin_edges function to compute the bin edges for the data before passing them to the numpy.histogram function. This function allows to specify the number of bins and the range of the data, and will compute the bin edges in a way that ensures they are evenly spaced. This fix does not raise an error as reported in the bug report and successfully generates a report. I have also included a test_issue915.py for testing the generation of the report.

    opened by Sohaib90 0
  • Feature Request - Override templates of the html flavour

    Feature Request - Override templates of the html flavour

    Missing functionality

    Override templates of the html flavour.

    Proposed feature

    Allow overriding (some) templates in src/pandas_profiling/report/presentation/flavours/html/templates/ to personalize pdp.

    Alternatives considered

    I monkey patched pdp to support overriding templates, e.g. to change the layout a bit as jinja2 supports this, but this isn't a clean way to do it.

    from pandas_profiling.report.presentation.flavours.html import templates
    from pandas_profiling.report.formatters import fmt, fmt_badge, fmt_numeric, fmt_percent
    import jinja2
    from jinja2 import ChoiceLoader, FileSystemLoader
    
    templates.package_loader = ChoiceLoader([
        FileSystemLoader(some_path),
        templates.package_loader,
    ])
    
    templates.jinja2_env = jinja2.Environment(
        lstrip_blocks=True, trim_blocks=True, loader=templates.package_loader
    )
    templates.jinja2_env.filters["is_list"] = lambda x: isinstance(x, list)
    templates.jinja2_env.filters["fmt_badge"] = fmt_badge
    templates.jinja2_env.filters["fmt_percent"] = fmt_percent
    templates.jinja2_env.filters["fmt_numeric"] = fmt_numeric
    templates.jinja2_env.filters["fmt"] = fmt
    

    Additional context

    No response

    feature request 💬 
    opened by prhbrt 0
  • Interaction plots for time series data

    Interaction plots for time series data

    Missing functionality

    Interaction plots for numeric time series variables.

    Proposed feature

    Calculate interaction plots for both numeric and numeric time series variables. Is there a setting to enable this?

    Alternatives considered

    I considered setting tsmode=False, but then I loose the autocorrelation plots.

    needs-triage 
    opened by MauritsDescamps 0
  • Bug Report: KeyError: 'max_length' when comparing two profile_report (`minimal=True` is used to generate these report)

    Bug Report: KeyError: 'max_length' when comparing two profile_report (`minimal=True` is used to generate these report)

    Current Behaviour

    There is an error message:

    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /var/folders/60/6qphmx_d7x7_11vpj8524vf40000gn/T/ipykernel_17862/709405443.py in <module>
          7 
          8 # Save report to file
    ----> 9 comparison_report.to_file("comparison.html")
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in to_file(self, output_file, silent)
        307                 create_html_assets(self.config, output_file)
        308 
    --> 309             data = self.to_html()
        310 
        311             if output_file.suffix != ".html":
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in to_html(self)
        418 
        419         """
    --> 420         return self.html
        421 
        422     def to_json(self) -> str:
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in html(self)
        229     def html(self) -> str:
        230         if self._html is None:
    --> 231             self._html = self._render_html()
        232         return self._html
        233 
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in _render_html(self)
        337         from pandas_profiling.report.presentation.flavours import HTMLReport
        338 
    --> 339         report = self.report
        340 
        341         with tqdm(
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
       1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
       1032         check_argument_types(memo)
    -> 1033         retval = func(*args, **kwargs)
       1034         try:
       1035             check_return_type(retval, memo)
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in report(self)
        223     def report(self) -> Root:
        224         if self._report is None:
    --> 225             self._report = get_report_structure(self.config, self.description_set)
        226         return self._report
        227 
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/report.py in get_report_structure(config, summary)
        376                     items=list(summary["variables"]),
        377                     item=Container(
    --> 378                         render_variables_section(config, summary),
        379                         sequence_type="accordion",
        380                         name="Variables",
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/report.py in render_variables_section(config, dataframe_summary)
        157             variable_type = summary["type"]
        158         render_map_type = render_map.get(variable_type, render_map["Unsupported"])
    --> 159         template_variables.update(render_map_type(config, template_variables))
        160 
        161         # Ignore these
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/variables/render_categorical.py in render_categorical(config, summary)
        405 
        406     if length:
    --> 407         length_table, length_histo = render_categorical_length(config, summary, varid)
        408         overview_items.append(length_table)
        409 
    
    ~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/variables/render_categorical.py in render_categorical_length(config, summary, varid)
         61             {
         62                 "name": "Max length",
    ---> 63                 "value": fmt_number(summary["max_length"]),
         64                 "alert": False,
         65             },
    
    KeyError: 'max_length'
    

    Expected Behaviour

    Run without error

    Data Description

    I'm runing the code for dataset comparison. The original code in that link works well. But when I set minimal=True to creat report, then compare the report, there comes a error

    Code that reproduces the bug

    from pandas_profiling import ProfileReport
    
    train_df = pd.read_csv("train.csv")
    train_report = ProfileReport(train_df, title="Train", minimal=True)
    
    test_df = pd.read_csv("test.csv")
    test_report = ProfileReport(test_df, title="Test", minimal=True)
    
    comparison_report = train_report.compare(test_report)
    comparison_report.to_file("comparison.html")
    

    pandas-profiling version

    v3.5.0

    Dependencies

    pandas                       1.4.2
    pandas-profiling             3.5.0
    

    OS

    Mac

    Checklist

    • [X] There is not yet another bug report for this issue in the issue tracker
    • [X] The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
    • [X] The issue has not been resolved by the entries listed under Common Issues.
    bug 🐛 
    opened by xiaoye-hua 0
Releases(v3.6.2)
  • v3.6.2(Jan 2, 2023)

  • v3.6.1(Dec 23, 2022)

  • v3.6.0(Dec 21, 2022)

    3.6.0 (2022-12-21)

    Bug Fixes

    • add css to cope with large tables (7f42f87)
    • adjust categoricals layout (f0bb45a)
    • categorical data not being obscured in the common values plot (40236bc)
    • compare report ignoring config parameter (3d60556)
    • compare report warnings always showing the last alert type (6b3c13d)
    • comparison fails when duplicates are disable (#1208) (6d19620)
    • do no raise exception for percentage formatter (3ea626d)
    • enforce recomputation of description sets (a9fd1c8)
    • error comparing only one precomputed profile (00646cd)
    • html: sensible cloud-platform notebook html rendering (b22ece2)
    • ignoring config of precomputed reports (6478c40)
    • only compute auto correlation when no config is specified (d5d4f58)
    • remove malfunctioning hook (e2593f5)
    • remove unused test (2170338)
    • return the proper type for widgets (4c0b358)
    • set compute default to false (c70e491)
    • solve mypy error (9c4266e)
    • solve mypy issue (e3e7788)
    • uses colors from the specified config (c0c556d)
    • utils: use 'urllib.request' instead of 'requests' (#1177) (e4d020b), closes #1168

    Features

    • add heatmap values as a table under correlations (fc5da9e)
    • allow to specify the configuration for the comparison report (ad725b0)
    • design improvements on the correlations section (e5cd8cf)
    • implement imbalanced warning (ce84c81)
    • update variables layout (#1207) (cf0e0a7)
    Source code(tar.gz)
    Source code(zip)
  • v3.5.0(Nov 22, 2022)

    3.5.0 (2022-11-22)

    Bug Fixes

    Features

    Source code(tar.gz)
    Source code(zip)
  • v3.4.0(Oct 20, 2022)

    3.4.0 (2022-10-20)

    Bug Fixes

    Features

    Source code(tar.gz)
    Source code(zip)
  • v3.3.0(Sep 7, 2022)

  • v3.2.0(May 2, 2022)

  • v3.1.0(Sep 27, 2021)

  • v3.0.0(May 11, 2021)

  • v2.13.0(May 8, 2021)

  • v2.12.0(May 5, 2021)

  • v2.11.0(Feb 20, 2021)

  • v2.10.1(Feb 7, 2021)

  • v2.10.0rc1(Jan 5, 2021)

  • v2.9.0(Sep 3, 2020)

  • v2.9.0rc1(Jul 12, 2020)

    This release candidate improves handling of sensitive data and futhermore reduces technical debt with various fixes. The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.

    A warm thank you to everyone who has contributed to this release: @gauravkumar37 @Jooong @smaranjitghose @XavierBanos Tam Nguyen @andycraig @mgorsk1 @mbh86 @MHUNCHO @GaelVaroquaux @AmauryLepicard @baluyotraf @pvojnisek @abegong

    Source code(tar.gz)
    Source code(zip)
  • v2.8.0(May 12, 2020)

    pandas-profiling now has build-in supports for Files and Images, such as extracting file sizes, creation dates and dimensions and scanning for truncated images or those containing EXIF information. Moreover, the text analysis features have also been reworked, providing more informative statistics.

    Read the changelog v2.8.0 for more details.

    Contributors: @loopyme @Bradley-Butcher @willemhendriks, @IscaAy, @frellnick, @dataverz @ieaves

    Source code(tar.gz)
    Source code(zip)
  • v2.7.1(May 11, 2020)

  • v2.7.0(May 7, 2020)

    Announcement and changelog are available in the documentation.

    We are grateful for @loopyme and @kyleYang for creating parts of the features on this release.

    Thanks for all contributors that made this release possible @1313e @dataprofessor @neomatrix369 @jiangfangfangxm @WesleyTheGeolien @NickYi1990 @ricgu8086.

    Source code(tar.gz)
    Source code(zip)
  • v2.6.0(Apr 13, 2020)

    Dependency policy

    The current dependency policy is suboptimal. Pinning the dependencies is great for reproducibility (high guarantee to work), but on the downside requires frequent maintenance and introduces compatibility issues with other packages. Therefore, we are moving away from pinning dependencies and instead specify a minimum version.

    Pandas v1

    Early releases of pandas v1 demonstrated many regressions that broke functionality (as acknowledged by the authors here). At this point, pandas is more stable and we notice high demand for compatibility. We move on to support pandas' latest versions. To ensure compatibility with both versions, we have extended the test matrix to test against both pandas 0.x.y and 1.x.y.

    Python 3.6+ features

    Python 3.6 introduces ordered dicts and f-strings, which we now rely on. This means that from pandas-profiling 2.6, you should minimally run Python 3.6. For users that for some reason cannot update, you can use pandas-profiling 2.5.0, but you unfortunately won't benefit from updates or maintenance.

    Extended continuous integration

    Starting from this release, we use Github Actions and Travis CI combined to increase maintainability. Travis CI handles the testing, Github Actions automates part of the development process by running black and building the docs.

    Source code(tar.gz)
    Source code(zip)
  • v2.5.0(Feb 14, 2020)

    • Progress bar added (#224)
    • Character analysis for Text/NLP (#278)
    • Themes: configuration and demo's (Orange, Dark)
    • Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.
    • Toggle descriptions at correlations.

    Deprecation:

    • This is the last version to support Python 3.5.

    Stability:

    • The order of columns changed when sort="None" (#377, fixed).
    • Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)
    • Improved mixed type detection (#351)
    • Refactor of report structures.
    • Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).
    • Distinct counts exclude NaNs.
    • Fixed alerts in notebooks.

    Other improvements:

    • Warnings are now sorted.
    • Links to Binder and Google Colab are added for notebooks (#349)
    • The overview section is tabbed.
    • Commit for pandas-profiling v2.5.0
    • Progress bar added (#224)
    • Character analysis for Text/NLP (#278)
    • Themes: configuration and demo's (Orange, Dark)
    • Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.
    • Toggle descriptions at correlations.

    Deprecation:

    • This is the last version to support Python 3.5.

    Stability:

    • The order of columns changed when sort="None" (#377, fixed).
    • Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)
    • Improved mixed type detection (#351)
    • Refactor of report structures.
    • Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).
    • Distinct counts exclude NaNs.
    • Fixed alerts in notebooks.

    Other improvements:

    • Warnings are now sorted.
    • Links to Binder and Google Colab are added for notebooks (#349)
    • The overview section is tabbed.
    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Jan 8, 2020)

    The v2.4.0 release decouples the data structure of reports from the actual rendering. It's now much simpler to change the user interface, whether the user is in a jupyter notebook, webpage, native application or just wants a json view of the data.

    We are also proud to announce that we are accepted for the GitHub Sponsor programme. You are cordially invited to support me through this programme, because you want to see me continue working on this project and to boost community funding, GitHub will match your contribution!

    Other improvements:

    • extended configuration with better defaults, including minimal mode for big data (#258, #310)
    • more example datasets
    • rejection of highly correlated variables is generalized (#284, #299)
    • many structural and stability improvements (#254, #274, #239)

    Special thanks to @marco-cardoso @ajupton @lvwerra @gliptak @neomatrix369 for their contributions.

    Source code(tar.gz)
    Source code(zip)
  • v2.3.0(Jul 27, 2019)

    • (Experimental) Support for "path" type
    • Fix numeric precision (#225)
    • Force labels in missing values diagram for large number of columns (#222)
    • Add pull request template
    • Add Census Dataset from the UCI ML Repository

    Thanks @bensdm and @huaiweicheng for your valuable contributions to this version!

    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Jul 22, 2019)

    New release introducing variable size binning (via astropy), PyCharm integration and various fixes and optimizations.

    • Added Variable bin sizing via Bayesian Boxing (feature request [#216])
    • PyCharm integration, console attempts to detect file type.
    • Fixed bug [#215].
    • Updated the missingno package to 0.4.2, fixing the font size in the bar diagram.
    • Various optimizations

    Thanks to: @Utsav37 @mansenfranzen @jakevdp

    Source code(tar.gz)
    Source code(zip)
  • v2.1.2(Jul 11, 2019)

  • v2.1.1(Jul 11, 2019)

  • v2.1.0(Jul 6, 2019)

    The pandas-profiling release version 2.1.0 includes:

    • Correlations: correlation calculations are now more fault tolerant ([#51] and [#197]), correlation names in the report are clarified.
    • Jupyter Notebook: rendering a profiling report is done inside the srcdoc attribute (which fixes [#199]), a full-width option is added and the column layout is improved.
    • User experience: The table styling and sample section formatting is improved.
    • Warnings: detection added for categorical variable that is suspected to be of the datetime type.
    • Documentation and community:
      • The Contribution page helps users that want to contribute.
      • Typo's fixed [#195], Thank you @abhilashshakti
      • Added more examples.
    • Other bugfixes and improvements:
      • Add version information to console interface.
      • Fix: Remove one-time used logger [#202]
      • Fix: Dealing with string indices [#200]

    Contributors: @abhilashshakti @adamrossnelson @manycoding @InsciteAnalytics

    Source code(tar.gz)
    Source code(zip)
  • v2.0.3(Jun 23, 2019)

  • v2.0.2(Jun 22, 2019)

    Revised version structure, fixed recursion preventing installation of dependencies ([#184]).

    The setup.py file used to include utils from the package prior to installation. This causes errors when the dependencies are not yet present.

    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 21, 2019)

Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

null 14 Jun 22, 2022
Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

Brady Law 2 Dec 1, 2021
Monitor the stability of a pandas or spark dataframe ⚙︎

Population Shift Monitoring popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets.

ING Bank 403 Dec 7, 2022
Pandas and Spark DataFrame comparison for humans

DataComPy DataComPy is a package to compare two Pandas DataFrames. Originally started to be something of a replacement for SAS's PROC COMPARE for Pand

Capital One 259 Dec 24, 2022
Important dataframe statistics with a single command

quick_eda Receiving dataframe statistics with one command Project description A python package for Data Scientists, Students, ML Engineers and anyone

Sven Eschlbeck 2 Dec 19, 2021
Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

Tirthajyoti Sarkar 249 Jan 8, 2023
A data structure that extends pyspark.sql.DataFrame with metadata information.

MetaFrame A data structure that extends pyspark.sql.DataFrame with metadata info

Invent Analytics 8 Feb 15, 2022
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Jan 4, 2023
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 5, 2023
An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

Mourad 450 Dec 30, 2022
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

weightedcalcs weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. Features Plays we

Jeremy Singer-Vine 98 Dec 31, 2022
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Coiled 102 Nov 10, 2022
Pandas and Dask test helper methods with beautiful error messages.

beavis Pandas and Dask test helper methods with beautiful error messages. test helpers These test helper methods are meant to be used in test suites.

Matthew Powers 18 Nov 28, 2022
Conduits - A Declarative Pipelining Tool For Pandas

Conduits - A Declarative Pipelining Tool For Pandas Traditional tools for declaring pipelines in Python suck. They are mostly imperative, and can some

Kale Miller 7 Nov 21, 2021
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

null 5 Sep 6, 2021
A data analysis using python and pandas to showcase trends in school performance.

A data analysis using python and pandas to showcase trends in school performance. A data analysis to showcase trends in school performance using Panda

Jimmy Faccioli 0 Sep 7, 2021
A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

The leading use-case for the staircase package is for the creation and analysis of step functions. Pretty exciting huh. But don't hit the close button

null 48 Dec 21, 2022
A crude Hy handle on Pandas library

Quickstart Hyenas is a curde Hy handle written on top of Pandas API to allow for more elegant access to data-scientist's powerhouse that is Pandas. In

Peter Výboch 4 Sep 5, 2022