Create HTML profiling reports from pandas DataFrame objects

Last update: Jan 1, 2023

Related tags

Data Visualization python data-science machine-learning statistics deep-learning jupyter pandas-dataframe exploratory-data-analysis jupyter-notebook eda pandas artificial-intelligence exploration data-analysis html-report data-exploration pandas-profiling data-quality data-profiling big-data-analytics

Overview

Pandas Profiling

Documentation | Slack | Stack Overflow

Generates profile reports from a pandas DataFrame.

The pandas df.describe() function is great but a little basic for serious exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

Type inference: detect the types of columns in a dataframe.
Essentials: type, unique values, missing values
Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
Most frequent values
Histogram
Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
Missing values matrix, count, heatmap and dendrogram of missing values
Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.

Announcements

Version v2.10.1 released: containing stability fixes for the previous release, which included a major overhaul of the type system, now fully reliant on visions. See the changelog below to know what has changed.

Spark backend in progress: We can happily announce that we're nearing v1 for the Spark backend for generating profile reports. Stay tuned.

Support `pandas-profiling`

The development of pandas-profiling relies completely on contributions. If you find value in the package, we welcome you to support the project directly through GitHub Sponsors! Please help me to continue to support this package. It's extra exciting that GitHub matches your contribution for the first year.

Find more information here:

February 7, 2021 💘

Examples

The following examples can give you an impression of what the package can do:

Census Income (US Adult Census data relating income)
NASA Meteorites (comprehensive set of meteorite landings)
Titanic (the "Wonderwall" of datasets)
NZA (open data from the Dutch Healthcare Authority)
Stata Auto (1978 Automobile data)
Vektis (Vektis Dutch Healthcare data)
Colors (a simple colors dataset)
UCI Bank Dataset (banking marketing dataset)

Specific features:

Russian Vocabulary (demonstrates text analysis)
Cats and Dogs (demonstrates image analysis from the file system)
Celebrity Faces (demonstrates image analysis with EXIF information)
Website Inaccessibility (demonstrates URL analysis)
Orange prices and Coal prices (showcases report themes)

Tutorials:

Tutorial: report structure using Kaggle data (advanced) (modify the report's structure)

Installation

Using pip

You can install using the pip package manager by running

pip install pandas-profiling[notebook]

Alternatively, you could install the latest version directly from Github:

pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Using conda

You can install using the conda package manager by running

conda install -c conda-forge pandas-profiling

From source

Download the source code by cloning the repository or by pressing 'Download ZIP' on this page.

Install by navigating to the proper directory and running:

python setup.py install

Documentation

The documentation for pandas_profiling can be found here. Previous documentation is still available here.

Getting started

Start by loading in your pandas DataFrame, e.g. by using:

import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=["a", "b", "c", "d", "e"]
)

To generate the report, run:

profile = ProfileReport(df, title="Pandas Profiling Report")

Explore deeper

You can configure the profile report in any way you like. The example code below loads the explorative configuration file, that includes many features for text (length distribution, unicode information), files (file size, creation time) and images (dimensions, exif information). If you are interested what exact settings were used, you can compare with the default configuration file.

profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)

Learn more about configuring pandas-profiling on the Advanced usage page.

Jupyter Notebook

We recommend generating reports interactively by using the Jupyter notebook. There are two interfaces (see animations below): through widgets and through a HTML report.

This is achieved by simply displaying the report. In the Jupyter Notebook, run:

profile.to_widgets()

The HTML report can be included in a Jupyter notebook:

Run the following code:

profile.to_notebook_iframe()

Saving the report

If you want to generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")

Alternatively, you can obtain the data as JSON:

# As a string
json_data = profile.to_json()

# As a file
profile.to_file("your_report.json")

Large datasets

Version 2.4 introduces minimal mode.

This is a default configuration that disables expensive computations (such as correlations and dynamic binning).

Use the following syntax:

profile = ProfileReport(large_dataset, minimal=True)
profile.to_file("output.html")

Command line usage

For standard formatted CSV files that can be read immediately by pandas, you can use the pandas_profiling executable.

Run the following for information about options and arguments.

pandas_profiling -h

Advanced usage

A set of options is available in order to adapt the report generated.

title (str): Title for the report ('Pandas Profiling Report' by default).
pool_size (int): Number of workers in thread pool. When set to zero, it is set to the number of CPUs available (0 by default).
progress_bar (bool): If True, pandas-profiling will display a progress bar.
infer_dtypes (bool): When True (default) the dtype of variables are inferred using visions using the typeset logic (for instance a column that has integers stored as string will be analyzed as if being numeric).

More settings can be found in the default configuration file, minimal configuration file and dark themed configuration file.

You find the configuration docs on the advanced usage page here

Example

profile = df.profile_report(title='Pandas Profiling Report', plot={'histogram': {'bins': 8}})
profile.to_file("output.html")

Supporting open source

Maintaining and developing the open-source code for pandas-profiling, with millions of downloads and thousands of users, would not be possible without support of our gracious sponsors.

Lambda workstations, servers, laptops, and cloud services power engineers and researchers at Fortune 500 companies and 94% of the top 50 universities. Lambda Cloud offers 4 & 8 GPU instances starting at $1.50 / hr. Pre-installed with TensorFlow, PyTorch, Ubuntu, CUDA, and cuDNN.

We would like to thank our generous Github Sponsors supporters who make pandas-profiling possible:

Martin Sotir, Joseph Yuen, Brian Lee, Stephanie Rivera, nscsekhar, abdulAziz

More info if you would like to appear here: Github Sponsor page

Types

Types are a powerful abstraction for effective data analysis, that goes beyond the logical data types (integer, float etc.). pandas-profiling currently recognizes the following types: Boolean, Numerical, Date, Categorical, URL, Path, File and Image.

We have developed a type system for Python, tailored for data analysis: visions. Selecting the right typeset drastically reduces the complexity the code of your analysis. Future versions of pandas-profiling will have extended type support through visions!

Contributing

Read on getting involved in the Contribution Guide.

A low threshold place to ask questions or start contributing is by reaching out on the pandas-profiling Slack. Join the Slack community.

Editor integration

PyCharm integration

Install pandas-profiling via the instructions above

Locate your pandas-profiling executable.

On macOS / Linux / BSD:

$ which pandas_profiling
(example) /usr/local/bin/pandas_profiling

On Windows:

$ where pandas_profiling
(example) C:\ProgramData\Anaconda3\Scripts\pandas_profiling.exe

In PyCharm, go to Settings (or Preferences on macOS) > Tools > External tools
Click the + icon to add a new external tool
Insert the following values
- Name: Pandas Profiling
- Program: The location obtained in step 2
- Arguments: "$FilePath$" "$FileDir$/$FileNameWithoutAllExtensions$_report.html"
- Working Directory: $ProjectFileDir$

To use the PyCharm Integration, right click on any dataset file:

External Tools > Pandas Profiling.

Other integrations

Other editor integrations may be contributed via pull requests.

Dependencies

The profile report is written in HTML and CSS, which means pandas-profiling requires a modern browser.

You need Python 3 to run this package. Other dependencies can be found in the requirements files:

Filename	Requirements
requirements.txt	Package requirements
requirements-dev.txt	Requirements for development
requirements-test.txt	Requirements for testing
setup.py	Requirements for Widgets etc.

Comments

pandas-profiling not compatible with pandas v1.0
Describe the bug

pandas-profiling not compatible with pandas v1.0. The key method "ProfileReport" returns error "TypeError: concat() got an unexpected keyword argument 'join_axes'" as join_axes is deprecated starting from Pandas v1.0. https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html?highlight=concat

To Reproduce

import pandas as pd import pandas_profiling

def test_issueXXX(): df = pd.read_csv(r'')

pf = pandas.profiling.ProfileReport(df)

TypeError: concat() got an unexpected keyword argument 'join_axes'

Version information:

Python version: 3.7.

Environment: Command Line and Pycharm

pandas-profiling: 1.4.1

pandas: 1.0

bug 🐛
opened by mantou16 27
AttributeError: 'DataFrame' object has no attribute 'profile_report'
Describe the bug Running the example in readme generates an error.

To Reproduce Running:

import numpy as np import pandas as pd import pandas_profiling df = pd.DataFrame( np.random.rand(100, 5), columns=['a', 'b', 'c', 'd', 'e'] ) df.profile_report()

in a Jupyter notebook gives:

--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-16-f9a7584e785c> in <module> ----> 1 df.profile_report() ~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object.__getattribute__(self, name) 5068 5069 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'profile_report'

Version information: alabaster==0.7.12 anaconda-client==1.7.2 anaconda-navigator==1.9.7 anaconda-project==0.8.2 asn1crypto==0.24.0 astroid==2.2.5 astropy==3.1.2 atomicwrites==1.3.0 attrs==19.1.0 Babel==2.6.0 backcall==0.1.0 backports.os==0.1.1 backports.shutil-get-terminal-size==1.0.0 beautifulsoup4==4.7.1 bitarray==0.8.3 bkcharts==0.2 bleach==3.1.0 bokeh==1.0.4 boto==2.49.0 Bottleneck==1.2.1 certifi==2019.3.9 cffi==1.12.2 chardet==3.0.4 Click==7.0 cloudpickle==0.8.0 clyent==1.2.2 colorama==0.4.1 conda==4.6.14 conda-build==3.17.8 conda-verify==3.1.1 confuse==1.0.0 contextlib2==0.5.5 cryptography==2.6.1 cycler==0.10.0 Cython==0.29.6 cytoolz==0.9.0.1 dask==1.1.4 decorator==4.4.0 defusedxml==0.5.0 distributed==1.26.0 docutils==0.14 entrypoints==0.3 et-xmlfile==1.0.1 fastcache==1.0.2 filelock==3.0.10 Flask==1.0.2 future==0.17.1 gevent==1.4.0 glob2==0.6 gmpy2==2.0.8 greenlet==0.4.15 h5py==2.9.0 heapdict==1.0.0 hpat==0.28.1 html5lib==1.0.1 htmlmin==0.1.12 idna==2.8 imageio==2.5.0 imagesize==1.1.0 importlib-metadata==0.0.0 ipykernel==5.1.0 ipyparallel==6.2.4 ipython==7.4.0 ipython-genutils==0.2.0 ipywidgets==7.4.2 isort==4.3.16 itsdangerous==1.1.0 jdcal==1.4 jedi==0.13.3 jeepney==0.4 Jinja2==2.10 jsonschema==3.0.1 jupyter==1.0.0 jupyter-client==5.2.4 jupyter-console==6.0.0 jupyter-core==4.4.0 jupyterlab==0.35.4 jupyterlab-server==0.2.0 keyring==18.0.0 kiwisolver==1.0.1 lazy-object-proxy==1.3.1 libarchive-c==2.8 lief==0.9.0 lightgbm==2.2.3 llvmlite==0.28.0 locket==0.2.0 lxml==4.3.2 MarkupSafe==1.1.1 matplotlib==3.0.3 mccabe==0.6.1 missingno==0.4.1 mistune==0.8.4 mkl-fft==1.0.10 mkl-random==1.0.2 mock==2.0.0 more-itertools==6.0.0 mpi4py==3.0.1 mpmath==1.1.0 msgpack==0.6.1 multipledispatch==0.6.0 navigator-updater==0.2.1 nbconvert==5.4.1 nbformat==4.4.0 networkx==2.2 nltk==3.4 nose==1.3.7 notebook==5.7.8 numba==0.43.1 numerapi==1.5.1 numerox==3.7.0 numexpr==2.6.9 numpy==1.16.2 numpydoc==0.8.0 olefile==0.46 openpyxl==2.6.1 packaging==19.0 pandas==0.24.2 pandas-profiling==1.4.1 pandocfilters==1.4.2 parso==0.3.4 partd==0.3.10 path.py==11.5.0 pathlib2==2.3.3 patsy==0.5.1 pbr==5.2.0 pep8==1.7.1 pexpect==4.6.0 phik==0.9.8 pickleshare==0.7.5 Pillow==5.4.1 pkginfo==1.5.0.1 plotly==3.8.1 pluggy==0.9.0 ply==3.11 prometheus-client==0.6.0 prompt-toolkit==2.0.9 psutil==5.6.1 ptyprocess==0.6.0 py==1.8.0 pyarrow==0.11.1 pycodestyle==2.5.0 pycosat==0.6.3 pycparser==2.19 pycrypto==2.6.1 pycurl==7.43.0.2 pyflakes==2.1.1 Pygments==2.3.1 pylint==2.3.1 pyodbc==4.0.26 pyOpenSSL==19.0.0 pyparsing==2.3.1 pyrsistent==0.14.11 PySocks==1.6.8 pytest==4.3.1 pytest-arraydiff==0.3 pytest-astropy==0.5.0 pytest-doctestplus==0.3.0 pytest-openfiles==0.3.2 pytest-pylint==0.14.0 pytest-remotedata==0.3.1 python-dateutil==2.8.0 python-igraph==0.7.1.post6 pytz==2018.9 PyWavelets==1.0.2 PyYAML==5.1 pyzmq==18.0.0 QtAwesome==0.5.7 qtconsole==4.4.3 QtPy==1.7.0 requests==2.21.0 retrying==1.3.3 rope==0.12.0 ruamel-yaml==0.15.46 scikit-image==0.14.2 scikit-learn==0.20.3 scipy==1.2.1 seaborn==0.9.0 SecretStorage==3.1.1 Send2Trash==1.5.0 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.12.0 snowballstemmer==1.2.1 sortedcollections==1.1.2 sortedcontainers==2.1.0 soupsieve==1.8 Sphinx==1.8.5 sphinxcontrib-websupport==1.1.0 spyder==3.3.3 spyder-kernels==0.4.2 SQLAlchemy==1.3.1 statsmodels==0.9.0 sympy==1.3 tables==3.5.1 tblib==1.3.2 terminado==0.8.1 testpath==0.4.2 toolz==0.9.0 tornado==6.0.2 tqdm==4.31.1 traitlets==4.3.2 typed-ast==1.4.0 unicodecsv==0.14.1 urllib3==1.24.1 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==0.14.1 widgetsnbextension==3.4.2 wrapt==1.11.1 wurlitzer==1.0.2 xlrd==1.2.0 XlsxWriter==1.1.5 xlwt==1.3.0 zict==0.1.4 zipp==0.3.3

Additional context Add any other context about the problem here.
bug 🐛
opened by bdch1234 22
Ploting a response variable on the histograms

Hey,

Great job with pandas-profiling I love it. I think it would be great to have an extra parameter to specify a response column. Plotting the average response for every bin of the histograms (for each variables) would allow to see obvious trends/correlations and would be useful for any regression problem (might be more tricky for classification where the response are discrete). What do you think ?

Thanks!
feature request 💬

opened by Optimox 17
feat: added filter to locate columns

This is a follow-up PR to the PR made earlier (#1096). Closes #638 Have changed the input from an text field to a dropdown as per @fabclmnt's suggestion.

Here's how it looks and works now:

https://user-images.githubusercontent.com/57868024/194428807-a7642deb-6ba5-4404-95ef-3e9605ba10cd.mp4

The dropdown isn't visible due to restrictions on the screen-recorder, here's an image of it in action for reference.

P.S. I'm sorry for the hassle in the previous PR, I haven't worked with git very much. Thank you for your patience.

opened by g-kabra 16

Potential incompatiblity with Pandas 1.4.0

Describe the bug

Pandas version 1.4.0 was release few days ago and some tests start failing. I was able to reproduce with a minimum example which is failing with Pandas 1.4.0 and working with Pandas 1.3.5.

To Reproduce

import pandas as pd
import pandas_profiling

data = {"col1": [1, 2], "col2": [3, 4]}
dataframe = pd.DataFrame(data=data)

profile = pandas_profiling.ProfileReport(dataframe, minimal=False)
profile.to_html()

When running with Pandas 1.4.0, I get the following traceback:

Traceback (most recent call last):
  File "/tmp/bug.py", line 8, in <module>
    profile.to_html()
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 368, in to_html
    return self.html
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 185, in html
    self._html = self._render_html()
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 287, in _render_html
    report = self.report
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 179, in report
    self._report = get_report_structure(self.config, self.description_set)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 161, in description_set
    self._description_set = describe_df(
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/describe.py", line 71, in describe
    series_description = get_series_descriptions(
  File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
    return func(*args, **kwargs)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 92, in pandas_get_series_descriptions
    for i, (column, description) in enumerate(
  File "/home/lothiraldan/.pyenv/versions/3.9.1/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
  File "/home/lothiraldan/.pyenv/versions/3.9.1/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 72, in multiprocess_1d
    return column, describe_1d(config, series, summarizer, typeset)
  File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
    return func(*args, **kwargs)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 50, in pandas_describe_1d
    return summarizer.summarize(config, series, dtype=vtype)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summarizer.py", line 37, in summarize
    _, _, summary = self.handle(str(dtype), config, series, {"type": str(dtype)})
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 62, in handle
    return op(*args)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
    return f(*res)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
    return f(*res)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
    return f(*res)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 17, in func2
    res = g(*x)
  File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
    return func(*args, **kwargs)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summary_algorithms.py", line 65, in inner
    return fn(config, series, summary)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summary_algorithms.py", line 82, in inner
    return fn(config, series, summary)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/describe_categorical_pandas.py", line 205, in pandas_describe_categorical_1d
    summary.update(length_summary_vc(value_counts))
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/describe_categorical_pandas.py", line 162, in length_summary_vc
    "median_length": weighted_median(
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/utils_pandas.py", line 13, in weighted_median
    w_median = (data[weights == np.max(weights)])[0]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

If I try changing the minimal from False to True, the script is now passing.

Version information:

Failing environment

Python version: Python 3.9.1 Pip version: pip 21.3.1 Pandas and pandas-profiling versions: 1.4.0 | 3.1.0 Full pip list:

Package               Version
--------------------- ---------
attrs                 21.4.0
certifi               2021.10.8
charset-normalizer    2.0.10
cycler                0.11.0
fonttools             4.28.5
htmlmin               0.1.12
idna                  3.3
ImageHash             4.2.1
Jinja2                3.0.3
joblib                1.0.1
kiwisolver            1.3.2
MarkupSafe            2.0.1
matplotlib            3.5.1
missingno             0.5.0
multimethod           1.6
networkx              2.6.3
numpy                 1.22.1
packaging             21.3
pandas                1.4.0
pandas-profiling      3.1.0
phik                  0.12.0
Pillow                9.0.0
pip                   21.3.1
pydantic              1.9.0
pyparsing             3.0.7
python-dateutil       2.8.2
pytz                  2021.3
PyWavelets            1.2.0
PyYAML                6.0
requests              2.27.1
scipy                 1.7.3
seaborn               0.11.2
setuptools            60.0.5
six                   1.16.0
tangled-up-in-unicode 0.1.0
tqdm                  4.62.3
typing_extensions     4.0.1
urllib3               1.26.8
visions               0.7.4
wheel                 0.37.1

Working environment

Python version: Python 3.9.1 Pip version: pip 21.3.1 Pandas and pandas-profiling versions: 1.3.5 | 3.1.0 Full pip list:

Package               Version
--------------------- ---------
attrs                 21.4.0
certifi               2021.10.8
charset-normalizer    2.0.10
cycler                0.11.0
fonttools             4.28.5
htmlmin               0.1.12
idna                  3.3
ImageHash             4.2.1
Jinja2                3.0.3
joblib                1.0.1
kiwisolver            1.3.2
MarkupSafe            2.0.1
matplotlib            3.5.1
missingno             0.5.0
multimethod           1.6
networkx              2.6.3
numpy                 1.22.1
packaging             21.3
pandas                1.3.5
pandas-profiling      3.1.0
phik                  0.12.0
Pillow                9.0.0
pip                   21.3.1
pydantic              1.9.0
pyparsing             3.0.7
python-dateutil       2.8.2
pytz                  2021.3
PyWavelets            1.2.0
PyYAML                6.0
requests              2.27.1
scipy                 1.7.3
seaborn               0.11.2
setuptools            60.0.5
six                   1.16.0
tangled-up-in-unicode 0.1.0
tqdm                  4.62.3
typing_extensions     4.0.1
urllib3               1.26.8
visions               0.7.4
wheel                 0.37.1

Let me know if I can provide more details and thank you for your good work!

bug 🐛

opened by Lothiraldan 15

TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
stats['range'] = stats['max'] - stats['min'] TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

I got this error
bug 🐛 information requested ❔ help wanted 🙋
opened by eyadsibai 15

2.10.0 - TraitError: The 'value' trait of a HTML instance must be a unicode string...

Describe the bug

Hi there - Looks like latest release (2.10.0) has broken a to_widgets functionality as outlined in the Getting started section of the docs. Confirmed eolling back to 2.9.0 does not produce the issue.

To Reproduce

# pandas_profiling==2.10.0
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=["a", "b", "c", "d", "e"]
)

profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)

profile.to_widgets()

Returns:

TraitError: The 'value' trait of a HTML instance must be a unicode string, but a value of Numeric <class 'visions.types.type.VisionsBaseTypeMeta'> was specified.

Version information: 2.10.0

Additional context

opened by rynmccrmck 14

ZeroDivisionError when using version 1.4.1

There was a change in behavior between versions 1.4.0 and 1.4.1 where some calls to ProfileReport that previously succeeded will now raise a ZeroDivisionError.

An example reproduction is to take the following code and run it in a Jupyter notebook cell:

import pandas
import pandas_profiling

import IPython

df = pandas.DataFrame({'c': 'v'}, index=['c'])
report = pandas_profiling.ProfileReport(df)
IPython.core.display.HTML(report.html)

With version 1.4.0 this produced an HTML report, but with version 1.4.1 it produces the following stack trace:

ZeroDivisionErrorTraceback (most recent call last)
<ipython-input-2-ffb5392b4284> in <module>()
      5 
      6 df = pandas.DataFrame({'c': 'v'}, index=['c'])
----> 7 report = pandas_profiling.ProfileReport(df)
      8 IPython.core.display.HTML(report.html)

/usr/local/lib/python2.7/dist-packages/pandas_profiling/__init__.pyc in __init__(self, df, **kwargs)
     67 
     68         self.html = to_html(sample,
---> 69                             description_set)
     70 
     71         self.description_set = description_set

/usr/local/lib/python2.7/dist-packages/pandas_profiling/report.pyc in to_html(sample, stats_object)
    192 
    193     # Add plot of matrix correlation
--> 194     pearson_matrix = plot.correlation_matrix(stats_object['correlations']['pearson'], 'Pearson')
    195     spearman_matrix = plot.correlation_matrix(stats_object['correlations']['spearman'], 'Spearman')
    196     correlations_html = templates.template('correlations').render(

/usr/local/lib/python2.7/dist-packages/pandas_profiling/plot.pyc in correlation_matrix(corrdf, title, **kwargs)
    134     plt.title(title, size=18)
    135     plt.colorbar(matrix_image)
--> 136     axes_cor.set_xticks(np.arange(0, corrdf.shape[0], corrdf.shape[0] * 1.0 / len(labels)))
    137     axes_cor.set_yticks(np.arange(0, corrdf.shape[1], corrdf.shape[1] * 1.0 / len(labels)))
    138     axes_cor.set_xticklabels(labels, rotation=90)

ZeroDivisionError: float division by zero

opened by ojarjur 14

pandas_profiling.utils.cache

ModuleNotFoundError: No module named 'pandas_profiling.utils'*

To Reproduce

Version information:

Additional context
information requested ❔

opened by ajaimes07 13
This call to matplotlib.use() has no effect because the backend has already

/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/pandas_profiling/base.py:20: UserWarning: This call to matplotlib.use() has no effect because the backend has already been chosen; matplotlib.use() must be called before pylab, matplotlib.pyplot, or matplotlib.backends is imported for the first time.

The backend was originally set to 'module://ipykernel.pylab.backend_inline' by the following code: File "/home/flash1/work/software/python/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in app.launch_new_instance() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance app.start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 477, in start ioloop.IOLoop.instance().start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start super(ZMQIOLoop, self).start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start handler_func(fd_obj, events) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events self._handle_recv() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback callback(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher return self.dispatch_shell(stream, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell handler(stream, idents, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request user_expressions, allow_stdin) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2718, in run_cell interactivity=interactivity, compiler=compiler, result=result) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes if self.run_code(code, result): File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2882, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 8, in import matplotlib.pyplot as plt File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", line 69, in from matplotlib.backends import pylab_setup File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/matplotlib/backends/init.py", line 14, in line for line in traceback.format_stack()

matplotlib.use('Agg')

opened by iweey 13
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Describe the bug

running the example below gives this error IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

latest version on conda-forge

To Reproduce

wine.csv

import numpy as np import pandas as pd from pandas_profiling import ProfileReport df = pd.read_csv("wine.csv") profile = ProfileReport(df, title="Pandas Profiling Report") profile.to_file("tmp.html")

Version information:

Python 3.9

pandas-profiling 3.1.0 pyhd8ed1ab_ 0 conda-forge

pandas 1.4.2 py39h1832856_1 conda-forge

bug 🐛
opened by darenr 12
Interaction plots for time series data

Missing functionality

Interaction plots for numeric time series variables.

Proposed feature

Calculate interaction plots for both numeric and numeric time series variables. Is there a setting to enable this?

Alternatives considered

I considered setting tsmode=False, but then I loose the autocorrelation plots.
needs-triage

opened by MauritsDescamps 0
Bug Report: KeyError: 'max_length' when comparing two profile_report (`minimal=True` is used to generate these report)
Current Behaviour

There is an error message:

`--------------------------------------------------------------------------- KeyError Traceback (most recent call last) /var/folders/60/6qphmx_d7x7_11vpj8524vf40000gn/T/ipykernel_17862/709405443.py in 7 8 # Save report to file ----> 9 comparison_report.to_file("comparison.html")

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in to_file(self, output_file, silent) 307 create_html_assets(self.config, output_file) 308 --> 309 data = self.to_html() 310 311 if output_file.suffix != ".html":

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in to_html(self) 418 419 """ --> 420 return self.html 421 422 def to_json(self) -> str:

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in html(self) 229 def html(self) -> str: 230 if self._html is None: --> 231 self._html = self._render_html() 232 return self._html 233

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in _render_html(self) 337 from pandas_profiling.report.presentation.flavours import HTMLReport 338 --> 339 report = self.report 340 341 with tqdm(

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/init.py in wrapper(*args, **kwargs) 1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs) 1032 check_argument_types(memo) -> 1033 retval = func(*args, **kwargs) 1034 try: 1035 check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in report(self) 223 def report(self) -> Root: 224 if self._report is None: --> 225 self._report = get_report_structure(self.config, self.description_set) 226 return self._report 227

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/report.py in get_report_structure(config, summary) 376 items=list(summary["variables"]), 377 item=Container( --> 378 render_variables_section(config, summary), 379 sequence_type="accordion", 380 name="Variables",

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/report.py in render_variables_section(config, dataframe_summary) 157 variable_type = summary["type"] 158 render_map_type = render_map.get(variable_type, render_map["Unsupported"]) --> 159 template_variables.update(render_map_type(config, template_variables)) 160 161 # Ignore these

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/variables/render_categorical.py in render_categorical(config, summary) 405 406 if length: --> 407 length_table, length_histo = render_categorical_length(config, summary, varid) 408 overview_items.append(length_table) 409

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/variables/render_categorical.py in render_categorical_length(config, summary, varid) 61 { 62 "name": "Max length", ---> 63 "value": fmt_number(summary["max_length"]), 64 "alert": False, 65 },

KeyError: 'max_length'`

Expected Behaviour

Run without error

Data Description

I'm runing the code for dataset comparison. The original code in that link works well. But when I set minimal=True to creat report, then compare the report, there comes a error

Code that reproduces the bug

from pandas_profiling import ProfileReport train_df = pd.read_csv("train.csv") train_report = ProfileReport(train_df, title="Train", minimal=True) test_df = pd.read_csv("test.csv") test_report = ProfileReport(test_df, title="Test", minimal=True) comparison_report = train_report.compare(test_report) comparison_report.to_file("comparison.html")

pandas-profiling version

v3.5.0

Dependencies

pandas 1.4.2 pandas-profiling 3.5.0

OS

Mac

Checklist

[X] There is not yet another bug report for this issue in the issue tracker

[X] The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.

[X] The issue has not been resolved by the entries listed under Common Issues.

needs-triage
opened by xiaoye-hua 0

Does pandas-profiling work in Jupyter Notebooks on AWS?

Does pandas-profiling work in Jupyter Notebooks on AWS? I understand there are a lot of configuration differences that can lead to issues but whenever I try to produce a profiling report, I get the following errors when I run:

profile = ProfileReport(df, 'myreport')
profile.to_file('s3://myfolder/myreport.html')

Summarize dataset:  97%|█████████▋| 427/438 [01:14<00:01,  8.03it/s, Calculate auto correlation]                    /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/multimethod/__init__.py:315: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
  return func(*args, **kwargs)
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:112: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored.
  warnings.warn("The input array could not be properly "
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:4881: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(stats.ConstantInputWarning(warn_msg))
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/correlations.py:67: UserWarning: There was an attempt to calculate the auto correlation, but this failed.
To hide this warning, disable the calculation
(using `df.profile_report(correlations={"auto": {"calculate": False}})`
If this is problematic for your use case, please report this as an issue:
https://github.com/ydataai/pandas-profiling/issues
(include the error message: 'No data; `observed` has size 0.')
  warnings.warn(
Summarize dataset:  98%|█████████▊| 428/438 [28:20<32:48, 196.80s/it, Calculate spearman correlation]/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/multimethod/__init__.py:315: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
  return func(*args, **kwargs)
Summarize dataset:  98%|█████████▊| 430/438 [30:55<21:07, 158.47s/it, Calculate kendall correlation] /home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:5218: RuntimeWarning: overflow encountered in long_scalars
  (2 * xtie * ytie) / m + x0 * y0 / (9 * m * (size - 2)))
/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/scipy/stats/_stats_py.py:5219: RuntimeWarning: invalid value encountered in sqrt
  z = con_minus_dis / np.sqrt(var)
Summarize dataset:  99%|█████████▊| 432/438 [45:40<00:38,  6.34s/it, Calculate phi_k correlation]   
---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/externals/loky/backend/queues.py", line 125, in _feed
    obj_ = dumps(obj, reducers=reducers)
  File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/externals/loky/backend/reduction.py", line 211, in dumps
    dump(obj, buf, reducers=reducers, protocol=protocol)
  File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/externals/loky/backend/reduction.py", line 204, in dump
    _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
  File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
  File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/_memmapping_reducer.py", line 446, in __call__
    for dumped_filename in dump(a, filename):
  File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 553, in dump
    NumpyPickler(f, protocol=protocol).dump(value)
  File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/pickle.py", line 487, in dump
    self.save(obj)
  File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 352, in save
    wrapper.write_array(obj, self)
  File "/home/ec2-user/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/numpy_pickle.py", line 134, in write_array
    pickler.file_handle.write(chunk.tobytes('C'))
OSError: [Errno 28] No space left on device
"""

The above exception was the direct cause of the following exception:

PicklingError                             Traceback (most recent call last)
<ipython-input-9-34649000e9e9> in <module>
      1 profile = ProfileReport(df_perf_18, title="MyReport")
----> 2 profile.to_file(f"s3://sf-puas-prod-use1-pc/fire/research/home_telematics/adt/analysis/MyReport.html")

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in to_file(self, output_file, silent)
    307                 create_html_assets(self.config, output_file)
    308 
--> 309             data = self.to_html()
    310 
    311             if output_file.suffix != ".html":

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in to_html(self)
    418 
    419         """
--> 420         return self.html
    421 
    422     def to_json(self) -> str:

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in html(self)
    229     def html(self) -> str:
    230         if self._html is None:
--> 231             self._html = self._render_html()
    232         return self._html
    233 

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in _render_html(self)
    337         from pandas_profiling.report.presentation.flavours import HTMLReport
    338 
--> 339         report = self.report
    340 
    341         with tqdm(

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in report(self)
    223     def report(self) -> Root:
    224         if self._report is None:
--> 225             self._report = get_report_structure(self.config, self.description_set)
    226         return self._report
    227 

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/profile_report.py in description_set(self)
    205     def description_set(self) -> Dict[str, Any]:
    206         if self._description_set is None:
--> 207             self._description_set = describe_df(
    208                 self.config,
    209                 self.df,

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/describe.py in describe(config, df, summarizer, typeset, sample)
     93         pbar.total += len(correlation_names)
     94 
---> 95         correlations = {
     96             correlation_name: progress(
     97                 calculate_correlation, pbar, f"Calculate {correlation_name} correlation"

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/describe.py in <dictcomp>(.0)
     94 
     95         correlations = {
---> 96             correlation_name: progress(
     97                 calculate_correlation, pbar, f"Calculate {correlation_name} correlation"
     98             )(config, df, correlation_name, series_description)

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/utils/progress_bar.py in inner(*args, **kwargs)
      9     def inner(*args, **kwargs) -> Any:
     10         bar.set_postfix_str(message)
---> 11         ret = fn(*args, **kwargs)
     12         bar.update()
     13         return ret

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/correlations.py in calculate_correlation(config, df, correlation_name, summary)
    105     correlation = None
    106     try:
--> 107         correlation = correlation_measures[correlation_name].compute(
    108             config, df, summary
    109         )

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/multimethod/__init__.py in __call__(self, *args, **kwargs)
    313         func = self[tuple(func(arg) for func, arg in zip(self.type_checkers, args))]
    314         try:
--> 315             return func(*args, **kwargs)
    316         except TypeError as ex:
    317             raise DispatchError(f"Function {func.__code__}") from ex

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/pandas_profiling/model/pandas/correlations_pandas.py in pandas_phik_compute(config, df, summary)
    152         from phik import phik_matrix
    153 
--> 154         correlation = phik_matrix(df[selected_cols], interval_cols=list(intcols))
    155 
    156     return correlation

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/phik/phik.py in phik_matrix(df, interval_cols, bins, quantile, noise_correction, dropna, drop_underflow, drop_overflow, verbose, njobs)
    254         verbose=verbose,
    255     )
--> 256     return phik_from_rebinned_df(
    257         data_binned,
    258         noise_correction,

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/phik/phik.py in phik_from_rebinned_df(data_binned, noise_correction, dropna, drop_underflow, drop_overflow, njobs)
    164         ]
    165     else:
--> 166         phik_list = Parallel(n_jobs=njobs)(
    167             delayed(_calc_phik)(co, data_binned[list(co)], noise_correction)
    168             for co in itertools.combinations_with_replacement(

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/parallel.py in __call__(self, iterable)
   1096 
   1097             with self._backend.retrieval_context():
-> 1098                 self.retrieve()
   1099             # Make sure that we get a last message telling us we are done
   1100             elapsed_time = time.time() - self._start_time

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/parallel.py in retrieve(self)
    973             try:
    974                 if getattr(self._backend, 'supports_timeout', False):
--> 975                     self._output.extend(job.get(timeout=self.timeout))
    976                 else:
    977                     self._output.extend(job.get())

~/SageMaker/.envs/mykernel/lib/python3.9/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    565         AsyncResults.get from multiprocessing."""
    566         try:
--> 567             return future.result(timeout=timeout)
    568         except CfTimeoutError as e:
    569             raise TimeoutError from e

~/SageMaker/.envs/mykernel/lib/python3.9/concurrent/futures/_base.py in result(self, timeout)
    436                     raise CancelledError()
    437                 elif self._state == FINISHED:
--> 438                     return self.__get_result()
    439 
    440                 self._condition.wait(timeout)

~/SageMaker/.envs/mykernel/lib/python3.9/concurrent/futures/_base.py in __get_result(self)
    388         if self._exception:
    389             try:
--> 390                 raise self._exception
    391             finally:
    392                 # Break a reference cycle with the exception in self._exception

PicklingError: Could not pickle the task to send it to the workers.

I'm on the latest version of pandas-profiling (just installed it today).

question/discussion ❓ information requested ❔

opened by JohnTravolski 3

bug: variables list is causing a misconfiguration in the UI variables section
Current Behaviour

Expected Behaviour

It would be easier on eyes if we make it as pill buttons instead, just like the one in "Overview"

Example:

Data Description

https://pandas-profiling.ydata.ai/examples/master/features/united_report.html

pandas-profiling version

vdev

Checklist

[X] There is not yet another bug report for this issue in the issue tracker

[X] The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.

[X] The issue has not been resolved by the entries listed under Common Issues.

bug 🐛
opened by rivanfebrian123 7
Dependency Dashboard
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

dockerfile

.devcontainer/Dockerfile

mcr.microsoft.com/vscode/devcontainers/python 0-3.10-bullseye

github-actions

.github/workflows/merge-dev.yml

actions/checkout v3

oprypin/find-latest-tag v1.1.1

simpleactions/create-tag v1.0.0

.github/workflows/merge-master.yml

actions/checkout v3

cycjimmy/semantic-release-action v3

actions/checkout v3

ad-m/github-push-action v0.6.0

simpleactions/create-tag v1.0.0

hugo19941994/delete-draft-releases v1.0.0

actions/create-release v1

.github/workflows/pull-request.yml

actions/checkout v3

wagoid/commitlint-github-action v5

actions/checkout v3

actions/setup-python v4

actions/cache v3

ad-m/github-push-action v0.6.0

.github/workflows/release.yml

actions/checkout v3

actions/setup-python v4

actions/cache v3

actions/upload-artifact v3

actions/download-artifact v3

AButler/upload-release-assets v2.0

actions/download-artifact v3

pypa/gh-action-pypi-publish v1.6.4

actions/checkout v3

actions/setup-python v4

.github/workflows/tests.yml

actions/checkout v3

actions/setup-python v4

actions/cache v3

actions/cache v3

actions/cache v3

actions/checkout v3

actions/setup-python v4

actions/cache v3

actions/cache v3

actions/cache v3

.github/workflows/triage.yml

pip_requirements

requirements.txt

scipy >=1.4.1, <1.10

pandas >1.1, <1.6, !=1.4.0

matplotlib >=3.2, <3.7

pydantic >=1.8.1, <1.11

PyYAML >=5.0.0, <6.1

jinja2 >=2.11.1, <3.2

visions ==0.7.5

numpy >=1.16.0,<1.24

htmlmin ==0.1.12

phik >=0.11.1,<0.13

requests >=2.24.0, <2.29

tqdm >=4.48.2, <4.65

seaborn >=0.10.1, <0.13

multimethod >=1.4, <1.10

statsmodels >=0.13.2, <0.14

typeguard >=2.13.2, <2.14

pip_setup

setup.py

jupyter-client >=5.3.4

jupyter-core >=4.6.3

ipywidgets >=7.5.1

tangled-up-in-unicode ==0.2.0

[ ] Check this box to trigger a request for Renovate to run again on this repository

dependencies 🔗
opened by renovate[bot] 0

Releases(v3.6.2)

v3.6.2(Jan 2, 2023)
3.6.2 (2023-01-02)

Bug Fixes

comparison alerts (#1229) (bbca61b)

comparison histogram (#1228) (0081581)

comparison report style issues (a465cdd)

update the link for the people-example.csv (2bb5043)

Source code(tar.gz)
Source code(zip)
v3.6.1(Dec 23, 2022)
3.6.1 (2022-12-23)

Bug Fixes

categorical var frequency plot (6cb391f)

remove ipywidgets import (1b8b117)

Source code(tar.gz)
Source code(zip)
v3.6.0(Dec 21, 2022)
3.6.0 (2022-12-21)

Bug Fixes

add css to cope with large tables (7f42f87)

adjust categoricals layout (f0bb45a)

categorical data not being obscured in the common values plot (40236bc)

compare report ignoring config parameter (3d60556)

compare report warnings always showing the last alert type (6b3c13d)

comparison fails when duplicates are disable (#1208) (6d19620)

do no raise exception for percentage formatter (3ea626d)

enforce recomputation of description sets (a9fd1c8)

error comparing only one precomputed profile (00646cd)

html: sensible cloud-platform notebook html rendering (b22ece2)

ignoring config of precomputed reports (6478c40)

only compute auto correlation when no config is specified (d5d4f58)

remove malfunctioning hook (e2593f5)

remove unused test (2170338)

return the proper type for widgets (4c0b358)

set compute default to false (c70e491)

solve mypy error (9c4266e)

solve mypy issue (e3e7788)

uses colors from the specified config (c0c556d)

utils: use 'urllib.request' instead of 'requests' (#1177) (e4d020b), closes #1168

Features

add heatmap values as a table under correlations (fc5da9e)

allow to specify the configuration for the comparison report (ad725b0)

design improvements on the correlations section (e5cd8cf)

implement imbalanced warning (ce84c81)

update variables layout (#1207) (cf0e0a7)

Source code(tar.gz)
Source code(zip)
v3.5.0(Nov 22, 2022)
3.5.0 (2022-11-22)

Bug Fixes

change context managed backend (#1149) (11e1a8a)

dataset names on comparison report (#1159) (3c14d43)

duplicate key in test dict (#1126) (d19affe)

improve description and correct plot for ‘auto’ correlation (#1119) (2617b92)

remove correlation calculation for constants (#1152) (1ed2bc0)

time series render format (#1157) (39ca8ce)

update config files to only calculate 'auto' correlation (#1158) (34cf73d)

update repository links (#1141) (c742c5d)

Features

add typechecking to profile report (#1139) (ec8ece0)

report comparison example (#1160) (5e75fd2)

report comparisons (#1069) (70ee5c7), closes #1137 #1136 #1143 #1148 #1150

Source code(tar.gz)
Source code(zip)
v3.4.0(Oct 20, 2022)
3.4.0 (2022-10-20)

Bug Fixes

correlation auto passing extra parameters (#1114) (21f4fe6)

cramer's correlation fails with missings vals (#1109) (8e7f8b2)

drop joblib dependency (#1090) (586cef3), closes #1056

fix linter errors (#1117) (5f17cfd)

make tangled-up-in-unicode an optional dependency (#1070) (e6b2a00)

remove unused imports (56beed4)

remove unused imports (66864c1)

Remove unused imports. (985fbd1)

Features

add support for Pandas 1.5 (#1076) (5c5a710)

added filter to locate columns (#1115) (c2f817d)

introduce auto parameter for correlations (#1095) (4d2e415)

Source code(tar.gz)
Source code(zip)
v3.3.0(Sep 7, 2022)

The full changelog is available here: https://pandas-profiling.ydata.ai/docs/master/pages/reference/changelog.html?highlight=change+log
Source code(tar.gz)
Source code(zip)
v3.2.0(May 2, 2022)

The full changelog is available here: https://pandas-profiling.ydata.ai/docs/master/pages/reference/changelog.html?highlight=change+log
Source code(tar.gz)
Source code(zip)
v3.1.0(Sep 27, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v3.0.0(May 11, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.13.0(May 8, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.12.0(May 5, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.11.0(Feb 20, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.10.1(Feb 7, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.10.0rc1(Jan 5, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.9.0(Sep 3, 2020)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.9.0rc1(Jul 12, 2020)

This release candidate improves handling of sensitive data and futhermore reduces technical debt with various fixes. The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.

A warm thank you to everyone who has contributed to this release: @gauravkumar37 @Jooong @smaranjitghose @XavierBanos Tam Nguyen @andycraig @mgorsk1 @mbh86 @MHUNCHO @GaelVaroquaux @AmauryLepicard @baluyotraf @pvojnisek @abegong
Source code(tar.gz)
Source code(zip)
v2.8.0(May 12, 2020)

pandas-profiling now has build-in supports for Files and Images, such as extracting file sizes, creation dates and dimensions and scanning for truncated images or those containing EXIF information. Moreover, the text analysis features have also been reworked, providing more informative statistics.

Read the changelog v2.8.0 for more details.

Contributors: @loopyme @Bradley-Butcher @willemhendriks, @IscaAy, @frellnick, @dataverz @ieaves
Source code(tar.gz)
Source code(zip)
v2.7.1(May 11, 2020)

Fix #468 by pinning visions to 0.4.1
Source code(tar.gz)
Source code(zip)
v2.7.0(May 7, 2020)

Announcement and changelog are available in the documentation.

We are grateful for @loopyme and @kyleYang for creating parts of the features on this release.

Thanks for all contributors that made this release possible @1313e @dataprofessor @neomatrix369 @jiangfangfangxm @WesleyTheGeolien @NickYi1990 @ricgu8086.
Source code(tar.gz)
Source code(zip)
v2.6.0(Apr 13, 2020)

Dependency policy

The current dependency policy is suboptimal. Pinning the dependencies is great for reproducibility (high guarantee to work), but on the downside requires frequent maintenance and introduces compatibility issues with other packages. Therefore, we are moving away from pinning dependencies and instead specify a minimum version.

Pandas v1

Early releases of pandas v1 demonstrated many regressions that broke functionality (as acknowledged by the authors here). At this point, pandas is more stable and we notice high demand for compatibility. We move on to support pandas' latest versions. To ensure compatibility with both versions, we have extended the test matrix to test against both pandas 0.x.y and 1.x.y.

Python 3.6+ features

Python 3.6 introduces ordered dicts and f-strings, which we now rely on. This means that from pandas-profiling 2.6, you should minimally run Python 3.6. For users that for some reason cannot update, you can use pandas-profiling 2.5.0, but you unfortunately won't benefit from updates or maintenance.

Extended continuous integration

Starting from this release, we use Github Actions and Travis CI combined to increase maintainability. Travis CI handles the testing, Github Actions automates part of the development process by running black and building the docs.
Source code(tar.gz)
Source code(zip)
v2.5.0(Feb 14, 2020)
Progress bar added (#224)

Character analysis for Text/NLP (#278)

Themes: configuration and demo's (Orange, Dark)

Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.

Toggle descriptions at correlations.

Deprecation:

This is the last version to support Python 3.5.

Stability:

The order of columns changed when sort="None" (#377, fixed).

Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)

Improved mixed type detection (#351)

Refactor of report structures.

Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).

Distinct counts exclude NaNs.

Fixed alerts in notebooks.

Other improvements:

Warnings are now sorted.

Links to Binder and Google Colab are added for notebooks (#349)

The overview section is tabbed.

Commit for pandas-profiling v2.5.0

Progress bar added (#224)

Character analysis for Text/NLP (#278)

Themes: configuration and demo's (Orange, Dark)

Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.

Toggle descriptions at correlations.

Deprecation:

This is the last version to support Python 3.5.

Stability:

The order of columns changed when sort="None" (#377, fixed).

Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)

Improved mixed type detection (#351)

Refactor of report structures.

Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).

Distinct counts exclude NaNs.

Fixed alerts in notebooks.

Other improvements:

Warnings are now sorted.

Links to Binder and Google Colab are added for notebooks (#349)

The overview section is tabbed.

Source code(tar.gz)
Source code(zip)
v2.4.0(Jan 8, 2020)
The v2.4.0 release decouples the data structure of reports from the actual rendering. It's now much simpler to change the user interface, whether the user is in a jupyter notebook, webpage, native application or just wants a json view of the data.

We are also proud to announce that we are accepted for the GitHub Sponsor programme. You are cordially invited to support me through this programme, because you want to see me continue working on this project and to boost community funding, GitHub will match your contribution!

Other improvements:

extended configuration with better defaults, including minimal mode for big data (#258, #310)

more example datasets

rejection of highly correlated variables is generalized (#284, #299)

many structural and stability improvements (#254, #274, #239)

Special thanks to @marco-cardoso @ajupton @lvwerra @gliptak @neomatrix369 for their contributions.
Source code(tar.gz)
Source code(zip)
v2.3.0(Jul 27, 2019)
(Experimental) Support for "path" type

Fix numeric precision (#225)

Force labels in missing values diagram for large number of columns (#222)

Add pull request template

Add Census Dataset from the UCI ML Repository

Thanks @bensdm and @huaiweicheng for your valuable contributions to this version!
Source code(tar.gz)
Source code(zip)
v2.2.0(Jul 22, 2019)
New release introducing variable size binning (via astropy), PyCharm integration and various fixes and optimizations.

Added Variable bin sizing via Bayesian Boxing (feature request [#216])

PyCharm integration, console attempts to detect file type.

Fixed bug [#215].

Updated the missingno package to 0.4.2, fixing the font size in the bar diagram.

Various optimizations

Thanks to: @Utsav37 @mansenfranzen @jakevdp
Source code(tar.gz)
Source code(zip)
v2.1.2(Jul 11, 2019)

Fix [#211] and README
Source code(tar.gz)
Source code(zip)
v2.1.1(Jul 11, 2019)
Fix of [#206]

Improve code maintainability of the view (HTML templates, notebook)

Fix bug in dendrogram sizing

Source code(tar.gz)
Source code(zip)
v2.1.0(Jul 6, 2019)
The pandas-profiling release version 2.1.0 includes:

Correlations: correlation calculations are now more fault tolerant ([#51] and [#197]), correlation names in the report are clarified.

Jupyter Notebook: rendering a profiling report is done inside the srcdoc attribute (which fixes [#199]), a full-width option is added and the column layout is improved.

User experience: The table styling and sample section formatting is improved.

Warnings: detection added for categorical variable that is suspected to be of the datetime type.

Documentation and community:

The Contribution page helps users that want to contribute.

Typo's fixed [#195], Thank you @abhilashshakti

Added more examples.

Other bugfixes and improvements:

Add version information to console interface.

Fix: Remove one-time used logger [#202]

Fix: Dealing with string indices [#200]

Contributors: @abhilashshakti @adamrossnelson @manycoding @InsciteAnalytics
Source code(tar.gz)
Source code(zip)
v2.0.3(Jun 23, 2019)

Bugfix on version structure for 2.0.2.
Source code(tar.gz)
Source code(zip)
v2.0.2(Jun 22, 2019)

Revised version structure, fixed recursion preventing installation of dependencies ([#184]).

The setup.py file used to include utils from the package prior to installation. This causes errors when the dependencies are not yet present.
Source code(tar.gz)
Source code(zip)
v2.0.1(Jun 21, 2019)
Add offline support [#177], [#179] and [#180]

Source code(tar.gz)
Source code(zip)

Create HTML profiling reports from pandas DataFrame objects

Related tags

Overview

Pandas Profiling

Announcements

Support pandas-profiling

Examples

Installation

Using pip

Using conda

From source

Documentation

Getting started

Explore deeper

Jupyter Notebook

Saving the report

Large datasets

Command line usage

Advanced usage

Supporting open source

Types

Contributing

Editor integration

PyCharm integration

Other integrations

Dependencies

Comments

Failing environment

Working environment

Missing functionality

Proposed feature

Alternatives considered

Current Behaviour

Expected Behaviour

Data Description

Code that reproduces the bug

pandas-profiling version

Dependencies

OS

Checklist

Current Behaviour

Expected Behaviour

Data Description

pandas-profiling version

Checklist

Detected dependencies

Releases(v3.6.2)

v3.6.2(Jan 2, 2023)

3.6.2 (2023-01-02)

Bug Fixes

v3.6.1(Dec 23, 2022)

3.6.1 (2022-12-23)

Bug Fixes

v3.6.0(Dec 21, 2022)

3.6.0 (2022-12-21)

Bug Fixes

Features

v3.5.0(Nov 22, 2022)

3.5.0 (2022-11-22)

Bug Fixes

Features

v3.4.0(Oct 20, 2022)

3.4.0 (2022-10-20)

Bug Fixes

Features

v3.3.0(Sep 7, 2022)

v3.2.0(May 2, 2022)

v3.1.0(Sep 27, 2021)

v3.0.0(May 11, 2021)

v2.13.0(May 8, 2021)

v2.12.0(May 5, 2021)

v2.11.0(Feb 20, 2021)

v2.10.1(Feb 7, 2021)

v2.10.0rc1(Jan 5, 2021)

v2.9.0(Sep 3, 2020)

v2.9.0rc1(Jul 12, 2020)

v2.8.0(May 12, 2020)

v2.7.1(May 11, 2020)

v2.7.0(May 7, 2020)

v2.6.0(Apr 13, 2020)

Support `pandas-profiling`