The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs

Overview

pandas-log

Documentation Status Updates

The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions, such as .query, .apply, .merge, .group_by and more.

Why pandas-log?

Pandas-log is a Python implementation of the R package tidylog, and provides a feedback about basic pandas operations.

The pandas has been invaluable for the data science ecosystem and usually consists of a series of steps that involve transforming raw data into an understandable/usable format. These series of steps need to be run in a certain sequence and if the result is unexpected it's hard to understand what happened. Pandas-log log metadata on each operation which will allow to pinpoint the issues.

Lets look at an example, first we need to load pandas-log after pandas and create a dataframe:

import pandas
import pandas_log

with pandas_log.enable():
    df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
                   "toy": [np.nan, 'Batmobile', 'Bullwhip'],
                   "born": [pd.NaT, pd.Timestamp("1940-04-25"), pd.NaT]})

pandas-log will give you feedback, for instance when filtering a data frame or adding a new variable:

df.assign(toy=lambda x: x.toy.map(str.lower))
  .query("name != 'Batman'")

pandas-log can be especially helpful in longer pipes:

df.assign(toy=lambda x: x.toy.map(str.lower))
  .query("name != 'Batman'")
  .dropna()\
  .assign(lower_name=lambda x: x.name.map(str.lower))
  .reset_index()

For medium article go here

For a full walkthrough go here

Installation

pandas-log is currently installable from PyPI:

pip install pandas-log

Contributing

Follow contribution docs for a full description of the process of contributing to pandas-log.

Comments
  • [ENH] General enhancements and bugfixes

    [ENH] General enhancements and bugfixes

    PR Description

    • Numerous small bug fixes
    • Generalized logging to pd.Series in addition to pd.DataFrame
    • Added an optional extra data quality check method to DataFrames
    • Added logging for additional methods
    • Improved log messages for some methods
    • Made the memory calculation component optional based on a flag to enable/auto_enable, as it slows things down dramatically for large dataframes
    • Removed some broken code
    • Fixed some formatting issues

    **This PR resolves issue #25 **

    PR Checklist

    Please ensure that you have done the following:

    1. [x] PR in from a fork off your branch. Do not PR from <your_username>:master, but rather from <your_username>:<branch_name>.
    1. [x] If you're not on the contributors list, add yourself to AUTHORS.rst.

    Quick Check

    To do a very quick check that everything is correct, follow these steps below:

    • [x] Run the command make format from pandsa-log's top-level directory. This will automatically run:
      • black formatting
      • fix imports with isort
    opened by cmdavis4 8
  • No module humanize or pandas-flavor

    No module humanize or pandas-flavor

    Brief Description

    Are the required imports for this package up-to-date? I installed pip install pandas-log, then got module import errors as I tried importing pandas log to my notebook:

    --> 114             import pandas_log
        115 
        116             with pandas_logs.enable():
    
    ~/GitHub/simple-tech-challenges/venv/lib/python3.8/site-packages/pandas_log/__init__.py in <module>
          2 
          3 """Top-level package for pandas-log."""
    ----> 4 from .pandas_log import *
          5 
          6 __author__ = """Eyal Trabelsi"""
    
    ~/GitHub/simple-tech-challenges/venv/lib/python3.8/site-packages/pandas_log/pandas_log.py in <module>
         15     restore_pandas_func_copy,
         16 )
    ---> 17 from pandas_log.pandas_execution_stats import StepStats, get_execution_stats
         18 
         19 __all__ = ["auto_enable", "auto_disable", "enable"]
    
    ~/GitHub/simple-tech-challenges/venv/lib/python3.8/site-packages/pandas_log/pandas_execution_stats.py in <module>
         22 with warnings.catch_warnings():
         23     warnings.simplefilter("ignore")
    ---> 24     import humanize
         25 
         26 
    
    ModuleNotFoundError: No module named 'humanize'
    
    opened by JoshZastrow 3
  • fix link to open new issue

    fix link to open new issue

    Brief Description of Fix

    Currently, the link to submit an issue refers to ericmjl. probably from a template cookie cutter.

    I would like to propose a change, such that now the docs...

    Relevant Context

    image

    readthedocs contributing page

    documentation good first issue 
    opened by DeanLa 2
  • A way to enable globally?

    A way to enable globally?

    Brief Description

    I'm looking for a way to enable pandas-log globally without the use of context manager. Is it possible right now? If not, how do you think about this feature? Thanks.

    opened by liebkne 1
  • [ENH]

    [ENH]

    PR Description

    Please describe the changes proposed in the pull request:

    • Numerous small bug fixes
    • Generalized logging to pd.Series in addition to pd.DataFrame
    • Added an optional extra data quality check method to DataFrames
    • Added logging for additional methods
    • Improved log messages for some methods
    • Made the memory calculation component optional based on a flag to enable/auto_enable, as it slows things down dramatically for large dataframes
    • Removed some broken code
    • Fixed some formatting issues

    **This PR resolves issue #25 **

    PR Checklist

    Please ensure that you have done the following:

    1. [ ] PR in from a fork off your branch. Do not PR from <your_username>:master, but rather from <your_username>:<branch_name>.
    1. [ ] If you're not on the contributors list, add yourself to AUTHORS.rst.

    Quick Check

    To do a very quick check that everything is correct, follow these steps below:

    • [ ] Run the command make format from pandsa-log's top-level directory. This will automatically run:
      • black formatting
      • fix imports with isort
    opened by cmdavis4 1
  • pd.merge nonetype object has no attribute 'memory_usage'

    pd.merge nonetype object has no attribute 'memory_usage'

    Brief Description

    System Information

    • Operating system: Windows
    • OS details (optional):
    • Python version (required): Python 3.6

    installed via pip

    Minimally Reproducible Code

    import pandas as pd import pandas_log df_a = pd.DataFrame({'a':[1,2,3],'b':['a','b','c']}) df_b = pd.DataFrame({'c':[11,12,13],'b':['a','b','c']})

    with pandas_log.enable(): df = ( pd.merge(df_a,df_b,on='b') )

    Error Messages

    --------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-4-c5cf824082a8> in <module>
          1 with pandas_log.enable():
          2     df = (
    ----> 3         pd.merge(df_a,df_b,on='b')
          4     )
    
    ~\.conda\envs\empl\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
         79         copy=copy,
         80         indicator=indicator,
    ---> 81         validate=validate,
         82     )
         83     return op.get_result()
    
    ~\.conda\envs\empl\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
        624             self.right_join_keys,
        625             self.join_names,
    --> 626         ) = self._get_merge_keys()
        627 
        628         # validate the merge keys dtypes. We may need to coerce
    
    ~\.conda\envs\empl\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
       1031 
       1032         if right_drop:
    -> 1033             self.right = self.right._drop_labels_or_levels(right_drop)
       1034 
       1035         return left_keys, right_keys, join_names
    
    ~\.conda\envs\empl\lib\site-packages\pandas\core\generic.py in _drop_labels_or_levels(self, keys, axis)
       1860             # Handle dropping columns labels
       1861             if labels_to_drop:
    -> 1862                 dropped.drop(labels_to_drop, axis=1, inplace=True)
       1863         else:
       1864             # Handle dropping column levels
    
    ~\.conda\envs\empl\lib\site-packages\pandas_flavor\register.py in __call__(self, *args, **kwargs)
         27             @wraps(method)
         28             def __call__(self, *args, **kwargs):
    ---> 29                 return method(self._obj, *args, **kwargs)
         30 
         31         register_dataframe_accessor(method.__name__)(AccessorMethod)
    
    ~\.conda\envs\empl\lib\site-packages\pandas_log\pandas_log.py in wrapped(*args, **fn_kwargs)
        134                 full_signature,
        135                 silent,
    --> 136                 verbose,
        137             )
        138             return output_df
    
    ~\.conda\envs\empl\lib\site-packages\pandas_log\pandas_log.py in _run_method_and_calc_stats(fn, fn_args, fn_kwargs, input_df, full_signature, silent, verbose)
        104 
        105         output_df, execution_stats = get_execution_stats(
    --> 106             fn, input_df, fn_args, fn_kwargs
        107         )
        108 
    
    ~\.conda\envs\empl\lib\site-packages\pandas_log\pandas_execution_stats.py in get_execution_stats(fn, input_df, fn_args, fn_kwargs)
         35 
         36     input_memory_size = StepStats.calc_df_series_memory(input_df)
    ---> 37     output_memory_size = StepStats.calc_df_series_memory(output_df)
         38 
         39     ExecutionStats = namedtuple(
    
    ~\.conda\envs\empl\lib\site-packages\pandas_log\pandas_execution_stats.py in calc_df_series_memory(df_or_series)
         78     @staticmethod
         79     def calc_df_series_memory(df_or_series):
    ---> 80         memory_size = df_or_series.memory_usage(index=True, deep=True)
         81         return (
         82             humanize.naturalsize(memory_size.sum())
    
    AttributeError: 'NoneType' object has no attribute 'memory_usage'
    
    opened by SuryaMudimi 1
  • Adding tips section

    Adding tips section

    PR Description

    Please describe the changes proposed in the pull request:

    • Tips whether execution step can be optimize
    • Tips whether execution step is required or noth

    This PR resolves #(put issue number here, and remove parentheses).

    PR Checklist

    Please ensure that you have done the following:

    1. [x] PR in from a fork off your branch. Do not PR from <your_username>:master, but rather from <your_username>:<branch_name>.
    1. [x] If you're not on the contributors list, add yourself to AUTHORS.rst.

    Quick Check

    To do a very quick check that everything is correct, follow these steps below:

    • [x] Run the command make format from pandsa-log's top-level directory. This will automatically run:
      • black formatting
      • fix imports with isort
    opened by eyaltrabelsi 1
  • Add memory usage

    Add memory usage

    PR Description

    Please describe the changes proposed in the pull request:

    • add Memory usage to execution stats
    • update usage
    • clean pr tempalte

    This PR resolves #(put issue number here, and remove parentheses).

    PR Checklist

    Please ensure that you have done the following:

    1. [ ] PR in from a fork off your branch. Do not PR from <your_username>:master, but rather from <your_username>:<branch_name>.
    1. [ ] If you're not on the contributors list, add yourself to AUTHORS.rst.

    Quick Check

    To do a very quick check that everything is correct, follow these steps below:

    • [ ] Run the command make check from pandsa-log's top-level directory. This will automatically run:
      • black formatting
      • fix imports with isort
      • running the test suite
      • docs build

    Relevant Reviewers

    Please tag maintainers to review.

    • @eyaltrabelsi
    opened by eyaltrabelsi 1
  • Update pip to 19.3

    Update pip to 19.3

    This PR updates pip from 19.2.3 to 19.3.

    Changelog

    19.3

    =================
    
    Deprecations and Removals
    -------------------------
    
    - Remove undocumented support for un-prefixed URL requirements pointing
    to SVN repositories. Users relying on this can get the original behavior
    by prefixing their URL with ``svn+`` (which is backwards-compatible). (`7037 &lt;https://github.com/pypa/pip/issues/7037&gt;`_)
    - Remove the deprecated ``--venv`` option from ``pip config``. (`7163 &lt;https://github.com/pypa/pip/issues/7163&gt;`_)
    
    Features
    --------
    
    - Print a better error message when ``--no-binary`` or ``--only-binary`` is given
    an argument starting with ``-``. (`3191 &lt;https://github.com/pypa/pip/issues/3191&gt;`_)
    - Make ``pip show`` warn about packages not found. (`6858 &lt;https://github.com/pypa/pip/issues/6858&gt;`_)
    - Support including a port number in ``--trusted-host`` for both HTTP and HTTPS. (`6886 &lt;https://github.com/pypa/pip/issues/6886&gt;`_)
    - Redact single-part login credentials from URLs in log messages. (`6891 &lt;https://github.com/pypa/pip/issues/6891&gt;`_)
    - Implement manylinux2014 platform tag support.  manylinux2014 is the successor
    to manylinux2010.  It allows carefully compiled binary wheels to be installed
    on compatible Linux platforms.  The manylinux2014 platform tag definition can
    be found in `PEP599 &lt;https://www.python.org/dev/peps/pep-0599/&gt;`_. (`7102 &lt;https://github.com/pypa/pip/issues/7102&gt;`_)
    
    Bug Fixes
    ---------
    
    - Abort installation if any archive contains a file which would be placed
    outside the extraction location. (`3907 &lt;https://github.com/pypa/pip/issues/3907&gt;`_)
    - pip&#39;s CLI completion code no longer prints a Traceback if it is interrupted. (`3942 &lt;https://github.com/pypa/pip/issues/3942&gt;`_)
    - Correct inconsistency related to the ``hg+file`` scheme. (`4358 &lt;https://github.com/pypa/pip/issues/4358&gt;`_)
    - Fix ``rmtree_errorhandler`` to skip non-existing directories. (`4910 &lt;https://github.com/pypa/pip/issues/4910&gt;`_)
    - Ignore errors copying socket files for local source installs (in Python 3). (`5306 &lt;https://github.com/pypa/pip/issues/5306&gt;`_)
    - Fix requirement line parser to correctly handle PEP 440 requirements with a URL
    pointing to an archive file. (`6202 &lt;https://github.com/pypa/pip/issues/6202&gt;`_)
    - The ``pip-wheel-metadata`` directory does not need to persist between invocations of pip, use a temporary directory instead of the current ``setup.py`` directory. (`6213 &lt;https://github.com/pypa/pip/issues/6213&gt;`_)
    - Fix ``--trusted-host`` processing under HTTPS to trust any port number used
    with the host. (`6705 &lt;https://github.com/pypa/pip/issues/6705&gt;`_)
    - Switch to new ``distlib`` wheel script template. This should be functionally
    equivalent for end users. (`6763 &lt;https://github.com/pypa/pip/issues/6763&gt;`_)
    - Skip copying .tox and .nox directories to temporary build directories (`6770 &lt;https://github.com/pypa/pip/issues/6770&gt;`_)
    - Fix handling of tokens (single part credentials) in URLs. (`6795 &lt;https://github.com/pypa/pip/issues/6795&gt;`_)
    - Fix a regression that caused ``~`` expansion not to occur in ``--find-links``
    paths. (`6804 &lt;https://github.com/pypa/pip/issues/6804&gt;`_)
    - Fix bypassed pip upgrade warning on Windows. (`6841 &lt;https://github.com/pypa/pip/issues/6841&gt;`_)
    - Fix &#39;m&#39; flag erroneously being appended to ABI tag in Python 3.8 on platforms that do not provide SOABI (`6885 &lt;https://github.com/pypa/pip/issues/6885&gt;`_)
    - Hide security-sensitive strings like passwords in log messages related to
    version control system (aka VCS) command invocations. (`6890 &lt;https://github.com/pypa/pip/issues/6890&gt;`_)
    - Correctly uninstall symlinks that were installed in a virtualenv,
    by tools such as ``flit install --symlink``. (`6892 &lt;https://github.com/pypa/pip/issues/6892&gt;`_)
    - Don&#39;t fail installation using pip.exe on Windows when pip wouldn&#39;t be upgraded. (`6924 &lt;https://github.com/pypa/pip/issues/6924&gt;`_)
    - Use canonical distribution names when computing ``Required-By`` in ``pip show``. (`6947 &lt;https://github.com/pypa/pip/issues/6947&gt;`_)
    - Don&#39;t use hardlinks for locking selfcheck state file. (`6954 &lt;https://github.com/pypa/pip/issues/6954&gt;`_)
    - Ignore &quot;require_virtualenv&quot; in ``pip config`` (`6991 &lt;https://github.com/pypa/pip/issues/6991&gt;`_)
    - Fix ``pip freeze`` not showing correct entry for mercurial packages that use subdirectories. (`7071 &lt;https://github.com/pypa/pip/issues/7071&gt;`_)
    - Fix a crash when ``sys.stdin`` is set to ``None``, such as on AWS Lambda. (`7118 &lt;https://github.com/pypa/pip/issues/7118&gt;`_, `7119 &lt;https://github.com/pypa/pip/issues/7119&gt;`_)
    
    Vendored Libraries
    ------------------
    
    - Upgrade certifi to 2019.9.11
    - Add contextlib2 0.6.0 as a vendored dependency.
    - Remove Lockfile as a vendored dependency.
    - Upgrade msgpack to 0.6.2
    - Upgrade packaging to 19.2
    - Upgrade pep517 to 0.7.0
    - Upgrade pyparsing to 2.4.2
    - Upgrade pytoml to 0.1.21
    - Upgrade setuptools to 41.4.0
    - Upgrade urllib3 to 1.25.6
    
    Improved Documentation
    ----------------------
    
    - Document caveats for UNC paths in uninstall and add .pth unit tests. (`6516 &lt;https://github.com/pypa/pip/issues/6516&gt;`_)
    - Add architectural overview documentation. (`6637 &lt;https://github.com/pypa/pip/issues/6637&gt;`_)
    - Document that ``--ignore-installed`` is dangerous. (`6794 &lt;https://github.com/pypa/pip/issues/6794&gt;`_)
    
    Links
    • PyPI: https://pypi.org/project/pip
    • Changelog: https://pyup.io/changelogs/pip/
    • Homepage: https://pip.pypa.io/
    opened by pyup-bot 1
  • All logs should be suppressed after disable being called

    All logs should be suppressed after disable being called

    Brief Description

    Currently, after disabling method some methods still reproduce logs altough they shouldn't, because the reference of the pandas method is diffrent from the instance method.

    Minimally Reproducible Code

    with enable():
        df = pd.read_csv("../examples/pokemon.csv")
        (
            df.query("legendary==0")
            .query("type_1=='fire' or type_2=='fire'")
        )
    df.query("legendary==1")
    
    bug 
    opened by eyaltrabelsi 1
  • Accessable medium post

    Accessable medium post

    Description

    Writing a medium post in addition to the docs may allow more comprehensive understanding of the both the need and usage.

    Relevant Context

    documentation 
    opened by eyaltrabelsi 1
  • TypeError: data type not understood

    TypeError: data type not understood

    Brief Description

    I'm trying to run pandas-log on my chain and it fails with the error:

    TypeError: data type not understood
    

    System Information

    • Python version (required): Python 3.8.5
    • Pandas version: 1.3.2

    Minimally Reproducible Code

    import pandas as pd
    autos = pd.read_csv('https://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip')
    def to_tz(df_, time_col, tz_offset, tz_name):
        return (df_
                 .groupby(tz_offset)
                 [time_col]
                 .transform(lambda s: pd.to_datetime(s)
                     .dt.tz_localize(s.name, ambiguous=True)
                     .dt.tz_convert(tz_name))
                )
    
    
    def tweak_autos(autos):
        cols = ['city08', 'comb08', 'highway08', 'cylinders', 'displ', 'drive', 'eng_dscr', 
            'fuelCost08', 'make', 'model', 'trany', 'range', 'createdOn', 'year']
        return (autos
         [cols]
         .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),
                 displ=autos.displ.fillna(0).astype('float16'),
                 drive=autos.drive.fillna('Other').astype('category'),
                 automatic=autos.trany.str.contains('Auto'),
                 speeds=autos.trany.str.extract(r'(\d)+').fillna('20').astype('int8'),
                 tz=autos.createdOn.str.extract(r'\d\d:\d\d ([A-Z]{3}?)').replace('EDT', 'EST5EDT'),
                 str_date=(autos.createdOn.str.slice(4,19) + ' ' + autos.createdOn.str.slice(-4)),
                 createdOn=lambda df_: to_tz(df_, 'str_date', 'tz', 'US/Eastern'),
                 ffs=autos.eng_dscr.str.contains('FFS')
                )
         .pipe(show, rows=2, title='New Cols')            
         .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16',
                  'range': 'int16',  'year': 'int16', 'make': 'category'})
         .drop(columns=['trany', 'eng_dscr'])
        )
    import pandas_log
    with pandas_log.enable():
        tweak_autos(autos)
    

    Error Messages

    1) fillna(value: 'object | ArrayLike | None' ="20", method: 'FillnaOptions | None' = None, axis: 'Axis | None' = None, inplace: 'bool' = False, limit=None, downcast=None):
    	Metadata:
    	* Filled 837 with 20.
    	Execution Stats:
    	* Execution time: Step Took 0.001512 seconds.
    
    1) replace(to_replace="EDT", value="EST5EDT", inplace: 'bool' = False, limit=None, regex: 'bool' = False, method: 'str' = 'pad'):
    	Execution Stats:
    	* Execution time: Step Took 0.001215 seconds.
    
    1) groupby(by="tz", axis: 'Axis' = 0, level: 'Level | None' = None, as_index: 'bool' = True, sort: 'bool' = True, group_keys: 'bool' = True, squeeze: 'bool | lib.NoDefault' = <no_default>, observed: 'bool' = False, dropna: 'bool' = True):
    	Metadata:
    	* Grouping by tz resulted in 2 groups like 
    		EST,
    		EST5EDT,
    	  and more.
    	Execution Stats:
    	* Execution time: Step Took 0.006409 seconds.
    /home/matt/envs/menv/lib/python3.8/site-packages/pandas_log/patched_logs_functions.py:249: UserWarning: Some pandas logging may involve copying dataframes, which can be time-/memory-intensive. Consider passing copy_ok=False to the enable/auto_enable functions in pandas_log if issues arise.
      warnings.warn(COPY_WARNING_MSG)
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-1-f6bfc55c635b> in <module>
         33 import pandas_log
         34 with pandas_log.enable():
    ---> 35     tweak_autos(autos)
    
    <ipython-input-1-f6bfc55c635b> in tweak_autos(autos)
         14     cols = ['city08', 'comb08', 'highway08', 'cylinders', 'displ', 'drive', 'eng_dscr', 
         15         'fuelCost08', 'make', 'model', 'trany', 'range', 'createdOn', 'year']
    ---> 16     return (autos
         17      [cols]
         18      .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),
    
    ~/envs/menv/lib/python3.8/site-packages/pandas_flavor/register.py in __call__(self, *args, **kwargs)
         27             @wraps(method)
         28             def __call__(self, *args, **kwargs):
    ---> 29                 return method(self._obj, *args, **kwargs)
         30 
         31         register_dataframe_accessor(method.__name__)(AccessorMethod)
    
    ~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_log.py in wrapped(*args, **fn_kwargs)
        184 
        185             input_df, fn_args = args[0], args[1:]
    --> 186             output_df = _run_method_and_calc_stats(
        187                 fn,
        188                 fn_args,
    
    ~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_log.py in _run_method_and_calc_stats(fn, fn_args, fn_kwargs, input_df, full_signature, silent, verbose, copy_ok, calculate_memory)
        168             output_df,
        169         )
    --> 170         step_stats.log_stats_if_needed(silent, verbose, copy_ok)
        171         if isinstance(output_df, pd.DataFrame) or isinstance(output_df, pd.Series):
        172             step_stats.persist_execution_stats()
    
    ~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_execution_stats.py in log_stats_if_needed(self, silent, verbose, copy_ok)
        106 
        107         if verbose or self.fn.__name__ not in DATAFRAME_ADDITIONAL_METHODS_TO_OVERIDE:
    --> 108             s = self.__repr__(verbose, copy_ok)
        109             if s:
        110                 # If this method isn't patched and verbose is False, __repr__ will give an empty string, which
    
    ~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_execution_stats.py in __repr__(self, verbose, copy_ok)
        147 
        148         # Step Metadata stats
    --> 149         logs, tips = self.get_logs_for_specifc_method(verbose, copy_ok)
        150         metadata_stats = f"\033[4mMetadata\033[0m:\n{logs}" if logs else ""
        151         metadata_tips = f"\033[4mTips\033[0m:\n{tips}" if tips else ""
    
    ~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_execution_stats.py in get_logs_for_specifc_method(self, verbose, copy_ok)
        128 
        129         log_method = partial(log_method, self.output_df, self.input_df)
    --> 130         logs, tips = log_method(*self.fn_args, **self.fn_kwargs)
        131         return logs, tips
        132 
    
    ~/envs/menv/lib/python3.8/site-packages/pandas_log/patched_logs_functions.py in log_assign(output_df, input_df, **kwargs)
        250             # If copying is ok, we can check how many values actually changed
        251             for col in changed_cols:
    --> 252                 values_changed, values_unchanged = num_values_changed(
        253                     input_df[col], output_df[col]
        254                 )
    
    ~/envs/menv/lib/python3.8/site-packages/pandas_log/patched_logs_functions.py in num_values_changed(input_obj, output_obj)
        127         isinstance(input_obj, pd.Series)
        128         and isinstance(output_obj, pd.Series)
    --> 129         and input_obj.dtype != output_obj.dtype
        130     ):
        131         # Comparing values for equality across dtypes wouldn't be well-defined so we just say they all changed
    
    TypeError: Cannot interpret 'datetime64[ns, US/Eastern]' as a data type
    
    opened by mattharrison 2
  • Sourcery Starbot ⭐ refactored eyaltrabelsi/pandas-log

    Sourcery Starbot ⭐ refactored eyaltrabelsi/pandas-log

    Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

    Here's your pull request refactoring your most popular Python repo.

    If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

    Review changes via command line

    To manually merge these changes, make sure you're on the master branch, then run:

    git fetch https://github.com/sourcery-ai-bot/pandas-log master
    git merge --ff-only FETCH_HEAD
    git reset HEAD^
    
    opened by sourcery-ai-bot 0
  • Integrate with Python logging module

    Integrate with Python logging module

    Integrate with Python logging

    I would love a way to integrate this with the standard Python logging module, or ar dropin replacement thereof, such as loguro.

    Such an integration would make it more useful when running production data-science code, and ease adoption of this library, which is thinks is a really interesting idea!

    opened by AllanLRH 1
  • Add tips/watch out section

    Add tips/watch out section

    Brief Description

    I would like to propose a section with various of tips like:

    • warn if using iterrows
    • use resample for group by on timestamp

    can use dovpanda once it get stabelized

    enhancement 
    opened by eyaltrabelsi 0
Owner
Eyal Trabelsi
Enthusiastic Software Engineer 👷 with big passion for Python, ML and Performance Optimisations🐍🤖🦸🏼.
Eyal Trabelsi
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 1, 2023
sqldf for pandas

pandasql pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R. pandasql seeks to provide a more familiar

yhat 1.2k Jan 9, 2023
Pandas Google BigQuery

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Python for Data 348 Jan 3, 2023
Koalas: pandas API on Apache Spark

pandas API on Apache Spark Explore Koalas docs » Live notebook · Issues · Mailing list Help Thirsty Koalas Devastated by Recent Fires The Koalas proje

Databricks 3.2k Jan 4, 2023
Modin: Speed up your Pandas workflows by changing a single line of code

Scale your pandas workflows by changing one line of code To use Modin, replace the pandas import: # import pandas as pd import modin.pandas as pd Inst

null 8.2k Jan 1, 2023
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

swifter A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner. Blog posts Release 1.0.0 Fir

Jason Carpenter 2.2k Jan 4, 2023
The easy way to write your own flavor of Pandas

Pandas Flavor The easy way to write your own flavor of Pandas Pandas 0.23 added a (simple) API for registering accessors with Pandas objects. Pandas-f

Zachary Sailer 260 Jan 1, 2023
An Open-Source Discord bot created to provide basic functionality which should be in every discord guild. We use this same bot with additional configurations for our guilds.

A Discord bot completely written to be taken from the source and built according to your own custom needs. This bot supports some core features and is

Tesseract Coding 14 Jan 11, 2022
The goal of this program was to find the most common color in my living room.

The goal of this program was to find the most common color in my living room. I found a dataset online with colors names and their corr

null 1 Nov 9, 2021
Visions provides an extensible suite of tools to support common data analysis operations

Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis

null 168 Dec 28, 2022
Additional useful operations for Python

Pyteal Extensions Additional useful operations for Python Available Operations MulDiv64: calculate m1*m2/d with no overflow on multiplication (TEAL 3+

Ulam Labs 11 Dec 14, 2022
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
Feedback is important: response-aware feedback mechanism for background based conversation

RFM The code for the paper: "Feedback is important: response-aware feedback mechanism for background based conversation." Requirements python 3.7 pyto

Jiatao Chen 2 Sep 29, 2022
A simple log parser and summariser for IIS web server logs

IISLogFileParser A basic parser tool for IIS Logs which summarises findings from the log file. Inspired by the Gist https://gist.github.com/wh13371/e7

null 2 Mar 26, 2022
We provide useful util functions. When adding a util function, please add a description of the util function.

Utils Collection Motivation When we implement codes, we often search for util functions that are already implemented. Here, we are going to share util

null 6 Sep 9, 2021
A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

imutils A series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displ

Adrian Rosebrock 4.3k Jan 8, 2023
A step-by-step tutorial for how to work with some of the most basic features of Nav2 using a Jupyter Notebook in a warehouse environment to create a basic application.

This project has a step-by-step tutorial for how to work with some of the most basic features of Nav2 using a Jupyter Notebook in a warehouse environment to create a basic application.

Steve Macenski 49 Dec 22, 2022
A simple CLI to convert snapshots into EAVT log, and EAVT log into SCD.

EAVT helper CLI Simple CLI to convert snapshots into eavt log, and eavt log into slowly changing dimensions Usage Installation Snapshot to EAVT log EA

null 2 Apr 7, 2022
Greppin' Logs: Leveling Up Log Analysis

This repo contains sample code and example datasets from Jon Stewart and Noah Rubin's presentation at the 2021 SANS DFIR Summit titled Greppin' Logs. The talk was centered around the idea that Forensics is Data Engineering and Data Science, and should be approached as such. Jon and Noah focused on the core (Unix) command line tools useful to anyone analyzing datasets from a terminal, purpose-built tools for handling structured tabular and JSON data, Stroz Friedberg's open source multipattern search tool Lightgrep, and scaling with AWS.

Stroz Friedberg 20 Sep 14, 2022