The easy way to write your own flavor of Pandas

Overview

Pandas Flavor

The easy way to write your own flavor of Pandas

Pandas 0.23 added a (simple) API for registering accessors with Pandas objects.

Pandas-flavor extends Pandas' extension API by:

  1. adding support for registering methods as well.
  2. making each of these functions backwards compatible with older versions of Pandas.

What does this mean?

It is now simpler to add custom functionality to Pandas DataFrames and Series.

Import this package. Write a simple python function. Register the function using one of the following decorators.

Why?

Pandas is super handy. Its general purpose is to be a "flexible and powerful data analysis/manipulation library".

Pandas Flavor allows you add functionality that tailors Pandas to specific fields or use cases.

Maybe you want to add new write methods to the Pandas DataFrame? Maybe you want custom plot functionality? Maybe something else?

Register accessors

Accessors (in pandas) are objects attached to a attribute on the Pandas DataFrame/Series that provide extra, specific functionality. For example, pandas.DataFrame.plot is an accessor that provides plotting functionality.

Add an accessor by registering the function with the following decorator and passing the decorator an accessor name.

# my_flavor.py

import pandas as pd
import pandas_flavor as pf

@pf.register_dataframe_accessor('my_flavor')
class MyFlavor(object):

  def __init__(self, data):
    self._data

    def row_by_value(self, col, value):
        """Slice out row from DataFrame by a value."""
        return self._data[self._data[col] == value].squeeze()

Every dataframe now has this accessor as an attribute.

import my_flavor

# DataFrame.
df = pd.DataFrame(data={
  "x": [10, 20, 25],
  "y": [0, 2, 5]
})

# Print DataFrame
print(df)

# x  y
# 0  10  0
# 1  20  2
# 2  25  5

# Access this functionality
df.my_flavor.row_by_value('x', 10)

# x    10
# y     0
# Name: 0, dtype: int64

To see this in action, check out pdvega, PhyloPandas, and pyjanitor!

Register methods

Using this package, you can attach functions directly to Pandas objects. No intermediate accessor is needed.

# my_flavor.py

import pandas as pd
import pandas_flavor as pf

@pf.register_dataframe_method
def row_by_value(df, col, value):
    """Slice out row from DataFrame by a value."""
    return df[df[col] == value].squeeze()
import my_flavor

# DataFrame.
df = DataFrame(data={
  "x": [10, 20, 25],
  "y": [0, 2, 5]
})

# Print DataFrame
print(df)

# x  y
# 0  10  0
# 1  20  2
# 2  25  5

# Access this functionality
df.row_by_value('x', 10)

# x    10
# y     0
# Name: 0, dtype: int64

Available Methods

  • register_dataframe_method: register a method directly with a pandas DataFrame.
  • register_dataframe_accessor: register an accessor (and it's methods) with a pandas DataFrame.
  • register_series_method: register a methods directly with a pandas Series.
  • register_series_accessor: register an accessor (and it's methods) with a pandas Series.

Installation

You can install using pip:

pip install pandas_flavor

or conda (thanks @ericmjl)!

conda install -c conda-forge pandas-flavor

Contributing

Pull requests are always welcome! If you find a bug, don't hestitate to open an issue or submit a PR. If you're not sure how to do that, check out this simple guide.

If you have a feature request, please open an issue or submit a PR!

TL;DR

Pandas 0.23 introduced a simpler API for extending Pandas. This API provided two key decorators, register_dataframe_accessor and register_series_accessor, that enable users to register accessors with Pandas DataFrames and Series.

Pandas Flavor originated as a library to backport these decorators to older versions of Pandas (<0.23). While doing the backporting, it became clear that registering methods directly to Pandas objects might be a desired feature as well.*

*It is likely that Pandas deliberately chose not implement to this feature. If everyone starts monkeypatching DataFrames with their custom methods, it could lead to confusion in the Pandas community. The preferred Pandas approach is to namespace your methods by registering an accessor that contains your custom methods.

So how does method registration work?

When you register a method, Pandas flavor actually creates and registers a (this is subtle, but important) custom accessor class that mimics the behavior of a method by:

  1. inheriting the docstring of your function
  2. overriding the __call__ method to call your function.
Comments
  • different scikit-bio dependencies

    different scikit-bio dependencies

    Hi, I ran into the issue:

    phylotoast 1.4.0rc2 has requirement scikit-bio<=0.4.2, but you'll have scikit-bio 0.5.1 which is incompatible. gneiss 0.4.2 has requirement scikit-bio==0.5.1, but you'll have scikit-bio 0.4.2 which is incompatible.

    Either way I cant install both versions, because pip overwrites the other. Even having an environment would not solve that I guess. How can you deal with that?

    Cheers, Robert

    opened by AmbrosiaFungus 5
  • How to use

    How to use

    Hey @Zsailer, great to meet you at SciPy 2018!

    I think pandas_flavor is what I'd like to switch over to in pyjanitor, where I simply register functions as a pandas accessor rather than subclass the entire dataframe outright.

    There is something a bit magical about how pandas_flavor works though. With subclassing, everything is quite transparent - I subclass pandas DataFrames, then have the users wrap their existing dataframe inside a Janitor dataframe, following which, all of the data cleaning methods are available:

    import pandas as pd
    import janitor as jn
    
    df = pd.DataFrame(...)
    df = jn.DataFrame(df).clean_names()...
    

    Say I decorated the Janitor functions as pandas accessors. How would things look like for an end-user? Would it be like the following?

    import pandas as pd
    
    df = pd.DataFrame(...).clean_names().remove_empty()...
    

    I guess I'm just wondering, where and when does a decorated function get exposed up to pandas?

    Thanks again for putting this out!

    opened by ericmjl 4
  • New release

    New release

    Hey @Zsailer, please let me know if you'd like help doing a release to PyPI -- I've got tests failing on the latest pyjanitor PR (here) but only because I'm relying on the latest code in PR #4!

    opened by ericmjl 2
  • support xarray

    support xarray

    Support xarray in the same way done in pyjanitor PR

    Example of a small run for validating Regression + New stuff:

    import pandas as pd

    import pandas_flavor as pf

    @pf.register_dataframe_method @pf.register_xarray_dataarray_method @pf.register_xarray_dataset_method def print_df(df): print(df) return df

    df = pd.DataFrame(data={"x": [10, 20, 25], "y": [0, 2, 5]}) df.print_df() df.to_xarray().print_df()

    opened by eyaltra 1
  • Add xarray registrations

    Add xarray registrations

    Support xarray in the same way done in pyjanitor PR

    Example of a small run for validating Regression + New stuff:

    import pandas as pd

    import pandas_flavor as pf

    @pf.register_dataframe_method @pf.register_xarray_dataarray_method @pf.register_xarray_dataset_method def print_df(df): print(df) return df

    df = pd.DataFrame(data={"x": [10, 20, 25], "y": [0, 2, 5]}) df.print_df() df.to_xarray().print_df()

    opened by eyaltrabelsi 1
  • Add

    Add "return method"

    @Zsailer let me know if you'd like more changes, happy to make them.

    I've also allowed edits from maintainers so you can make quickie changes without needing to wait for me.

    Closes #3.

    opened by ericmjl 1
  • Minimum pandas version; infra; lazy loading

    Minimum pandas version; infra; lazy loading

    Major changes here:

    1. Remove code that supported older versions of pandas.
    2. Add dev container for easy development.
    3. Use lazy loading to improve import times in downstream packages that depend on pandas-flavor (e.g. pyjanitor).
    4. Add minimal test suite to check for breaking changes.
    5. Add code quality checks via code linters.
    6. Add docstrings to satisfy code linters.
    7. Add a GitHub action to automatically run tests on pull requests.
    opened by ericmjl 0
  • Documentation: Making a

    Documentation: Making a "flavor" python package

    Reflecting back on this #3, it's not clear from the README how to make a pandas "flavor" that's importable.

    The README shows how to register functions to DataFrames, but I think it needs documentation on how to write these functions in a module/package that you can import.

    Something like:

    1. Add "flavoring" functions in a module or package file my_flavor.py
      import pandas_flavor as pf
      
      @pf.register_series_method
      @pf.register_dataframe_method
      def cool_func(df, arg):
          print(arg)
          return df 
      
    2. Import Pandas and that "flavor" module/package in a Python session.
      import pandas as pd
      import my_flavor
      
      df = pd.DataFrame(...).cool_func()
      
    opened by Zsailer 0
  • lazy-loader breaks compatibility with PyInstaller

    lazy-loader breaks compatibility with PyInstaller

    The new lazy submodule loader breaks the dependency analysis of PyInstaller. Normally, we'd manually patch this up on PyInstaller's side but, given that pandas_flavours only has two submodules (minus the one which holds the package's version), one imports only pandas and functools, the other imports the first and functools and neither of them do any processing at all on initialisation, what actually is the point in using lazy loading here anyway? Asides from breaking PyInstaller's analysis (and just looking ugly), it screws with IDE completion and static analysis tools.

    opened by bwoodsend 0
  • EHN: Cut duplicate codes via the factory design pattern

    EHN: Cut duplicate codes via the factory design pattern

    I saw there have a lot of duplicate codes. And they both have the same structure. Maybe we could extract the same codes via the factory design pattern?

    Duplicate codes:

    • https://github.com/Zsailer/pandas_flavor/blob/f9308140d559ef1cb80587dedf6a7e32ca1f0b67/pandas_flavor/register.py#L19-L26
    • https://github.com/Zsailer/pandas_flavor/blob/f9308140d559ef1cb80587dedf6a7e32ca1f0b67/pandas_flavor/register.py#L38-L47
    • https://github.com/Zsailer/pandas_flavor/blob/f9308140d559ef1cb80587dedf6a7e32ca1f0b67/pandas_flavor/xarray.py#L13-L21

    The prototype of this idea. It could work for pandas-like object, such as pandas.DataFrame, pandas.Series, pandas.Index, geopandas.GeoDataFrame, and geopandas.GeoSeries.

    def register_method_factory(register_accessor):
        @wraps(register_accessor)
        def decorator(method):
            def method_accessor(pd_obj):
                @wraps(method)
                def wrapper(*args, **kwargs):
                    return method(pd_obj, *args, **kwargs)
    
                return wrapper
    
            # Register method as pandas object inner method.
            register_accessor(method.__name__)(method_accessor)
    
            # Must return method itself, otherwise would get None.
            return method
    
        return decorator
    
    
    # or register_dataframe_method = register_method_factory(register_dataframe_accessor)
    @register_method_factory
    def register_dataframe_method(method):
        """Docstring"""
    
        return register_dataframe_accessor(method)
    
    
    @register_method_factory
    def register_dataarray_method(method):
        """Docstring"""
    
        return register_dataarray_accessor(method)
    
    opened by Zeroto521 3
  • lazy-loader causes ERROR when installing version 0.3.0

    lazy-loader causes ERROR when installing version 0.3.0

    When I run pip install pandas-flavor==0.3.0, I get the following error

    ERROR: Could not find a version that satisfies the requirement lazy-loader==0.1rc2 (from pandas-flavor)

    opened by solita-vilmaja 0
  • How can I distinguish between inplace and copy operations?

    How can I distinguish between inplace and copy operations?

    Problem description: It's difficult to determine how/when to use inplace or create a copy and the behavior is inconsistent. Does pandas_flavor always require a copy/modifying the original dataframe? Is there a way to avoid this in order to save memory?

    import pandas_flavor
    
    # Does not replace original dataframe
    @pandas_flavor.register_dataframe_method
    def drop_empty_rows(dataframe):
        return dataframe.dropna(axis=0, how='all')
     
    # Should replace original dataframe
    @pandas_flavor.register_dataframe_method
    def drop_empty_rows(dataframe):
        dataframe = dataframe.dropna(axis=0, how='all')
        return dataframe
    
    # Should replace original dataframe
    @pandas_flavor.register_dataframe_method
    def drop_empty_rows(dataframe):
        dataframe_processed = dataframe.copy()
        dataframe_processed = dataframe.dropna(axis=0, how='all')
        return dataframe_processed
    

    If I call the first function on a dataframe, it returns the dataframe with dropped rows but does not change the original dataframe.

    dict_rows = {}
    dict_rows['A']  = [20,numpy.nan,40,10,50]
    dict_rows['B'] = [50,numpy.nan,10,40,50]
    dict_rows['C'] = [30,numpy.nan,50,40,50]
    
    dataframe = pandas.DataFrame(dict_rows)
    

    This function returns the reduced dataframe, but doesn't affect the original dataframe.

    >>> dataframe.drop_empty_rows()
          A     B     C
    0  20.0  50.0  30.0
    2  40.0  10.0  50.0
    3  10.0  40.0  40.0
    4  50.0  50.0  50.0
    
    opened by DOH-Manada 0
  • Autocomplete in IDE

    Autocomplete in IDE

    First of all thank you for making pandas_flavor. It is very useful.

    One issue: Is it possible to make it so that registering the method also enables tab/autocomplete for that method in some common IDEs (spyder, pycharm, vs)?

    Relevant link: https://intellij-support.jetbrains.com/hc/en-us/community/posts/115000665110-auto-completion-for-dynamic-module-attributes-in-python

    Registering the method in the dir may also work?

    Again, thank you for pandas_flavour.

    opened by hmelberg 4
Releases(0.1.2)
Owner
Zachary Sailer
@jupyter core developer, @jupyter Distinguished Contributor
Zachary Sailer
Modin: Speed up your Pandas workflows by changing a single line of code

Scale your pandas workflows by changing one line of code To use Modin, replace the pandas import: # import pandas as pd import modin.pandas as pd Inst

null 8.2k Jan 1, 2023
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

null 10k Jan 1, 2023
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 1, 2023
sqldf for pandas

pandasql pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R. pandasql seeks to provide a more familiar

yhat 1.2k Jan 9, 2023
Pandas Google BigQuery

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Python for Data 348 Jan 3, 2023
Koalas: pandas API on Apache Spark

pandas API on Apache Spark Explore Koalas docs » Live notebook · Issues · Mailing list Help Thirsty Koalas Devastated by Recent Fires The Koalas proje

Databricks 3.2k Jan 4, 2023
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

swifter A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner. Blog posts Release 1.0.0 Fir

Jason Carpenter 2.2k Jan 4, 2023
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

Tugcan Olgun 141 Jan 6, 2023
adds flavor of interactive filtering to the traditional pipe concept of UNIX shell

percol __ ____ ___ ______________ / / / __ \/ _ \/ ___/ ___/ __ \/ / / /_/ / __/ / / /__/ /_/ / / / .__

Masafumi Oyamada 3.2k Jan 7, 2023
A weekly dive into commonly used modules in the Rust ecosystem, with story flavor!

The goal of this project is to bring the same concept as PyMOTW to the Rust world. PyMOTW was an invaluable resource for me when I was learning Python years ago, and I hope that I can help someone in a similar way. Each week we'll dive into a module and explore some of the functionality that we can find there while following along the adventures of some colourful characters.

Scott Lyons 20 Aug 26, 2022
A framework that allows people to write their own Rocket League bots.

YOU PROBABLY SHOULDN'T PULL THIS REPO Bot Makers Read This! If you just want to make a bot, you don't need to be here. Instead, start with one of thes

null 543 Dec 20, 2022
Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

For SwapNet Create a list.txt file containing all the images to process. This can be done with the GNU find command: find path/to/input/folder -name '

Andrew Jong 2 Nov 10, 2021
The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs

pandas-log The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common funct

Eyal Trabelsi 206 Dec 13, 2022
PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline from a configuration file.

PdpCLI Quick Links Introduction Installation Tutorial Basic Usage Data Reader / Writer Plugins Introduction PdpCLI is a pandas DataFrame processing CL

Yasuhiro Yamaguchi 15 Jan 7, 2022
Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

Francis 5 May 14, 2022
A way to write regex with objects instead of strings.

Py Idiomatic Regex (AKA iregex) Documentation Available Here An easier way to write regex in Python using OOP instead of strings. Makes the code much

Ryan Peach 18 Nov 15, 2021
A simple way to read and write LAPS passwords from linux.

A simple way to read and write LAPS passwords from linux. This script is a python setter/getter for property ms-Mcs-AdmPwd used by LAPS inspired by @s

Podalirius 36 Dec 9, 2022
ianZiPu is a way to write notation for Guqin (古琴) music.

JianZiPu Font JianZiPu is a way to write notation for Guqin (古琴) music. This document will cover how to use this font, and how to contribute to its de

Nancy Yi Liang 8 Nov 25, 2022
A way to write regex with objects instead of strings.

Py Idiomatic Regex (AKA iregex) Documentation Available Here An easier way to write regex in Python using OOP instead of strings. Makes the code much

Ryan Peach 18 Nov 15, 2021