Pipetools enables function composition similar to using Unix pipes.

Overview

Pipetools

tests-badge coverage-badge pypi-badge

Complete documentation

pipetools enables function composition similar to using Unix pipes.

It allows forward-composition and piping of arbitrary functions - no need to decorate them or do anything extra.

It also packs a bunch of utils that make common operations more convenient and readable.

Source is on github.

Why?

Piping and function composition are some of the most natural operations there are for plenty of programming tasks. Yet Python doesn't have a built-in way of performing them. That forces you to either deep nesting of function calls or adding extra glue code.

Example

Say you want to create a list of python files in a given directory, ordered by filename length, as a string, each file on one line and also with line numbers:

>>> print(pyfiles_by_length('../pipetools'))
1. ds_builder.py
2. __init__.py
3. compat.py
4. utils.py
5. main.py

All the ingredients are already there, you just have to glue them together. You might write it like this:

def pyfiles_by_length(directory):
    all_files = os.listdir(directory)
    py_files = [f for f in all_files if f.endswith('.py')]
    sorted_files = sorted(py_files, key=len, reverse=True)
    numbered = enumerate(py_files, 1)
    rows = ("{0}. {1}".format(i, f) for i, f in numbered)
    return '\n'.join(rows)

Or perhaps like this:

def pyfiles_by_length(directory):
    return '\n'.join('{0}. {1}'.format(*x) for x in enumerate(reversed(sorted(
        [f for f in os.listdir(directory) if f.endswith('.py')], key=len)), 1))

Or, if you're a mad scientist, you would probably do it like this:

pyfiles_by_length = lambda d: (reduce('{0}\n{1}'.format,
    map(lambda x: '%d. %s' % x, enumerate(reversed(sorted(
        filter(lambda f: f.endswith('.py'), os.listdir(d)), key=len))))))

But there should be one -- and preferably only one -- obvious way to do it.

So which one is it? Well, to redeem the situation, pipetools give you yet another possibility!

pyfiles_by_length = (pipe
    | os.listdir
    | where(X.endswith('.py'))
    | sort_by(len).descending
    | (enumerate, X, 1)
    | foreach("{0}. {1}")
    | '\n'.join)

Why would I do that, you ask? Comparing to the native Python code, it's

  • Easier to read -- minimal extra clutter
  • Easier to understand -- one-way data flow from one step to the next, nothing else to keep track of
  • Easier to change -- want more processing? just add a step to the pipeline
  • Removes some bug opportunities -- did you spot the bug in the first example?

Of course it won't solve all your problems, but a great deal of code can be expressed as a pipeline, giving you the above benefits. Read on to see how it works!

Installation

$ pip install pipetools

Uh, what's that?

Usage

The pipe

The pipe object can be used to pipe functions together to form new functions, and it works like this:

from pipetools import pipe

f = pipe | a | b | c

# is the same as:
def f(x):
    return c(b(a(x)))

A real example, sum of odd numbers from 0 to x:

from functools import partial
from pipetools import pipe

odd_sum = pipe | range | partial(filter, lambda x: x % 2) | sum

odd_sum(10)  # -> 25

Note that the chain up to the sum is lazy.

Automatic partial application in the pipe

As partial application is often useful when piping things together, it is done automatically when the pipe encounters a tuple, so this produces the same result as the previous example:

odd_sum = pipe | range | (filter, lambda x: x % 2) | sum

As of 0.1.9, this is even more powerful, see X-partial.

Built-in tools

Pipetools contain a set of pipe-utils that solve some common tasks. For example there is a shortcut for the filter class from our example, called where():

from pipetools import pipe, where

odd_sum = pipe | range | where(lambda x: x % 2) | sum

Well that might be a bit more readable, but not really a huge improvement, but wait!

If a pipe-util is used as first or second item in the pipe (which happens quite often) the pipe at the beginning can be omitted:

odd_sum = range | where(lambda x: x % 2) | sum

See pipe-utils' documentation.

OK, but what about the ugly lambda?

where(), but also foreach(), sort_by() and other pipe-utils can be quite useful, but require a function as an argument, which can either be a named function -- which is OK if it does something complicated -- but often it's something simple, so it's appropriate to use a lambda. Except Python's lambdas are quite verbose for simple tasks and the code gets cluttered...

X object to the rescue!

from pipetools import where, X

odd_sum = range | where(X % 2) | sum

How 'bout that.

Read more about the X object and it's limitations.

Automatic string formatting

Since it doesn't make sense to compose functions with strings, when a pipe (or a pipe-util) encounters a string, it attempts to use it for (advanced) formatting:

>>> countdown = pipe | (range, 1) | reversed | foreach('{}...') | ' '.join | '{} boom'
>>> countdown(5)
'4... 3... 2... 1... boom'

Feeding the pipe

Sometimes it's useful to create a one-off pipe and immediately run some input through it. And since this is somewhat awkward (and not very readable, especially when the pipe spans multiple lines):

result = (pipe | foo | bar | boo)(some_input)

It can also be done using the > operator:

result = some_input > pipe | foo | bar | boo

Note

Note that the above method of input won't work if the input object defines __gt__ for any object - including the pipe. This can be the case for example with some objects from math libraries such as NumPy. If you experience strange results try falling back to the standard way of passing input into a pipe.

But wait, there is more

Checkout the Maybe pipe, partial application on steroids or automatic data structure creation in the full documentation.

Comments
  • New Feature: Add the ability to pipe args and kwargs.

    New Feature: Add the ability to pipe args and kwargs.

    PR 'Add the ability to pipe args and kwargs.' -> #23

    I added the functionality to pipe *args and **kwargs to a function. And now you don't need to pipe a tuple with the first argument as a function and the second argument as a parameter Now you can pass *args and **kwargs to a function using pipe '|'. Now prepare_function_for_pipe knows how to handle keyword-only arguments. And or knows how to handle next_func as *args and **kwargs to self.func.

        # Automatic partial with *args
        range_args: tuple[int, int, int] = (1, 20, 2)
        # Using pipe
        my_range: Callable = pipe | range | range_args
        # Using tuple
        my_range: Callable = pipe | (range, range_args)
        # list(my_range()) == [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
    
        # Automatic partial with **kwargs
        dataclass_kwargs: dict[str, bool] = {'frozen': True, 'kw_only': True, 'slots': True}
        # Using pipe
        my_dataclass: Callable = pipe | dataclass | dataclass_kwargs
        # Using tuple
        my_dataclass: Callable = pipe | (dataclass, dataclass_kwargs)
        @my_dataclass
        class Bla:
            foo: int
            bar: str
    
        # Bla(5, 'bbb') -> Raises TypeError: takes 1 positional argument but 3 were given
        # Bla(foo=5, bar='bbb').foo == 5
    
    opened by tallerasaf 6
  • added reflected version of magic methods

    added reflected version of magic methods

    This patch added support for:

    2 ** (64 - X)   # __rpow__, __rsub__, and all other __rFOOBAR__ methods
    X >> 8          # __rshift__ and __rrshift__ (There were already X > 8 and X < 8, so I guess X >> 8 does not indicate a pipe redirection either)
    1 << (64 - X)   # __lshift__ and __rlshift__
    X @ np.eye(10)  # matrix multiplication since Python 3.5
    X & 1
    X ^ 1
    +X              # I wonder if anyone would ever use this unary operator...
    

    This patch also fixed a tiny repr problem:

    >>> X // 42
    X / 42          # should have been X // 42, even in Python 2.7
    

    This patch did not add support for:

    index in X      # the opposite of X._in_(index); __contains__ did not work and always returned True?
    len(X)          # it would be confusing and inconsistent if built-in functions worked, but custom functions did not...
    

    Magic method references: Python 2, Python 3

    opened by Arnie97 5
  • Streams

    Streams

    Hi everyone, Just wanted to throw an idea out here - would pipetools be a good fit for stream-based/reactive programming? Specifically what I'm thinking of is piping streams of data together, rather than piping functions, where it's not necessarily the case that a stream will a emit a datum every time it consumes one, in the way that a piped function will always pass its result to the next function in the sequence.

    Would it be enough to override the "compose" method of the Pipe class to adapt to such a use-case?

    opened by RedKhan 4
  • fix UnicodeDecodeError during installation when trying to read README.rst

    fix UnicodeDecodeError during installation when trying to read README.rst

    issue

    I encounter an UnicodeDecodeError when trying to install pipetools on windows 7 (pip 10, python 3.6.5). Figure out that this is caused by file encoding issue when trying to open README.rst sth. like

    UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205: illegal multibyte sequence
    

    what this fix does

    Use io.open for python 2/3 compatibility, set the encoding paramater of io.open to utf-8 fixs the issue.

    opened by mpwang 3
  • Using `x > pipe | foo | bar` gives different results than `(pipe | foo | bar)(x)`

    Using `x > pipe | foo | bar` gives different results than `(pipe | foo | bar)(x)`

    Not totally sure about anything here (this is my first time leaving a bug) but it seems like I've found a situation where these two seemingly-the-same statements do different things. I've included a minimal example:

    import numpy as np
    from pipetools import pipe
    
    def scaleDiagonal(matrix):
    
    	m = np.copy(matrix)
    	size = matrix.shape[0]
    
    	smallest = min([ m[i,i] for i in range(size) ])
    
    	for i in range(size):
    		m[i,i] = m[i,i] / smallest
    
    	return m
    
    x = np.array([1,2,3,4,5,6,7,8,9]).reshape(3,3)
    
    y = (pipe | scaleDiagonal)(x)         # Works
    z = x > pipe | scaleDiagonal          # Errors
    
    opened by pickledish 3
  • Add support Python 3.9

    Add support Python 3.9

    According alert from DeprecationWarning we have two issues:

    pipetools/main.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
        from collections import Iterable
    

    and

    pipetools/utils.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
        from collections import Mapping
    

    Docs: https://docs.pytest.org/en/stable/warnings.html

    opened by CrazyLionHeart 2
  • >> instead of > for input

    >> instead of > for input

    I don't know if it's possible to change > to >> for the input method? IMO >> is less used, especially in numerical / science-related code. I looked around the source code but didn't find where to make the change (I don't see __gt__ method in Pipe class).

    Thank you.

    opened by goingtosleep 1
  • Does pipetools support Pandas Dataframes?

    Does pipetools support Pandas Dataframes?

    I am trying to run this example

    additives_df.loc[:, ["pallets"]] > pipe | (lambda df: df.head())
    

    but i get the exceptions

    TypeError: '>' not supported between instances of 'float' and 'function'
    

    Is there a proper way to pipe pandas dataframes

    opened by OdinTech3 1
  • Provide support for Python 3

    Provide support for Python 3

    Works not included in this PR but you'd probably want:

    • [x] Run publish_docs.py to update README.rst
    • [x] Bump the version number and update changelog.rst
    • [x] Upload to PyPI
    opened by Arnie97 0
  • Add the ability to pipe args and kwargs.

    Add the ability to pipe args and kwargs.

    Signed-off-by: Asaf Taller [email protected] https://github.com/0101/pipetools/issues/24

    I added the functionality to pipe *args and **kwargs to a function. And now you don't need to pipe a tuple with the first argument as a function and the second argument as a parameter Now you can pass *args and **kwargs to a function using pipe '|'. Now prepare_function_for_pipe knows how to handle keyword-only arguments. And or knows how to handle next_func as *args and **kwargs to self.func.

        # Automatic partial with *args
        range_args: tuple[int, int, int] = (1, 20, 2)
        # Using pipe
        my_range: Callable = pipe | range | range_args
        # Using tuple
        my_range: Callable = pipe | (range, range_args)
        # list(my_range()) == [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
    
        # Automatic partial with **kwargs
        dataclass_kwargs: dict[str, bool] = {'frozen': True, 'kw_only': True, 'slots': True}
        # Using pipe
        my_dataclass: Callable = pipe | dataclass | dataclass_kwargs
        # Using tuple
        my_dataclass: Callable = pipe | (dataclass, dataclass_kwargs)
        @my_dataclass
        class Bla:
            foo: int
            bar: str
    
        # Bla(5, 'bbb') -> Raises TypeError: takes 1 positional argument but 3 were given
        # Bla(foo=5, bar='bbb').foo == 5
    

    Let me know if you have any questions about this PR. @0101

    opened by tallerasaf 0
  • foreach_i

    foreach_i

    I'd like to propose a new util: foreach_i.

    Motivation

    You know how JS's Array.map also passes the element index as a 2nd parameter to the function?

    > ['a', 'b', 'c'].map((x, i) => `Element ${i} is ${x}`)
    [ 'Element 0 is a', 'Element 1 is b', 'Element 2 is c' ]
    

    That's exactly what I'm trying to do here. The only difference is that the index would be passed as the 1st param:

    def test_foreach_i():
        
        r = ['a', 'b', 'c'] > (pipe
                              | foreach_i(lambda i, x: f'Element {i} is {x}')
                              | list
                              )
        
        assert r == [ 'Element 0 is a'
                    , 'Element 1 is b'
                    , 'Element 2 is c'    
                    ]
    

    (Naïve) Implementation

    from pipetools.utils import foreach, as_args
    from typing import Callable, TypeVar
    
    
    A = TypeVar('A')
    B = TypeVar('B')
    
    def foreach_i(f: Callable[[int, A], B]):
        
        return enumerate | foreach(as_args(f))
    

    The same could be done for foreach_do.

    opened by tfga 6
  • Docstrings

    Docstrings

    What is a good practice or recommendation for writing docstrings that are simple and clean to write and work well with documentation tools.

    Related Stackoverflow quesion

    opened by hamzamohdzubair 2
  • Enhancement possibility? -->  Pipe cache

    Enhancement possibility? --> Pipe cache

    There is some overhead to create pipes. For some use cases it may be advantageous to cache pipes or even partial pipes. Would it be possible to cache pipes automatically? ... or by some switch, etc.?

    Here you can see the "penalty" associated with creating pipes.

    $ ipython3
    Python 3.7.5 (default, Oct 17 2019, 12:21:00) 
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.
    
    In [1]: from pipetools import pipe,X,foreach
    
    In [2]: def my_func(count=10000, predef = False):
       ...:     if predef == False:
       ...:         for k in range(count):
       ...:             a = range(10) > pipe | foreach(X**2) | sum
       ...:     else:
       ...:         my_pipe = pipe | foreach(X**2) | sum
       ...:         for k in range(count):
       ...:             a = range(10) > my_pipe
       ...:     return a
       ...: 
    
    In [3]: %timeit my_func()
    202 ms ± 8.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [4]: %timeit my_func(predef=True)
    59.5 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    In [5]: %timeit for k in range(10000): a=sum([x**2 for x in range(10)])
    29.9 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    opened by rickhg12hs 3
  • `join` util

    `join` util

    I'd like to propose a new join util.

    The idea is simple: it calls foreach(str) before calling str.join. So, whenever you have something like

    | foreach(str)
    | ', '.join
    

    you can replace it with

    | join(', ')
    

    You can customize the string conversion by passing either a function or a format string as the second parameter, e.g.:

    | join(', ', lambda x: '-{}-'.format(x))        # function
    
    | join(', ', '-{}-')                            # fmt string
    

    Implementation

    def join(delim, formatter=str):
        '''
        join(' ')
        join(' ', fmtFn)
        join(' ', fmtString)
        '''
        
        return foreach(formatter) | delim.join
    

    Tests

    def test_join(self):
        
        r = [1, 2, 3] > (pipe
                        | join(', ')
                        )
        
        self.assertEquals(r, '1, 2, 3')
        
        
    def test_join_with_formatter(self):
        
        r = [1, 2, 3] > (pipe
                        | join(', ', lambda x: '-{}-'.format(x))
                        )
        
        self.assertEquals(r, '-1-, -2-, -3-')
        
        
    def test_join_with_fmtString(self):
        
        r = [1, 2, 3] > (pipe
                        | join(', ', '-{}-')
                        )
        
        self.assertEquals(r, '-1-, -2-, -3-')
    
    opened by tfga 4
WAL enables programmable waveform analysis.

This repro introcudes the Waveform Analysis Language (WAL). The initial paper on WAL will appear at ASPDAC'22 and can be downloaded here: https://www.

Institute for Complex Systems (ICS), Johannes Kepler University Linz 40 Dec 13, 2022
A Numba-based two-point correlation function calculator using a grid decomposition

A Numba-based two-point correlation function (2PCF) calculator using a grid decomposition. Like Corrfunc, but written in Numba, with simplicity and hackability in mind.

Lehman Garrison 3 Aug 24, 2022
An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

Mourad 450 Dec 30, 2022
A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

Michael Milton 1 Oct 26, 2021
ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

ForecastGA is a tool that combines a couple of popular libraries, Atspy and googleanalytics, with a few enhancements.

JR Oakes 36 Jan 3, 2023
Using approximate bayesian posteriors in deep nets for active learning

Bayesian Active Learning (BaaL) BaaL is an active learning library developed at ElementAI. This repository contains techniques and reusable components

ElementAI 687 Dec 25, 2022
Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

Python for Data 866 Dec 16, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Coiled 102 Nov 10, 2022
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

Marc Skov Madsen 97 Dec 8, 2022
DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in clustering.

Amazon Web Services - Labs 53 Dec 8, 2022
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

null 5 Sep 6, 2021
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Damien Farrell 81 Dec 26, 2022
A data analysis using python and pandas to showcase trends in school performance.

A data analysis using python and pandas to showcase trends in school performance. A data analysis to showcase trends in school performance using Panda

Jimmy Faccioli 0 Sep 7, 2021
A collection of learning outcomes data analysis using Python and SQL, from DQLab.

Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu

null 6 Oct 11, 2022
We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

Jerry Lee 3 Oct 17, 2021
Active Learning demo using two small datasets

ActiveLearningDemo How to run step one put the dataset folder and use command below to split the dataset to the required structure run utils.py For ea

null 3 Nov 10, 2021
Stock Analysis dashboard Using Streamlit and Python

StDashApp Stock Analysis Dashboard Using Streamlit and Python If you found the content useful and want to support my work, you can buy me a coffee! Th

StreamAlpha 27 Dec 9, 2022
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

null 2 Nov 20, 2021
Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python This project is a good starting point for those who have little

Himanshu Kumar singh 2 Dec 4, 2021