Bottleneck a collection of fast, NaN-aware NumPy array functions written in C.

Overview
https://travis-ci.org/pydata/bottleneck.svg?branch=master

Bottleneck

Bottleneck is a collection of fast, NaN-aware NumPy array functions written in C.

As one example, to check if a np.array has any NaNs using numpy, one must call np.any(np.isnan(array)). The :meth:`bottleneck.anynan` function interleaves the :meth:`np.isnan` check with :meth:`np.any` pre-exit, enabling up to an O(N) speedup relative to numpy.

Bottleneck strives to be a drop-in accelerator for NumPy functions. When using the following libraries, Bottleneck support is automatically enabled and utilized:

Details on the performance benefits can be found in :ref:`benchmarking`

Example

Let's give it a try. Create a NumPy array:

>>> import numpy as np
>>> a = np.array([1, 2, np.nan, 4, 5])

Find the nanmean:

>>> import bottleneck as bn
>>> bn.nanmean(a)
3.0

Moving window mean:

>>> bn.move_mean(a, window=2, min_count=1)
array([ 1. ,  1.5,  2. ,  4. ,  4.5])

Benchmark

Bottleneck comes with a benchmark suite:

>>> bn.bench()
Bottleneck performance benchmark
    Bottleneck 1.3.0.dev0+122.gb1615d7; Numpy 1.16.4
    Speed is NumPy time divided by Bottleneck time
    NaN means approx one-fifth NaNs; float64 used

              no NaN     no NaN      NaN       no NaN      NaN
               (100,)  (1000,1000)(1000,1000)(1000,1000)(1000,1000)
               axis=0     axis=0     axis=0     axis=1     axis=1
nansum         29.7        1.4        1.6        2.0        2.1
nanmean        99.0        2.0        1.8        3.2        2.5
nanstd        145.6        1.8        1.8        2.7        2.5
nanvar        138.4        1.8        1.8        2.8        2.5
nanmin         27.6        0.5        1.7        0.7        2.4
nanmax         26.6        0.6        1.6        0.7        2.5
median        120.6        1.3        4.9        1.1        5.7
nanmedian     117.8        5.0        5.7        4.8        5.5
ss             13.2        1.2        1.3        1.5        1.5
nanargmin      66.8        5.5        4.8        3.5        7.1
nanargmax      57.6        2.9        5.1        2.5        5.3
anynan         10.2        0.3       52.3        0.8       41.6
allnan         15.1      196.0      156.3      135.8      111.2
rankdata       45.9        1.2        1.2        2.1        2.1
nanrankdata    50.5        1.4        1.3        2.4        2.3
partition       3.3        1.1        1.6        1.0        1.5
argpartition    3.4        1.2        1.5        1.1        1.6
replace         9.0        1.5        1.5        1.5        1.5
push         1565.6        5.9        7.0       13.0       10.9
move_sum     2159.3       31.1       83.6      186.9      182.5
move_mean    6264.3       66.2      111.9      361.1      246.5
move_std     8653.6       86.5      163.7      232.0      317.7
move_var     8856.0       96.3      171.6      267.9      332.9
move_min     1186.6       13.4       30.9       23.5       45.0
move_max     1188.0       14.6       29.9       23.5       46.0
move_argmin  2568.3       33.3       61.0       49.2       86.8
move_argmax  2475.8       30.9       58.6       45.0       82.8
move_median  2236.9      153.9      151.4      171.3      166.9
move_rank     847.1        1.2        1.4        2.3        2.6

You can also run a detailed benchmark for a single function using, for example, the command:

>>> bn.bench_detailed("move_median", fraction_nan=0.3)

Only arrays with data type (dtype) int32, int64, float32, and float64 are accelerated. All other dtypes result in calls to slower, unaccelerated functions. In the rare case of a byte-swapped input array (e.g. a big-endian array on a little-endian operating system) the function will not be accelerated regardless of dtype.

Where

download https://pypi.python.org/pypi/Bottleneck
docs https://bottleneck.readthedocs.io
code https://github.com/pydata/bottleneck
mailing list https://groups.google.com/group/bottle-neck

License

Bottleneck is distributed under a Simplified BSD license. See the LICENSE file and LICENSES directory for details.

Install

Requirements:

Bottleneck Python 3.6, 3.7, 3.8; NumPy 1.15.0+ (follows NEP 29)
Compile gcc, clang, MinGW or MSVC
Unit tests pytest, hypothesis
Documentation sphinx, numpydoc

Detailed installation instructions can be found at :ref:`installing`

To install Bottleneck on Linux, Mac OS X, et al.:

$ pip install .

To install bottleneck on Windows, first install MinGW and add it to your system path. Then install Bottleneck with the command:

python setup.py install --compiler=mingw32

Alternatively, you can use the Windows binaries created by Christoph Gohlke: http://www.lfd.uci.edu/~gohlke/pythonlibs/#bottleneck

Unit tests

To keep the install dependencies light, test dependencies are made available via a setuptools "extra":

$ pip install bottleneck[test]

Or, if working locally:

$ pip install .[test]

After you have installed Bottleneck, run the suite of unit tests:

In [1]: import bottleneck as bn

In [2]: bn.test()
============================= test session starts =============================
platform linux -- Python 3.7.4, pytest-4.3.1, py-1.8.0, pluggy-0.12.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/chris/code/bottleneck/.hypothesis/examples')
rootdir: /home/chris/code/bottleneck, inifile: setup.cfg
plugins: openfiles-0.3.2, remotedata-0.3.2, doctestplus-0.3.0, mock-1.10.4, forked-1.0.2, cov-2.7.1, hypothesis-4.32.2, xdist-1.26.1, arraydiff-0.3
collected 190 items

bottleneck/tests/input_modification_test.py ........................... [ 14%]
..                                                                      [ 15%]
bottleneck/tests/list_input_test.py .............................       [ 30%]
bottleneck/tests/move_test.py .................................         [ 47%]
bottleneck/tests/nonreduce_axis_test.py ....................            [ 58%]
bottleneck/tests/nonreduce_test.py ..........                           [ 63%]
bottleneck/tests/reduce_test.py ....................................... [ 84%]
............                                                            [ 90%]
bottleneck/tests/scalar_input_test.py ..................                [100%]

========================= 190 passed in 46.42 seconds =========================
Out[2]: True

If developing in the git repo, simply run py.test

Comments
  • 'ERROR: Could not build wheels for bottleneck which use PEP 517 and cannot be installed directly'

    'ERROR: Could not build wheels for bottleneck which use PEP 517 and cannot be installed directly'

    Hi, I'm installing Bottleneck by fastai lib in a Suse 12 sp4 server, but had the error when building wheels for it. It says that uses PEP517 and cannot be installed directly. Funny thing is that I had it installed on my (base) but when I move to my app env it just jams on the error. There is a way to solve it? Thanks in advance!

    (pycamlar) (base) [email protected]:~/pycamlar> pip install --no-cache-dir Bottleneck
    Collecting Bottleneck
      Downloading https://files.pythonhosted.org/packages/62/d0/55bbb49f4fade3497de2399af70ec0a06e432c786b8623c878b11e90d456/Bottleneck-1.3.1.tar.gz (88kB)
         |████████████████████████████████| 92kB 1.1MB/s
      Installing build dependencies ... done
      Getting requirements to build wheel ... done
        Preparing wheel metadata ... done
    Requirement already satisfied: numpy in ./pycamlar/lib/python3.7/site-packages (from Bottleneck) (1.17.4)
    Building wheels for collected packages: Bottleneck
      Building wheel for Bottleneck (PEP 517) ... error
      ERROR: Command errored out with exit status 1:
       command: /home/filholf/pycamlar/pycamlar/bin/python /home/filholf/pycamlar/pycamlar/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmpzdoiga7s
           cwd: /tmp/pip-install-gx8dza5k/Bottleneck
      Complete output (122 lines):
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.7
      creating build/lib.linux-x86_64-3.7/bottleneck
      copying bottleneck/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck
      copying bottleneck/_pytesttester.py -> build/lib.linux-x86_64-3.7/bottleneck
      copying bottleneck/_version.py -> build/lib.linux-x86_64-3.7/bottleneck
      creating build/lib.linux-x86_64-3.7/bottleneck/benchmark
      copying bottleneck/benchmark/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck/benchmark
      copying bottleneck/benchmark/autotimeit.py -> build/lib.linux-x86_64-3.7/bottleneck/benchmark
      copying bottleneck/benchmark/bench.py -> build/lib.linux-x86_64-3.7/bottleneck/benchmark
      copying bottleneck/benchmark/bench_detailed.py -> build/lib.linux-x86_64-3.7/bottleneck/benchmark
      creating build/lib.linux-x86_64-3.7/bottleneck/slow
      copying bottleneck/slow/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck/slow
      copying bottleneck/slow/move.py -> build/lib.linux-x86_64-3.7/bottleneck/slow
      copying bottleneck/slow/nonreduce.py -> build/lib.linux-x86_64-3.7/bottleneck/slow
      copying bottleneck/slow/nonreduce_axis.py -> build/lib.linux-x86_64-3.7/bottleneck/slow
      copying bottleneck/slow/reduce.py -> build/lib.linux-x86_64-3.7/bottleneck/slow
      creating build/lib.linux-x86_64-3.7/bottleneck/src
      copying bottleneck/src/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck/src
      copying bottleneck/src/bn_config.py -> build/lib.linux-x86_64-3.7/bottleneck/src
      copying bottleneck/src/bn_template.py -> build/lib.linux-x86_64-3.7/bottleneck/src
      creating build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/__init__.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/input_modification_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/list_input_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/memory_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/move_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/nonreduce_axis_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/nonreduce_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/reduce_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/scalar_input_test.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      copying bottleneck/tests/util.py -> build/lib.linux-x86_64-3.7/bottleneck/tests
      UPDATING build/lib.linux-x86_64-3.7/bottleneck/_version.py
      set build/lib.linux-x86_64-3.7/bottleneck/_version.py to '1.3.1'
      running build_ext
      running config
      compiling '_configtest.c':
    
      #pragma GCC diagnostic error "-Wattributes"
    
      int __attribute__((optimize("O3"))) have_attribute_optimize_opt_3(void*);
    
      int main(void)
      {
          return 0;
      }
    
      gcc -pthread -B /home/filholf/miniconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
      failure.
      removing: _configtest.c _configtest.o
      compiling '_configtest.c':
    
      #ifndef __cplusplus
      static inline int static_func (void)
      {
          return 0;
      }
      inline int nostatic_func (void)
      {
          return 0;
      }
      #endif
      int main(void) {
          int r1 = static_func();
          int r2 = nostatic_func();
          return r1 + r2;
      }
    
      gcc -pthread -B /home/filholf/miniconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
      failure.
      removing: _configtest.c _configtest.o
      compiling '_configtest.c':
    
      #ifndef __cplusplus
      static __inline__ int static_func (void)
      {
          return 0;
      }
      __inline__ int nostatic_func (void)
      {
          return 0;
      }
      #endif
      int main(void) {
          int r1 = static_func();
          int r2 = nostatic_func();
          return r1 + r2;
      }
    
      gcc -pthread -B /home/filholf/miniconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
      failure.
      removing: _configtest.c _configtest.o
      compiling '_configtest.c':
    
      #ifndef __cplusplus
      static __inline int static_func (void)
      {
          return 0;
      }
      __inline int nostatic_func (void)
      {
          return 0;
      }
      #endif
      int main(void) {
          int r1 = static_func();
          int r2 = nostatic_func();
          return r1 + r2;
      }
    
      gcc -pthread -B /home/filholf/miniconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -c _configtest.c -o _configtest.o
      failure.
      removing: _configtest.c _configtest.o
      building 'bottleneck.reduce' extension
      creating build/temp.linux-x86_64-3.7
      creating build/temp.linux-x86_64-3.7/bottleneck
      creating build/temp.linux-x86_64-3.7/bottleneck/src
      gcc -pthread -B /home/filholf/miniconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-build-env-i9jvt975/overlay/lib/python3.7/site-packages/numpy/core/include -I/home/filholf/miniconda3/include/python3.7m -Ibottleneck/src -c bottleneck/src/reduce.c -o build/temp.linux-x86_64-3.7/bottleneck/src/reduce.o -O2
      error: command 'gcc' failed with exit status 1
      ----------------------------------------
      ERROR: Failed building wheel for Bottleneck
      Running setup.py clean for Bottleneck
    Failed to build Bottleneck
    ERROR: Could not build wheels for Bottleneck which use PEP 517 and cannot be installed directly
    
    opened by lcflorindo 56
  • Preparing to release bottleneck 1.3.0

    Preparing to release bottleneck 1.3.0

    I am getting ready to release bottleneck 1.2.2. The only thing left to do is testing.

    The following people gave test reports on the pre-release of bottleneck 1.2.1, so I'm pinging you again in case you have time to test this release (the master branch): @cgohlke @itdaniher @toobaz @shoyer and anyone else.

    opened by kwgoodman 40
  • Loop unrolling and pairwise summation

    Loop unrolling and pairwise summation

    I'm playing around with loop unrolling in the unroll branch. I did some unrolling of bn.nansum. Here's the performance on my computer (nansum2 is the unrolled one):

    With float64:

    In [1]: bn.bench(functions=['nansum', 'nansum2'])
    Bottleneck performance benchmark
        Bottleneck 1.1.0dev; Numpy 1.11.0
        Speed is NumPy time divided by Bottleneck time
        NaN means approx one-third NaNs; float64 and axis=-1 are used
    
                     no NaN     no NaN      NaN        NaN    
                       (10,)   (1000,1000)   (10,)   (1000,1000)
        nansum         32.1        2.6       32.3        7.4
        nansum2        41.9        3.6       42.1        9.2
    

    With int64:

    In [3]: bn.bench(functions=['nansum', 'nansum2'], dtype='int64')
    Bottleneck performance benchmark
        Bottleneck 1.1.0dev; Numpy 1.11.0
        Speed is NumPy time divided by Bottleneck time
        NaN means approx one-third NaNs; int64 and axis=-1 are used
    
                     no NaN     no NaN      NaN        NaN    
                       (10,)   (1000,1000)   (10,)   (1000,1000)
        nansum         13.1        1.1       12.7        1.1
        nansum2        19.5        1.4       19.8        1.4
    

    And here are some more timings:

    In [4]: a = np.random.rand(100000000)
    In [5]: timeit bn.nansum(a)
    10 loops, best of 3: 78.5 ms per loop
    In [6]: timeit bn.nansum2(a)
    10 loops, best of 3: 59.2 ms per loop
    

    Neat.

    opened by kwgoodman 36
  • Appveyor

    Appveyor

    Bottleneck seems to only get tested on windows right before a release---long after a commit that breaks bottleneck on windows might have been made. See, for example, #129.

    It would be nice to test bottleneck on windows with Appveyor.

    Anyone up for the challenge? @Midnighter?

    We could skip the flake8 and sdist tests if that is helpful (we already do those tests on Travis). Maybe windows has a 32 bit option. Travis only has 64-bit OSes.

    opened by kwgoodman 31
  • Should we port bottleneck to C?

    Should we port bottleneck to C?

    I'm trying to port nansum to C (without using cython) to get a feel for how bottleneck would look written in pure C.

    What I have so far (in the c_rewrite branch) is a nansum with reduced features that compiles but does not work at all. I have not yet tried to deal with reference counting because I don't yet know how. Any help, comments, appreciated.

    Here's a demo on how to compile:

    /bottleneck/bottleneck/template (c_rewrite)$ python setup.py build_ext --inplace
    running build_ext
    building 'nansum' extension
    <snip>
    In [1]: from nansum import nansum
    In [2]: a=np.random.rand(4)
    In [3]: nansum(a)
    ---------------------------------------------------------------------------
    SystemError                               Traceback (most recent call last)
    <ipython-input-3-1d96a990bfd9> in <module>()
    ----> 1 nansum(a)
    
    SystemError: error return without exception set
    
    opened by kwgoodman 30
  • Rewrite of bottleneck internals

    Rewrite of bottleneck internals

    Here's a proposal for a rewrite of the bottlleneck internals. The main goals:

    • Easier to add functions to bottleneck
    • Easier to maintain
    • Less C code

    The easiest way to explain what I have in mind is with an example written in pure python:

    import numpy as np
    
    
    def nanmean_2d_float64_axis0(a):
        algo = Nanmean()
        return reduce_2d_float64_axis0(a, algo)
    
    
    def reduce_2d_float64_axis0(a, algo):
        n0, n1 = a.shape
        y = np.empty(n1)
        for j in range(n1):
            algo.clear()
            for i in range(n0):
                ai = a[i, j]
                algo.append(ai)
            y[j] = algo.calc()
        return y
    
    
    class Nanmean(object):
    
        def __init__(self):
            self.count = 0
            self.asum = 0
    
        def append(self, ai):
            if ai == ai:
                self.asum += ai
                self.count += 1
    
        def calc(self):
            if self.count > 0:
                return self.asum / self.count
            else:
                return np.nan
    
        def clear(self):
            self.asum = 0
            self.count = 0
    

    Note that:

    • nanmean_2d_float64_axis0 keeps the same name and signature as in the current code.
    • reduce_2d_float64_axis0 can be (re)used with any reduce function such as nansum or median.
    • The algo, here Nanmean, only has to be written for the 1d case.

    To add a new function, the user only has to think about the 1d case and then write the algo class (I've never written a cython class but I assume it can be done). Bottleneck should take care of the rest.

    A similar setup would be used for moving window functions.

    I can't picture what all this would look like in practice, but just wanted to record the thought.

    opened by kwgoodman 26
  • Cannot build/install on Ubuntu 12.10 64Bit in Virtual Machine

    Cannot build/install on Ubuntu 12.10 64Bit in Virtual Machine

    Hello,

    I'm not pretending that somebody will track down the cause but I thought I'd share the situation.

    I'm trying to install bottleneck on a VirtualBox 64bits Ubuntu 12.10 guest.

    python setup.py install

    ends with the following lines:

    bottleneck/src/func/64bit/func.c: At top level:
    /home/scrosta/virtualenv/local/lib/python2.7/site-packages/numpy/core/include/numpy/__ufunc_api.h:226:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
    gcc: internal compiler error: Killed (program cc1)
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
    error: command 'gcc' failed with exit status 4
    

    where of course the gcc: internal compiler error: Killed (program cc1) is the ugly bit.

    Also when trying a make all from the repository gives exactly the same error:

    /home/scrosta/virtualenv/local/lib/python2.7/site-packages/numpy/core/include/numpy/__ufunc_api.h:226:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
    gcc: internal compiler error: Killed (program cc1)
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
    error: command 'gcc' failed with exit status 4
    make: *** [funcs] Error 1
    
    opened by stefanocrosta 23
  • Numpy 1.12.0 bug causes unit test failure in bottleneck

    Numpy 1.12.0 bug causes unit test failure in bottleneck

    I ask forgiveness in advance for having no idea of whether this is an upstream problem or a packaging problem... I don't have time now to investigate (and actually tests run fine in my - partially out of date - debian testing). But maybe the output of those tests sounds familiar to you...

    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=851613

    opened by toobaz 22
  • Please help: osx10.11-xcode7.3 test failures?

    Please help: osx10.11-xcode7.3 test failures?

    median and nanmedian are now failing on travis in bottleneck master. Travis passed and then after one commit (which only touched the AppVeyor files) it failed.

    I don't think the failures are related to the appveyor commit. Maybe related to a change in the osx builds? Help!

    opened by kwgoodman 20
  • Symbol not found: _get_largest_child

    Symbol not found: _get_largest_child

    After installing on OS X 10.8 Mountain Lion, I get this error:

    >>> from bottleneck.move import *
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: dlopen(/Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so, 2): Symbol not found: _get_largest_child
      Referenced from: /Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so
      Expected in: flat namespace
     in /Library/Python/2.7/site-packages/Bottleneck-0.6.0-py2.7-macosx-10.8-intel.egg/bottleneck/move.so
    

    Any ideas?

    >>> import numpy
    >>> numpy.__version__
    '1.6.2'
    
    opened by mrjbq7 20
  • New release?

    New release?

    It looks like there was an attempt to release bottleneck 1.3.3 https://github.com/pydata/bottleneck/pull/347 but it didn't reach PyPI. Since #306 hasn't been released yet, users are finding all sorts of installation problems in Python 3.9, see #376. This will continue happening with the release of Python 3.10.

    Even if the package is now unmaintained or dormand, if a new release is made with #306, at least some common installation issues will be addressed.

    cc @qwhelan

    opened by astrojuanlu 17
  • [BUG] NumPy 1.24: test regression `ValueError: cannot convert float NaN to integer`

    [BUG] NumPy 1.24: test regression `ValueError: cannot convert float NaN to integer`

    Describe the bug When NumPy is upgrade to 1.24, the following test fails:

    ________________________________________________________ test_move[move_rank] _________________________________________________________
    
    func = <built-in function move_rank>
    
        @pytest.mark.parametrize("func", bn.get_functions("move"), ids=lambda x: x.__name__)
        def test_move(func):
            """Test that bn.xxx gives the same output as a reference function."""
            fmt = (
                "\nfunc %s | window %d | min_count %s | input %s (%s) | shape %s | "
                "axis %s | order %s\n"
            )
            fmt += "\nInput array:\n%s\n"
            aaae = assert_array_almost_equal
            func_name = func.__name__
            func0 = eval("bn.slow.%s" % func_name)
            if func_name == "move_var":
                decimal = 3
            else:
                decimal = 5
            for i, a in enumerate(arrays(func_name)):
                axes = range(-1, a.ndim)
                for axis in axes:
                    windows = range(1, a.shape[axis])
                    for window in windows:
                        min_counts = list(range(1, window + 1)) + [None]
                        for min_count in min_counts:
                            actual = func(a, window, min_count, axis=axis)
    >                       desired = func0(a, window, min_count, axis=axis)
    
    a          = array([], shape=(2, 0), dtype=int64)
    aaae       = <function assert_array_almost_equal at 0x7faf70006ca0>
    actual     = array([], shape=(2, 0), dtype=float64)
    axes       = range(-1, 2)
    axis       = 0
    da         = dtype('float32')
    dd         = dtype('float32')
    decimal    = 5
    desired    = array([], shape=(1, 0, 2), dtype=float32)
    err_msg    = '\nfunc move_rank | window 1 | min_count None | input a53 (float32) | shape (1, 0, 2) | axis 2 | order C,F\n\nInput array:\n[]\n\n dtype mismatch %s %s'
    fmt        = '\nfunc %s | window %d | min_count %s | input %s (%s) | shape %s | axis %s | order %s\n\nInput array:\n%s\n'
    func       = <built-in function move_rank>
    func0      = <function move_rank at 0x7faf70ab9d00>
    func_name  = 'move_rank'
    i          = 57
    min_count  = 1
    min_counts = [1, None]
    tup        = ('move_rank', 1, 'None', 'a53', 'float32', '(1, 0, 2)', ...)
    window     = 1
    windows    = range(1, 2)
    
    bottleneck/.venv/lib/python3.11/site-packages/bottleneck/tests/move_test.py:33: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    bottleneck/.venv/lib/python3.11/site-packages/bottleneck/slow/move.py:110: in move_rank
        return move_func(lastrank, a, window, min_count, axis=axis)
            a          = array([], shape=(2, 0), dtype=int64)
            axis       = 0
            min_count  = 1
            window     = 1
    bottleneck/.venv/lib/python3.11/site-packages/bottleneck/slow/move.py:148: in move_func
        y[tuple(idx2)] = func(a[tuple(idx1)], axis=axis, **kwargs)
            a          = array([], shape=(2, 0), dtype=int64)
            axis       = 0
            func       = <function lastrank at 0x7faf70ab9ee0>
            i          = 0
            idx1       = [slice(0, 1, None), slice(None, None, None)]
            idx2       = [0, slice(None, None, None)]
            kwargs     = {}
            mc         = 1
            min_count  = 1
            win        = 1
            window     = 1
            y          = array([], shape=(2, 0), dtype=float64)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    a = array([], shape=(1, 0), dtype=int64), axis = 0
    
        def lastrank(a, axis=-1):
            """
            The ranking of the last element along the axis, ignoring NaNs.
        
            The ranking is normalized to be between -1 and 1 instead of the more
            common 1 and N. The results are adjusted for ties.
        
            Parameters
            ----------
            a : ndarray
                Input array. If `a` is not an array, a conversion is attempted.
            axis : int, optional
                The axis over which to rank. By default (axis=-1) the ranking
                (and reducing) is performed over the last axis.
        
            Returns
            -------
            d : array
                In the case of, for example, a 2d array of shape (n, m) and
                axis=1, the output will contain the rank (normalized to be between
                -1 and 1 and adjusted for ties) of the the last element of each row.
                The output in this example will have shape (n,).
        
            Examples
            --------
            Create an array:
        
            >>> y1 = larry([1, 2, 3])
        
            What is the rank of the last element (the value 3 in this example)?
            It is the largest element so the rank is 1.0:
        
            >>> import numpy as np
            >>> from la.afunc import lastrank
            >>> x1 = np.array([1, 2, 3])
            >>> lastrank(x1)
            1.0
        
            Now let's try an example where the last element has the smallest
            value:
        
            >>> x2 = np.array([3, 2, 1])
            >>> lastrank(x2)
            -1.0
        
            Here's an example where the last element is not the minimum or maximum
            value:
        
            >>> x3 = np.array([1, 3, 4, 5, 2])
            >>> lastrank(x3)
            -0.5
        
            """
            a = np.array(a, copy=False)
            ndim = a.ndim
            if a.size == 0:
                # At least one dimension has length 0
                shape = list(a.shape)
                shape.pop(axis)
                r = np.empty(shape, dtype=a.dtype)
    >           r.fill(np.nan)
    E           ValueError: cannot convert float NaN to integer
    
    a          = array([], shape=(1, 0), dtype=int64)
    axis       = 0
    ndim       = 2
    r          = array([], dtype=int64)
    shape      = [0]
    
    bottleneck/.venv/lib/python3.11/site-packages/bottleneck/slow/move.py:236: ValueError
    

    Downgrading to 1.23.5 makes tests pass.

    To Reproduce To assist in reproducing the bug, please include the following:

    1. git clone https://github.com/pydata/bottleneck
    2. pip install bottleneck pytest
    3. python -c "import bottleneck;bottleneck.test()"

    Expected behavior Tests passing ;-).

    Additional context Tested on a825e20999c8a03892ccb464707fdc4bf128e2a4.

    bug 
    opened by mgorny 0
  • Add move_quantile function

    Add move_quantile function

    This PR adds move_quantile to the list of supported move functions.

    Why?

    Quantiles (and moving quantiles) are often useful statistics to look at, and having a fast move version of quantile would be great.

    How?

    Moving/rolling quantile is implemented in almost exactly the same way as moving median: via two heaps (max-heap and min-heap). The only difference is in sizes of the heaps -- for move_median they should have the same size (modulo parity nuances), while for the move_quantile sizes of the heaps should be rebalanced differently.

    The changes to transform move_median into move_quantile are very minor, and were implemented in the first commit 524afbf7c91e81a85422f74db21cbc4b2c1694a2 (36++, 13--). This commit fully implemented move_quantile with fixed q=0.25 out of move_median.

    • The initial approach was to substitute move_median with move_quantile completely. Then, on move_median call, just call move_quantile(q=0.5). This is implemented and tested in commits until de181daf28d0c1617058c62846fd330b6f772e97 , where fully working move_quantile (and move_median via move_quantile) was implemented.

      At this point, new move_median bench was compared to old move_median bench. It was observed that the new move_median became slower by 1-3%. Even though the changes were minor, apparently new arithmetic operations introduced were enough to cause this overhead. For a performance-oriented package with decrease in speed is not justifiable.

    • It was decided to implement move_quantile parallel to move_median. This causes a lot of code repetition, but this needed to be done to not sacrificy move_median performance (and also to avoid abusing macros) cd49b4f9630a5d60115c8847509be8d38305e5a8 . A lot of the functions in move_median.c were almost duplicated, hence a large diff. At this commit, both move_quantile and move_median were fully implemented and almost fully tested.

    • When move_quantile is called with q=0., instead move_min is called, which is much faster. Similarly with q=1. and move_max, and with q=0.5 and move_median.

    • Only interpolation method "midpoint" was implemented for now.

    Other changes

    • Function parse_args in move_template.c was heavily refactored for better clarity

    Technicalities

    • np.nanquantile behaves weirdly when there are np.inf's in the data. See, for instance, https://github.com/numpy/numpy/issues/21932, https://github.com/numpy/numpy/issues/21091 . In particular, np.nanquantile(q=0.5) doesn't give the same result as np.nanmedian on such data, because of how arithmetic operation work on np.infs. Our move_quantile behaves as expected and in agreement with move_median when q=0.5. To test properly (and have a numpy slow version of move_quantile), we notice that np.nanmedian behaviour can be achieved if one takes (np.nanquantile(a, q=0.5, method="lower") + np.nanquantile(a, q=0.5, method="higher")) / 2. This is what we use for slow function if there are np.inf's in the data. The fact that this and np.nanmedian return the same is tested in move_test.py. This issue is also discussed in there in comments (which I used pretty liberally)
    • When there are no infs in a, the usual np.nanquantile is called in bn.slow.move_quantile, so benching is "fair", since we don't consider infinite values during benching.

    Tests

    • A lot of extensive tests were added for move_quantile. With constant REPEAT_FULL_QUANTILE set to 1 in test_move_quantile_with_infs_and_nans, the test considers 200k instances, and takes ~7 mins to run. It was tested with more repetitions and larger range of parameter, the current values are set so that the Github Actions tests run reasonable time.

    Benches

    • bn.move_quantile is significantly faster than bn.slow.move_quantile:
        Bottleneck 1.3.5.post0.dev24; Numpy 1.23.1
        Speed is NumPy time divided by Bottleneck time
        None of the array elements are NaN
    
       Speed  Call                          Array
       269.9  move_quantile(a, 1, q=0.25)   rand(1)
      2502.7  move_quantile(a, 2, q=0.25)   rand(10)
      6718.9  move_quantile(a, 20, q=0.25)   rand(100)
      5283.4  move_quantile(a, 200, q=0.25)   rand(1000)
      5747.2  move_quantile(a, 2, q=0.25)   rand(10, 10)
      3197.3  move_quantile(a, 20, q=0.25)   rand(100, 100)
      3051.9  move_quantile(a, 20, axis=0, q=0.25)   rand(100, 100, 100)
      3135.6  move_quantile(a, 20, axis=1, q=0.25)   rand(100, 100, 100)
      3232.8  move_quantile(a, 20, axis=2, q=0.25)   rand(100, 100, 100)
    

    The increase in speed was tested and confirmed separately (outside of bn.bench) for sanity check. q = 0.25 is used for all benches with move_quantile.

    • A slight complication that arises is that these benches are very long to run now, because of how slow np.nanquantile is. bn.bench(functions=["move_quantile"]) runs for about 20 minutes:
    Bottleneck performance benchmark
        Bottleneck 1.3.5.post0.dev24; Numpy 1.23.1
        Speed is NumPy time divided by Bottleneck time
        NaN means approx one-fifth NaNs; float64 used
    
                  no NaN     no NaN      NaN       no NaN      NaN    
                   (100,)  (1000,1000)(1000,1000)(1000,1000)(1000,1000)
                   axis=0     axis=0     axis=0     axis=1     axis=1  
    move_quantile 6276.9     1961.2     1781.8     2294.7     2255.1
    

    Further changes

    Several things that can be improved with move_quantile going further:

    • Implement more interpolation methods. Refactoring of parse_arg function made it much easier to pass additional arguments to functions in move. Changing behavior of mq_get_quantile should not be a problem as well
    • np.quantile supports a list (iterable) of quantiles to compute. Can also add it here, quite easy to do if implement it at the first step on python level.
    • I had an attempt of making the argument q a required argument for move_quantile (as it should be), but was met with some complications and left it as is. If will create a python wrapper to parse the iterable q input anyway, can add non-keyword q to that python layer.

    Wrap-up

    Thanks for considering, and sorry for a large diff. 50% of that is duplicating code in move_median.c, and another 20% is new tests. You can see in de181daf28d0c1617058c62846fd330b6f772e97 how few changes were actually made for move_quantile to work, but this approach just unfortunately slowed down move_median by a bit.

    opened by andrii-riazanov 2
  • Continuous fuzzing by way of OSS-Fuzz

    Continuous fuzzing by way of OSS-Fuzz

    Hi,

    I was wondering if you would like to integrate continuous fuzzing by way of OSS-Fuzz? Fuzzing is a way to automate test-case generation and in this PR https://github.com/google/oss-fuzz/pull/8303 I did an initial integration into OSS-Fuzz where the current fuzzer targets the scalar functions and move_median using an array generated from fuzzer data. The fuzzing engine used by OSS-Fuzz is Atheris.

    The native code is compiled with various sanitizers in order to detect memory corruption issues. Additionally, the code has various asserts to trigger in the event any inconsistency between scalar and .slow.. functions.

    If you would like to integrate, the only thing I need is a list of email(s) that will get access to the data produced by OSS-Fuzz, such as bug reports, coverage reports and more stats. Notice the emails affiliated with the project will be public in the OSS-Fuzz repo, as they will be part of a configuration file.

    opened by DavidKorczynski 0
  • [Enhancement] Add move_skew and move_kurt

    [Enhancement] Add move_skew and move_kurt

    I tried to implement moving window skewness and kurtosis on this project. However, simply applying the technique used in move_std/move_var is not numerically stable due to the third/fourth power needed in skewneww/kurtosis. Are there any methods to reduce the float error?

    opened by jkadbear 0
  • Python crashed in fuzzing test of 7 APIs

    Python crashed in fuzzing test of 7 APIs

    Describe the bug API-list: bn.nanmedian, bn.nanmean, bn.nanstd, bn.median, bn.ss, bn.nanmin, bn.nanmax. Python crashed in our fuzzing test of 7 APIs All tests were run on the latest developing branch.

    To Reproduce To assist in reproducing the bug, please include the following:

    1. Script example: python random_shape.py input1 bn.nanmedian
    2. Python3.8 & Ubuntu18.04
    3. pip 21.2.4
    4. pip list

    Expected behavior python should not crash

    Additional context input1 input2 input3 input4 input5 input6 input7

    Crash information: median nanmean nanmedian nanmin nanstd ss nanmin and nanmax crashed at the same position.

    bug 
    opened by baltsers 0
Releases(v1.4.0.dev0)
Owner
Python for Data
Python for Data
Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Iris Species Predictor Iris species predictor app is used to classify iris species using their sepal length, sepal width, petal length and petal width

Siva Prakash 5 Apr 5, 2022
Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 5, 2022
NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions, weight decay and dropout.

null 1 Feb 10, 2022
Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

Linear Models Implementations of LinearRegression, LassoRegression and RidgeRegression with appropriate Regularizers and Optimizers. Linear Regression

Keivan Ipchi Hagh 1 Nov 22, 2021
Cohort Intelligence used to solve various mathematical functions

Cohort-Intelligence-for-Mathematical-Functions About Cohort Intelligence : Cohort Intelligence ( CI ) is an optimization technique. It attempts to mod

Aayush Khandekar 2 Oct 25, 2021
A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and motion planning

pybullet-planning (previously ss-pybullet) A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and

Caelan Garrett 260 Dec 27, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 8, 2023
A Collection of Conference & School Notes in Machine Learning 🦄📝🎉

Machine Learning Conference & Summer School Notes. ??????

null 558 Dec 28, 2022
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking. TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

null 538 Jan 1, 2023
neurodsp is a collection of approaches for applying digital signal processing to neural time series

neurodsp is a collection of approaches for applying digital signal processing to neural time series, including algorithms that have been proposed for the analysis of neural time series. It also includes simulation tools for generating plausible simulations of neural time series.

NeuroDSP 224 Dec 2, 2022
Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

null 148 Dec 7, 2022
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

?? Interactive Machine Learning experiments: ??️models training + ??models demo

Oleksii Trekhleb 1.4k Jan 6, 2023
A collection of neat and practical data science and machine learning projects

Data Science A collection of neat and practical data science and machine learning projects Explore the docs » Report Bug · Request Feature Table of Co

Will Fong 2 Dec 10, 2021
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
PLUR is a collection of source code datasets suitable for graph-based machine learning.

PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the datasets. This is done by offering a unified API and data structures for all datasets.

Google Research 76 Nov 25, 2022
Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

FINRA 25 Dec 28, 2022
High performance implementation of Extreme Learning Machines (fast randomized neural networks).

High Performance toolbox for Extreme Learning Machines. Extreme learning machines (ELM) are a particular kind of Artificial Neural Networks, which sol

Anton Akusok 174 Dec 7, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 7, 2023