Python bindings for MPI

MPI for Python

Last update: Dec 29, 2022

Overview

MPI for Python

https://github.com/mpi4py/mpi4py/workflows/ci/badge.svg?branch=master

https://dev.azure.com/mpi4py/mpi4py/_apis/build/status/mpi4py.mpi4py?branchName=master

https://ci.appveyor.com/api/projects/status/whh5xovp217h0f7n?svg=true

https://circleci.com/gh/mpi4py/mpi4py.svg?style=shield

https://travis-ci.org/mpi4py/mpi4py.svg?branch=master

https://readthedocs.org/projects/mpi4py/badge/?version=latest

Overview

Welcome to MPI for Python. This package provides Python bindings for the Message Passing Interface (MPI) standard. It is implemented on top of the MPI-1/2/3 specification and exposes an API which grounds on the standard MPI-2 C++ bindings.

Dependencies

Python 2.7, 3.5 or above, or PyPy 2.0 or above.
A functional MPI 1.x/2.x/3.x implementation like MPICH or Open MPI built with shared/dynamic libraries.
To work with the in-development version, you need to install Cython.

Testsuite

The testsuite is run periodically on

Comments

Importing the C API fails on Windows
I am trying to build a C++ Python extension (with pybind11) that uses MPI and mpi4py on Windows. I am working in a conda environment and I installed mpi4py as follows:

conda install mpi4py -c conda-forge --yes

The following:

// import the mpi4py API if (import_mpi4py() < 0) { throw std::runtime_error("Could not load mpi4py API."); }

throws the exception with traceback:

Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Users\IEUser\VeloxChemMP\build_dbg_14.27\lib\python3.8\site-packages\veloxchem\__init__.py", line 2, in <module> from .veloxchemlib import AtomBasis ImportError: Could not load mpi4py API.

Running mpiexec -n 4 python -c “from mpi4py import MPI; comm = MPI.COMM_WORLD; print(comm.Get_rank())” works as expected. I am not sure whether this is a bug or some silly mistake I am making.

This issue was migrated from moved from BitBucket #177
opened by robertodr 60

cuda tests fail when CUDA is available but not configured

I'm testing the build of the new release 3.1.1.

All tests accessing cuda are failing. This is not entirely surprising in itself. My system has nvidia drivers available and has a switchable nvidia card accessible via bumblebee (primusrun). But I have not specifically configured my system to execute CUDA. So it's not surprising that CUDA_ERROR_NO_DEVICE is found. For me the nvidia card that I have at hand is for experimentation, not for routine operation. The main video card is intel.

What's the best way to handle this situation? How can a non-CUDA build be enforced when CUDA is otherwise "available".

An example test log is:

ERROR: testAllgather (test_cco_buf.TestCCOBufInplaceSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/projects/python/build/mpi4py/test/test_cco_buf.py", line 382, in testAllgather
    buf = array(-1, typecode, (size, count))
  File "/projects/python/build/mpi4py/test/arrayimpl.py", line 459, in __init__
    self.array = numba.cuda.device_array(shape, typecode)
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 223, in _require_cuda_context
    with _runtime.ensure_context():
  File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 121, in ensure_context
    with driver.get_active_context():
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 393, in __enter__
    driver.cuCtxGetCurrent(byref(hctx))
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 280, in __getattr__
    self.initialize()
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 240, in initialize
    raise CudaSupportError("Error at driver init: \n%s:" % e)
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init:
[100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
-------------------- >> begin captured logging << --------------------
numba.cuda.cudadrv.driver: INFO: init
numba.cuda.cudadrv.driver: DEBUG: call driver api: cuInit
numba.cuda.cudadrv.driver: ERROR: Call to cuInit results in CUDA_ERROR_NO_DEVICE
--------------------- >> end captured logging << ---------------------

opened by drew-parsons 43

test_io.TestIOSelf failures on Fedora Rawhide i686

Seeing this on Fedora Rawhide i686:

+ mpiexec -np 1 python3 test/runtests.py -v --no-builddir
/builddir/build/BUILD/mpi4py-3.1.1/test/runtests.py:76: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
  from distutils.util import get_platform
[[email protected]] Python 3.10 (/usr/bin/python3)
[[email protected]] MPI 3.1 (Open MPI 4.1.1)
[[email protected]] mpi4py 3.1.1 (/builddir/build/BUILDROOT/mpi4py-3.1.1-1.fc36.i386/usr/lib/python3.10/site-packages/openmpi/mpi4py)
--------------------------------------------------------------------------
The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release.
Workarounds are to run on a single node, or to use a system with an RDMA
capable network such as Infiniband.
--------------------------------------------------------------------------
...
======================================================================
ERROR: testIReadIWrite (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 124, in testIReadIWrite
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testIReadIWriteAll (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 302, in testIReadIWriteAll
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testIReadIWriteAt (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 75, in testIReadIWriteAt
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testIReadIWriteAtAll (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 227, in testIReadIWriteAtAll
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testIReadIWriteShared (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 176, in testIReadIWriteShared
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWrite (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 97, in testReadWrite
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAll (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 276, in testReadWriteAll
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAllBeginEnd (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 329, in testReadWriteAllBeginEnd
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAt (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 53, in testReadWriteAt
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAtAll (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 203, in testReadWriteAtAll
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAtAllBeginEnd (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 252, in testReadWriteAtAllBeginEnd
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteOrdered (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 355, in testReadWriteOrdered
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteOrderedBeginEnd (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 379, in testReadWriteOrderedBeginEnd
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteShared (test_io.TestIOSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 151, in testReadWriteShared
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testIReadIWrite (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 124, in testIReadIWrite
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testIReadIWriteAll (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 302, in testIReadIWriteAll
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testIReadIWriteAt (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 75, in testIReadIWriteAt
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testIReadIWriteAtAll (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 227, in testIReadIWriteAtAll
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testIReadIWriteShared (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 176, in testIReadIWriteShared
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWrite (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 97, in testReadWrite
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAll (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 276, in testReadWriteAll
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAllBeginEnd (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 329, in testReadWriteAllBeginEnd
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAt (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 53, in testReadWriteAt
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAtAll (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 203, in testReadWriteAtAll
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteAtAllBeginEnd (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 252, in testReadWriteAtAllBeginEnd
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteOrdered (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 355, in testReadWriteOrdered
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteOrderedBeginEnd (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 379, in testReadWriteOrderedBeginEnd
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
ERROR: testReadWriteShared (test_io.TestIOWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_io.py", line 151, in testReadWriteShared
    fh.Set_view(0, etype)
  File "mpi4py/MPI/File.pyx", line 215, in mpi4py.MPI.File.Set_view
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
======================================================================
FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='q')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
    self.assertEqual(mt.extent, n*ex1)
AssertionError: 20 != 24
======================================================================
FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='Q')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
    self.assertEqual(mt.extent, n*ex1)
AssertionError: 20 != 24
======================================================================
FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='d')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
    self.assertEqual(mt.extent, n*ex1)
AssertionError: 20 != 24
======================================================================
FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='g')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
    self.assertEqual(mt.extent, n*ex1)
AssertionError: 28 != 36
======================================================================
FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='D')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
    self.assertEqual(mt.extent, n*ex1)
AssertionError: 36 != 40
======================================================================
FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='G')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
    self.assertEqual(mt.extent, n*ex1)
AssertionError: 52 != 60
======================================================================
FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='i8')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
    self.assertEqual(mt.extent, n*ex1)
AssertionError: 20 != 24
======================================================================
FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='u8')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
    self.assertEqual(mt.extent, n*ex1)
AssertionError: 20 != 24
======================================================================
FAIL: testStruct4 (test_util_dtlib.TestUtilDTLib) (typecode='f8')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/mpi4py-3.1.1/test/test_util_dtlib.py", line 165, in testStruct4
    self.assertEqual(mt.extent, n*ex1)
AssertionError: 20 != 24
----------------------------------------------------------------------
Ran 1385 tests in 41.098s
FAILED (failures=9, errors=28, skipped=191)

opened by opoplawski 31

Warn users if mpi4py running under different MPI implementation

It does happen to unexperienced users that they run pacakges using mpi4py under an MPI implementation different from the one that mpi4y was built with.

MPI.jl, the analogue of mpi4py in Julia land, tries to warn users if this happens by detecting at MPI_INIT time if MPI_COMM_WORLD reports only 1 rank but the known environment variables (MPI_LOCALNRANKS, OMPI_COMM_WORLD_SIZE) suggest that there should be more than 1 rank. This happens exactly when mpi4py would be built with mpich but run under open-mpi.

If this condition is detected it prints a warning.

What do you think about doing this? Is it something that you would merge into mpi4py?

opened by PhilipVinc 19
[WIP] Add auto-generated "API Reference" to the RTD docs for future cross referencing
This is a pure exploration and I have no idea if I have time to complete it. Just for fun...

Notes to self:

I temporarily changed the theme to pydata's so that I can make a direct comparison with NumPy & CuPy's websites. I will revert it back if I ever have the chance to finish.

_templates/autosummary/class.rst is copied from CuPy

Need to compare with the old API ref https://mpi4py.github.io/apiref/index.html

I am still confused by how the new website http://mpi4py.readthedocs.org/ is generated. My local test via running make html in docs/source/usrman/ would lead to the old site https://mpi4py.github.io/ instead...
opened by leofang 15
mpi4py error during getting results (in pare with SLURM)
ERROR: Traceback (most recent call last): File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/main.py", line 72, in main() File "/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/main.py", line 60, in main run_command_line() File "/home/vasko/.local/lib/python3.6/site-packages/mpi4py/run.py", line 47, in run_command_line run_path(sys.argv[0], run_name='main') File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/opt/software/anaconda/3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "cali_send_2.py", line 137, in globals()[sys.argv[1]](sys.argv[2], sys.argv[3]) File "cali_send_2.py", line 94, in solve_on_cali sols = list(executor.map(solve_matrix, repeat(inputs), range(len(wls)), wls)) File "/home/vasko/.local/lib/python3.6/site-packages/mpi4py/futures/pool.py", line 207, in result_iterator yield futures.pop().result() File "/opt/software/anaconda/3/lib/python3.6/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/opt/software/anaconda/3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise self._exception UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 5: invalid start byte

ENV CentOS release 6.5 (Final) Python 3.6 anaconda mpiexec (OpenRTE) 1.8.2 mpi4py 3.0.3

Piece of Code:

inputs = [der_mats, ref_ind_yee_grid, n_xy_sq, param_sweep_on, i_m, inv_eps, sol_params] with MPIPoolExecutor(max_workers=int(nodes)) as executor: sols = list(executor.map(solve_matrix, repeat(inputs), range(len(wls)), wls)) executor.shutdown(wait=True) # wait for all complete zipobj = ZipFile(zp_fl_nm, 'w') for sol in sols: w, v, solnum, vq = sol print(w[0], solnum) # this line will shows if data have duplicates. w.tofile(f"w_sol_{solnum}.npy") v.tofile(f"v_sol_{solnum}.npy") vq.tofile(f"vq_sol_{solnum}.npy") zipobj.write(f"w_sol_{solnum}.npy") zipobj.write(f"v_sol_{solnum}.npy") zipobj.write(f"vq_sol_{solnum}.npy") os.remove(f"w_sol_{solnum}.npy") os.remove(f"v_sol_{solnum}.npy") os.remove(f"vq_sol_{solnum}.npy")

Call of method I do with sending command like this: f'srun --mpi=pmi2 -n ${{SLURM_NTASKS}} python -m mpi4py.futures cali_send_2.py solve_on_cali \"\"{name}\"\" {num_nodes}'

Sometimes this error not appear if I use another range for wls with (wls = np.arange(0.4e-6, 1.8e-6, 0.01e-6)) it crush with this error or return duplicates of some solutions if step 0.1e-6. If I use this range (wls = np.arange(0.55e-6, 1.55e-6, 0.01e-6)) with any step 0.1e-6 or 0.001e-6 it's NOT crush with mentioned error and returns good results without duplicates.

Could someone please explain me what is the origin of this error? My suspicion is pointing on float numbers like 1.699999999999999999999e-6
opened by byquip 15
Add GPU tests for DLPack support
DO NOT MERGE. To be continued tomorrow...

With Open MPI 4.1.1, I am seeing MPI_ERR_TRUNCATE in testAllgatherv3, so I skip them for now:

$ mpirun --mca opal_cuda_support 1 -n 1 python test/runtests.py --no-numba --cupy -e "test_cco_nb_vec" -e "test_cco_vec"

Tomorrow I will continue investigating the only errors in test_msgspec.py.
opened by leofang 14
mpi4py.futures.MPIPoolExecutor hangs at comm.Disconnect() inside client_close(comm)
Background information

mpi4py 3.0.3 PyPy 7.3.3 (Python 2.7.18) Intel MPI Version 2019 Update 8 Build 20200624 GNU/Linux 3.10.0-1160.11.1.el7.x86_64 slurm 20.02.5

Details of the problem

Sample code

from mpi4py import MPI import mpi4py.futures as mp import sys def write(x): sys.stdout.write(x) sys.stdout.flush() def fun(args): rank = MPI.COMM_WORLD.Get_rank() return rank if __name__ == '__main__': size = MPI.COMM_WORLD.Get_size() rank = MPI.COMM_WORLD.Get_rank() for i in range(size): MPI.COMM_WORLD.Barrier() if rank == i: write('Parent %d\n' % rank) with mp.MPIPoolExecutor(2) as pool: write(', '.join(map(str, pool.map(fun, range(2)))) + '\n')

shell$ srun -N 16 --ntasks-per-node 2 --pty /bin/bash -l shell$ mpirun -ppn 1 ./test_mpi.py Parent 0 0, 1

It hangs here. When I traced the function calls, I found the problem at futures/_lib.py -> def client_close(comm): -> comm.Disconnect() Similarly, when the with statement is replaced with an assignment statement and pool.shutdown() is called after the for loop, all the processes print their messages correctly but the program still hangs at comm.Disconnect(). A similar C code version of this code does not produce this problem.
opened by KyuzoR 14

3.1.3: pytest is failing

I'm trying to package your module as an rpm package. So I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

python3 -sBm build -w --no-isolation
because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
install .whl file in </install/prefix>
run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

Here is pytest output:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-mpi4py-3.1.3-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-mpi4py-3.1.3-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/mpi4py-3.1.3, configfile: setup.cfg, testpaths: test
collected 1402 items

test/test_address.py .....                                                                                                                                           [  0%]
test/test_attributes.py ........................................ssssssss                                                                                             [  3%]
test/test_cco_buf.py ........................................................................                                                                        [  8%]
test/test_cco_nb_buf.py ......................................................................                                                                       [ 13%]
test/test_cco_nb_vec.py ..........................................................                                                                                   [ 18%]
test/test_cco_ngh_buf.py ................                                                                                                                            [ 19%]
test/test_cco_ngh_obj.py ........                                                                                                                                    [ 19%]
test/test_cco_obj.py ........................................                                                                                                        [ 22%]
test/test_cco_obj_inter.py ssssssssssssssssssssssss                                                                                                                  [ 24%]
test/test_cco_vec.py ..............................................................                                                                                  [ 28%]
test/test_cffi.py ss                                                                                                                                                 [ 28%]
test/test_comm.py ...................................................................                                                                                [ 33%]
test/test_comm_inter.py ssssssssssss                                                                                                                                 [ 34%]
test/test_comm_topo.py ............................                                                                                                                  [ 36%]
test/test_ctypes.py ..                                                                                                                                               [ 36%]
test/test_datatype.py .........................                                                                                                                      [ 38%]
test/test_dl.py ....                                                                                                                                                 [ 38%]
test/test_doc.py .                                                                                                                                                   [ 38%]
test/test_dynproc.py sss.                                                                                                                                            [ 39%]
test/test_environ.py ..............                                                                                                                                  [ 40%]
test/test_errhandler.py .....s                                                                                                                                       [ 40%]
test/test_errorcode.py .....                                                                                                                                         [ 40%]
test/test_exceptions.py ....................................sssss...                                                                                                 [ 44%]
test/test_file.py ..............                                                                                                                                     [ 45%]
test/test_fortran.py ...........                                                                                                                                     [ 45%]
test/test_grequest.py ...                                                                                                                                            [ 46%]
test/test_group.py ................................................                                                                                                  [ 49%]
test/test_info.py ..........                                                                                                                                         [ 50%]
test/test_io.py ............................                                                                                                                         [ 52%]
test/test_memory.py .............                                                                                                                                    [ 53%]
test/test_mpimem.py ..                                                                                                                                               [ 53%]
test/test_msgspec.py ..s....ss............FF.........ssssss......ssssssssssssssssss......................................................ssssssssssss......s.s...... [ 63%]
s..ss                                                                                                                                                                [ 63%]
test/test_msgzero.py ...s...s                                                                                                                                        [ 64%]
test/test_objmodel.py .........                                                                                                                                      [ 64%]
test/test_op.py .........                                                                                                                                            [ 65%]
test/test_p2p_buf.py ....................................                                                                                                            [ 68%]
test/test_p2p_buf_matched.py ..........                                                                                                                              [ 68%]
test/test_p2p_obj.py ................................................................................                                                                [ 74%]
test/test_p2p_obj_matched.py ..........                                                                                                                              [ 75%]
test/test_pack.py ......                                                                                                                                             [ 75%]
test/test_pickle.py ..s...s                                                                                                                                          [ 76%]
test/test_rc.py ...                                                                                                                                                  [ 76%]
test/test_request.py .........                                                                                                                                       [ 77%]
test/test_rma.py ssssssssssssssssssssssssssssssssssss                                                                                                                [ 79%]
test/test_rma_nb.py ssssssssssssss                                                                                                                                   [ 80%]
test/test_spawn.py ssssssssssssssssssssssssssssssssssssssss                                                                                                          [ 83%]
test/test_status.py .........                                                                                                                                        [ 84%]
test/test_subclass.py ..............................ss..                                                                                                             [ 86%]
test/test_threads.py ...                                                                                                                                             [ 86%]
test/test_util_dtlib.py ...................                                                                                                                          [ 88%]
test/test_util_pkl5.py ..........................................................................................                                                    [ 94%]
test/test_win.py ssssssssssssssssss......................................ssssssssssssssssssss                                                                        [100%]

================================================================================= FAILURES =================================================================================
_________________________________________________________________ TestMessageSimpleNumPy.testNotContiguous _________________________________________________________________

>   ???
E   ValueError: ndarray is not contiguous

mpi4py/MPI/asbuffer.pxi:140: ValueError

During handling of the above exception, another exception occurred:

self = <test_msgspec.TestMessageSimpleNumPy testMethod=testNotContiguous>

    def testNotContiguous(self):
        sbuf = numpy.ones([3,2])[:,0]
        rbuf = numpy.zeros([3])
        sbuf.flags.writeable = False
>       self.assertRaises((BufferError, ValueError),
                          Sendrecv, sbuf, rbuf)

test/test_msgspec.py:457:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/test_msgspec.py:158: in Sendrecv
    MPI.COMM_SELF.Sendrecv(sendbuf=smsg, dest=0,   sendtag=0,
mpi4py/MPI/Comm.pyx:327: in mpi4py.MPI.Comm.Sendrecv
    ???
mpi4py/MPI/msgbuffer.pxi:455: in mpi4py.MPI.message_p2p_send
    ???
mpi4py/MPI/msgbuffer.pxi:438: in mpi4py.MPI._p_msg_p2p.for_send
    ???
mpi4py/MPI/msgbuffer.pxi:203: in mpi4py.MPI.message_simple
    ???
mpi4py/MPI/msgbuffer.pxi:138: in mpi4py.MPI.message_basic
    ???
mpi4py/MPI/asbuffer.pxi:365: in mpi4py.MPI.getbuffer
    ???
mpi4py/MPI/asbuffer.pxi:144: in mpi4py.MPI.PyMPI_GetBuffer
    ???
mpi4py/MPI/commimpl.pxi:142: in mpi4py.MPI.PyMPI_Lock
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: NumPy currently only supports dlpack for writeable arrays

mpi4py/MPI/asdlpack.pxi:193: TypeError
_________________________________________________________________ TestMessageSimpleNumPy.testNotWriteable __________________________________________________________________

>   ???
E   ValueError: buffer source array is read-only

mpi4py/MPI/asbuffer.pxi:140: ValueError

During handling of the above exception, another exception occurred:

self = <test_msgspec.TestMessageSimpleNumPy testMethod=testNotWriteable>

    def testNotWriteable(self):
        sbuf = numpy.ones([3])
        rbuf = numpy.zeros([3])
        rbuf.flags.writeable = False
>       self.assertRaises((BufferError, ValueError),
                          Sendrecv, sbuf, rbuf)

test/test_msgspec.py:450:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/test_msgspec.py:158: in Sendrecv
    MPI.COMM_SELF.Sendrecv(sendbuf=smsg, dest=0,   sendtag=0,
mpi4py/MPI/Comm.pyx:328: in mpi4py.MPI.Comm.Sendrecv
    ???
mpi4py/MPI/msgbuffer.pxi:460: in mpi4py.MPI.message_p2p_recv
    ???
mpi4py/MPI/msgbuffer.pxi:446: in mpi4py.MPI._p_msg_p2p.for_recv
    ???
mpi4py/MPI/msgbuffer.pxi:203: in mpi4py.MPI.message_simple
    ???
mpi4py/MPI/msgbuffer.pxi:138: in mpi4py.MPI.message_basic
    ???
mpi4py/MPI/asbuffer.pxi:365: in mpi4py.MPI.getbuffer
    ???
mpi4py/MPI/asbuffer.pxi:144: in mpi4py.MPI.PyMPI_GetBuffer
    ???
mpi4py/MPI/commimpl.pxi:142: in mpi4py.MPI.PyMPI_Lock
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: NumPy currently only supports dlpack for writeable arrays

mpi4py/MPI/asdlpack.pxi:193: TypeError
========================================================================= short test summary info ==========================================================================
SKIPPED [1] test/test_attributes.py:20: mpi-win-attr
SKIPPED [1] test/test_attributes.py:190: mpi-win-attr
SKIPPED [1] test/test_attributes.py:42: mpi-win-attr
SKIPPED [1] test/test_attributes.py:45: mpi-win-attr
SKIPPED [1] test/test_attributes.py:48: mpi-win-attr
SKIPPED [1] test/test_attributes.py:71: mpi-win-attr
SKIPPED [1] test/test_attributes.py:100: mpi-win-attr
SKIPPED [1] test/test_attributes.py:96: mpi-win-attr
SKIPPED [3] test/test_cco_obj_inter.py:115: mpi-world-size<2
SKIPPED [3] test/test_cco_obj_inter.py:158: mpi-world-size<2
SKIPPED [3] test/test_cco_obj_inter.py:124: mpi-world-size<2
SKIPPED [3] test/test_cco_obj_inter.py:55: mpi-world-size<2
SKIPPED [3] test/test_cco_obj_inter.py:59: mpi-world-size<2
SKIPPED [3] test/test_cco_obj_inter.py:77: mpi-world-size<2
SKIPPED [3] test/test_cco_obj_inter.py:132: mpi-world-size<2
SKIPPED [3] test/test_cco_obj_inter.py:96: mpi-world-size<2
SKIPPED [1] test/test_cffi.py:49: cffi
SKIPPED [1] test/test_cffi.py:70: cffi
SKIPPED [3] test/test_comm_inter.py:34: mpi-world-size<2
SKIPPED [3] test/test_comm_inter.py:41: mpi-world-size<2
SKIPPED [3] test/test_comm_inter.py:57: mpi-world-size<2
SKIPPED [3] test/test_comm_inter.py:50: mpi-world-size<2
SKIPPED [1] test/test_dynproc.py:63: mpi-world-size<2
SKIPPED [1] test/test_dynproc.py:112: mpi-world-size<2
SKIPPED [1] test/test_dynproc.py:161: mpi-world-size<2
SKIPPED [1] test/test_errhandler.py:45: mpi-win
SKIPPED [1] test/test_exceptions.py:314: mpi-win
SKIPPED [1] test/test_exceptions.py:301: mpi-win
SKIPPED [1] test/test_exceptions.py:305: mpi-win
SKIPPED [1] test/test_exceptions.py:309: mpi-win
SKIPPED [1] test/test_exceptions.py:331: mpi-win
SKIPPED [1] test/test_msgspec.py:239: python3
SKIPPED [1] test/test_msgspec.py:231: python3
SKIPPED [1] test/test_msgspec.py:263: mpi-world-size<2
SKIPPED [2] test/test_msgspec.py:393: cupy
SKIPPED [2] test/test_msgspec.py:396: cupy
SKIPPED [2] test/test_msgspec.py:399: cupy
SKIPPED [2] test/test_msgspec.py:402: cupy
SKIPPED [2] test/test_msgspec.py:405: cupy
SKIPPED [2] test/test_msgspec.py:408: cupy
SKIPPED [1] test/test_msgspec.py:506: cupy
SKIPPED [1] test/test_msgspec.py:493: cupy
SKIPPED [1] test/test_msgspec.py:499: cupy
SKIPPED [1] test/test_msgspec.py:393: numba
SKIPPED [1] test/test_msgspec.py:396: numba
SKIPPED [1] test/test_msgspec.py:399: numba
SKIPPED [1] test/test_msgspec.py:402: numba
SKIPPED [1] test/test_msgspec.py:405: numba
SKIPPED [1] test/test_msgspec.py:408: numba
SKIPPED [1] test/test_msgspec.py:550: numba
SKIPPED [1] test/test_msgspec.py:524: numba
SKIPPED [1] test/test_msgspec.py:537: numba
SKIPPED [1] test/test_msgspec.py:1040: cupy
SKIPPED [1] test/test_msgspec.py:1043: cupy
SKIPPED [1] test/test_msgspec.py:1046: cupy
SKIPPED [1] test/test_msgspec.py:1049: cupy
SKIPPED [1] test/test_msgspec.py:1052: cupy
SKIPPED [1] test/test_msgspec.py:1055: cupy
SKIPPED [1] test/test_msgspec.py:1040: numba
SKIPPED [1] test/test_msgspec.py:1043: numba
SKIPPED [1] test/test_msgspec.py:1046: numba
SKIPPED [1] test/test_msgspec.py:1049: numba
SKIPPED [1] test/test_msgspec.py:1052: numba
SKIPPED [1] test/test_msgspec.py:1055: numba
SKIPPED [1] test/test_msgspec.py:1199: cupy
SKIPPED [1] test/test_msgspec.py:1208: numba
SKIPPED [1] test/test_msgspec.py:1332: cupy
SKIPPED [1] test/test_msgspec.py:1341: numba
SKIPPED [1] test/test_msgspec.py:1300: python3
SKIPPED [2] test/test_msgzero.py:33: openmpi
SKIPPED [1] test/test_pickle.py:126: dill
SKIPPED [1] test/test_pickle.py:168: yaml
SKIPPED [2] test/test_rma.py:91: mpi-rma
SKIPPED [2] test/test_rma.py:261: mpi-rma
SKIPPED [2] test/test_rma.py:270: mpi-rma
SKIPPED [2] test/test_rma.py:207: mpi-rma
SKIPPED [2] test/test_rma.py:307: mpi-rma
SKIPPED [2] test/test_rma.py:319: mpi-rma
SKIPPED [2] test/test_rma.py:163: mpi-rma
SKIPPED [2] test/test_rma.py:407: mpi-rma
SKIPPED [2] test/test_rma.py:114: mpi-rma
SKIPPED [2] test/test_rma.py:279: mpi-rma
SKIPPED [2] test/test_rma.py:256: mpi-rma
SKIPPED [2] test/test_rma.py:340: mpi-rma
SKIPPED [2] test/test_rma.py:42: mpi-rma
SKIPPED [2] test/test_rma.py:251: mpi-rma
SKIPPED [2] test/test_rma.py:335: mpi-rma
SKIPPED [2] test/test_rma.py:369: mpi-rma
SKIPPED [2] test/test_rma.py:345: mpi-rma
SKIPPED [2] test/test_rma.py:397: mpi-rma
SKIPPED [2] test/test_rma_nb.py:67: mpi-rma-nb
SKIPPED [2] test/test_rma_nb.py:151: mpi-rma-nb
SKIPPED [2] test/test_rma_nb.py:161: mpi-rma-nb
SKIPPED [2] test/test_rma_nb.py:100: mpi-rma-nb
SKIPPED [2] test/test_rma_nb.py:144: mpi-rma-nb
SKIPPED [2] test/test_rma_nb.py:44: mpi-rma-nb
SKIPPED [2] test/test_rma_nb.py:137: mpi-rma-nb
SKIPPED [4] test/test_spawn.py:120: using CUDA
SKIPPED [4] test/test_spawn.py:219: using CUDA
SKIPPED [4] test/test_spawn.py:94: using CUDA
SKIPPED [4] test/test_spawn.py:151: using CUDA
SKIPPED [4] test/test_spawn.py:169: using CUDA
SKIPPED [4] test/test_spawn.py:183: using CUDA
SKIPPED [4] test/test_spawn.py:106: using CUDA
SKIPPED [4] test/test_spawn.py:197: using CUDA
SKIPPED [4] test/test_spawn.py:134: using CUDA
SKIPPED [4] test/test_spawn.py:239: using CUDA
SKIPPED [1] test/test_subclass.py:234: mpi-win
SKIPPED [1] test/test_subclass.py:229: mpi-win
SKIPPED [2] test/test_win.py:47: mpi-win-create
SKIPPED [2] test/test_win.py:95: mpi-win-create
SKIPPED [2] test/test_win.py:30: mpi-win-create
SKIPPED [2] test/test_win.py:53: mpi-win-create
SKIPPED [2] test/test_win.py:71: mpi-win-create
SKIPPED [2] test/test_win.py:61: mpi-win-create
SKIPPED [2] test/test_win.py:85: mpi-win-create
SKIPPED [2] test/test_win.py:38: mpi-win-create
SKIPPED [2] test/test_win.py:106: mpi-win-create
SKIPPED [2] test/test_win.py:194: mpi-win-dynamic
SKIPPED [2] test/test_win.py:189: mpi-win-dynamic
SKIPPED [2] test/test_win.py:95: mpi-win-dynamic
SKIPPED [2] test/test_win.py:176: mpi-win-dynamic
SKIPPED [2] test/test_win.py:53: mpi-win-dynamic
SKIPPED [2] test/test_win.py:71: mpi-win-dynamic
SKIPPED [2] test/test_win.py:61: mpi-win-dynamic
SKIPPED [2] test/test_win.py:85: mpi-win-dynamic
SKIPPED [2] test/test_win.py:182: mpi-win-dynamic
SKIPPED [2] test/test_win.py:106: mpi-win-dynamic
FAILED test/test_msgspec.py::TestMessageSimpleNumPy::testNotContiguous - TypeError: NumPy currently only supports dlpack for writeable arrays
FAILED test/test_msgspec.py::TestMessageSimpleNumPy::testNotWriteable - TypeError: NumPy currently only supports dlpack for writeable arrays
=============================================================== 2 failed, 1167 passed, 233 skipped in 24.75s ===============================================================

In my build procedure I've added temporary those failing units to to --deselect list.

build

opened by kloczek 12

Problem with corrupted data using allgather

I'm having a problem with data corruption when using allgather, but only one of the HPC systems we use. I think the problem is very likely somewhere in the infiniband stack, but I'm at a complete loss to track it down.

We can trigger the error with the following simple test script:

from mpi4py import MPI

NCOORD = 3600


class State:
    def __init__(self, coords):
        self.coords = coords


def main():
    world = MPI.COMM_WORLD
    rank = world.Get_rank()
    N = world.Get_size()

    with open(f"output_{rank}.txt", "w") as outfile:
        if rank == 0:
            # generate N random states
            states = []
            for i in range(N):
                coords = np.random.randn(NCOORD, 3)
                state = State(coords)
                states.append(state)

            # broadcast all states to each node
            world.bcast(states, root=0)

            # scatter a single state to each node
            state = world.scatter(states, root=0)

        else:
            states = world.bcast(None, root=0)
            state = world.scatter(None, root=0)

        while True:
            results = world.allgather(state)
            for i, (s1, s2) in enumerate(zip(states, results)):
                if (s1.coords != s2.coords).any():
                    print(f"Coordinates do not match on rank {rank}", file=outfile)
                    print(f"position {i}", file=outfile)
                    print("expected:", file=outfile)
                    print(s1.coords, file=outfile)
                    print("got:", file=outfile)
                    print(s2.coords, file=outfile)

                    mask = (s1.coords != s2.coords)
                    x1_diff = s1.coords[mask]
                    x2_diff = s2.coords[mask]
                    print("diff expected", file=outfile)
                    print(x1_diff, file=outfile)
                    print("diff got", file=outfile)
                    print(x2_diff, file=outfile)
                    raise ValueError()
            print("success")


if __name__ == "__main__":
    main()

This runs when ranks are confined to a single node, but fails whenever ranks span multiple nodes. The errors are either the ValueError triggered when the data does not match what is expected, an unknown pickle protocol error, or a pickle data truncated error. In each case, the data is apparently being corrupted.

Typically, one of the ranks on a node will complete successfully, whereas the remaining ranks will recieve garbage data. This makes me think this is some kind of data race.

The same script works fine on the other HPC system that we use, which makes me think it is a some kind of problem with the network stack, rather than with mpi4py. However, I have been getting pushback from our system adminstrators because the following equivalent c program runs without issue:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <sys/time.h>

double **alloc_darray2d(int, int);
void free_darray2d(double **, int, int);
double ***alloc_darray3d(int, int, int);
void free_darray3d(double ***, int, int, int);
double ***alloc_parray2d(int, int);
void free_parray2d(double ***, int, int);
void gauss_rng(double *, int, double);

int main(int argc, char *argv[]) {
int i, j, p,  myid, numprocs, n, ntot, iter, found;
double *sbuf, *rbuf;
int ncoord = 3600;
int seed = -1, maxiter = 1000000;
unsigned int lseed;
double ***states, **state, ***results;
double sigma = 1.0;
const double randmax = (double)RAND_MAX + 1.0;
struct timeval curtime;
char fname[25];
FILE *fd;

   MPI_Init(NULL, NULL);
   MPI_Comm_rank(MPI_COMM_WORLD, &myid);
   MPI_Comm_size(MPI_COMM_WORLD, &numprocs);

   /* allocate 2d array of doubles state[ncoord][3] contiguously in memory */
   state = alloc_darray2d(ncoord, 3);
   /* allocate 3d array of doubles states[numprocs][ncoord][3] contiguously in memory */
   states = alloc_darray3d(numprocs, ncoord, 3);
   n = ncoord*3;
   ntot = n*numprocs;
   if (states == NULL) { fprintf(stderr, "error: failed to allocate states array\n"); }

   if (myid == 0) {
      /* configure the random number generator */
      if (argc >= 2) {
         seed = atoi(argv[1]);
         if (argc == 3) {
            maxiter = atoi(argv[2]);
         }
      }
      if (seed >= 0) {
         lseed = seed;
      } else {
         /* randomly seed rng using the current microseconds within the current second */
         gettimeofday(&curtime, NULL);
         lseed = curtime.tv_usec;
      }
      srandom(lseed);

      /* fill the states array with gaussian distributed random numbers */
      for (p = 0; p < numprocs; p++) {
         for (i = 0; i < ncoord; i++) {
            for (j = 0; j < 3; j++) {
               /* random returns integer between 0 and RAND_MAX, divide
                  by randmax gives number within [0,1) */
               states[p][i][j] = random()/randmax;
            }
         }
      }
      gauss_rng(&states[0][0][0], ntot, sigma);
   }
   /* scatter states array into state array at each process */
   MPI_Bcast(&states[0][0][0], ntot, MPI_DOUBLE, 0, MPI_COMM_WORLD);
   MPI_Scatter(&states[0][0][0], n, MPI_DOUBLE, &state[0][0], n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
   /* gather state array from each process into results array */
   results = alloc_darray3d(numprocs, ncoord, 3);
   found = 0;
   for (iter = 1; iter <= maxiter; iter++) {
      MPI_Allgather(&state[0][0], n, MPI_DOUBLE, &results[0][0][0], n, MPI_DOUBLE, MPI_COMM_WORLD);
      snprintf(fname, 24, "scattergather.%d", myid);
      fd = fopen(fname, "w");
      /* compare states and results arrays */
      for (p = 0; p < numprocs; p++) {
         for (i = 0; i < ncoord; i++) {
            if (states[p][i][0] != results[p][i][0]
                || states[p][i][1] != results[p][i][1]
                || states[p][i][2] != results[p][i][2]) {
               fprintf(fd, "states and results differ on process %d at position %d:\n", myid, p);
               fprintf(fd, "states[%d][%d][0] = %g\nresults[%d][%d][0] = %g\n", p, i, p, i, states[p][i][0], results[p][i][0]);
               fprintf(fd, "states[%d][%d][1] = %g\nresults[%d][%d][1] = %g\n", p, i, p, i, states[p][i][1], results[p][i][1]);
               fprintf(fd, "states[%d][%d][2] = %g\nresults[%d][%d][2] = %g\n", p, i, p, i, states[p][i][2], results[p][i][2]);
               found = 1;
            }
         }
      }
      if (found == 1) { break; }
   }
   fclose(fd);
   MPI_Finalize();
}

#include <math.h>
void gauss_rng(double *z, int n, double sigma) {
/*
   gauss_rng converts an array z of length n filled with uniformly
   distributed random numbers in [0,1) into gaussian distributed
   random numbers with width sigma.
*/
double sq;
const double pi2 = 8.0*atan(1.0);
int i;

   if ((n/2)*2 != n) {
      fprintf(stderr, "error: n in gauss_rng must be even\n");
      return;
   }
   for (i = 0; i < n; i += 2) {
      sq = sigma*sqrt(-2.0*log(1.0 - z[i]));
      z[i] = sq*sin(pi2*z[i + 1]);
      z[i + 1] = sq*cos(pi2*z[i + 1]);
   }
}

/* allocate a double 2d array with subscript range
   dm[0,...,nx-1][0,...,ny-1]                      */
double **alloc_darray2d(int nx, int ny) {
  int i;
  double **dm;

  dm = (double **)malloc((size_t) nx*sizeof(double*));
  if (dm == NULL) { return NULL; }
  dm[0] = (double *)malloc((size_t) nx*ny*sizeof(double));
  if (dm[0] == NULL) {
     free((void *) dm);
     return NULL;
  }
  for (i = 1; i < nx; i++) dm[i] = dm[i - 1] + ny;
/* return pointer to array of pointers to rows */
  return dm;
}

void free_darray2d(double **dm, int nx, int ny) {
  free((void *) dm[0]);
  free((void *) dm);
}

/* allocate a 2d array of pointers to doubles with subscript range
   pm[0,...,nx-1][0,...,ny-1]           */
double ***alloc_parray2d(int nx, int ny) {
  int i;
  double ***pm;

  pm = (double ***)malloc((size_t) nx*sizeof(double**));
  if (pm == NULL) { return NULL; }
  pm[0]=(double **)malloc((size_t) nx*ny*sizeof(double*));
  if (pm[0] == NULL) {
     free((void *) pm);
     return NULL;
  }
  for (i = 1; i < nx; i++) pm[i] = pm[i - 1] + ny;
/* return pointer to array of pointers to rows */
  return pm;
}

void free_parray2d(double ***pm,int nx, int ny) {
  free((void *) pm[0]);
  free((void *) pm);
}

/* allocate 3d array of doubles with subscript range
   dm[0,...,nx-1][0,...,ny-1][0,...,nz-1]           */
double ***alloc_darray3d(int nx, int ny, int nz) {
  int i, j;
  double ***dm;

/* first, allocate 2d array of pointers to doubles */
  dm = alloc_parray2d(nx, ny);
  if (dm == NULL) return NULL;

/* allocate memory for the whole thing */
  dm[0][0] = (double *)malloc((size_t) nx*ny*nz*sizeof(double));
  if (dm[0][0] == NULL) {
    free_parray2d(dm, nx, ny);
    return NULL;
  }

/* set the pointers inside the matrix */
  for (i = 0; i < nx; i++) {
    for (j = 1; j < ny; j++) {
      dm[i][j] = dm[i][j - 1] + nz;
    }
    if (i < nx - 1) dm[i + 1][0] = dm[i][ny - 1] + nz;
  }

/* return pointer to array of matrix of pointers */
  return dm;
}

void free_darray3d(double ***dm, int nx, int ny, int nz) {
  free((void *) dm[0][0]);
  free_parray2d(dm, nx, ny);

opened by jlmaccal 12

CUDA-aware Ireduce and Iallreduce operations for PyTorch GPU tensors segfault

When calling either Ireduce or Iallreduce on PyTorch GPU tensors, a segfault occurs. I haven't exhaustively tested all of the ops, but I don't have problems with Reduce, Allreduce, Isend / Irecv, and Ibcast when tested the same way. I haven't tested CuPy tensors, but it might be worthwhile.

It might just be something I'm doing wrong when using these functions, so here is a minimal script that can be used to demonstrate this behavior. The errors are only present when running on GPU:

# mpirun -np 2 python repro.py gpu Ireduce
from mpi4py import MPI
import torch
import sys

if len(sys.argv) < 3:
    print('Usage: python repro.py [cpu|gpu] [MPI function to test]')
    sys.exit(1)

use_gpu = sys.argv[1] == 'gpu'
func_name = sys.argv[2]

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
if use_gpu:
    device = torch.device('cuda:' + str(rank % torch.cuda.device_count()))
else:
    device = torch.device('cpu')

def test_Iallreduce():
    sendbuf = torch.ones(1, device=device)
    recvbuf = torch.empty_like(sendbuf)
    torch.cuda.synchronize()
    req = comm.Iallreduce(sendbuf, recvbuf, op=MPI.SUM)  # also fails with MPI.MAX
    req.wait()
    assert recvbuf[0] == size

def test_Ireduce():
    buf = torch.ones(1, device=device)
    if rank == 0:
        sendbuf = MPI.IN_PLACE
        recvbuf = buf
    else:
        sendbuf = buf
        recvbuf = None
    torch.cuda.synchronize()
    req = comm.Ireduce(sendbuf, recvbuf, root=0, op=MPI.SUM)  # also fails with MPI.MAX
    req.wait()
    if rank == 0:
        assert buf[0] == size

eval('test_' + func_name + '()')

Software/Hardware Versions:

OpenMPI 4.1.2, 4.1.1, 4.1.0, and 4.0.7 (built w/ --with-cuda flag)
mpi4py 3.1.3 (built against above MPI version)
CUDA 11.0
Python 3.6 (also tested under 3.8)
Nvidia K80 GPU (also tested with V100)
OS Ubuntu 18.04 (also tested in containerized environment)
torch 1.10.1 (w/ GPU support)

You can reproduce my environment setup with the following commands:

wget https://www.open-mpi.org//software/ompi/v3.0/downloads/openmpi-4.1.2.tar.gz
tar xvf openmpi-4.1.2.tar.gz
cd openmpi-4.1.2
./configure --with-cuda --prefix=/opt/openmpi-4.1.2
sudo make -j4 all install
export PATH=/opt/openmpi-4.1.2/bin:$PATH
export LD_LIBRARY_PATH=/opt/openmpi-4.1.2/lib:$LD_LIBRARY_PATH
env MPICC=/opt/openmpi-4.1.2/bin/mpicc pip install mpi4py
pip install torch numpy

The error message for Ireduce is the following:

[<host>:25864] *** Process received signal ***
[<host>:25864] Signal: Segmentation fault (11)
[<host>:25864] Signal code: Invalid permissions (2)
[<host>:25864] Failing at address: 0x1201220000
[<host>:25864] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3f040)[0x7f00efcf3040]
[<host>:25864] [ 1] /opt/openmpi-4.1.2/lib/openmpi/mca_op_avx.so(+0xc079)[0x7f00e41c0079]
[<host>:25864] [ 2] /opt/openmpi-4.1.2/lib/openmpi/mca_coll_libnbc.so(+0x7385)[0x7f00d3330385]
[<host>:25864] [ 3] /opt/openmpi-4.1.2/lib/openmpi/mca_coll_libnbc.so(NBC_Progress+0x1f3)[0x7f00d3330033]
[<host>:25864] [ 4] /opt/openmpi-4.1.2/lib/openmpi/mca_coll_libnbc.so(ompi_coll_libnbc_progress+0x8e)[0x7f00d332e84e]
[<host>:25864] [ 5] /opt/openmpi-4.1.2/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7f00edefba3c]
[<host>:25864] [ 6] /opt/openmpi-4.1.2/lib/libopen-pal.so.40(ompi_sync_wait_mt+0xc5)[0x7f00edf025a5]
[<host>:25864] [ 7] /opt/openmpi-4.1.2/lib/libmpi.so.40(ompi_request_default_wait+0x1f9)[0x7f00ee4eafa9]
[<host>:25864] [ 8] /opt/openmpi-4.1.2/lib/libmpi.so.40(PMPI_Wait+0x52)[0x7f00ee532e02]
[<host>:25864] [ 9] /home/ubuntu/venv/lib/python3.6/site-packages/mpi4py/MPI.cpython-36m-x86_64-linux-gnu.so(+0xa81e2)[0x7f00ee8911e2]
[<host>:25864] [10] python[0x50a865]
[<host>:25864] [11] python(_PyEval_EvalFrameDefault+0x444)[0x50c274]
[<host>:25864] [12] python[0x509989]
[<host>:25864] [13] python[0x50a6bd]
[<host>:25864] [14] python(_PyEval_EvalFrameDefault+0x444)[0x50c274]
[<host>:25864] [15] python[0x507f94]
[<host>:25864] [16] python(PyRun_StringFlags+0xaf)[0x63500f]
[<host>:25864] [17] python[0x600911]
[<host>:25864] [18] python[0x50a4ef]
[<host>:25864] [19] python(_PyEval_EvalFrameDefault+0x444)[0x50c274]
[<host>:25864] [20] python[0x507f94]
[<host>:25864] [21] python(PyEval_EvalCode+0x23)[0x50b0d3]
[<host>:25864] [22] python[0x634dc2]
[<host>:25864] [23] python(PyRun_FileExFlags+0x97)[0x634e77]
[<host>:25864] [24] python(PyRun_SimpleFileExFlags+0x17f)[0x63862f]
[<host>:25864] [25] python(Py_Main+0x591)[0x6391d1]
[<host>:25864] [26] python(main+0xe0)[0x4b0d30]
[<host>:25864] [27] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f00efcd5bf7]
[<host>:25864] [28] python(_start+0x2a)[0x5b2a5a]
[<host>:25864] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node <host> exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I appreciate any guidance!

opened by jmerizia 12

The kernel appears to have died. It will restart automatically.
Hi there,

I would really like to know how to fix or work around the problem I encountered when running a simple code snipper from the docs in a jupyter notebook.

from mpi4py import MPI from mpi4py.futures import MPICommExecutor with MPICommExecutor(MPI.COMM_WORLD, root=0) as executor: if executor is not None: future = executor.submit(abs, -42) assert future.result() == 42 answer = set(executor.map(abs, [-42, 42])) assert answer == {42}

When I run this code with

MPI4PY_FUTURES_MAX_WORKERS=8 mpiexec -n 1 jupyter notebook

in a jupyter notebook, I get the error The kernel appears to have died. It will restart automatically.. The problem happens on Linux, but on Windows it works fine.

Thanks in advance.
opened by YarShev 6

3.1.3: sphinx fails

I have problems with generate documentation. Any hint what it could be?

+ /usr/bin/sphinx-build -j48 -n -T -b man docs/source/usrman build/sphinx/man
Running Sphinx v5.3.0
/usr/lib/python3.8/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (None)/charset_normalizer (3.0.0) doesn't match a supported version!
  warnings.warn(
making output directory... done
[pers-jacek:3312327] mca_base_component_repository_open: unable to open mca_pmix_ext3x: /usr/lib64/openmpi/mca_pmix_ext3x.so: undefined symbol: pmix_value_load (ignored)
[pers-jacek:3312327] [[17983,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 320
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_pmix_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[pers-jacek:3312325] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 716
[pers-jacek:3312325] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 172
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[pers-jacek:3312325] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

opened by kloczek 12

MPICommExecutor & MPIPoolExecutor Freeze Indefinitely

Architecture: Power9 (Summit Super Computer)

MPI Version: Package: IBM Spectrum MPI Spectrum MPI: 10.4.0.03rtm4 Spectrum MPI repo revision: IBM_SPECTRUM_MPI_10.04.00.03_2021.01.12_RTM4 Spectrum MPI release date: Unreleased developer copy

MPI4py Version: 3.1.1

Reproduce Script:

from mpi4py.futures import MPICommExecutor
from mpi4py import MPI
import time
import os

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

# square the numbers
def apply_fun(i):
    print('running apply!', flush=True)
    return i**2*rank

print('pid:',os.getpid(), flush=True)
print('rank:', rank, flush=True)

# this *does* implement map-reduce and supposedly works on legacy systems without dynamic process management
# (I've gotten it working with `jsrun -n 1` but so far no luck with multiple processes)
# see the docs: https://mpi4py.readthedocs.io/en/stable/mpi4py.futures.html?highlight=MPICommExecutor#mpicommexecutor
with MPICommExecutor(MPI.COMM_WORLD, root=0) as executor:
    if executor is not None:
        print('Executor started from root!', flush=True)
        answer = list(executor.map(apply_fun, range(41)))
        print('pid: ',os.getpid(),'rank:',rank, answer, flush=True)

jsrun python mpi_test.py output:

Warning: OMP_NUM_THREADS=16 is greater than available PU's
Warning: OMP_NUM_THREADS=16 is greater than available PU's
pid: 1448946
rank: 1
pid: 1448945
rank: 0
Executor started from root!
running apply!

Then indefinite freeze. Btw jsrun is summit's 'custom version' of mpirun/mpiexec and it works really well in general (in contrast to mpirun & mpiexec). Also with this exact same setup I had no problem using MPI.gather() & MPI.scatter() it is just the Executors which don't work which troublesome because I really like the map-based API.

opened by profPlum 16

Wrap MPIX_Query_cuda_support

Our application, PyFR, makes very successful use of mpi4py and has support for CUDA-aware MPI implementations. Here, however, our biggest issue is knowing if the MPI distribution we are running under is CUDA aware or not. Although there does not appear to be a perfect solution, for OpenMPI derivatives, the following is reasonable:

https://www.open-mpi.org/faq/?category=runcuda#mpi-cuda-aware-support

with the API of interest being MPIX_Query_cuda_support. It would therefore be nice if mpi4py could expose this attribute (None/False/True).

opened by FreddieWitherden 6
testPackUnpackExternal alignment error on sparc64
sparc64 is not the most common architecture around, but for what it's worth 3.1.2 has started giving a Bus Error (Invalid address alignment) in testPackUnpackExternal (test_pack.TestPackExternal),

testProbeRecv (test_p2p_obj_matched.TestP2PMatchedWorldDup) ... ok testPackSize (test_pack.TestPackExternal) ... ok testPackUnpackExternal (test_pack.TestPackExternal) ... [sompek:142729] *** Process received signal *** [sompek:142729] Signal: Bus error (10) [sompek:142729] Signal code: Invalid address alignment (1) [sompek:142729] Failing at address: 0xffff800100ea2821 [sompek:142729] *** End of error message *** Bus error make[1]: *** [debian/rules:91: override_dh_auto_test] Error 1

Full log at https://buildd.debian.org/status/fetch.php?pkg=mpi4py&arch=sparc64&ver=3.1.2-1&stamp=1636215944&raw=0

It previously passed with 3.1.1.

Ongoing sparc64 build logs at https://buildd.debian.org/status/logs.php?pkg=mpi4py&arch=sparc64
opened by drew-parsons 11

Releases(3.1.4)

3.1.4(Nov 2, 2022)
WARNING: This is the last release supporting Python 2.

Rebuild C sources with Cython 0.29.32 to support Python 3.11.

Fix contiguity check for DLPack and CAI buffers.

Workaround build failures with setuptools v60.

Source code(tar.gz)
Source code(zip)
mpi4py-3.1.4.tar.gz(2.37 MB)
3.1.3(Nov 25, 2021)
WARNING: This is the last release supporting Python 2.

Add missing support for MPI.BOTTOM to generalized all-to-all collectives.

Source code(tar.gz)
Source code(zip)
mpi4py-3.1.3.tar.gz(2.34 MB)
3.1.2(Nov 4, 2021)
WARNING: This is the last release supporting Python 2.

mpi4py.futures: Add _max_workers property to MPIPoolExecutor.

mpi4py.util.dtlib: Fix computation of alignment for predefined datatypes.

mpi4py.util.pkl5: Fix deadlock when using ssend() + mprobe().

mpi4py.util.pkl5: Add environment variable MPI4PY_PICKLE_THRESHOLD.

mpi4py.rc: Interpret "y" and "n" strings as boolean values.

Fix/add typemap/typestr for MPI.WCHAR/MPI.COUNT datatypes.

Minor fixes and additions to documentation.

Minor fixes to typing support.

Support for local version identifier (PEP-440).

Source code(tar.gz)
Source code(zip)
mpi4py-3.1.2.tar.gz(2.34 MB)
3.1.1(Aug 14, 2021)
WARNING: This is the last release supporting Python 2.

Fix typo in Requires-Python package metadata.

Regenerate C sources with Cython 0.29.24.

Source code(tar.gz)
Source code(zip)
mpi4py-3.1.1.tar.gz(2.33 MB)
3.1.0(Aug 12, 2021)
WARNING: This is the last release supporting Python 2.

New features:

mpi4py.util: New package collecting miscellaneous utilities.

Enhancements:

Add pickle-based Request.waitsome() and Request.testsome().

Add lowercase methods Request.get_status() and Request.cancel().

Support for passing Python GPU arrays compliant with the DLPack_ data interchange mechanism (link) and the __cuda_array_interface__ (CAI) standard (link) to uppercase methods. This support requires that mpi4py is built against CUDA-aware MPI implementations. This feature is currently experimental and subject to future changes.

mpi4py.futures: Add support for initializers and canceling futures at shutdown. Environment variables names now follow the pattern MPI4PY_FUTURES_*, the previous MPI4PY_* names are deprecated.

Add type annotations to Cython code. The first line of the docstring of functions and methods displays a signature including type annotations.

Add companion stub files to support type checkers.

Support for weak references.

Miscellaneous:

Add a new mpi4py publication (link) to the citation listing.

Source code(tar.gz)
Source code(zip)
mpi4py-3.1.0.tar.gz(2.33 MB)
3.0.3(Jul 27, 2020)
Regenerate Cython wrappers to support Python 3.8.

Source code(tar.gz)
Source code(zip)
mpi4py-3.0.3.tar.gz(1.36 MB)
3.0.2(Jul 27, 2020)
Bug fixes:

Fix handling of readonly buffers in support for Python 2 legacy buffer interface. The issue triggers only when using a buffer-like object that is readonly and does not export the new Python 3 buffer interface.

Fix build issues with Open MPI 4.0.x series related to removal of many MPI-1 symbols deprecated in MPI-2 and removed in MPI-3.

Minor documentation fixes.

Source code(tar.gz)
Source code(zip)
mpi4py-3.0.2.tar.gz(1.36 MB)
3.0.1(Jul 27, 2020)
Bug fixes:

Fix Comm.scatter() and other collectives corrupting input send list. Add safety measures to prevent related issues in global reduction operations.

Fix error-checking code for counts in Op.Reduce_local().

Enhancements:

Map size-specific Python/NumPy typecodes to MPI datatypes.

Allow partial specification of target list/tuple arguments in the various Win RMA methods.

Workaround for removal of MPI_{LB|UB} in Open MPI 4.0.

Support for Microsoft MPI v10.0.

Source code(tar.gz)
Source code(zip)
mpi4py-3.0.1.tar.gz(1.36 MB)
3.0.0(Jul 27, 2020)
New features:

mpi4py.futures: Execute computations asynchronously using a pool of MPI processes. This package is based on concurrent.futures from the Python standard library.

mpi4py.run: Run Python code and abort execution in case of unhandled exceptions to prevent deadlocks.

mpi4py.bench: Run basic MPI benchmarks and tests.

Enhancements:

Lowercase, pickle-based collective communication calls are now thread-safe through the use of fine-grained locking.

The MPI module now exposes a memory type which is a lightweight variant of the builtin memoryview type, but exposes both the legacy Python 2 and the modern Python 3 buffer interface under a Python 2 runtime.

The MPI.Comm.Alltoallw() method now uses count=1 and displ=0 as defaults, assuming that messages are specified through user-defined datatypes.

The Request.Wait[all]() methods now return True to match the interface of Request.Test[all]().

The Win class now implements the Python buffer interface.

Backward-incompatible changes:

The buf argument of the MPI.Comm.recv() method is deprecated, passing anything but None emits a warning.

The MPI.Win.memory property was removed, use the MPI.Win.tomemory() method instead.

Executing python -m mpi4py in the command line is now equivalent to python -m mpi4py.run. For the former behavior, use python -m mpi4py.bench.

Python 2.6 and 3.2 are no longer supported. The mpi4py.MPI module may still build and partially work, but other pure-Python modules under the mpi4py namespace will not.

Windows: Remove support for legacy MPICH2, Open MPI, and DeinoMPI.

Source code(tar.gz)
Source code(zip)
mpi4py-3.0.0.tar.gz(1.36 MB)
2.0.0(Jul 27, 2020)
Support for MPI-3 features.

Matched probes and receives.

Nonblocking collectives.

Neighborhood collectives.

New communicator constructors.

Request-based RMA operations.

New RMA communication and synchronisation calls.

New window constructors.

New datatype constructor.

New C++ boolean and floating complex datatypes.

Support for MPI-2 features not included in previous releases.

Generalized All-to-All collective (Comm.Alltoallw())

User-defined data representations (Register_datarep())

New scalable implementation of reduction operations for Python objects. This code is based on binomial tree algorithms using point-to-point communication and duplicated communicator contexts. To disable this feature, use mpi4py.rc.fast_reduce = False.

Backward-incompatible changes:

Python 2.4, 2.5, 3.0 and 3.1 are no longer supported.

Default MPI error handling policies are overriden. After import, mpi4py sets the ERRORS_RETURN error handler in COMM_SELF and COMM_WORLD, as well as any new Comm, Win, or File instance created through mpi4py, thus effectively ignoring the MPI rules about error handler inheritance. This way, MPI errors translate to Python exceptions. To disable this behavior and use the standard MPI error handling rules, use mpi4py.rc.errors = 'default'.

Change signature of all send methods, dest is a required argument.

Change signature of all receive and probe methods, source defaults to ANY_SOURCE, tag defaults to ANY_TAG.

Change signature of send lowercase-spelling methods, obj arguments are not mandatory.

Change signature of recv lowercase-spelling methods, renamed 'obj' arguments to 'buf'.

Change Request.Waitsome() and Request.Testsome() to return None or list.

Change signature of all lowercase-spelling collectives, sendobj arguments are now mandatory, recvobj arguments were removed.

Reduction operations MAXLOC and MINLOC are no longer special-cased in lowercase-spelling methods Comm.[all]reduce() and Comm.[ex]scan(), the input object must be specified as a tuple (obj, location).

Change signature of name publishing functions. The new signatures are Publish_name(service_name, port_name, info=INFO_NULL) and ``Unpublish_name(service_name, port_name, info=INFO_NULL)```.

Win instances now cache Python objects exposing memory by keeping references instead of using MPI attribute caching.

Change signature of Win.Lock(). The new signature is Win.Lock(rank, lock_type=LOCK_EXCLUSIVE, assertion=0).

Move Cartcomm.Map() to Intracomm.Cart_map().

Move Graphcomm.Map() to Intracomm.Graph_map().

Remove the mpi4py.MPE module.

Rename the Cython definition file for use with cimport statement from mpi_c.pxd to libmpi.pxd.

Source code(tar.gz)
Source code(zip)
mpi4py-2.0.0.tar.gz(1.22 MB)
1.3.1(Jul 27, 2020)
Regenerate C wrappers with Cython 0.19.1 to support Python 3.3.

Install *.pxd files in <site-packages>/mpi4py to ease the support for Cython's cimport statement in code requiring to access mpi4py internals.

As a side-effect of using Cython 0.19.1, ancient Python 2.3 is no longer supported. If you really need it, you can install an older Cython and run python setup.py build_src --force.

Source code(tar.gz)
Source code(zip)
mpi4py-1.3.1.tar.gz(1022.05 KB)