Interactive Parallel Computing in Python

Overview

Interactive Parallel Computing with IPython

ipyparallel is the new home of IPython.parallel. ipyparallel is a Python package and collection of CLI scripts for controlling clusters for Jupyter.

ipyparallel contains the following CLI scripts:

  • ipcluster - start/stop a cluster
  • ipcontroller - start a scheduler
  • ipengine - start an engine

Install

Install ipyparallel:

pip install ipyparallel

To enable the IPython Clusters tab in Jupyter Notebook:

ipcluster nbextension enable

To disable it again:

ipcluster nbextension disable

See the documentation on configuring the notebook server to find your config or setup your initial jupyter_notebook_config.py.

JupyterHub Install

To install for all users on JupyterHub, as root:

jupyter nbextension install --sys-prefix --py ipyparallel
jupyter nbextension enable --sys-prefix --py ipyparallel
jupyter serverextension enable --sys-prefix --py ipyparallel

Run

Start a cluster:

ipcluster start

Use it from Python:

import os
import ipyparallel as ipp

rc = ipp.Client()
ar = rc[:].apply_async(os.getpid)
pid_map = ar.get_dict()

See the docs for more info.

Comments
  • Cannot enable debug in mpi engines `WARNING | debugpy_stream undefined, debugging will not be enabled`

    Cannot enable debug in mpi engines `WARNING | debugpy_stream undefined, debugging will not be enabled`

    When I start the ipcluster by

    ipcluster start --engines=mpi -n 2 --debug
    

    The engines report the following warning:

    2021-11-27 13:31:41.056 [IPEngine.0] WARNING | debugpy_stream undefined, debugging will not be enabled
    2021-11-27 13:31:41.061 [IPEngine.1] WARNING | debugpy_stream undefined, debugging will not be enabled
    

    How can I enable the debugging in engines? Thanks!

    The output of the cluster is attached: cluster.log

    opened by lrtfm 23
  • install from git fails with FIPS-compliant nodejs

    install from git fails with FIPS-compliant nodejs

    I'm trying to package your module as an rpm package. So I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

    • python3 -sBm build -w
    • install .whl file in </install/prefix>
    • run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

    In this case sbove procedure fails on first step

    + /usr/bin/python3 -sBm build -w
    * Creating venv isolated environment...
    * Installing packages in isolated environment... (jupyterlab>=3.0.0,==3.*, packaging, setuptools>=40.8.0, wheel)
    * Getting dependencies for wheel...
    running egg_info
    creating ipyparallel.egg-info
    writing manifest file 'ipyparallel.egg-info/SOURCES.txt'
    no previously-included directories found matching 'lab/lib'
    warning: no directories found matching 'ipyparallel/labextension'
    no previously-included directories found matching 'docs/_build'
    warning: no previously-included files matching '*~' found anywhere in distribution
    warning: no previously-included files matching '*.pyc' found anywhere in distribution
    warning: no previously-included files matching '*.pyo' found anywhere in distribution
    warning: no previously-included files matching '.git' found anywhere in distribution
    warning: no previously-included files matching '.ipynb_checkpoints' found anywhere in distribution
    warning: no previously-included files matching '.DS_Store' found anywhere in distribution
    writing manifest file 'ipyparallel.egg-info/SOURCES.txt'
    * Installing packages in isolated environment... (wheel)
    * Building wheel...
    running bdist_wheel
    running pre_dist
    yarn not found, ignoring yarn.lock file
    yarn install v1.21.1
    [1/4] Resolving packages...
    [2/4] Fetching packages...
    [3/4] Linking dependencies...
    warning " > @lumino/[email protected]" has unmet peer dependency "[email protected]".
    warning "@jupyterlab/builder > @jupyterlab/buildutils > verdaccio > [email protected]" has unmet peer dependency "typanion@*".
    warning Workspaces can only be enabled in private projects.
    warning Workspaces can only be enabled in private projects.
    [4/4] Building fresh packages...
    warning Your current version of Yarn is out of date. The latest version is "1.22.17", while you're on "1.21.1".
    Done in 6.96s.
    yarn run v1.21.1
    $ jlpm run build:lib && jlpm run build:labextension
    $ tsc
    $ jupyter labextension build .
    Building extension in .
    
    Compilation starting…
    
    node:internal/crypto/hash:67
      this[kHandle] = new _Hash(algorithm, xofLen);
                      ^
    
    Error: error:0308010C:digital envelope routines::unsupported
        at new Hash (node:internal/crypto/hash:67:19)
        at Object.createHash (node:crypto:130:10)
        at BulkUpdateDecorator.hashFactory (/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/node_modules/webpack/lib/util/createHash.js:145:18)
        at BulkUpdateDecorator.update (/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/node_modules/webpack/lib/util/createHash.js:46:50)
        at RawSource.updateHash (/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/node_modules/webpack/node_modules/webpack-sources/lib/RawSource.js:70:8)
        at NormalModule._initBuildHash (/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/node_modules/webpack/lib/NormalModule.js:880:17)
        at handleParseResult (/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/node_modules/webpack/lib/NormalModule.js:946:10)
        at /home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/node_modules/webpack/lib/NormalModule.js:1040:4
        at processResult (/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/node_modules/webpack/lib/NormalModule.js:755:11)
        at /home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/node_modules/webpack/lib/NormalModule.js:819:5 {
      opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
      library: 'digital envelope routines',
      reason: 'unsupported',
      code: 'ERR_OSSL_EVP_UNSUPPORTED'
    }
    An error occurred.
    subprocess.CalledProcessError: Command '['node', '/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/node_modules/@jupyterlab/builder/lib/build-labextension.js', '--core-path', '/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/jupyterlab/staging', '/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0']' returned non-zero exit status 1.
    See the log file for details:  /tmp/jupyterlab-debug-o3qqk3av.log
    error Command failed with exit code 1.
    info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
    error Command failed with exit code 1.
    info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
    Traceback (most recent call last):
      File "/usr/lib/python3.8/site-packages/pep517/in_process/_in_process.py", line 363, in <module>
        main()
      File "/usr/lib/python3.8/site-packages/pep517/in_process/_in_process.py", line 345, in main
        json_out['return_val'] = hook(**hook_input['kwargs'])
      File "/usr/lib/python3.8/site-packages/pep517/in_process/_in_process.py", line 261, in build_wheel
        return _build_backend().build_wheel(wheel_directory, config_settings,
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/build_meta.py", line 230, in build_wheel
        return self._build_with_temp_dir(['bdist_wheel'], '.whl',
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/build_meta.py", line 215, in _build_with_temp_dir
        self.run_setup()
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/build_meta.py", line 158, in run_setup
        exec(compile(code, __file__, 'exec'), locals())
      File "setup.py", line 173, in <module>
        setuptools.setup(**setup_args)
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/_distutils/core.py", line 148, in setup
        return run_commands(dist)
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
        dist.run_commands()
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
        self.run_command(cmd)
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
        cmd_obj.run()
      File "/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/setupbase.py", line 131, in run
        self.run_command(pre_build.__name__)
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/tmp/build-env-lghtzk_q/lib64/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
        cmd_obj.run()
      File "/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/setupbase.py", line 112, in run
        func()
      File "/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/setupbase.py", line 214, in builder
        run(npm_cmd + ['run', build_cmd], cwd=node_package)
      File "/home/tkloczko/rpmbuild/BUILD/ipyparallel-8.1.0/setupbase.py", line 275, in run
        return subprocess.check_call(cmd, **kwargs)
      File "/usr/lib64/python3.8/subprocess.py", line 364, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['/tmp/build-env-lghtzk_q/bin/jlpm', 'run', 'build:prod']' returned non-zero exit status 1.
    
    ERROR Backend subproccess exited when trying to invoke build_wheel
    
    opened by kloczek 19
  • AttributeError: module 'numpy' has no attribute '__version__'

    AttributeError: module 'numpy' has no attribute '__version__'

    am getting the following error message when I attempt to import pandas:

    import pandas AttributeError: module 'numpy' has no attribute 'version'

    Following is the full error message: `--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 1 import numpy ----> 2 import pandas 3 4 5 print("hello")

    C:\ProgramData\Anaconda3\lib\site-packages\pandas_init_.py in 21 22 # numpy compat ---> 23 from pandas.compat.numpy import * 24 25 try:

    C:\ProgramData\Anaconda3\lib\site-packages\pandas\compat\numpy_init_.py in 8 9 # numpy versioning ---> 10 _np_version = np.version 11 _nlv = LooseVersion(_np_version) 12 _np_version_under1p10 = _nlv < LooseVersion('1.10')

    AttributeError: module 'numpy' has no attribute 'version'`

    opened by mehdiHadji 16
  • ipcluster creates one icon per engine in Dock.app

    ipcluster creates one icon per engine in Dock.app

    Since I upgraded to 10.11.x (currently 10.11.6), every launch of engines with ipcluster creates an icon per engine in my macOS Dock.app. Can this be avoided? Strangely, I don't find any tips on that with Google (the google-fu is strong in this one), so I'm afraid I have to bother you guys.

    1. The command:

    ipcluster start -n 6

    1. The env:
    • ipcluster --version: 5.2.0
    • python 3.5
    • running via conda, using conda_forge channel for everything
    1. The symptom:
    screenshot 2016-11-08 14 02 21
    opened by michaelaye 15
  • Basic Streaming Output Implementation

    Basic Streaming Output Implementation

    A very basic implementation to enable streaming stdout/stderr/display-outputs.

    Closes #434 (I think).

    Enabled by default for blocking execution when using the %%px magic.

    opened by sahil1105 14
  • Cannot read large buffer/file from engine

    Cannot read large buffer/file from engine

    I'm trying to read a file/large string buffer from an engine . But it is returning <memory at 0x036F3B70> instead of the files data.

    To reproduce:

    1. Start an ipcluster
    2. Create a client
    3. Create a file > 1MB
    4. Read the file using apply function
    
    In [2]: from ipyparallel import Client
    
    In [3]: c = Client()
    
    In [4]: d = c[0]
    
    In [5]: def read(path):
       ...:     with open(path,'rb') as f:
       ...:         return f.read(1024000)
       ...:
    
    In [6]: r = d.apply_async(read,p)
    
    In [7]: r.get()
    Out[7]: '<memory at 0x0319E5D0>'
    
    In [8]: def read():
       ...:     return 'a'*1024000
       ...:
    
    In [9]: r = d.apply_async(read)
    
    In [10]: r.get()
    Out[10]: '<memory at 0x0319E580>'
    
    

    This worked fine on 3.1.0.

    opened by frmdstryr 13
  • Boost python exceptions crashing engines

    Boost python exceptions crashing engines

    I am trying to use ipyparallel to parallelize the workload in a Python project largely consisting of a C++ codebase exposed to Python via Boost.Python. I am using a local ipcluster for testing:

    ipcluster start -n 4 --debug
    

    I am using the task based interface:

    >>> import ipyparallel
    >>> rc = ipyparallel.Client()
    >>> lview = rc.load_balanced_view()
    >>> ar = lview.apply(...)
    >>> ar.get()
    

    Everything seems to work properly in my initial testing, until I try to distribute a task in which C++ code throws an exception (FYI, Boost.Python has a mechanism in which C++ exceptions are caught at the boundary between C++ and Python, and re-thrown in Python as Python exceptions -- in many years of using Boost Python, this mechanism has always been working flawlessly for my use cases).

    When I try to distribute a task that throws from C++, I can see the following happening in the debug output of ipcluster:

    2017-03-05 22:07:17.940 [IPClusterStart] Process '/usr/bin/python3.5' stopped: {'pid': 18580, 'exit_code': -6}
    2017-03-05 22:07:21.520 [IPClusterStart] b"2017-03-05 22:07:21.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 1"
    2017-03-05 22:07:24.520 [IPClusterStart] b"2017-03-05 22:07:24.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 2"
    2017-03-05 22:07:27.520 [IPClusterStart] b"2017-03-05 22:07:27.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 3"
    2017-03-05 22:07:30.520 [IPClusterStart] b"2017-03-05 22:07:30.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 4"
    2017-03-05 22:07:33.520 [IPClusterStart] b"2017-03-05 22:07:33.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 5"
    2017-03-05 22:07:36.520 [IPClusterStart] b"2017-03-05 22:07:36.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 6"
    2017-03-05 22:07:39.520 [IPClusterStart] b"2017-03-05 22:07:39.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 7"
    2017-03-05 22:07:42.520 [IPClusterStart] b"2017-03-05 22:07:42.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 8"
    2017-03-05 22:07:45.520 [IPClusterStart] b"2017-03-05 22:07:45.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 9"
    2017-03-05 22:07:48.520 [IPClusterStart] b"2017-03-05 22:07:48.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 10"
    2017-03-05 22:07:51.520 [IPClusterStart] b"2017-03-05 22:07:51.520 [IPControllerApp] heartbeat::missed b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' : 11"
    2017-03-05 22:07:51.521 [IPClusterStart] b'2017-03-05 22:07:51.520 [IPControllerApp] registration::unregister_engine(3)'
    2017-03-05 22:07:56.531 [IPClusterStart] b'ERROR:tornado.application:Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f3ce0377598>)'
    2017-03-05 22:07:56.535 [IPClusterStart] b'Traceback (most recent call last):'
    2017-03-05 22:07:56.535 [IPClusterStart] b'  File "/usr/lib64/python3.5/site-packages/tornado/ioloop.py", line 604, in _run_callback'
    2017-03-05 22:07:56.535 [IPClusterStart] b'    ret = callback()'
    2017-03-05 22:07:56.536 [IPClusterStart] b'  File "/usr/lib64/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper'
    2017-03-05 22:07:56.536 [IPClusterStart] b'    return fn(*args, **kwargs)'
    2017-03-05 22:07:56.536 [IPClusterStart] b'  File "/usr/lib64/python3.5/site-packages/ipyparallel/controller/scheduler.py", line 325, in <lambda>'
    2017-03-05 22:07:56.536 [IPClusterStart] b'    lambda : self.handle_stranded_tasks(uid),'
    2017-03-05 22:07:56.536 [IPClusterStart] b'  File "/usr/lib64/python3.5/site-packages/ipyparallel/controller/scheduler.py", line 335, in handle_stranded_tasks'
    2017-03-05 22:07:56.536 [IPClusterStart] b'    for msg_id in lost.keys():'
    2017-03-05 22:07:56.537 [IPClusterStart] b'RuntimeError: dictionary changed size during iteration'
    2017-03-05 22:07:56.537 [IPClusterStart] b"2017-03-05 22:07:56.534 [IPControllerApp] task::task '61317ff8-c759-4faa-aebb-2fd57595c586' finished on 3"
    

    On the ipython side, when I try to .get() the result from the future, the interpreter waits for a bit and then says:

    In [14]: ar.get()
    Traceback (most recent call last):
      File "/usr/lib64/python3.5/site-packages/ipyparallel/controller/scheduler.py", line 347, in handle_stranded_tasks
        raise error.EngineError("Engine %r died while running task %r"%(engine, msg_id))
    ipyparallel.error.EngineError: Engine b'af9a4de7-e140-4a91-b4bb-c7089b04adc9' died while running task '61317ff8-c759-4faa-aebb-2fd57595c586'
    

    I am able to reproduce the problem with a minimal Boost.Python module consisting of a single class with a single method which throws an std::invalid exception, which normally is translated into a ValueError python exception. I can make this minimal example available in a github repo if it can help to debug the issue. The exception translation works fine from the ipython prompt:

    In [1]: import ipy_testmod
    
    In [2]: s = ipy_testmod.my_struct()
    
    In [3]: s.my_method(123)
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-3-ba5412712f47> in <module>()
    ----> 1 s.my_method(123)
    
    ValueError: error!
    
    In [4]:
    

    It just breaks down when thrown from an ipyparallel engine:

    In [7]: def func(x, n):
       ...:     return x.my_method(n)
       ...: 
    
    In [8]: ar = lview.apply(func, s, 123)
    
    In [9]: ar.get()
    Traceback (most recent call last):
      File "/usr/lib64/python3.5/site-packages/ipyparallel/controller/scheduler.py", line 347, in handle_stranded_tasks
        raise error.EngineError("Engine %r died while running task %r"%(engine, msg_id))
    ipyparallel.error.EngineError: Engine b'03245294-8978-4b6c-a3c6-abf4e0cb7710' died while running task '936c8b82-bbf9-4723-955b-657c048d93e3'
    

    Wrapping the throwing code into a try/except block does not seem to help:

    In [10]: def func(x, n):
        ...:     try:
        ...:         return x.my_method(n)
        ...:     except:
        ...:         return 23
        ...: 
    
    In [11]: ar = lview.apply(func, s, 123)
    
    In [12]: ar.get()
    Traceback (most recent call last):
      File "/usr/lib64/python3.5/site-packages/ipyparallel/controller/scheduler.py", line 347, in handle_stranded_tasks
        raise error.EngineError("Engine %r died while running task %r"%(engine, msg_id))
    ipyparallel.error.EngineError: Engine b'91e8abe8-2371-469c-af4b-f6d376ed3e5e' died while running task '3b1e0902-a321-4d06-affa-d9417c18b2d4'
    
    In [13]:
    

    It seems like just the act of throwing from C++ brings down the engine.

    opened by bluescarni 12
  • nbextension does not enable clusters tab

    nbextension does not enable clusters tab

    I have a conda environment, in which I installed ipyparallel using pip install ipyparallel.

    I tried every iteration of nbextension enable I found online, but still the "Clusters" tab does not work on my jupyter notebook. It appears that the extension is enabled:

    [conda_environment] λ jupyter nbextension list
    Known nbextensions:
      config dir: C:\anaconda3\envs\conda_environment\etc\jupyter\nbconfig
        notebook section
          jupyter-js-widgets/extension enabled
          - Validating: ok
        tree section
          ipyparallel/main enabled
          - Validating: ok
      config dir: C:\ProgramData\jupyter\nbconfig
        tree section
          ipyparallel/main enabled
          - Validating: ok
    

    But when I start up my notebook, I see:

    image

    Here is the notebook terminal output, which makes it look like the extension is being loaded:

    [conda_environment] λ jupyter notebook
    [I 12:11:25.173 NotebookApp] Loading IPython parallel extension
    [I 12:11:25.424 NotebookApp] Serving notebooks from local directory: C:\Users\username
    [I 12:11:25.425 NotebookApp] 0 active kernels
    [I 12:11:25.425 NotebookApp] The Jupyter Notebook is running at:
    [I 12:11:25.426 NotebookApp] http://localhost:8888/?token=XXXXXXXXXXXXXX
    [I 12:11:25.426 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    [C 12:11:25.430 NotebookApp]
    
        Copy/paste this URL into your browser when you connect for the first time,
        to login with a token:
            http://localhost:8888/?token=XXXXXXXXXXXXXXXXXXXXXX
    [I 12:11:25.817 NotebookApp] Accepting one-time-token-authenticated connection from ::1
    

    There is some error in the console, but it doesn't mean much (to me, at least):

    Error: Syntax error, unrecognized expression: [href=#clusters]
        at Function.fa.error (jquery.min.js:2)
        at fa.tokenize (jquery.min.js:2)
        at Function.fa [as find] (jquery.min.js:2)
        at n.fn.init.find (jquery.min.js:2)
        at Object.load [as load_ipython_extension] (main.js?v=20180426121124:36)
        at utils.js:39
        at Object.execCb (require.js?v=6da8be361b9ee26c5e721e76c6d4afce:1690)
        at Module.check (require.js?v=6da8be361b9ee26c5e721e76c6d4afce:865)
        at Module.<anonymous> (require.js?v=6da8be361b9ee26c5e721e76c6d4afce:1140)
        at require.js?v=6da8be361b9ee26c5e721e76c6d4afce:131
    
    opened by jat255 11
  • concurrent.futures compatibility (IPEP 19)

    concurrent.futures compatibility (IPEP 19)

    @aarchiba opened ipython/ipython#8893

    Since concurrent.futures is standard in python >=3.4 and backported to python 2.7, it is a good way to write portable parallel code. Algorithms that support parallelism can take a pool argument and work with whatever form of parallelism the user chooses - except not IPython parallelism, right now. It would be valuable to add an Executor/Future compatibility layer.

    opened by minrk 11
  • man page files moved from ipython

    man page files moved from ipython

    I don't know if these are worth keeping - they basically just say 'see --help'. But I'm moving them out of IPython, and this is the obvious place for them to go if we do keep them.

    opened by takluyver 11
  • Feature proposal: Launch ipcluster from python

    Feature proposal: Launch ipcluster from python

    For simple parallelization requirements on a single host it is often an extra burden (especially when designing APIs) to require the user to launch ipcluster. If instead I could also launch these from within my Python library it could be a very nice replacement for multiprocessing which supports this mode. Specifically, if my library is using multiprocessing the user doesn't have to care about anything and get single-machine parallelization for free. The problem is that multiprocessing sucks and I'd much rather use ipyparallel for this as well.

    I suppose there might be an unsupported way to do this already by importing ipcluster somehow.

    Anyway, I think that would be useful for certain problems.

    opened by twiecki 11
  • Run again with 480 egines, 20 tasks/engine (top is main, 3093386f , bottom is this pr, a98a4fa). Workload is load-balanced submission of random 0-1s tasks (same seed), 20 tasks/engine for a total of 9600 tasks.

    Run again with 480 egines, 20 tasks/engine (top is main, 3093386f , bottom is this pr, a98a4fa). Workload is load-balanced submission of random 0-1s tasks (same seed), 20 tasks/engine for a total of 9600 tasks.

        Run again with 480 engines, 20 tasks/engine (top is main, 3093386f , bottom is this pr, a98a4fa). Workload is load-balanced submission of random 0-1s tasks (same seed), 20 tasks/engine for a total of 9600 tasks.
    
    Screen Shot 2021-07-21 at 14 25 20

    Can see that while the client is working to produce the tasks, there is still contention between serializing in the main thread and actually sending in the io thread until the main thread is done (purple line). This completes 1s faster in this PR (7.6s vs 8.4s). The first result doesn't arrive for 2 more seconds, which is really around when the last real send completes and receives start being processed.

    The bubble can be seen around 11s in main, which is where sends and receives are both being processed, and this is gone after this PR.

    Originally posted by @minrk in https://github.com/ipython/ipyparallel/issues/534#issuecomment-884152890

    opened by Cathy131415 2
  • Trying to start cluster trait error

    Trying to start cluster trait error

    I keep trying to launch a cluster through both Jupyter notebook and the terminal interface. below is the output of the ipcluster start command.

    ipcluster start -n 1 --debug
    2022-08-04 17:43:25.240 [IPClusterStart] IPYTHONDIR set to: /home/dylan/.ipython
    2022-08-04 17:43:25.241 [IPClusterStart] Using existing profile dir: '/home/dylan/.ipython/profile_default'
    2022-08-04 17:43:25.241 [IPClusterStart] Searching path ['/home/dylan', '/home/dylan/.ipython/profile_default', '/usr/etc/ipython', '/usr/local/etc/ipython', '/etc/ipython'] for config files
    2022-08-04 17:43:25.241 [IPClusterStart] Attempting to load config file: ipython_config.py
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipython_config in /etc/ipython
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipython_config in /usr/local/etc/ipython
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipython_config in /usr/etc/ipython
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipython_config in /home/dylan/.ipython/profile_default
    2022-08-04 17:43:25.241 [IPClusterStart] Loaded config file: /home/dylan/.ipython/profile_default/ipython_config.py
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipython_config in /home/dylan
    2022-08-04 17:43:25.241 [IPClusterStart] Attempting to load config file: ipcluster_config.py
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipcluster_config in /etc/ipython
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipcluster_config in /usr/local/etc/ipython
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipcluster_config in /usr/etc/ipython
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipcluster_config in /home/dylan/.ipython/profile_default
    2022-08-04 17:43:25.241 [IPClusterStart] Looking for ipcluster_config in /home/dylan
    2022-08-04 17:43:25.242 [IPClusterStart] Forwarding SIGUSR1 to engines
    2022-08-04 17:43:25.242 [IPClusterStart] Forwarding SIGUSR2 to engines
    2022-08-04 17:43:25.242 [IPClusterStart] Not forwarding SIGINFO
    2022-08-04 17:43:25.243 [IPClusterStart] Starting ipcluster with [daemonize=False]
    2022-08-04 17:43:25.244 [IPClusterStart] Starting LocalControllerLauncher: ['/usr/bin/python3', '-m', 'ipyparallel.controller']
    2022-08-04 17:43:25.244 [IPClusterStart] Sending output for ipcontroller-8096 to /home/dylan/.ipython/profile_default/log/ipcontroller-8096.log
    2022-08-04 17:43:25.244 [IPClusterStart] Setting environment: IPP_CLUSTER_ID,IPP_PROFILE_DIR
    2022-08-04 17:43:25.247 [IPClusterStart] LocalControllerLauncher /usr/bin/python3 started: 8099
    2022-08-04 17:43:25.247 [IPClusterStart] Updating /home/dylan/.ipython/profile_default/security/cluster-.json
    Traceback (most recent call last):
      File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/dylan/.local/lib/python3.8/site-packages/ipyparallel/controller/__main__.py", line 4, in <module>
        main()
      File "/home/dylan/.local/lib/python3.8/site-packages/ipyparallel/controller/app.py", line 1275, in main
        return IPController.launch_instance(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/config/application.py", line 662, in launch_instance
        app = cls.instance(**kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/config/configurable.py", line 412, in instance
        inst = cls(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 958, in __new__
        inst.setup_instance(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 986, in setup_instance
        super(HasTraits, self).setup_instance(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 977, in setup_instance
        value.instance_init(self)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 2266, in instance_init
        self._trait.instance_init(obj)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 522, in instance_init
        v = self._validate(obj, self.default_value)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 591, in _validate
        value = self.validate(obj, value)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 1871, in validate
        return _validate_bounds(self, obj, value)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 1840, in _validate_bounds
        raise TraitError(
    traitlets.traitlets.TraitError: The value of the 'None' trait of an IPController instance should not be less than 1, but a value of 0 was specified
    2022-08-04 17:43:25.620 [IPClusterStart] LocalControllerLauncher /usr/bin/python3 stopped: {'exit_code': 1, 'pid': 8099, 'identifier': 'ipcontroller-8096'}
    2022-08-04 17:43:25.620 [IPClusterStart] Removing /home/dylan/.ipython/profile_default/log/ipcontroller-8096.log
    2022-08-04 17:43:25.620 [IPClusterStart] Output for ipcontroller-8096:
    Traceback (most recent call last):
      File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/dylan/.local/lib/python3.8/site-packages/ipyparallel/controller/__main__.py", line 4, in <module>
        main()
      File "/home/dylan/.local/lib/python3.8/site-packages/ipyparallel/controller/app.py", line 1275, in main
        return IPController.launch_instance(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/config/application.py", line 662, in launch_instance
        app = cls.instance(**kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/config/configurable.py", line 412, in instance
        inst = cls(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 958, in __new__
        inst.setup_instance(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 986, in setup_instance
        super(HasTraits, self).setup_instance(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 977, in setup_instance
        value.instance_init(self)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 2266, in instance_init
        self._trait.instance_init(obj)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 522, in instance_init
        v = self._validate(obj, self.default_value)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 591, in _validate
        value = self.validate(obj, value)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 1871, in validate
        return _validate_bounds(self, obj, value)
      File "/usr/lib/python3/dist-packages/traitlets/traitlets.py", line 1840, in _validate_bounds
        raise TraitError(
    traitlets.traitlets.TraitError: The value of the 'None' trait of an IPController instance should not be less than 1, but a value of 0 was specified
    
    2022-08-04 17:43:25.620 [IPClusterStart] WARNING | Controller stopped: {'exit_code': 1, 'pid': 8099, 'identifier': 'ipcontroller-8096'}
    2022-08-04 17:43:25.620 [IPClusterStart] Removed cluster file: /home/dylan/.ipython/profile_default/security/cluster-.json
    2022-08-04 17:43:26.274 [IPClusterStart] Setting $IPP_CONNECTION_INFO environment
    2022-08-04 17:43:26.274 [IPClusterStart] Waiting for ['/home/dylan/.ipython/profile_default/security/ipcontroller-client.json', '/home/dylan/.ipython/profile_default/security/ipcontroller-engine.json']
    2022-08-04 17:43:26.375 [IPClusterStart] Already notified stop (data)
    ERROR:tornado.application:Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7faa38be9ca0>, <Task finished name='Task-1' coro=<IPClusterStart.start_cluster() done, defined at /home/dylan/.local/lib/python3.8/site-packages/ipyparallel/cluster/app.py:568> exception=RuntimeError("Controller stopped with unknown while waiting for ['/home/dylan/.ipython/profile_default/security/ipcontroller-client.json', '/home/dylan/.ipython/profile_default/security/ipcontroller-engine.json']")>)
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/tornado/ioloop.py", line 758, in _run_callback
        ret = callback()
      File "/usr/lib/python3/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
        return fn(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/tornado/ioloop.py", line 779, in _discard_future_result
        future.result()
      File "/home/dylan/.local/lib/python3.8/site-packages/ipyparallel/cluster/app.py", line 569, in start_cluster
        await self.cluster.start_cluster()
      File "/home/dylan/.local/lib/python3.8/site-packages/ipyparallel/cluster/cluster.py", line 780, in start_cluster
        await self.start_engines(n)
      File "/home/dylan/.local/lib/python3.8/site-packages/ipyparallel/cluster/cluster.py", line 698, in start_engines
        connection_info = await self.controller.get_connection_info()
      File "/home/dylan/.local/lib/python3.8/site-packages/ipyparallel/cluster/launcher.py", line 377, in get_connection_info
        raise RuntimeError(
    RuntimeError: Controller stopped with unknown while waiting for ['/home/dylan/.ipython/profile_default/security/ipcontroller-client.json', '/home/dylan/.ipython/profile_default/security/ipcontroller-engine.json']
    ^C2022-08-04 17:43:36.605 [IPClusterStart] Received signal 2 received, stopping launchers...
    2022-08-04 17:43:36.606 [IPClusterStart] ERROR | IPython cluster: stopping
    2022-08-04 17:43:36.606 [IPClusterStart] Updating /home/dylan/.ipython/profile_default/security/cluster-.json
    2022-08-04 17:43:36.607 [IPClusterStart] Stopping engine(s): 1659656606
    2022-08-04 17:43:36.607 [IPClusterStart] Removed cluster file: /home/dylan/.ipython/profile_default/security/cluster-.json
    
    raise TraitError(
    traitlets.traitlets.TraitError: The value of the 'None' trait of an IPController instance should not be less than 1, but a value of 0 was specified
    
    opened by dylan-alfi 1
  • 8.3.0:  sphinx warnings `reference target not found`

    8.3.0: sphinx warnings `reference target not found`

    First of all it is not possible now use straight sphinx-build

    [tkloczko@devel-g2v ipyparallel-8.3.0]$ /usr/bin/sphinx-build -n -T -b html docs/source build/sphinx/html
    Running Sphinx v4.5.0
    WARNING: The config value `today' has type `date', defaults to `str'.
    loading pickled environment... done
    [autosummary] generating autosummary for: api/ipyparallel.rst, changelog.md, examples/Cluster API.ipynb, examples/Data Publication API.ipynb, examples/Futures.ipynb, examples/Index.ipynb, examples/Monitoring an MPI Simulation - 1.ipynb, examples/Monitoring an MPI Simulation - 2.ipynb, examples/Monte Carlo Options.ipynb, examples/Parallel Decorator and map.ipynb, ..., reference/mpi.md, reference/security.md, tutorial/asyncresult.md, tutorial/demos.md, tutorial/direct.md, tutorial/index.md, tutorial/intro.md, tutorial/magics.md, tutorial/process.md, tutorial/task.md
    Failed to import ipyparallel.cluster.launcher.
    Possible hints:
    * KeyError: 'ipyparallel'
    * ModuleNotFoundError: No module named 'ipyparallel'
    myst v0.17.2: MdParserConfig(commonmark_only=False, gfm_only=False, enable_extensions=['colon_fence', 'deflist'], linkify_fuzzy_links=True, dmath_allow_labels=True, dmath_allow_space=True, dmath_allow_digits=True, dmath_double_inline=False, update_mathjax=True, mathjax_classes='tex2jax_process|mathjax_process|math|output_area', disable_syntax=[], all_links_external=False, url_schemes=('http', 'https', 'mailto', 'ftp'), ref_domains=None, highlight_code_blocks=True, number_code_blocks=[], title_to_header=False, heading_anchors=None, heading_slug_func=None, footnote_transition=True, sub_delimiters=('{', '}'), words_per_minute=200)
    building [mo]: targets for 0 po files that are out of date
    building [html]: targets for 0 source files that are out of date
    updating environment: 0 added, 0 changed, 0 removed
    looking for now-outdated files... none found
    no targets are out of date.
    build succeeded, 2 warnings.
    

    This can be fixed using patch like below

    --- a/docs/source/conf.py~      2022-05-14 17:50:07.000000000 +0000
    +++ b/docs/source/conf.py       2022-05-14 17:52:22.860215430 +0000
    @@ -16,7 +16,10 @@
     # If extensions (or modules to document with autodoc) are in another directory,
     # add these directories to sys.path here. If the directory is relative to the
     # documentation root, use os.path.abspath to make it absolute, like shown here.
    -# sys.path.insert(0, os.path.abspath('.'))
    +import sys
    +import os
    +sys.path.insert(0, os.path.abspath('../..'))
    +
     # We load the ipython release info into a dict by explicit execution
     iprelease = {}
     exec(
    

    Than .. on building my packages I'm using sphinx-build command with -n switch which shows warmings about missing references. These are not critical issues.

    opened by kloczek 1
  • Add option to restart engines

    Add option to restart engines

    As discussed https://github.com/ipython/ipyparallel/issues/695, it would be good to have an option in IPyParallel clusters, where the engines are automatically restarted (or a new engine set is started) when engines die (due to any reason such as OOM, Segfault, etc.).

    enhancement 
    opened by sahil1105 0
  • More user-friendly errors and automatic restarts in case of engines crashing due to OOM

    More user-friendly errors and automatic restarts in case of engines crashing due to OOM

    The errors we report in case of OOM and Segmentation-Fault are now much better, but I was wondering is there a way to make them more "user-friendly"?

    1. Currently, at least for the MPI case, we report the mpiexec output, which is great, but could there be a way to report a cleaner error in addition to this, that could clearly identify this as a OOM error (or a seg-fault if possible)?
    2. Is there something that packages (like Bodo) could do to make this experience better/easier?
    3. What's the best way to automate restart of engines in this case? Ideally, if enabled, in cases where the engines crash, if we could clean up the processes, display a message (e.g. "engines crashed due to OOM, restarting engines..."), and then restart the engines, that would be useful.
    opened by sahil1105 5
Owner
IPython
interactive computing in Python
IPython
Distributed Computing for AI Made Simple

Project Home Blog Documents Paper Media Coverage Join Fiber users email list [email protected] Fiber Distributed Computing for AI Made Simp

Uber Open Source 997 Dec 30, 2022
XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

null 92 Dec 14, 2022
A Software Framework for Neuromorphic Computing

A Software Framework for Neuromorphic Computing

Lava 338 Dec 26, 2022
Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Somoclu Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing

Peter Wittek 239 Nov 10, 2022
monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture

monolish is a linear equation solver library that monolithically fuses variable data type, matrix structures, matrix data format, vendor specific data transfer APIs, and vendor specific numerical algebra libraries.

RICOS Co. Ltd. 179 Dec 21, 2022
Visualize classified time series data with interactive Sankey plots in Google Earth Engine

sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P

Aaron Zuspan 76 Dec 15, 2022
Turns your machine learning code into microservices with web API, interactive GUI, and more.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

Machine Learning Tooling 2.8k Jan 2, 2023
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

?? Interactive Machine Learning experiments: ??️models training + ??models demo

Oleksii Trekhleb 1.4k Jan 6, 2023
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

Samrat Mitra 2 Nov 18, 2021
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

null 6 Jun 30, 2022
Educational python for Neural Networks, written in pure Python/NumPy.

Educational python for Neural Networks, written in pure Python/NumPy.

null 127 Oct 27, 2022
learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

BDFD 6 Nov 5, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 8, 2023
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
Sequence learning toolkit for Python

seqlearn seqlearn is a sequence classification toolkit for Python. It is designed to extend scikit-learn and offer as similar as possible an API. Comp

Lars 653 Dec 27, 2022
Simple structured learning framework for python

PyStruct PyStruct aims at being an easy-to-use structured learning and prediction library. Currently it implements only max-margin methods and a perce

pystruct 666 Jan 3, 2023
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 2, 2023
Metric learning algorithms in Python

metric-learn: Metric Learning in Python metric-learn contains efficient Python implementations of several popular supervised and weakly-supervised met

null 1.3k Dec 28, 2022