Distributed Computing for AI Made Simple

Overview

build

drawing

Project Home   Blog   Documents   Paper   Media Coverage

Join Fiber users email list [email protected]

Fiber

Distributed Computing for AI Made Simple

This project is experimental and the APIs are not considered stable.

Fiber is a Python distributed computing library for modern computer clusters.

  • It is easy to use. Fiber allows you to write programs that run on a computer cluster level without the need to dive into the details of computer cluster.
  • It is easy to learn. Fiber provides the same API as Python's standard multiprocessing library that you are familiar with. If you know how to use multiprocessing, you can program a computer cluster with Fiber.
  • It is fast. Fiber's communication backbone is built on top of Nanomsg which is a high-performance asynchronous messaging library to allow fast and reliable communication.
  • It doesn't need deployment. You run it as the same way as running a normal application on a computer cluster and Fiber handles the rest for you.
  • It it reliable. Fiber has built-in error handling when you are running a pool of workers. Users can focus on writing the actual application code instead of dealing with crashed workers.

Originally, it was developed to power large scale parallel scientific computation projects like POET and it has been used to power similar projects within Uber.

Installation

pip install fiber

Check here for details.

Quick Start

Hello Fiber

To use Fiber, simply import it in your code and it works very similar to multiprocessing.

import fiber

if __name__ == '__main__':
    fiber.Process(target=print, args=('Hello, Fiber!',)).start()

Note that if __name__ == '__main__': is necessary because Fiber uses spawn method to start new processes. Check here for details.

Let's take look at another more complex example:

Estimating Pi

import fiber
import random

@fiber.meta(cpu=1)
def inside(p):
    x, y = random.random(), random.random()
    return x * x + y * y < 1

def main():
    NUM_SAMPLES = int(1e6)
    pool = fiber.Pool(processes=4)
    count = sum(pool.map(inside, range(0, NUM_SAMPLES)))
    print("Pi is roughly {}".format(4.0 * count / NUM_SAMPLES))

if __name__ == '__main__':
    main()

Fiber implements most of multiprocessing's API including Process, SimpleQueue, Pool, Pipe, Manager and it has its own extension to the multiprocessing's API to make it easy to compose large scale distributed applications. For the detailed API guild, check out here.

Running on a Kubernetes cluster

Fiber also has native support for computer clusters. To run the above example on Kubernetes, fiber provided a convenient command line tool to manage the workflow.

Assume you have a working docker environment locally and have finished configuring Google Cloud SDK. Both gcloud and kubectl are available locally. Then you can start by writing a Dockerfile which describes the running environment. An example Dockerfile looks like this:

# example.docker
FROM python:3.6-buster
ADD examples/pi_estimation.py /root/pi_estimation.py
RUN pip install fiber

Build an image and launch your job

fiber run -a python3 /root/pi_estimation.py

This command will look for local Dockerfile and build a docker image and push it to your Google Container Registry . It then launches the main job which contains your code and runs the command python3 /root/pi_estimation.py inside your job. Once the main job is running, it will start 4 subsequent jobs on the cluster and each of them is a Pool worker.

Supported platforms

  • Operating system: Linux
  • Python: 3.6+
  • Supported cluster management systems:
    • Kubernetes (Tested with Google Kubernetes Engine on Google cloud)

We are interested in supporting other cluster management systems like Slurm, if you want to contribute to it please let us know.

Check here for details.

Documentation

The documentation, including method/API references, can be found here.

Testing

Install test dependencies. You'll also need to make sure docker is available on the testing machine.

$ pip install -e .[test]

Run tests

$ make test

Contributing

Please read our code of conduct before you contribute! You can find details for submitting pull requests in the CONTRIBUTING.md file. Issue template.

Versioning

We document versions and changes in our changelog - see the CHANGELOG.md file for details.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Cite Fiber

@misc{zhi2020fiber,
    title={Fiber: A Platform for Efficient Development and Distributed Training for Reinforcement Learning and Population-Based Methods},
    author={Jiale Zhi and Rui Wang and Jeff Clune and Kenneth O. Stanley},
    year={2020},
    eprint={2003.11164},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Acknowledgments

  • Special thanks to Piero Molino for designing the logo for Fiber
Comments
  • Is there any plan to extend fiber beyond aws/gcp Kubernetes clusters?

    Is there any plan to extend fiber beyond aws/gcp Kubernetes clusters?

    So far it seems fiber only supports aws and gcp clusters, which might not be available to users for various reasons.

    I believe the kubernetes_backend.py is generic enough to launch fiber 'Process' on customized Kubernetes clusters. However, it seems the fiber cli.py does not include options other than aws and gcp. I suggest we can extend fiber cli to generic Kubernetes clusters.

    enhancement 
    opened by zw0610 4
  • pip install fiber error with Python 3.8 on macOS

    pip install fiber error with Python 3.8 on macOS

    I get this error with pip install fiber:

    ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

    To get this far I first did these: pip install wheel brew install nanomsg installed nanomsg 1.1.5 pip install nnpy

    I appreciate any help getting past this. I can provide more of the error log if needed.

    good first issue 
    opened by statmike 4
  • Passing multiple arguments to a method is failing

    Passing multiple arguments to a method is failing

    I am using starmap method in ZPool for passing multiple arguments, which is the same method we would be using in the multiprocessing library, but here when I pass the values as mentioned, it is failing and running infinitely!

    A simple example code I am using is,

    import fiber
    from fiber.pool import Pool
    
    
    @fiber.meta(cpu=1)
    def add(a, b):
        return a + b
    
    
    if __name__ == "__main__":
        pool = Pool(processes=4)
        ans = pool.starmap(add, [(1, 2), (4, 5)])
    

    The exception I am getting is,

    Exception in Process(PoolWorker-4, 24266)>: Traceback (most recent call last): File "/Users/arjun/environments/fibertest/lib/python3.6/site-packages/fiber/process.py", line 289, in _bootstrap self.run() File "/Users/arjun/environments/fibertest/lib/python3.6/site-packages/fiber/process.py", line 185, in run return super().run() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/Users/arjun/environments/fibertest/lib/python3.6/site-packages/fiber/pool.py", line 858, in zpool_worker wrap_exception, 0, req=req) File "/Users/arjun/environments/fibertest/lib/python3.6/site-packages/fiber/pool.py", line 801, in zpool_worker_core res = func(*args, **kwds) TypeError: add() argument after ** must be a mapping, not tuple

    And it is being printed indefinitely!

    I am doing something wrong here ? Or is this an issue ?

    opened by Arjunsankarlal 3
  • Having problem with ZPool.map_async() as it runs function sequentially.

    Having problem with ZPool.map_async() as it runs function sequentially.

    Hello, I'm trying to use fiber in my Distributed Deep RL project and wanted to understand the basics of the fiber (and python multiprocessing). I am not sure why, but the code below does not run in parallel. Can anyone explain to me what am I doing wrong? I'm using Anaconda Python 3.7.7 on Ubuntu Linux 18.04 LTS.

    import fiber
    import time
    
    
    def do_something(sleep_time):
        time.sleep(sleep_time)
        return 'func id: {}, sleep time: {}'.format(sleep_time, sleep_time)
    
    
    def main():
        s = time.perf_counter()
        pool = fiber.Pool(processes=4)
        times_list = [3, 1, 2, 4]
    
        handle = pool.map_async(do_something, times_list)
        results = handle.get()
        for i in results:
            print(i)
    
        dt = time.perf_counter() - s
        print(dt)
    
    
    if __name__ == '__main__':
        main()
    
    

    image

    good first issue 
    opened by bakhsanzh 3
  • urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded (Caused by NewConnectionErr or('<urllib3.connection.HTTPConnection object at 0x7fb3be7bca58>: Failed to establish a new connection: [Errno 111] Connection refused',))

    urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded (Caused by NewConnectionErr or(': Failed to establish a new connection: [Errno 111] Connection refused',))

    Getting this error when executing any fiber function call. Tried giving all permissions and double checked network configuration but everything seems right. As this is the broad error and this is relatively new library couldn't find solution anywhere else. I'm new to this so please point me in the right direction.

    Logs look like this:

    Feb 22 18:48:03 test1-797b9fdffb-jz225: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPCon
    nection object at 0x7fb3be7bc710>: Failed to establish a new connection: [Errno 111] Connection refused',)': /api/v1/namespaces/default/pods/test1-797b9fdffb-jz225
    Feb 22 18:48:03 test1-797b9fdffb-jz225: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPCon
    nection object at 0x7fb3be7bc860>: Failed to establish a new connection: [Errno 111] Connection refused',)': /api/v1/namespaces/default/pods/test1-797b9fdffb-jz225
    Feb 22 18:48:03 test1-797b9fdffb-jz225: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPCon
    nection object at 0x7fb3be7bc940>: Failed to establish a new connection: [Errno 111] Connection refused',)': /api/v1/namespaces/default/pods/test1-797b9fdffb-jz225
    something started
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 170, in _new_conn
        (self._dns_host, self.port), self.timeout, **extra_kw
      File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection
        raise err
      File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 86, in create_connection
        sock.connect(sa)
    ConnectionRefusedError: [Errno 111] Connection refused
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
        chunked=chunked,
      File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 394, in _make_request
        conn.request(method, url, **httplib_request_kw)
      File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 234, in request
        super(HTTPConnection, self).request(method, url, body=body, headers=headers)
      File "/usr/local/lib/python3.6/http/client.py", line 1287, in request
        self._send_request(method, url, body, headers, encode_chunked)
      File "/usr/local/lib/python3.6/http/client.py", line 1333, in _send_request
        self.endheaders(body, encode_chunked=encode_chunked)
      File "/usr/local/lib/python3.6/http/client.py", line 1282, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
     File "/usr/local/lib/python3.6/http/client.py", line 1042, in _send_output
        self.send(msg)
      File "/usr/local/lib/python3.6/http/client.py", line 980, in send
        self.connect()
      File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 200, in connect
        conn = self._new_conn()
      File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn
        self, "Failed to establish a new connection: %s" % e
    urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fb3be7bca58>: Failed to establish a new connection: [Errno 111] Connection refused
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/root/main.py", line 55, in <module>
        sharedQue = SimpleQueue()
      File "/usr/local/lib/python3.6/site-packages/fiber/queues.py", line 295, in __init__
        backend = get_backend()
      File "/usr/local/lib/python3.6/site-packages/fiber/backend.py", line 74, in get_backend
        name)).Backend(**kwargs)
      File "/usr/local/lib/python3.6/site-packages/fiber/kubernetes_backend.py", line 64, in __init__
        self.default_namespace)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 22785, in read_namespaced_pod
        return self.read_namespaced_pod_with_http_info(name, namespace, **kwargs)  # noqa: E501
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 22894, in read_namespaced_pod_with_http_info
        collection_formats=collection_formats)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 353, in call_api
        _preload_content, _request_timeout, _host)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
        _request_timeout=_request_timeout)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 377, in request
        headers=headers)
    File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 243, in GET
        query_params=query_params)
      File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 216, in request
        headers=headers)
      File "/usr/local/lib/python3.6/site-packages/urllib3/request.py", line 75, in request
        method, url, fields=fields, headers=headers, **urlopen_kw
      File "/usr/local/lib/python3.6/site-packages/urllib3/request.py", line 96, in request_encode_url
        return self.urlopen(method, url, **extra_kw)
      File "/usr/local/lib/python3.6/site-packages/urllib3/poolmanager.py", line 375, in urlopen
        response = conn.urlopen(method, u.request_uri, **kw)
      File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 796, in urlopen
        **response_kw
      File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 796, in urlopen
        **response_kw
      File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 796, in urlopen
        **response_kw
      File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
        method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
      File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 573, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /api/v1/namespaces/default/pods/test1-797b9fdffb-jz225 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb3be7bca58>: Failed to establish a new connection: [Errno 111] Connection refused',))
    

    Also I was getting POST method not supported (501) error when trying to run with fiber cli. It also fails. Don't know whether it's a bug.

    opened by kiran-italiya 2
  • Installation issue with pip install fiber on Windows 10, Python 3.8, via conda

    Installation issue with pip install fiber on Windows 10, Python 3.8, via conda

    Hi all,

    I am getting an error when running pip install fiber in my conda (v4.8.3) environment. It seems to do with a make -j8 command failing and saying at the top that no makefile was found. Was it perhaps not generated via nnpy-bundle? Or should I install first a previous version of nanomsg?

    Any help would be greatly appreciated!

    Make installed and in path:

    $ make -version
    GNU Make 4.3
    Built for Windows32
    Copyright (C) 1988-2020 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    

    Input in command prompt:

    $ pip install fiber
    

    Output: (I've replaced my user directory with ~)

    Collecting fiber
      Using cached fiber-0.2.1.tar.gz (61 kB)
    Collecting nnpy-bundle
      Using cached nnpy-bundle-1.4.2.post1.tar.gz (6.3 kB)
        ERROR: Command errored out with exit status 1:
         command: '~\Anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'~\\AppData\\Local\\Temp\\pip-install-4ws65jt4\\nnpy-bundle\\setup.py'"'"'; __file__='"'"'~\\AppData\\Local\\Temp\\pip-install-4ws65jt4\\nnpy-bundle\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base '~\AppData\Local\Temp\pip-install-4ws65jt4\nnpy-bundle\pip-egg-info'
             cwd: ~\AppData\Local\Temp\pip-install-4ws65jt4\nnpy-bundle\
        Complete output (77 lines):
        Cloning into 'nanomsg'...
        Note: switching to '1.1.5'.
    
        You are in 'detached HEAD' state. You can look around, make experimental
        changes and commit them, and you can discard any commits you make in this
        state without impacting any branches by switching back to a branch.
    
        If you want to create a new branch to retain commits you create, you may
        do so (now or later) by using -c with the switch command. Example:
    
          git switch -c <new-branch-name>
    
        Or undo this operation with:
    
          git switch -
    
        Turn off this advice by setting config variable advice.detachedHead to false
    
        HEAD is now at 1749fd7b Bump version to 1.1.5.
        -- Building for: Visual Studio 16 2019
        -- Selecting Windows SDK version 10.0.18362.0 to target Windows 10.0.19041.
        -- The C compiler identification is MSVC 19.24.28314.0
        -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe
        -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe - works
        -- Detecting C compiler ABI info
        -- Detecting C compiler ABI info - done
        -- Detecting C compile features
        -- Detecting C compile features - done
        -- Detected nanomsg ABI v5 (v5.1.0)
        -- Looking for pthread.h
        -- Looking for pthread.h - not found
        -- Found Threads: TRUE
        -- OS System is Windows
        -- OS Version is 10.0.19041
        -- Looking for InitializeConditionVariable
        -- Looking for InitializeConditionVariable - found
        -- Performing Test NN_HAVE_GCC_ATOMIC_BUILTINS
        -- Performing Test NN_HAVE_GCC_ATOMIC_BUILTINS - Failed
        CMake Warning at CMakeLists.txt:294 (message):
          Could not find asciidoctor: skipping docs
    
    
        -- Configuring done
        -- Generating done
        -- Build files have been written to: ~/AppData/Local/Temp/pip-install-4ws65jt4/nnpy-bundle/nanomsg/build
        make: *** No targets specified and no makefile found.  Stop.
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "~\AppData\Local\Temp\pip-install-4ws65jt4\nnpy-bundle\setup.py", line 47, in <module>
            install_requires=['cffi'],
          File "~\Anaconda3\lib\site-packages\setuptools\__init__.py", line 144, in setup
            return distutils.core.setup(**attrs)
          File "~\Anaconda3\lib\distutils\core.py", line 108, in setup
            _setup_distribution = dist = klass(attrs)
          File "~\Anaconda3\lib\site-packages\setuptools\dist.py", line 425, in __init__
            k: v for k, v in attrs.items()
          File "~\Anaconda3\lib\distutils\dist.py", line 292, in __init__
            self.finalize_options()
          File "~\Anaconda3\lib\site-packages\setuptools\dist.py", line 706, in finalize_options
            ep.load()(self)
          File "~\Anaconda3\lib\site-packages\setuptools\dist.py", line 713, in _finalize_setup_keywords
            ep.load()(self, ep.name, value)
          File "~\Anaconda3\lib\site-packages\cffi\setuptools_ext.py", line 217, in cffi_modules
            add_cffi_module(dist, cffi_module)
          File "~\Anaconda3\lib\site-packages\cffi\setuptools_ext.py", line 49, in add_cffi_module
            execfile(build_file_name, mod_vars)
          File "~\Anaconda3\lib\site-packages\cffi\setuptools_ext.py", line 25, in execfile
            exec(code, glob, glob)
          File "generate.py", line 199, in <module>
            ffi = create_module()
          File "generate.py", line 142, in create_module
            build_nanomsg_static_lib(cwd)
          File "generate.py", line 67, in build_nanomsg_static_lib
            check_call("make -j8", shell=True, cwd=build_dir)
          File "~\Anaconda3\lib\subprocess.py", line 347, in check_call
            raise CalledProcessError(retcode, cmd)
        subprocess.CalledProcessError: Command 'make -j8' returned non-zero exit status 2.
        ----------------------------------------
    ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
    
    opened by maxenceryan 2
  • Set requests and limits for k8s

    Set requests and limits for k8s

    It appears that only limits is set. Our admissions controller requires both requests and limits.

    Could requests be set to the same values as limits? https://github.com/uber/fiber/blob/master/fiber/kubernetes_backend.py#L78

    enhancement 
    opened by ahutterTA 2
  • Bump rsa from 4.0 to 4.1

    Bump rsa from 4.0 to 4.1

    Bumps rsa from 4.0 to 4.1.

    Changelog

    Sourced from rsa's changelog.

    Version 4.1 - released 2020-06-10

    • Added support for Python 3.8.
    • Dropped support for Python 2 and 3.4.
    • Added type annotations to the source code. This will make Python-RSA easier to use in your IDE, and allows better type checking.
    • Added static type checking via MyPy.
    • Fix #129 Installing from source gives UnicodeDecodeError.
    • Switched to using Poetry for package management.
    • Added support for SHA3 hashing: SHA3-256, SHA3-384, SHA3-512. This is natively supported by Python 3.6+ and supported via a third-party library on Python 3.5.
    • Choose blinding factor relatively prime to N. Thanks Christian Heimes for pointing this out.
    • Reject cyphertexts (when decrypting) and signatures (when verifying) that have been modified by prepending zero bytes. This resolves CVE-2020-13757. Thanks Adelapie for pointing this out.
    Commits
    • c6731b1 Bumped version to 4.1
    • 80f0e9d Marked version 4.1 as released
    • 65ab5b5 Add support for Python 3.8
    • 9ecf340 Fixed credit for report
    • 93af6f2 Fix CVE-2020-13757: detect cyphertext modifications by prepending zero bytes
    • ae1a906 Add more type hints
    • 1473cb8 Drop character encoding markers for Python 2.x
    • 8ed5071 Choose blinding factor relatively prime to N
    • 1659432 Updated Code Climate badge in README.md
    • 96e13dd Configured CodeClimate
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • [Feature] Explicitly add AWS/GCP/Azure support in `fiber` command

    [Feature] Explicitly add AWS/GCP/Azure support in `fiber` command

    Previously, only GCP is supported in fiber command. This PR is to add support for Kubernetes on AWS, Azure and GCP.

    Command-line usage:

    # Explicitly specify a cloud provider
    $ fiber run --aws python my.py
    $ fiber run --gcp python my.py
    $ fiber run --azure python my.py
    
    # Automatically detect cloud provider (based on if any of `aws`, `gcloud` exists)
    $ fiber run python my.py
    
    opened by calio 1
  • Fix argument preparation for Pool.starmap

    Fix argument preparation for Pool.starmap

    Both Pool.startmap and Pool.apply_async set starmap=True when creating tasks. Previously Pool.starmap didn't have correct argument list. The correct format should be:

    [(args, kwargs), ...] (used by Pool.apply_async) or
    [(args,), ...] (used by Pool.starmap)
    
    opened by calio 0
  • How should an image in fiberconfig be specified for an ECR in AWS?

    How should an image in fiberconfig be specified for an ECR in AWS?

    is this the way. and will there be any issues integrating fiber in a Fastapi endpoint image=xxxxxxxx.dkx.ecr.ap-southeast-1.amazonaws.com/backend:fiberpool

    opened by sakmalh 0
  • Bump certifi from 2020.12.5 to 2022.12.7

    Bump certifi from 2020.12.5 to 2022.12.7

    Bumps certifi from 2020.12.5 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Package still maintained? Alternatives?

    Package still maintained? Alternatives?

    Hi Uber / Fiber team,

    We really like your minimalist approach to distribute Python workloads using K8s / Multiprocessing API. However the package's last commit is from 2021. I assume fiber is not maintained / used in production anymore?

    Curious if anybody in the community is still using this and if not, what alternatives they found.

    Thank you!

    opened by ynouri 1
  • Bump joblib from 1.0.1 to 1.2.0

    Bump joblib from 1.0.1 to 1.2.0

    Bumps joblib from 1.0.1 to 1.2.0.

    Changelog

    Sourced from joblib's changelog.

    Release 1.2.0

    • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

    • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

    • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

    • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

    • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

    • Vendor loky 3.3.0 which fixes several bugs including:

      • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

      • avoiding leaking worker processes in case of nested loky parallel calls;

      • reliability spawn the correct number of reusable workers.

    Release 1.1.0

    • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

    • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

    ... (truncated)

    Commits
    • 5991350 Release 1.2.0
    • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
    • cea26ff CI test the future loky-3.3.0 branch (#1338)
    • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
    • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
    • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
    • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
    • ac09691 [MAINT] various test updates (#1334)
    • 4a314b1 Vendor loky 3.2.0 (#1333)
    • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • connections error issue

    connections error issue

    Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 95, in create_connection raise err File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 85, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 239, in request super(HTTPConnection, self).request(method, url, body=body, headers=headers) File "/usr/local/lib/python3.8/http/client.py", line 1256, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/local/lib/python3.8/http/client.py", line 1302, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.8/http/client.py", line 1251, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.8/http/client.py", line 1011, in _send_output self.send(msg) File "/usr/local/lib/python3.8/http/client.py", line 951, in send self.connect() File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 205, in connect conn = self._new_conn() File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f547978c6d0>: Failed to establish a new connection: [Errno 111] Connection refused

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/test_harness_flows.py", line 121, in sys.exit(main()) File "/test_harness_flows.py", line 113, in main test_case.setup_method() File "/test_harness_flows.py", line 26, in setup_method self.driver = webdriver.Remote( File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 277, in init self.start_session(capabilities, browser_profile) File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 433, in execute response = self.command_executor.execute(driver_command, params) File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/remote_connection.py", line 344, in execute return self._request(command_info[0], url, body=data) File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/remote_connection.py", line 366, in _request response = self._conn.request(method, url, body=body, headers=headers) File "/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 78, in request return self.request_encode_body( File "/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body return self.urlopen(method, url, **extra_kw) File "/usr/local/lib/python3.8/site-packages/urllib3/poolmanager.py", line 376, in urlopen response = conn.urlopen(method, u.request_uri, **kw) File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 815, in urlopen return self.urlopen( File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 815, in urlopen return self.urlopen( File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 815, in urlopen return self.urlopen( File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=4444): Max retries exceeded with url: /wd/hub/session (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f547978c6d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

    opened by avinashkcofomo 1
  • Bump numpy from 1.19.5 to 1.22.0

    Bump numpy from 1.19.5 to 1.22.0

    Bumps numpy from 1.19.5 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Owner
Uber Open Source
Open Source Software at Uber
Uber Open Source
🎛 Distributed machine learning made simple.

?? lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

Machine Learning Tooling 44 Nov 27, 2022
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

TD Ameritrade 2.5k Jan 6, 2023
Interactive Parallel Computing in Python

Interactive Parallel Computing with IPython ipyparallel is the new home of IPython.parallel. ipyparallel is a Python package and collection of CLI scr

IPython 2.3k Dec 30, 2022
A Software Framework for Neuromorphic Computing

A Software Framework for Neuromorphic Computing

Lava 338 Dec 26, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 23.3k Dec 31, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 7, 2023
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Jan 7, 2023
BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

null 4.1k Jan 9, 2023
Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

Max Pumperla 1.6k Dec 29, 2022
Uber Open Source 1.6k Dec 31, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. 10x Larger Models 10x Faster Trainin

Microsoft 8.4k Dec 30, 2022
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

null 2.5k Dec 28, 2022
a distributed deep learning platform

Apache SINGA Distributed deep learning system http://singa.apache.org Quick Start Installation Examples Issues JIRA tickets Code Analysis: Mailing Lis

The Apache Software Foundation 2.7k Jan 5, 2023
A high performance and generic framework for distributed DNN training

BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith

Bytedance Inc. 3.3k Dec 28, 2022
Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

Ibotta 282 Dec 9, 2022
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.9k Jan 5, 2023
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

null 27 Aug 19, 2022
WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the parallel scalability.

Shigang Li 6 Jun 18, 2022