Extended pickling support for Python objects

Related tags

cloudpickle
Overview

cloudpickle

Automated Tests codecov.io

cloudpickle makes it possible to serialize Python constructs not supported by the default pickle module from the Python standard library.

cloudpickle is especially useful for cluster computing where Python code is shipped over the network to execute on remote hosts, possibly close to the data.

Among other things, cloudpickle supports pickling for lambda functions along with functions and classes defined interactively in the __main__ module (for instance in a script, a shell or a Jupyter notebook).

Cloudpickle can only be used to send objects between the exact same version of Python.

Using cloudpickle for long-term object storage is not supported and strongly discouraged.

Security notice: one should only load pickle data from trusted sources as otherwise pickle.load can lead to arbitrary code execution resulting in a critical security vulnerability.

Installation

The latest release of cloudpickle is available from pypi:

pip install cloudpickle

Examples

Pickling a lambda expression:

>>> import cloudpickle
>>> squared = lambda x: x ** 2
>>> pickled_lambda = cloudpickle.dumps(squared)

>>> import pickle
>>> new_squared = pickle.loads(pickled_lambda)
>>> new_squared(2)
4

Pickling a function interactively defined in a Python shell session (in the __main__ module):

>>> CONSTANT = 42
>>> def my_function(data: int) -> int:
...     return data + CONSTANT
...
>>> pickled_function = cloudpickle.dumps(my_function)
>>> depickled_function = pickle.loads(pickled_function)
>>> depickled_function
<function __main__.my_function(data:int) -> int>
>>> depickled_function(43)
85

Running the tests

  • With tox, to test run the tests for all the supported versions of Python and PyPy:

    pip install tox
    tox
    

    or alternatively for a specific environment:

    tox -e py37
    
  • With py.test to only run the tests for your current version of Python:

    pip install -r dev-requirements.txt
    PYTHONPATH='.:tests' py.test
    

History

cloudpickle was initially developed by picloud.com and shipped as part of the client SDK.

A copy of cloudpickle.py was included as part of PySpark, the Python interface to Apache Spark. Davies Liu, Josh Rosen, Thom Neale and other Apache Spark developers improved it significantly, most notably to add support for PyPy and Python 3.

The aim of the cloudpickle project is to make that work available to a wider audience outside of the Spark ecosystem and to make it easier to improve it further notably with the help of a dedicated non-regression test suite.

Issues
  • Add ability to register modules to be deeply serialized

    Add ability to register modules to be deeply serialized

    This PR is based on the work done by @kinghuang in PR391, but takes on the feedback provided by @ogrisel and adds testing.

    Fixes #206

    Issue Summary

    To summarise the issue, in many cases cloudpickle is used to send code for remote execution. This is used in dask, prefect, mlflow and many libraries. For local functions, this works perfectly fine. But for any non-local function or class, cloudpickle assumes that external modules and packages are available at the location of deserialization. This may either not be the case, or the version of the package available at the end point may be different.

    This PR adds the option to register modules for deep serialization by providing a register_deep_serialization function which takes either a name or a module. This is the original register_dynamic_module by @kinghuang.

    import cloudpickle
    from tests import external
    
    cloudpickle.register_deep_serialization("tests.external")  # string name works
    cloudpickle.register_deep_serialization(external)          # You can pass the module itself
    cloudpickle.register_deep_serialization("tests")           # or the parent string/module
    
    output = cloudpickle.dumps(external.an_external_function)
    

    Original dumps:

    b'\x80\x05\x95+\x00\x00\x00\x00\x00\x00\x00\x8c\x0etests.external\x94\x8c\x14an_external_function\x94\x93\x94.'
    

    dumps after registering tests.external for deep serialization:

    b'\x80\x05\x95<\x02\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\r_builtin_type\x94\x93\x94\x8c\nLambdaType\x94\x85\x94R\x94(h\x02\x8c\x08CodeType\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x01KCC\x04d\x01S\x00\x94N\x8c\x11this is something\x94\x86\x94))\x8c5/Users/samreay/Projects/cloudpickle/tests/external.py\x94\x8c\x14an_external_function\x94K\x04C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94\x8c\x05tests\x94\x8c\x08__name__\x94\x8c\x0etests.external\x94\x8c\x08__file__\x94\x8c5/Users/samreay/Projects/cloudpickle/tests/external.py\x94uNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x19}\x94}\x94(h\x14h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x15\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94u\x86\x94\x86R0.'
    

    Modules can be unregistered via unregister_deep_serialization

    Tests

    One of the example tests with an explicit use case is shown above. On top of this, tests have been added to _lookup_module_and_qualname using the _cloudpickle_testpkg package, and also to the new _is_explicitly_serialized_module function.

    opened by Samreay 58
  • ENH: derive from C-pickler for fast serialization

    ENH: derive from C-pickler for fast serialization

    Summary:

    This PR proposes a new Cloudpickler class, that inherits from the C _pickle.Pickler instead of the python pickle._Pickler, allowing 10x+ speedups for the serialization of large builtin objects such as dicts, lists..

    Disclaimer: a new start

    Moving from the python to the c Pickler requires a fair amount of changes. For this reason, instead of simply adapting the current code to respect the new constraints, I started back from scratch. This allows a new, clean API and structure, that will be hopefully easier to understand for everyone.

    I made a lot of comments, (sometimes overly verbose), to ease the review process of this PR. Eventually, I hope the information they contain can be transfered to a proper project documentation.

    Implementation:

    Changes to python

    As opposed to the python pickler, The CPickler does not expose the save_* family of functions, as well as low level isntructions such as write. These methods can can neither be patched, or called, and the only customization option we had initially was the dispatch table, that is called for all types BUT a few special cases, including classes and functions, the two principal use-cases of cloudpickle.

    As this makes it simply impossible to modify pickling behavior for such types, we patched the C pickler for it to allow a user defined reduction callback for functions and classes. This idea was suggested by @pitrou.

    The direct consequence is that functions and classes now have to follow the save_reduce-load_build pickling/depickling process. Unfortunaltely, this API is not well suited for custom builtin-type saving: in particular, the state setting part of load_build (function that reconstructs an object from a reduce value) assumes all attributes of an object are writeable, which is not the case for C types (especially function.__globals__ and function.__closure__)

    For this reason, we also changed the API of save_reduce, allowing to add a custom state_setter, that will be called at unpickling time.

    You can view the totals changes in this diff

    Individual PRs to CPython:

    • https://github.com/python/cpython/pull/12499 (reducer_override)
    • https://github.com/python/cpython/pull/12588 (state_setter in save_reduce)

    Changes to cloudpickle

    Functions and classes are the two main types affected by this PR. The main challenge was to make the saving process fit into the save_reduce API.

    Outside of these types, the actuall reduction process remains intact.

    However, now that any customization must return a tuple, I decided to adopt a new naming, hopefully clearer naming style for functions. You will see by yourselves.

    How to build this version locally

    Until the final release of Python 3.8, you need to build python from upstream's master branch

    git clone [email protected]:python/cpython.git
    cd cpython
    ./configure
    make
    

    To be able to use external modules you need a virtual environment, using for example the venv module:

    ./python -m venv /path/to/local/virtualenv
    

    Clone and install cloudpickle and its dependencies

    cd /path/to/cloudpickle
    git clone [email protected]:cloudpipe/cloudpickle.git
    git fetch origin pull/ID/head:fast-cloudpickle
    /path/to/local/virtualenv/bin/python -mpip install -rdev-requirements.txt
    /path/to/local/virtualenv/bin/python -mpip install .
    

    Finally, rum the tests:

    /path/to/local/virtualenv/bin/python -mpytest tests/
    

    Bechmarks:

    • Benchmarks of a "concrete", end-to-end use-case using loky can be found here. To run the benchmarks, you also need the master version of loky.
    opened by pierreglaser 41
  • Pickling of generic annotations/types in 3.5+

    Pickling of generic annotations/types in 3.5+

    This PR adds support for pickling annotations on 3.5+, and fixes some problems with generic annotations on 3.7+.

    TODO

    • [x] Backport for 3.5
    • [x] Test that fails with TypeError: type() doesn't support MRO entry resolution; use types.new_class() if not using types.new_class for reconstructing classes
    • [x] Remove typing_extensions dependency
    • [x] Prefix privates with _
    • [x] Add test for pickle_depickle'ing annotated functions/classes

    Details

    The types.new_class change (in _make_skeleton_class) is because of a TypeError: type() doesn't support MRO entry resolution; use types.new_class() error on 3.7+, similar to this issue. Also see https://github.com/python/cpython/pull/6319.

    I'm not sure if there are any downsides to TypeVars being __reduce__'d now. Previously, they were only supported as globals (so always imported, I think).

    The functions try_decompose_generic and get_bases are brittle the way they are written, because they check for attributes. There might be a better way.

    Tests

    Passing, and added some new ones.

    ci downstream ci ray ci joblib ci distributed ci python-nightly ci loky 
    opened by valtron 39
  • deduplicate cloudpickle reducers.

    deduplicate cloudpickle reducers.

    closes #284 related to #364

    About backward compatiblity:

    • this PR removes make_skel_func, fill_function, e.g the previous function cloudpickle used to reconstruct functions, as they were equivalent to the new function_setstate/function_new (modulo some Python 3.8 compatiblity. These functions are important to reconstruct pickles created by previous cloudpickle versions. A simple fix is to keep them inside cloudpickle.py for a few releases and add a FutureWarning inside them saying that an attempt is made into reading old pickle, and that reading them will break in 2 releases.
    • By removing the previous Python < 3.8 CloudPickler class this PR also removes semi-public functions (all the CloudPickler.save_*). These functions are not necessary to read old pickles, but they could be used inside third-party code. To address this, we could keep exposing the previous CloudPickler for the next few releases, but make cloudpickle.dump(s) use the new CloudPickler. This way, we can add a FutureWarning into the previous CloudPickler.__init__, while cloudpickle.dump(s) remains silent.

    Also, the module names don't make much sense now. In the future we should rename cloudpickle_fast.py to cloudpickle.py, and merge it with the previous cloudpickle.py.

    @jakirkham if you want to give #364 another shot, but rebasing on this PR first, I suspect its implementation should be much easier :)

    ci downstream 
    opened by pierreglaser 25
  • Add ability to pickle dynamically create modules

    Add ability to pickle dynamically create modules

    The old logic treated all modules the same, which would fail when unpickling. In save_module detect whether the module has been dynamically created by following the chain of imports. Noteworthy is that imp.find_module doesn't work with submodules (example sckit.tree), so we actually have to split the module name and iterate over each piece.

    Dynamic modules are saved as dictionaries and reconstituted by dynamic_subimport function. While working on the test cases I discovered NotImplemented and Ellipsis also don't work properly (they are introduced into the test dynamic module by exec). I've also addressed that.

    opened by rodrigofarnhamsc 21
  • Optionally use pickle5 (Redux)

    Optionally use pickle5 (Redux)

    Fixes https://github.com/cloudpipe/cloudpickle/issues/179

    Thanks to @pierreglaser's work in PR ( https://github.com/cloudpipe/cloudpickle/pull/368 ), this is a rebased/simplified version of PR ( https://github.com/cloudpipe/cloudpickle/pull/364 ). Otherwise is the same in that it tries to use pickle5 on older Python versions to support out-of-band buffers.

    ci downstream 
    opened by jakirkham 20
  • Making cloudpickle produce

    Making cloudpickle produce "consistent/deterministic" results.

    This question arose in the following context. I have multiple Python processes, and some classes are defined in each process. Sometimes class definitions are shipped from one process to another (using cloudpickle). Sometimes classes may be shipped multiple times or in multiple ways to a given process, and I'd like to deduplicate them based on the output of cloudpickle (that is, if cloudpickle.dumps(class1) == cloudpickle.dumps(class2), then the classes are the "same" and I can throw away one of them. This works, but there are way too many false negatives (that is, two classes really are the same (in some sense), but cloudpickle.dumps gives different results on the two classes.

    Here's one example that sort of illustrates the issue (although there are a number of ways this kind of thing can arise).

    Suppose I do the following.

    import cloudpickle
    
    class Foo1(object):
        def __init__(self):
            pass
    
    serialized1 = cloudpickle.dumps(Foo1)
    Foo2 = cloudpickle.loads(serialized1)
    serialized2 = cloudpickle.dumps(Foo2)
    
    assert serialized1 == serialized2  # This assertion fails.
    

    I'd love for this kind of assertion to succeed. Does anyone know if this is achievable or what the main obstacles are?

    Interestingly, if I iterate this a third time,

    Foo3 = cloudpickle.loads(serialized2)
    serialized3 = cloudpickle.dumps(Foo3)
    
    assert serialized2 == serialized3  # This succeeds.
    

    then the assert succeeds, so maybe it suffices to use cloudpickle.dumps(cloudpickle.loads(cloudpickle.dumps(cls))) to deduplicate classes (though this seems kind of insane, and I haven't tested this extensively). Would you expect this to work?

    One thing that may be related/revealing is the outputs I get if I do something similar in an IPython interpreter (instead of a regular Python interpreter).

    First copy and paste this block into IPython.

    import cloudpickle
    
    class Foo(object):
        def __init__(self):
            pass
    
    serialized1 = cloudpickle.dumps(Foo)
    

    Then copy and paste this block into IPython.

    class Foo(object):
        def __init__(self):
            pass
    
    serialized2 = cloudpickle.dumps(Foo)
    

    Comparing serialized1 and serialized2 next to each other, they are

    serialized1  # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'
    serialized2  # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'
    

    They seem to be the same everywhere except that the first includes the string <ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h and the second includes the string <ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h. Any idea where these strings come from or if it is possible to remove them?

    cc @Wapaul1 @mehrdadn

    opened by robertnishihara 19
  • NumPy arrays serialize more slowly with cloudpickle than pickle

    NumPy arrays serialize more slowly with cloudpickle than pickle

    I would expect pickle and cloudpickle to behave pretty much identically here. Sadly cloudpickle serializes much more slowly.

    In [1]: import numpy as np
    
    In [2]: data = np.random.randint(0, 255, dtype='u1', size=100000000)
    
    In [3]: import cloudpickle, pickle
    
    In [4]: %time len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
    CPU times: user 50.9 ms, sys: 135 ms, total: 186 ms
    Wall time: 185 ms
    Out[4]: 100000161
    
    In [5]: %time len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
    CPU times: user 125 ms, sys: 280 ms, total: 404 ms
    Wall time: 405 ms
    Out[5]: 100000161
    
    opened by mrocklin 19
  • Implement dynamic class provenance tracking to fix isinstance semantics and add support for dynamically defined enums

    Implement dynamic class provenance tracking to fix isinstance semantics and add support for dynamically defined enums

    This is a fix for #244 (and #101) to add support for dynamically defined Enum subclasses.

    Properly adding support for dynamic Enums required to more broadly fix the isinstance semantics as initially requested in #195.

    The proposed solution involves tracking the provenance of pickled dynamic class definitions with a pair of weakref.WeakKeyDictionary / weakref.WeakValueDictionary protected by a threading.Lock.

    enhancement 
    opened by ogrisel 19
  • Use protocol=pickle.HIGHEST_PROTOCOL by default

    Use protocol=pickle.HIGHEST_PROTOCOL by default

    This is a fix for #123.

    opened by ogrisel 19
  • Add support for `abc.abstract*` methods

    Add support for `abc.abstract*` methods

    This PR adds support for abc.abstractproperty, abc.abstractclassmethod, and abc.abstractstaticmethod. The changes here are mostly a duplicate of what was proposed in https://github.com/cloudpipe/cloudpickle/pull/369 by @KristianHolsheimer plus a copy of the tests added in https://github.com/cloudpipe/cloudpickle/pull/371, but using the abc.abstract* methods. Additionally, this PR extends test_abc a bit to include coverage for abstract @propertys.

    As pointed out in https://github.com/cloudpipe/cloudpickle/issues/367#issuecomment-628643963, the abc.abstract* methods added here are now deprecated, but it seems reasonable to add support for them anyways as users will still run into them from time to time (xref https://github.com/cloudpipe/cloudpickle/issues/394)

    Closes https://github.com/cloudpipe/cloudpickle/issues/367

    opened by jrbourbeau 1
  • Fix #440: Incorrect pickles for subclasses of generic classes

    Fix #440: Incorrect pickles for subclasses of generic classes

    This PR fixes #440: Incorrect pickles for subclasses of generic classes.

    As pointed out in the issue comment, the root of the issue lies in _get_bases, which returned incorrect values for subclasses of generic classes. For example, the GLeafAny class in the issue was determined to have original bases of (__main__.Generic[~T],), although its actual base is GDerivedAny, and since that is not an _GenericAlias, GLeafAny shouldn't have __orig_bases__ at all.

    However, in _get_bases, we were using hasattr(typ, '__orig_bases__') to check if the class has __orig_bases__, which would go through to its bases classes, in this case, to GDerivedAny.__orig_bases__. This is wrong; we should only use __orig_bases__ if it's defined on the current class. Thus my change.

    opened by huzecong 1
  • Cannot ignore locks with cloudpickle

    Cannot ignore locks with cloudpickle

    I have an object that I would like to pickle. Unfortunately, the object has a lock within it.

    Is there a way to set up cloudpickle to ignore locks so that it doesn't crash?

    ` import threading import cloudpickle

    class ThirdPartyClass: def init(self): self.internal_lock = threading.RLock() self.something_I_want_pickled = 'important string'

    cloudpickle.dumps(ThirdPartyClass())

    `

    See also #81

    opened by ryanthompson591 0
  • Update `distributed` CI test dependencies

    Update `distributed` CI test dependencies

    This PR adds pytest-asyncio and pytest-rerunfailures as dependencies to distributed downstream CI build.

    pytest-asyncio is needed to run some distributed tests, otherwise they will be skipped. For example, looking at the CI builds in https://github.com/cloudpipe/cloudpickle/pull/432, we see this warning about being skipped for lots of tests:

    ...
    distributed/tests/test_utils.py: 1 warning
    distributed/tests/test_utils_test.py: 4 warnings
    distributed/tests/test_worker.py: 7 warnings
      /opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/_pytest/python.py:172: PytestUnhandledCoroutineWarning: async def functions are not natively supported and have been skipped.
      You need to install a suitable plugin for your async framework, for example:
        - anyio
        - pytest-asyncio
        - pytest-tornasync
        - pytest-trio
        - pytest-twisted
        warnings.warn(PytestUnhandledCoroutineWarning(msg.format(nodeid)))
    

    pytest-rerunfailures was recently added as a test dependency for distributed to automatically re-run some known flaky tests.

    cc @jakirkham

    ci distributed 
    opened by jrbourbeau 3
  • Remove and deprecate unused functions

    Remove and deprecate unused functions

    From what I can tell it appears these functions aren't used anywhere and the test suite has coverage for Tornado coroutines, Ellipsis, and NotImplemented.

    https://github.com/cloudpipe/cloudpickle/blob/343da119685f622da2d1658ef7b3e2516a01817f/tests/cloudpickle_test.py#L798-L804

    https://github.com/cloudpipe/cloudpickle/blob/343da119685f622da2d1658ef7b3e2516a01817f/tests/cloudpickle_test.py#L810-L816

    https://github.com/cloudpipe/cloudpickle/blob/343da119685f622da2d1658ef7b3e2516a01817f/tests/cloudpickle_test.py#L964

    This PR proposes we remove these unused functions, though do let me know if I'm missing something and these utilities are in fact needed for some external code paths.

    opened by jrbourbeau 4
  • Incorrect pickles for subclasses of generic classes

    Incorrect pickles for subclasses of generic classes

    Environment:

    • Python 3.8.7
    • cloudpickle==1.6.0

    I've run into a very curious error that only manifests when a couple of conditions are satisfied:

    • There's a base class GBase that is a generic class (inherits from Generic[T]).
    • There's a derived class that inherits from the non-parameterized base class (inherits from GBase, not GBase[T] or GBase[int]).
    • The classes are defined in a notebook or IPython repl (so they're in __main__, and their code must be pickled).
    • I pickle the derived class, and unpickle it in another process where the class isn't defined.

    What I observe is then:

    • The unpickled class's __module__ is types, instead of __main__.
    • The non-parameterized bases are missing from the MRO of the unpickled class.

    Here's a script to reproduce the issue:

    from typing import Generic, TypeVar
    import multiprocessing as mp
    
    import cloudpickle
    
    T = TypeVar("T")
    
    class Base: pass
    class Derived(Base): pass
    class Leaf(Derived): pass
    
    class GBase(Generic[T]): pass
    class GDerivedAny(GBase): pass
    class GLeafAny(GDerivedAny): pass
    
    class GDerivedInt(GBase[int]): pass
    class GLeafInt(GDerivedInt): pass
    
    class GDerivedT(GBase[T]): pass
    class GLeafT(GDerivedT[T]): pass
    
    klasses = [
        Base, Derived, Leaf, GBase, GDerivedAny, GLeafAny, GDerivedInt, GLeafInt, GDerivedT, GLeafT
    ]
    
    def get_mro(klass):
        return [f"{base.__module__}.{base.__qualname__}" for base in klass.mro()]
    
    def test_klass(klass_pickle):
        # This has to be run in a separate process, where the classes aren't defined.
        return get_mro(cloudpickle.loads(klass_pickle))
    
    with mp.Pool(len(klasses)) as pool:
        mros = pool.map(test_klass, [cloudpickle.dumps(klass) for klass in klasses])
        for klass, mro in zip(klasses, mros):
            expected_mro = [base.__name__ for base in klass.mro()]
            mro_without_module = [name.split(".")[1] for name in mro]
            output = [klass, set(expected_mro) - set(mro_without_module), mro]
            print("".join(repr(x).ljust(35) for x in output))
    

    The output is (each row is class, set of missing base classes, and the MRO of the unpickled class):

    <class '__main__.Base'>            set()                              ['__main__.Base', 'builtins.object']
    <class '__main__.Derived'>         set()                              ['types.Derived', '__main__.Base', 'builtins.object']
    <class '__main__.Leaf'>            set()                              ['types.Leaf', 'types.Derived', '__main__.Base', 'builtins.object']
    <class '__main__.GBase'>           set()                              ['__main__.GBase', 'typing.Generic', 'builtins.object']
    <class '__main__.GDerivedAny'>     {'GBase'}                          ['types.GDerivedAny', 'typing.Generic', 'builtins.object']
    <class '__main__.GLeafAny'>        {'GBase', 'GDerivedAny'}           ['types.GLeafAny', 'typing.Generic', 'builtins.object']
    <class '__main__.GDerivedInt'>     set()                              ['types.GDerivedInt', '__main__.GBase', 'typing.Generic', 'builtins.object']
    <class '__main__.GLeafInt'>        {'GDerivedInt'}                    ['types.GLeafInt', '__main__.GBase', 'typing.Generic', 'builtins.object']
    <class '__main__.GDerivedT'>       set()                              ['types.GDerivedT', '__main__.GBase', 'typing.Generic', 'builtins.object']
    <class '__main__.GLeafT'>          set()                              ['types.GLeafT', 'types.GDerivedT', '__main__.GBase', 'typing.Generic', 'builtins.object']
    

    You can see that all non-parameterized bases are missing from the MRO, and the module for all but the base classes are types.

    bug help wanted 
    opened by huzecong 1
  • Dynamic class reset state on every deserialization

    Dynamic class reset state on every deserialization

    Reproducer:

    # Tested with cloudpickle 1.6.0
    from cloudpickle import dumps, loads
    
    
    class Klass:
        classvar = None
    
    def mutator():
        Klass.classvar = 100
    
    def check():
        print("checking....")
        print(f"   Klass.classvar [{hex(id(Klass))}] = {Klass.classvar}")
    
    
    def failing_case():
        print("Klass", hex(id(Klass)))
        saved = dumps(Klass)
        mutator()
        check()
        loads(saved)
        check()
        loads(saved)
        check()
    
    
    
    if __name__ == '__main__':
        failing_case()
    

    Prints:

    Klass 0x7fc698719980
    checking....
       Klass.classvar [0x7fc698719980] = 100
    checking....
       Klass.classvar [0x7fc698719980] = None
    checking....
       Klass.classvar [0x7fc698719980] = None
    

    After each loads(saved), the state in Klass is being reset unexpectedly.

    This problem can appear like a tricky race condition in distributed, multi-threaded framework, such as Dask. See example https://gist.github.com/sklam/98e7c98ce909e76a3fa7904754db7bd9.

    I created a patch for this in the vendored cloudpickle in Numba: https://github.com/numba/numba/pull/7388. Please let me know if there will be problems with the way I am fixing it. If it is okay, I can submit the PR here.

    opened by sklam 7
  • Classes with `cached_property` can't be cloudpickled

    Classes with `cached_property` can't be cloudpickled

    I believe this is a bug.

    Here is an example:

    from functools import cached_property
    import pickle
    import cloudpickle
    
    
    class MyExample:
        def __init__(self, foo):
            self.foo = foo
    
        @cached_property
        def bar(self) -> int:
            return self.foo * 3
    
    
    example = MyExample(2)
    pickle.dumps(example)      # works
    cloudpickle.dumps(example) # crashes
    

    This fails with TypeError: cannot pickle '_thread.RLock' object.

    enhancement help wanted 
    opened by durumu 1
  • Private dispatch table not checked

    Private dispatch table not checked

    I'm attempting to register a custom reducer with cloudpickle. In my use case, I need multiple CloudPickler's to serialize an object with different behavior, so I can't override the class-level global dispatch table.

    To demonstrate, assume I have the following class:

    class CustomClass:
        def __init__(self, *args):
            self.x = len(args)
    
    c = CustomClass()
    

    I can define a custom reducer and register it with a private dispatch table:

    def custom_reducer(obj):
        return CustomClass, (0,)
    
    io_buffer = io.BytesIO()
    custom_pickler = pickle.Pickler(io_buffer)
    custom_pickler.dispatch_table = {CustomClass: custom_reducer}
    custom_pickler.dump(c)
    
    io_buffer.seek(0)
    c1 = pickle.load(io_buffer)
    assert c1.x == 1
    

    I can also prove that this doesn't affect the global dispatch table:

    c2 = pickle.loads(pickle.dumps(c))
    assert c2.x == 0
    

    Unfortunately, when I try the same behavior with cloudpickle, it doesn't respect the private dispatch table:

    io_buffer = io.BytesIO()
    custom_pickler = cloudpickle.CloudPickler(io_buffer)
    custom_pickler.dispatch_table = {CustomClass: custom_reducer}
    custom_pickler.dump(c)
    
    io_buffer.seek(0)
    c3 = cloudpickle.load(io_buffer)
    assert c3.x == 1 # assertion fails becase c3.x == 0
    

    (reference to how this was implemented in pickle)

    enhancement 
    opened by wuisawesome 3
  • 1.6.0: pytest is failing

    1.6.0: pytest is failing

    I'm trying to package your module as rpm packag. So I'm using typical in such case build, install and test cycle used on building package from non-root account:

    • "setup.py build"
    • "setup.py install --root </install/prefix>"
    • "pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

    May I ask for help because few units are failing:

    + PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-cloudpickle-1.6.0-6.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-cloudpickle-1.6.0-6.fc35.x86_64/usr/lib/python3.8/site-packages
    + /usr/bin/pytest -ra --import-mode=importlib
    =========================================================================== test session starts ============================================================================
    platform linux -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
    benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
    Using --randomly-seed=739258526
    rootdir: /home/tkloczko/rpmbuild/BUILD/cloudpickle-1.6.0, configfile: tox.ini
    plugins: forked-1.3.0, shutil-1.7.0, virtualenv-1.7.0, expect-1.1.0, flake8-1.0.7, timeout-1.4.2, betamax-0.8.1, freezegun-0.4.2, aspectlib-1.5.2, toolbox-0.5, rerunfailures-9.1.1, requests-mock-1.9.3, cov-2.12.1, pyfakefs-4.5.0, flaky-3.7.0, benchmark-3.4.1, xdist-2.3.0, pylama-7.7.1, datadir-1.3.1, regressions-2.2.0, cases-3.6.3, xprocess-0.18.1, black-0.3.12, anyio-3.3.0, asyncio-0.15.1, trio-0.7.0, httpbin-1.0.0, subtests-0.5.0, isort-2.0.0, hypothesis-6.14.6, mock-3.6.1, profiling-1.7.0, randomly-3.8.0, Faker-8.12.1
    collected 7 items / 2 errors / 5 selected
    
    ================================================================================== ERRORS ==================================================================================
    ________________________________________________________________ ERROR collecting tests/cloudpickle_test.py ________________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/cloudpickle-1.6.0/tests/cloudpickle_test.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/cloudpickle_test.py:52: in <module>
        from .testutils import subprocess_pickle_echo
    E   ImportError: attempted relative import with no known parent package
    ______________________________________________________________ ERROR collecting tests/test_backward_compat.py ______________________________________________________________
    ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/cloudpickle-1.6.0/tests/test_backward_compat.py'.
    Hint: make sure your test modules/packages have valid Python names.
    Traceback:
    tests/test_backward_compat.py:17: in <module>
        from .generate_old_pickles import PICKLE_DIRECTORY
    E   ImportError: attempted relative import with no known parent package
    ========================================================================= short test summary info ==========================================================================
    ERROR tests/cloudpickle_test.py
    ERROR tests/test_backward_compat.py
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ============================================================================ 2 errors in 0.49s =============================================================================
    pytest-xprocess reminder::Be sure to terminate the started process by running 'pytest --xkill' if you have not explicitly done so in your fixture with 'xprocess.getinfo(<process_name>).terminate()'.
    
    help wanted 
    opened by kloczek 8
Releases(v2.0.0)
  • v0.4.4(May 14, 2018)

  • v0.5.3(May 14, 2018)

    Installation

    pip install cloudpickle
    

    Changes Since v0.5.2

    • Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types (issue #144).

    • itertools objects can also pickled (PR #156).

    • logging.RootLogger can be also pickled (PR #160).

    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Feb 13, 2018)

    Installation

    pip install cloudpickle
    

    Changes Since v0.4.2

    • Fixed a regression: AttributeError when loading pickles that hold a reference to a dynamically defined class from the __main__ module. (issue #131).
    • Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types. (issue #144)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Oct 26, 2017)

    Installation

    pip install cloudpickle
    

    Changes Since v0.4.0

    • Fixed a crash when pickling dynamic classes whose __dict__ attribute was defined as a property. Most notably, this affected dynamic namedtuples in Python 2. (https://github.com/cloudpipe/cloudpickle/pull/113)
    • Cloudpickle now preserves the __module__ attribute of functions (https://github.com/cloudpipe/cloudpickle/pull/118/).
    • Fixed a crash when pickling modules that don't have a __package__ attribute (https://github.com/cloudpipe/cloudpickle/pull/116).
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Aug 9, 2017)

    Get it while it's briny with

    pip install cloudpickle
    

    Ch-ch-ch-changes

    • Fix functions with empty cells (https://github.com/cloudpipe/cloudpickle/pull/91)
    • Allow pickling Logger objects (https://github.com/cloudpipe/cloudpickle/pull/96)
    • Fix crash when pickling dynamic class cycles (https://github.com/cloudpipe/cloudpickle/pull/102)
    • Support WeakSets and ABCMeta instances (https://github.com/cloudpipe/cloudpickle/pull/104)
    • Ignore "None" modules added to sys.modules (https://github.com/cloudpipe/cloudpickle/pull/107)
    • Remove non-standard __transient__ support (https://github.com/cloudpipe/cloudpickle/pull/110)
    • Catch exception from pickle.whichmodule() (https://github.com/cloudpipe/cloudpickle/pull/112)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(May 31, 2017)

    Get it while it's hot with

    pip install cloudpickle
    

    Changes since v0.2.2

    • Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80)
    • Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90)
    • Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88)
    • Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(May 30, 2017)

    Get it while it's hot with

    pip install cloudpickle
    

    Changes

    • Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80)
    • Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90)
    • Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88)
    • Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Sep 5, 2015)

    cloudpickle bug fix release v0.1.1

    • fixed save_classmethod (#41)
    • now allows users to import cloudpickle to dump and load pickled data (#37)
    • no more pickling of closed files, was broken on Python 3 (#32)
    • more tests!
    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Apr 16, 2015)

Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

orjson orjson is a fast, correct JSON library for Python. It benchmarks as the fastest Python library for JSON and is more correct than the standard j

null 2.4k Sep 22, 2021
MessagePack serializer implementation for Python msgpack.org[Python]

MessagePack for Python What's this MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JS

MessagePack 1.5k Sep 22, 2021
A lightweight library for converting complex objects to and from simple Python datatypes.

marshmallow: simplified object serialization marshmallow is an ORM/ODM/framework-agnostic library for converting complex datatypes, such as objects, t

marshmallow-code 5.7k Sep 24, 2021
Python bindings for the simdjson project.

pysimdjson Python bindings for the simdjson project, a SIMD-accelerated JSON parser. If SIMD instructions are unavailable a fallback parser is used, m

Tyler Kennedy 474 Sep 24, 2021
Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.

jsonpickle jsonpickle is a library for the two-way conversion of complex Python objects and JSON. jsonpickle builds upon the existing JSON encoders, s

null 962 Sep 16, 2021
Generic ASN.1 library for Python

ASN.1 library for Python This is a free and open source implementation of ASN.1 types and codecs as a Python package. It has been first written to sup

Ilya Etingof 187 Sep 20, 2021
simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON <http://json.org> encoder and decoder for Python 3.3+ with legacy suppo

null 1.4k Sep 24, 2021
Ultra fast JSON decoder and encoder written in C with Python bindings

UltraJSON UltraJSON is an ultra fast JSON encoder and decoder written in pure C with bindings for Python 3.6+. Install with pip: $ python -m pip insta

null 3.4k Sep 24, 2021
🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)

srsly: Modern high-performance serialization utilities for Python This package bundles some of the best Python serialization libraries into one standa

Explosion 230 Sep 9, 2021
Python wrapper around rapidjson

python-rapidjson Python wrapper around RapidJSON Authors: Ken Robbins <[email protected]> Lele Gaifax <[email protected]> License: MIT License Sta

null 429 Sep 17, 2021
Protocol Buffers - Google's data interchange format

Protocol Buffers - Google's data interchange format Copyright 2008 Google Inc. https://developers.google.com/protocol-buffers/ Overview Protocol Buffe

Protocol Buffers 50.8k Sep 23, 2021
Crappy tool to convert .scw files to .json and and vice versa.

SCW-JSON-TOOL Crappy tool to convert .scw files to .json and vice versa. How to use Run main.py file with two arguments: python main.py <scw2json or j

Fred31 5 May 14, 2021
FlatBuffers: Memory Efficient Serialization Library

FlatBuffers FlatBuffers is a cross platform serialization library architected for maximum memory efficiency. It allows you to directly access serializ

Google 16.8k Sep 23, 2021