Extended pickling support for Python objects

Last update: Jan 5, 2023

Related tags

Data Serialization cloudpickle

Overview

cloudpickle

cloudpickle makes it possible to serialize Python constructs not supported by the default pickle module from the Python standard library.

cloudpickle is especially useful for cluster computing where Python code is shipped over the network to execute on remote hosts, possibly close to the data.

Among other things, cloudpickle supports pickling for lambda functions along with functions and classes defined interactively in the __main__ module (for instance in a script, a shell or a Jupyter notebook).

Cloudpickle can only be used to send objects between the exact same version of Python.

Using cloudpickle for long-term object storage is not supported and strongly discouraged.

Security notice: one should only load pickle data from trusted sources as otherwise pickle.load can lead to arbitrary code execution resulting in a critical security vulnerability.

Installation

The latest release of cloudpickle is available from pypi:

pip install cloudpickle

Examples

Pickling a lambda expression:

>>> import cloudpickle
>>> squared = lambda x: x ** 2
>>> pickled_lambda = cloudpickle.dumps(squared)

>>> import pickle
>>> new_squared = pickle.loads(pickled_lambda)
>>> new_squared(2)
4

Pickling a function interactively defined in a Python shell session (in the __main__ module):

>>> CONSTANT = 42
>>> def my_function(data: int) -> int:
...     return data + CONSTANT
...
>>> pickled_function = cloudpickle.dumps(my_function)
>>> depickled_function = pickle.loads(pickled_function)
>>> depickled_function
<function __main__.my_function(data:int) -> int>
>>> depickled_function(43)
85

Running the tests

With tox, to test run the tests for all the supported versions of Python and PyPy:
```
pip install tox
tox
```
or alternatively for a specific environment:
```
tox -e py37
```
With py.test to only run the tests for your current version of Python:
```
pip install -r dev-requirements.txt
PYTHONPATH='.:tests' py.test
```

History

cloudpickle was initially developed by picloud.com and shipped as part of the client SDK.

A copy of cloudpickle.py was included as part of PySpark, the Python interface to Apache Spark. Davies Liu, Josh Rosen, Thom Neale and other Apache Spark developers improved it significantly, most notably to add support for PyPy and Python 3.

The aim of the cloudpickle project is to make that work available to a wider audience outside of the Spark ecosystem and to make it easier to improve it further notably with the help of a dedicated non-regression test suite.

Comments

Add ability to register modules to be deeply serialized

This PR is based on the work done by @kinghuang in PR391, but takes on the feedback provided by @ogrisel and adds testing.

Fixes #206

Issue Summary

To summarise the issue, in many cases cloudpickle is used to send code for remote execution. This is used in dask, prefect, mlflow and many libraries. For local functions, this works perfectly fine. But for any non-local function or class, cloudpickle assumes that external modules and packages are available at the location of deserialization. This may either not be the case, or the version of the package available at the end point may be different.

This PR adds the option to register modules for deep serialization by providing a register_deep_serialization function which takes either a name or a module. This is the original register_dynamic_module by @kinghuang.

import cloudpickle
from tests import external

cloudpickle.register_deep_serialization("tests.external")  # string name works
cloudpickle.register_deep_serialization(external)          # You can pass the module itself
cloudpickle.register_deep_serialization("tests")           # or the parent string/module

output = cloudpickle.dumps(external.an_external_function)

Original dumps:

b'\x80\x05\x95+\x00\x00\x00\x00\x00\x00\x00\x8c\x0etests.external\x94\x8c\x14an_external_function\x94\x93\x94.'

dumps after registering tests.external for deep serialization:

b'\x80\x05\x95<\x02\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\r_builtin_type\x94\x93\x94\x8c\nLambdaType\x94\x85\x94R\x94(h\x02\x8c\x08CodeType\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x01KCC\x04d\x01S\x00\x94N\x8c\x11this is something\x94\x86\x94))\x8c5/Users/samreay/Projects/cloudpickle/tests/external.py\x94\x8c\x14an_external_function\x94K\x04C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94\x8c\x05tests\x94\x8c\x08__name__\x94\x8c\x0etests.external\x94\x8c\x08__file__\x94\x8c5/Users/samreay/Projects/cloudpickle/tests/external.py\x94uNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x19}\x94}\x94(h\x14h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x15\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94u\x86\x94\x86R0.'

Modules can be unregistered via unregister_deep_serialization

Tests

One of the example tests with an explicit use case is shown above. On top of this, tests have been added to _lookup_module_and_qualname using the _cloudpickle_testpkg package, and also to the new _is_explicitly_serialized_module function.

opened by Samreay 58

ENH: derive from C-pickler for fast serialization
Summary:

This PR proposes a new Cloudpickler class, that inherits from the C _pickle.Pickler instead of the python pickle._Pickler, allowing 10x+ speedups for the serialization of large builtin objects such as dicts, lists..

Disclaimer: a new start

Moving from the python to the c Pickler requires a fair amount of changes. For this reason, instead of simply adapting the current code to respect the new constraints, I started back from scratch. This allows a new, clean API and structure, that will be hopefully easier to understand for everyone.

I made a lot of comments, (sometimes overly verbose), to ease the review process of this PR. Eventually, I hope the information they contain can be transfered to a proper project documentation.

Implementation:

Changes to python

As opposed to the python pickler, The CPickler does not expose the save_* family of functions, as well as low level isntructions such as write. These methods can can neither be patched, or called, and the only customization option we had initially was the dispatch table, that is called for all types BUT a few special cases, including classes and functions, the two principal use-cases of cloudpickle.

As this makes it simply impossible to modify pickling behavior for such types, we patched the C pickler for it to allow a user defined reduction callback for functions and classes. This idea was suggested by @pitrou.

The direct consequence is that functions and classes now have to follow the save_reduce-load_build pickling/depickling process. Unfortunaltely, this API is not well suited for custom builtin-type saving: in particular, the state setting part of load_build (function that reconstructs an object from a reduce value) assumes all attributes of an object are writeable, which is not the case for C types (especially function.__globals__ and function.__closure__)

For this reason, we also changed the API of save_reduce, allowing to add a custom state_setter, that will be called at unpickling time.

You can view the totals changes in this diff

Individual PRs to CPython:

https://github.com/python/cpython/pull/12499 (reducer_override)

https://github.com/python/cpython/pull/12588 (state_setter in save_reduce)

Changes to cloudpickle

Functions and classes are the two main types affected by this PR. The main challenge was to make the saving process fit into the save_reduce API.

Outside of these types, the actuall reduction process remains intact.

However, now that any customization must return a tuple, I decided to adopt a new naming, hopefully clearer naming style for functions. You will see by yourselves.

How to build this version locally

Until the final release of Python 3.8, you need to build python from upstream's master branch

git clone [email protected]:python/cpython.git cd cpython ./configure make

To be able to use external modules you need a virtual environment, using for example the venv module:

./python -m venv /path/to/local/virtualenv

Clone and install cloudpickle and its dependencies

cd /path/to/cloudpickle git clone [email protected]:cloudpipe/cloudpickle.git git fetch origin pull/ID/head:fast-cloudpickle /path/to/local/virtualenv/bin/python -mpip install -rdev-requirements.txt /path/to/local/virtualenv/bin/python -mpip install .

Finally, rum the tests:

/path/to/local/virtualenv/bin/python -mpytest tests/

Bechmarks:

Benchmarks of a "concrete", end-to-end use-case using loky can be found here. To run the benchmarks, you also need the master version of loky.
opened by pierreglaser 41
Pickling of generic annotations/types in 3.5+
This PR adds support for pickling annotations on 3.5+, and fixes some problems with generic annotations on 3.7+.

TODO

[x] Backport for 3.5

[x] Test that fails with TypeError: type() doesn't support MRO entry resolution; use types.new_class() if not using types.new_class for reconstructing classes

[x] Remove typing_extensions dependency

[x] Prefix privates with _

[x] Add test for pickle_depickle'ing annotated functions/classes

Details

The types.new_class change (in _make_skeleton_class) is because of a TypeError: type() doesn't support MRO entry resolution; use types.new_class() error on 3.7+, similar to this issue. Also see https://github.com/python/cpython/pull/6319.

I'm not sure if there are any downsides to TypeVars being __reduce__'d now. Previously, they were only supported as globals (so always imported, I think).

The functions try_decompose_generic and get_bases are brittle the way they are written, because they check for attributes. There might be a better way.

Tests

Passing, and added some new ones.
ci downstream ci ray ci joblib ci distributed ci python-nightly ci loky
opened by valtron 39
deduplicate cloudpickle reducers.
closes #284 related to #364

About backward compatiblity:

this PR removes make_skel_func, fill_function, e.g the previous function cloudpickle used to reconstruct functions, as they were equivalent to the new function_setstate/function_new (modulo some Python 3.8 compatiblity. These functions are important to reconstruct pickles created by previous cloudpickle versions. A simple fix is to keep them inside cloudpickle.py for a few releases and add a FutureWarning inside them saying that an attempt is made into reading old pickle, and that reading them will break in 2 releases.

By removing the previous Python < 3.8 CloudPickler class this PR also removes semi-public functions (all the CloudPickler.save_*). These functions are not necessary to read old pickles, but they could be used inside third-party code. To address this, we could keep exposing the previous CloudPickler for the next few releases, but make cloudpickle.dump(s) use the new CloudPickler. This way, we can add a FutureWarning into the previous CloudPickler.__init__, while cloudpickle.dump(s) remains silent.

Also, the module names don't make much sense now. In the future we should rename cloudpickle_fast.py to cloudpickle.py, and merge it with the previous cloudpickle.py.

@jakirkham if you want to give #364 another shot, but rebasing on this PR first, I suspect its implementation should be much easier :)
ci downstream
opened by pierreglaser 25
Add ability to pickle dynamically create modules

The old logic treated all modules the same, which would fail when unpickling. In save_module detect whether the module has been dynamically created by following the chain of imports. Noteworthy is that imp.find_module doesn't work with submodules (example sckit.tree), so we actually have to split the module name and iterate over each piece.

Dynamic modules are saved as dictionaries and reconstituted by dynamic_subimport function. While working on the test cases I discovered NotImplemented and Ellipsis also don't work properly (they are introduced into the test dynamic module by exec). I've also addressed that.

opened by rodrigofarnhamsc 21
Optionally use pickle5 (Redux)

Fixes https://github.com/cloudpipe/cloudpickle/issues/179

Thanks to @pierreglaser's work in PR ( https://github.com/cloudpipe/cloudpickle/pull/368 ), this is a rebased/simplified version of PR ( https://github.com/cloudpipe/cloudpickle/pull/364 ). Otherwise is the same in that it tries to use pickle5 on older Python versions to support out-of-band buffers.
ci downstream

opened by jakirkham 20
Implement dynamic class provenance tracking to fix isinstance semantics and add support for dynamically defined enums

This is a fix for #244 (and #101) to add support for dynamically defined Enum subclasses.

Properly adding support for dynamic Enums required to more broadly fix the isinstance semantics as initially requested in #195.

The proposed solution involves tracking the provenance of pickled dynamic class definitions with a pair of weakref.WeakKeyDictionary / weakref.WeakValueDictionary protected by a threading.Lock.
enhancement

opened by ogrisel 19

Making cloudpickle produce "consistent/deterministic" results.

This question arose in the following context. I have multiple Python processes, and some classes are defined in each process. Sometimes class definitions are shipped from one process to another (using cloudpickle). Sometimes classes may be shipped multiple times or in multiple ways to a given process, and I'd like to deduplicate them based on the output of cloudpickle (that is, if cloudpickle.dumps(class1) == cloudpickle.dumps(class2), then the classes are the "same" and I can throw away one of them. This works, but there are way too many false negatives (that is, two classes really are the same (in some sense), but cloudpickle.dumps gives different results on the two classes.

Here's one example that sort of illustrates the issue (although there are a number of ways this kind of thing can arise).

Suppose I do the following.

import cloudpickle

class Foo1(object):
    def __init__(self):
        pass

serialized1 = cloudpickle.dumps(Foo1)
Foo2 = cloudpickle.loads(serialized1)
serialized2 = cloudpickle.dumps(Foo2)

assert serialized1 == serialized2  # This assertion fails.

I'd love for this kind of assertion to succeed. Does anyone know if this is achievable or what the main obstacles are?

Interestingly, if I iterate this a third time,

Foo3 = cloudpickle.loads(serialized2)
serialized3 = cloudpickle.dumps(Foo3)

assert serialized2 == serialized3  # This succeeds.

then the assert succeeds, so maybe it suffices to use cloudpickle.dumps(cloudpickle.loads(cloudpickle.dumps(cls))) to deduplicate classes (though this seems kind of insane, and I haven't tested this extensively). Would you expect this to work?

One thing that may be related/revealing is the outputs I get if I do something similar in an IPython interpreter (instead of a regular Python interpreter).

First copy and paste this block into IPython.

import cloudpickle

class Foo(object):
    def __init__(self):
        pass

serialized1 = cloudpickle.dumps(Foo)

Then copy and paste this block into IPython.

class Foo(object):
    def __init__(self):
        pass

serialized2 = cloudpickle.dumps(Foo)

Comparing serialized1 and serialized2 next to each other, they are

serialized1  # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'
serialized2  # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'

They seem to be the same everywhere except that the first includes the string <ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h and the second includes the string <ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h. Any idea where these strings come from or if it is possible to remove them?

cc @Wapaul1 @mehrdadn

opened by robertnishihara 19

NumPy arrays serialize more slowly with cloudpickle than pickle

I would expect pickle and cloudpickle to behave pretty much identically here. Sadly cloudpickle serializes much more slowly.

In [1]: import numpy as np

In [2]: data = np.random.randint(0, 255, dtype='u1', size=100000000)

In [3]: import cloudpickle, pickle

In [4]: %time len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 50.9 ms, sys: 135 ms, total: 186 ms
Wall time: 185 ms
Out[4]: 100000161

In [5]: %time len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 125 ms, sys: 280 ms, total: 404 ms
Wall time: 405 ms
Out[5]: 100000161

opened by mrocklin 19

Remove non-standard __transient__ support
The __transient__ dunder attribute is not the standard way to prevent attributes from being pickled. Instead, the standard approach is to use the __getstate__ and __setstate__ magic methods.

Considering that:

This is an old fix that was implemented maybe for unsupported Python versions (i.e.: Python 2.6).

Nobody knows what it is doing there exactly or why it was added.

Having this special non-standard case implemented makes the code more complex and may result in unexpected behavior (see #108).

This code was not even covered by tests, so removing it should increase code coverage and make the module more robust.

I propose to remove any support for the non-standard approach.

As mentioned in #108, some other projects might be using this attribute. But when looking at those projects:

Most simply have copies of the cloudpickle.py file (hence the match when searching for __transient__).

Others simply seem to be using __transient__ but without depending on cloudpickle as an external module dependency.

Most seem to be not very relevant (i.e.: fewer than 2 stars).

I think even though this change may break other's code it is an unlikely scenario. Anyway, if that was the case, I think they should be fixing their code rather than making cloudpickle carry ugly fixes. Also, they can always choose to use an older cloudpickle version from PyPi.

Fixes #108.
opened by Peque 18
Fix cloudpickle incompatibilities on early Python 3.5 versions

Closes #360 . cloudpickle 1.4.0 is not compatible with early Python 3.5 versions. This should fix it.

Note that I did not set up any CI for Python 3.5.[0-2], I simply tested it on my machine using fresh conda envs.

@vedran If you have some time, could you tell me if this branch fixes the problems that made you create #360?

I would be tempted to release a bugfix version by tonight since this bug completely breaks cloudpickle on Python 3.5.

opened by pierreglaser 17

Fix NamedTuple issues on Python 3.9

This PR fixes issue #460. Two changes were required. First, if __module__ was present in obj.__dict__, we need to pass it along to type_kwargs. See error message below.

cls = <class 'typing.NamedTupleMeta'>, typename = 'MyTuple', bases = (<class 'typing.NamedTuple'>,), ns = {'__orig_bases__': (<function NamedTuple at 0x7fc0780f9310>,), '__slots__': ()}

    def __new__(cls, typename, bases, ns):
        assert bases[0] is _NamedTuple
        types = ns.get('__annotations__', {})
        default_names = []
        for field_name in types:
            if field_name in ns:
                default_names.append(field_name)
            elif default_names:
                raise TypeError(f"Non-default namedtuple field {field_name} "
                                f"cannot follow default field"
                                f"{'s' if len(default_names) > 1 else ''} "
                                f"{', '.join(default_names)}")
        nm_tpl = _make_nmtuple(typename, types.items(),
                               defaults=[ns[n] for n in default_names],
>                              module=ns['__module__'])
E       KeyError: '__module__'

Second, if we pass __slots__ and __module__ to type_kwargs then we get the following error:

cls = <class 'typing.NamedTupleMeta'>, typename = 'MyTuple', bases = (<class 'typing.NamedTuple'>,)
ns = {'__module__': 'tests.cloudpickle_test', '__orig_bases__': (<function NamedTuple at 0x7fa300131310>,), '__slots__': ()}

    def __new__(cls, typename, bases, ns):
        assert bases[0] is _NamedTuple
        types = ns.get('__annotations__', {})
        default_names = []
        for field_name in types:
            if field_name in ns:
                default_names.append(field_name)
            elif default_names:
                raise TypeError(f"Non-default namedtuple field {field_name} "
                                f"cannot follow default field"
                                f"{'s' if len(default_names) > 1 else ''} "
                                f"{', '.join(default_names)}")
        nm_tpl = _make_nmtuple(typename, types.items(),
                               defaults=[ns[n] for n in default_names],
                               module=ns['__module__'])
        # update from user namespace without overriding special namedtuple attributes
        for key in ns:
            if key in _prohibited:
>               raise AttributeError("Cannot overwrite NamedTuple attribute " + key)
E               AttributeError: Cannot overwrite NamedTuple attribute __slots__

/Users/ryanc/opt/anaconda3/lib/python3.9/typing.py:1884: AttributeError

To resolve this, I deleted the lines passing __slots__ to type_kwargs. Our unit test test_instance_with_slots still passes with this change. The deleted __slots__ lines were written 4 years ago and are possibly no longer useful. If there is reason to believe removing it could cause a regression, we should at least add a unit test that properly tests the functionality provided by these lines.

I've run all unit tests locally with Python 3.6, 3.7, 3.8, 3.9, and 3.10 and verified non-regression. The new NamedTuple test fails on develop with Python 3.9 and 3.10 but passes on this branch. I'm happy to iterate here if there are changes needed.

opened by RyanClark2k 1

2.2.0: pytest (7.2) is failing in two units

I'm packaging your module as an rpm package so I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

python3 -sBm build -w --no-isolation
because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
install .whl file in </install/prefix>
run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

Looks like cloudpickle test suite is failing with pytest 7.2. Here is pytest output:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-cloudpickle-2.2.0-4.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-cloudpickle-2.2.0-4.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/cloudpickle-2.2.0, configfile: tox.ini
collected 256 items

tests/cloudpickle_file_test.py .......
tests/cloudpickle_test.py ...................................F.....................................................................................................................F......................................................s.................................
tests/test_backward_compat.py .......

================================================================================= FAILURES =================================================================================
________________________________________________________________ CloudPickleTest.test_dynamic_pytest_module ________________________________________________________________

self = <tests.cloudpickle_test.CloudPickleTest testMethod=test_dynamic_pytest_module>

    def test_dynamic_pytest_module(self):
        # Test case for pull request https://github.com/cloudpipe/cloudpickle/pull/116
        import py

        def f():
            s = py.builtin.set([1])
            return s.pop()

        # some setup is required to allow pytest apimodules to be correctly
        # serializable.
        from cloudpickle import CloudPickler
        from cloudpickle import cloudpickle_fast as cp_fast
>       CloudPickler.dispatch_table[type(py.builtin)] = cp_fast._module_reduce
E       AttributeError: module 'py' has no attribute 'builtin'

tests/cloudpickle_test.py:1482: AttributeError
___________________________________________________________ Protocol2CloudPickleTest.test_dynamic_pytest_module ____________________________________________________________

self = <tests.cloudpickle_test.Protocol2CloudPickleTest testMethod=test_dynamic_pytest_module>

    def test_dynamic_pytest_module(self):
        # Test case for pull request https://github.com/cloudpipe/cloudpickle/pull/116
        import py

        def f():
            s = py.builtin.set([1])
            return s.pop()

        # some setup is required to allow pytest apimodules to be correctly
        # serializable.
        from cloudpickle import CloudPickler
        from cloudpickle import cloudpickle_fast as cp_fast
>       CloudPickler.dispatch_table[type(py.builtin)] = cp_fast._module_reduce
E       AttributeError: module 'py' has no attribute 'builtin'

tests/cloudpickle_test.py:1482: AttributeError
========================================================================= short test summary info ==========================================================================
SKIPPED [1] tests/cloudpickle_test.py:2261: Need Pickle Protocol 5 or later
FAILED tests/cloudpickle_test.py::CloudPickleTest::test_dynamic_pytest_module - AttributeError: module 'py' has no attribute 'builtin'
FAILED tests/cloudpickle_test.py::Protocol2CloudPickleTest::test_dynamic_pytest_module - AttributeError: module 'py' has no attribute 'builtin'
================================================================ 2 failed, 253 passed, 1 skipped in 14.34s =================================================================

Here is list of installed modules in build env

Package           Version
----------------- --------------
appdirs           1.4.4
asn1crypto        1.5.1
attrs             22.1.0
bcrypt            3.2.2
Brlapi            0.8.3
build             0.9.0
cffi              1.15.1
contourpy         1.0.6
cryptography      38.0.1
cssselect         1.1.0
cycler            0.11.0
distro            1.8.0
dnspython         2.2.1
exceptiongroup    1.0.0
extras            1.0.0
fixtures          4.0.0
fonttools         4.38.0
gpg               1.17.1-unknown
iniconfig         1.1.1
kiwisolver        1.4.4
libcomps          0.1.19
louis             3.23.0
lxml              4.9.1
matplotlib        3.6.2
mock              4.0.3
numpy             1.23.1
olefile           0.46
packaging         21.3
pbr               5.9.0
pep517            0.13.0
Pillow            9.3.0
pip               22.3.1
pluggy            1.0.0
ply               3.11
psutil            5.9.2
pyasn1            0.4.8
pyasn1-modules    0.2.8
pycparser         2.21
PyGObject         3.42.2
pyparsing         3.0.9
pytest            7.2.0
python-dateutil   2.8.2
PyYAML            6.0
rpm               4.17.0
scour             0.38.2
setuptools        65.6.3
six               1.16.0
testtools         2.5.0
tomli             2.0.1
tornado           6.2
tpm2-pkcs11-tools 1.33.7
tpm2-pytss        1.1.0
typing_extensions 4.4.0
wheel             0.38.4

opened by kloczek 1

Exception line numbering is wrong in Python 3.10.8

Hi 👋

Behaviour in 3.8:

Python 3.8.9 (default, Apr 13 2022, 08:48:06) 
Type "help", "copyright", "credits" or "license" for more information.
>>> def add(x, y):
...     if x == 2:
...         raise Exception(f'Kapput: problem with x={x} and y={y}')
...     else:
...         return x + y
... 
>>> add(2, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in add
Exception: Kapput: problem with x=2 and y=2
>>> import cloudpickle
>>> cloudpickle.loads(cloudpickle.dumps(add))(2, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in add
Exception: Kapput: problem with x=2 and y=2

Behaviour in 3.10:

Python 3.10.8 (main, Oct 13 2022, 09:48:40) [Clang 14.0.0 (clang-1400.0.29.102)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def add(x, y):
...     if x == 2:
...         raise Exception(f'Kapput: problem with x={x} and y={y}')
...     else:
...         return x + y
... 
>>> import cloudpickle
>>> add(2, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in add
Exception: Kapput: problem with x=2 and y=2
>>> cloudpickle.loads(cloudpickle.dumps(add))(2, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in add
Exception: Kapput: problem with x=2 and y=2

Note the difference in the second line from the bottom: In 3.10, the line number is wrong.

This works fine with plain Python pickle.

opened by henrifroese 0

cloudpickle cannot pickle '_jpype._JField' objects

So, I've been working on a project which involves implementing reinforcement learning in a server-client app. The server is written in Java and the client is in Python, which is why I use JPype to import some server classes.

After importing the necessary packages and creating the environment using PettingZoo, it is time to create the model and train it using Stable-Baselines3, but the problem is that when I use Supersuit, it needs to pickle and unpickle the environment, and because the environment contains many Java objects, an error is thrown: TypeError: cannot pickle '_jpype._JField' object.

The normal Pickle package does not support JField objects, but in the JPype library, there is a JPickle version that supports JField objects. I tried to modify the cloudpickle_fast.py to add the JPickle package but I end up having a problem with the cloudpickle.loads()

Here is what I modified in cloudpickle_fast.py:

from jpype.pickle import JPickler, JUnpickler

    def dump(self, obj):
        try:
            return Pickler.dump(self, obj)
        except RuntimeError as e:
            if "recursion" in e.args[0]:
                msg = (
                    "Could not pickle object as excessively deep recursion "
                    "required."
                )
                raise pickle.PicklingError(msg) from e
            else:
                raise
        except TypeError as e:
            return JPickler.dump(self, obj)

And here is the full stacktrace I get:

---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
Input In [11], in <cell line: 6>()
      1 env = MARL_Env_Parallel(4)
      5 env = ss.pettingzoo_env_to_vec_env_v1(env)
----> 6 env = ss.concat_vec_envs_v1(env, 1, num_cpus=1, base_class='stable_baselines3')

File ~\anaconda3\envs\gym\lib\site-packages\supersuit\vector\vector_constructors.py:61, in concat_vec_envs_v1(vec_env, num_vec_envs, num_cpus, base_class)
     59 def concat_vec_envs_v1(vec_env, num_vec_envs, num_cpus=0, base_class="gymnasium"):
     60     num_cpus = min(num_cpus, num_vec_envs)
---> 61     vec_env = MakeCPUAsyncConstructor(num_cpus)(*vec_env_args(vec_env, num_vec_envs))
     63     if base_class == "gymnasium":
     64         return vec_env

File ~\anaconda3\envs\gym\lib\site-packages\supersuit\vector\concat_vec_env.py:22, in ConcatVecEnv.__init__(self, vec_env_fns, obs_space, act_space)
     21 def __init__(self, vec_env_fns, obs_space=None, act_space=None):
---> 22     self.vec_envs = vec_envs = [vec_env_fn() for vec_env_fn in vec_env_fns]
     23     for i in range(len(vec_envs)):
     24         if not hasattr(vec_envs[i], "num_envs"):

File ~\anaconda3\envs\gym\lib\site-packages\supersuit\vector\concat_vec_env.py:22, in <listcomp>(.0)
     21 def __init__(self, vec_env_fns, obs_space=None, act_space=None):
---> 22     self.vec_envs = vec_envs = [vec_env_fn() for vec_env_fn in vec_env_fns]
     23     for i in range(len(vec_envs)):
     24         if not hasattr(vec_envs[i], "num_envs"):

File ~\anaconda3\envs\gym\lib\site-packages\supersuit\vector\vector_constructors.py:11, in vec_env_args.<locals>.env_fn()
     10 def env_fn():
---> 11     env_copy = cloudpickle.loads(cloudpickle.dumps(env))
     12     return env_copy

UnpicklingError: Memo value not found at index 3

I don't have a lot of experience with Pickle, so any advice would be welcome, thanks.

opened by framepixel 0

pytest no longer bundles py

py is kinda deprecated and pytest now bundles only a subset of it. It'd be best to stop depending on it. If it's not possible, dependency on py should be explicitly specified so the original package is installed.

https://github.com/cloudpipe/cloudpickle/blob/f5472e1a2eb4235e61b632b58367dede93dfb30c/tests/cloudpickle_test.py#L1472

opened by frenzymadness 2

Releases(v2.0.0)

v2.0.0(Sep 14, 2021)

Source code(tar.gz)
Source code(zip)
v0.5.3(May 14, 2018)
Installation

pip install cloudpickle

Changes Since v0.5.2

Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types (issue #144).

itertools objects can also pickled (PR #156).

logging.RootLogger can be also pickled (PR #160).

Source code(tar.gz)
Source code(zip)
v0.4.4(May 14, 2018)
Installation

pip install cloudpickle==0.4.4

Changes Since v0.4.3

logging.RootLogger can be also pickled (PR #160).

Source code(tar.gz)
Source code(zip)
v0.4.3(Feb 13, 2018)
Installation

pip install cloudpickle

Changes Since v0.4.2

Fixed a regression: AttributeError when loading pickles that hold a reference to a dynamically defined class from the __main__ module. (issue #131).

Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types. (issue #144)

Source code(tar.gz)
Source code(zip)
v0.4.1(Oct 26, 2017)
Installation

pip install cloudpickle

Changes Since v0.4.0

Fixed a crash when pickling dynamic classes whose __dict__ attribute was defined as a property. Most notably, this affected dynamic namedtuples in Python 2. (https://github.com/cloudpipe/cloudpickle/pull/113)

Cloudpickle now preserves the __module__ attribute of functions (https://github.com/cloudpipe/cloudpickle/pull/118/).

Fixed a crash when pickling modules that don't have a __package__ attribute (https://github.com/cloudpipe/cloudpickle/pull/116).

Source code(tar.gz)
Source code(zip)
v0.4.0(Aug 9, 2017)
Get it while it's briny with

pip install cloudpickle

Ch-ch-ch-changes

Fix functions with empty cells (https://github.com/cloudpipe/cloudpickle/pull/91)

Allow pickling Logger objects (https://github.com/cloudpipe/cloudpickle/pull/96)

Fix crash when pickling dynamic class cycles (https://github.com/cloudpipe/cloudpickle/pull/102)

Support WeakSets and ABCMeta instances (https://github.com/cloudpipe/cloudpickle/pull/104)

Ignore "None" modules added to sys.modules (https://github.com/cloudpipe/cloudpickle/pull/107)

Remove non-standard __transient__ support (https://github.com/cloudpipe/cloudpickle/pull/110)

Catch exception from pickle.whichmodule() (https://github.com/cloudpipe/cloudpickle/pull/112)

Source code(tar.gz)
Source code(zip)
v0.3.1(May 31, 2017)
Get it while it's hot with

pip install cloudpickle

Changes since v0.2.2

Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80)

Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90)

Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88)

Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85)

Source code(tar.gz)
Source code(zip)
v0.3.0(May 30, 2017)
Get it while it's hot with

pip install cloudpickle

Changes

Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80)

Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90)

Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88)

Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85)

Source code(tar.gz)
Source code(zip)
v0.1.1(Sep 5, 2015)
cloudpickle bug fix release v0.1.1

fixed save_classmethod (#41)

now allows users to import cloudpickle to dump and load pickled data (#37)

no more pickling of closed files, was broken on Python 3 (#32)

more tests!

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 16, 2015)
Pickle arbitrary functions, classes

Python 3 support!

Many thanks to the originators of PiCloud and the maintainers of PySpark for their stewardship of cloudpickle over the years!

Note: This was released on an :airplane:, in the literal :cloud:s.
Source code(tar.gz)
Source code(zip)
cloudpickle-0.1.0-py2.py3-none-any.whl(12.09 KB)
cloudpickle-0.1.0.tar.gz(14.32 KB)

Owner

GitHub

Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.

jsonpickle jsonpickle is a library for the two-way conversion of complex Python objects and JSON. jsonpickle builds upon the existing JSON encoders, s

1.1k Jan 2, 2023

MessagePack serializer implementation for Python msgpack.org[Python]

MessagePack for Python What's this MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JS

1.7k Dec 29, 2022

Ultra fast JSON decoder and encoder written in C with Python bindings

UltraJSON UltraJSON is an ultra fast JSON encoder and decoder written in pure C with bindings for Python 3.6+. Install with pip: $ python -m pip insta

3.9k Jan 2, 2023

simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON <http://json.org> encoder and decoder for Python 3.3+ with legacy suppo

1.5k Dec 31, 2022

Generic ASN.1 library for Python

ASN.1 library for Python This is a free and open source implementation of ASN.1 types and codecs as a Python package. It has been first written to sup

223 Dec 11, 2022

serialize all of python

dill serialize all of python About Dill dill extends python's pickle module for serializing and de-serializing python objects to the majority of the b

1.8k Jan 7, 2023

Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

orjson orjson is a fast, correct JSON library for Python. It benchmarks as the fastest Python library for JSON and is more correct than the standard j

4.1k Dec 30, 2022

🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)

srsly: Modern high-performance serialization utilities for Python This package bundles some of the best Python serialization libraries into one standa

329 Dec 28, 2022

Python wrapper around rapidjson

python-rapidjson Python wrapper around RapidJSON Authors: Ken Robbins <[email protected]> Lele Gaifax <[email protected]> License: MIT License Sta

469 Jan 4, 2023

Python bindings for the simdjson project.

pysimdjson Python bindings for the simdjson project, a SIMD-accelerated JSON parser. If SIMD instructions are unavailable a fallback parser is used, m

562 Jan 8, 2023

A Python pickling decompiler and static analyzer

Fickling Fickling is a decompiler, static analyzer, and bytecode rewriter for Python pickle object serializations. Pickled Python objects are in fact

162 Dec 13, 2022

Python Kalman filtering and optimal estimation library. Implements Kalman filter, particle filter, Extended Kalman filter, Unscented Kalman filter, g-h (alpha-beta), least squares, H Infinity, smoothers, and more. Has companion book 'Kalman and Bayesian Filters in Python'.

FilterPy - Kalman filters and other optimal and non-optimal estimation filters in Python. NOTE: Imminent drop of support of Python 2.7, 3.4. See secti

2.5k Dec 30, 2022

Extended pickling support for Python objects

Related tags

Overview

cloudpickle

Installation

Examples

Running the tests

History

Comments

Issue Summary

Tests

Summary:

Disclaimer: a new start

Implementation:

Changes to python

Changes to cloudpickle

How to build this version locally

Bechmarks:

TODO

Details

Tests

Releases(v2.0.0)

v2.0.0(Sep 14, 2021)

v0.5.3(May 14, 2018)

Installation

Changes Since v0.5.2

v0.4.4(May 14, 2018)

Installation

Changes Since v0.4.3

v0.4.3(Feb 13, 2018)

Installation

Changes Since v0.4.2

v0.4.1(Oct 26, 2017)

Installation

Changes Since v0.4.0

v0.4.0(Aug 9, 2017)

Ch-ch-ch-changes

v0.3.1(May 31, 2017)

Changes since v0.2.2

v0.3.0(May 30, 2017)

Changes

v0.1.1(Sep 5, 2015)

0.1.0(Apr 16, 2015)

Owner

Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.

MessagePack serializer implementation for Python msgpack.org[Python]

Ultra fast JSON decoder and encoder written in C with Python bindings

simplejson is a simple, fast, extensible JSON encoder/decoder for Python

Generic ASN.1 library for Python

serialize all of python

Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)

Python wrapper around rapidjson

Python bindings for the simdjson project.

A Python pickling decompiler and static analyzer

Python Kalman filtering and optimal estimation library. Implements Kalman filter, particle filter, Extended Kalman filter, Unscented Kalman filter, g-h (alpha-beta), least squares, H Infinity, smoothers, and more. Has companion book 'Kalman and Bayesian Filters in Python'.

Buildout is a deployment automation tool written in and extended with Python

Extended refactoring capabilities for Python LSP Server using Rope.

Python AVL Protocols Server for Codec 8 and Codec 8 Extended Protocols

On Generating Extended Summaries of Long Documents

SSRF search vulnerabilities exploitation extended.

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

An extended version of the hotkeys demo code using action classes

Generalized hybrid model for mode-locked laser diodes with an extended passive cavity