Support customizing how built-in types are pickled#563
Open
AdrS wants to merge 2 commits intocloudpipe:masterfrom
Open
Support customizing how built-in types are pickled#563AdrS wants to merge 2 commits intocloudpipe:masterfrom
AdrS wants to merge 2 commits intocloudpipe:masterfrom
Conversation
Cloudpickle's Pickler class either inherits from pickle.Pickler. pickle.Pickler is either the C implementation of the CPython pickler or a pure-Python pickler. Only the pure-Python pickler supports customizing how built-in types are pickled. This change introduces a PurePythonPickler class which inherits from pickle._Pickler and supports customizing how built-in types are pickled. The Pickler class continues to inherit from the faster C implementation when it is available. Providing a means of customizing how built-in types are pickled enables users to implement deterministic pickling for set and frozenset. See: cloudpipe#453
Author
|
@ogrisel is this a reasonable change? |
tvalentyn
approved these changes
Mar 31, 2025
Contributor
|
Thanks @AdrS , the changes look reasonable to me and will make it easier to use cloudpickle in Apache Beam. Hi @ogrisel ! would you be able to help us find a reviewer for this change or help take a look at this contribution? Thank you so much! Please let us know if you have any questions or concerns. |
Contributor
|
@ogrisel just a friendly reminder that we are waiting for your feedback on the course of action here. Thanks! |
AdrS
added a commit
to AdrS/beam
that referenced
this pull request
Apr 21, 2025
This is to enable customizing how sets are serialized to increase the pickling determinism. I'm modifying the vendored cloudpickle as a stop-gap measure until the cloudpickle maintainers review cloudpipe/cloudpickle#563. Issue: apache#34410
Contributor
|
It looks like @ogrisel might not be available for review right now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cloudpickle's Pickler class inherits from pickle.Pickler. pickle.Pickler is either the C implementation of the CPython pickler or a pure-Python pickler. Only the pure-Python pickler supports customizing how built-in types are pickled. This change introduces a PurePythonPickler class which inherits from pickle._Pickler and supports customizing how built-in types are pickled. The Pickler class continues to inherit from the faster C implementation when it is available. The implementation uses multiple inheritance and delegates calls to the proxy object of the second-in-MRO order superclass. The reason is to preserve most of the behavior of the stock pickler while minimizing changes to the cloudpickle.
Providing a means of customizing how built-in types are pickled will enable Apache Beam to implement (mostly) deterministic pickling for set and frozenset and increase the cache hit rate for workflows. See: #453