Rehash: Resumable Hashlib¶
Rehash is a resumable interface to the OpenSSL-based hashers (message digest objects) in the
hashlib standard library. Rehash provides hashers that
can be pickled, persisted and reconstituted from their
and otherwise serialized. The rest of the Rehash API is identical to
Rehash hashers can be used to checkpoint and restore progress when hashing large byte streams:
import pickle, rehash hasher = rehash.sha256(b"foo") state = pickle.dumps(hasher) hasher2 = pickle.loads(state) hasher2.update(b"bar") assert hasher2.hexdigest() == rehash.sha256(b"foobar").hexdigest()
pip install rehash
Rehash is useful in any situation when your VM is short-lived or preemptible, and the object you’re hashing is huge. For example, Rehash can be used to hand off the hashing state of large objects between AWS Lambda functions or Google Cloud Functions, which have runtime limits of 15 and 9 minutes, respectively.
blake2 hash algorithms in Python 3.6 are not OpenSSL-based and not supported by rehash.
PyPy uses its own hasher implementations. Those are not serializable using rehash.
By default, rehash objects present themselves with a
repr() that exposes their internal state. This allows one to
resume the hashing from the point where it stopped. If exposed through an untrusted channel under specific conditions,
this could potentially allow an attacker to use an extension attack. If you are unsure about the implications of this,
rehash.opaque_repr = True after importing rehash.