Chest
A dictionary that spills to disk.
Chest acts likes a dictionary but it can write its contents to disk. This is useful in the following two occasions:
- Chest can hold datasets that are larger than memory
- Chest persists and so can be saved and loaded for later use
Related Projects
The standard library shelve
is an alternative out-of-core dictionary. Chest
offers the following benefits over shelve:
- Chest supports any hashable key (not just strings)
- Chest supports pluggable serialization and file saving schemes
Alternatively one might consider a traditional key-value store database like Redis.
Shove is another excellent alternative with support for a variety of stores.
How it works
Chest stores data in two locations
- An in-memory dictionary
- On the filesystem in a directory owned by the chest
As a user adds contents to the chest the in-memory dictionary fills up. When a chest stores more data in memory than desired (see available_memory=
keyword argument) it writes the larger contents of the chest to disk as pickle files (the choice of pickle
is configurable). When a user asks for a value chest checks the in-memory store, then checks on-disk and loads the value into memory if necessary, pushing other values to disk.
Chest is a simple project. It was intended to provide a simple interface to assist in the storage and retrieval of numpy arrays. However it's design and implementation are agnostic to this case and so could be used in a variety of other situations.
With minimal work chest could be extended to serve as a communication point between multiple processes.
Known Failings
Chest was designed to hold a moderate amount of largish numpy arrays. It doesn't handle the very many small key-value pairs usecase (though could with small effort). In particular chest has the following deficiencies
- Chest is not multi-process safe. We should institute a file lock at least around the
.keys
file. - Chest does not support mutation of variables on disk.
LICENSE
New BSD. See License
Install
chest
is available through conda
:
conda install chest
chest
is on the Python Package Index (PyPI):
pip install chest
Example
>>> from chest import Chest
>>> c = Chest()
>>> # Acts like a normal dictionary
>>> c['x'] = [1, 2, 3]
>>> c['x']
[1, 2, 3]
>>> # Data persists to local files
>>> c.flush()
>>> import os
>>> os.listdir(c.path)
['.keys', 'x']
>>> # These files hold pickled results
>>> import pickle
>>> pickle.load(open(c.key_to_filename('x')))
[1, 2, 3]
>>> # Though one normally accesses these files with chest itself
>>> c2 = Chest(path=c.path)
>>> c2.keys()
['x']
>>> c2['x']
[1, 2, 3]
>>> # Chest is configurable, so one can use json, etc. instead of pickle
>>> import json
>>> c = Chest(path='my-chest', dump=json.dump, load=json.load)
>>> c['x'] = [1, 2, 3]
>>> c.flush()
>>> json.load(open(c.key_to_filename('x')))
[1, 2, 3]
Dependencies
Chest
supports Python 2.6+ and Python 3.2+ with a common codebase.
It currently depends on the heapdict
library.
It is a light weight dependency.