When saving a pd.Series
, pd.DataFrame
, or pd.Panel
to HDF5 using deepdish, an AttributeError
is raised, and I cannot save the file. I've tracked down the issue, and it's due to a change in Pandas version 0.24.0.
Here is how I've been able to reproduce the error, where I have installed Pandas 0.24.2, Numpy 0.15.4, deepdish 0.3.6, and PyTables 3.5.1.
import pandas as pd
import numpy as np
import deepdish as dd
dd.io.save("test.h5", {"test" : pd.Series(data=np.random.rand(1))}, )
The error returned is:
---------------------------------------------------------------------------
NoSuchNodeError Traceback (most recent call last)
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in get_node(self, key)
1159 key = '/' + key
-> 1160 return self._handle.get_node(self.root, key)
1161 except _table_mod.exceptions.NoSuchNodeError:
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in get_node(self, where, name, classname)
1643 nodepath = join_path(basepath, name or '') or '/'
-> 1644 node = where._v_file._get_node(nodepath)
1645 elif isinstance(where, (six.string_types, numpy.str_)):
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in _get_node(self, nodepath)
1598
-> 1599 node = self._node_manager.get_node(nodepath)
1600 assert node is not None, "unable to instantiate node ``%s``" % nodepath
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in get_node(self, key)
436 if self.node_factory:
--> 437 node = self.node_factory(key)
438 self.cache_node(node, key)
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/group.py in _g_load_child(self, childname)
1180 # Is the node a group or a leaf?
-> 1181 node_type = self._g_check_has_child(childname)
1182
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/group.py in _g_check_has_child(self, name)
397 "group ``%s`` does not have a child named ``%s``"
--> 398 % (self._v_pathname, name))
399 return node_type
NoSuchNodeError: group ``/`` does not have a child named ``//test``
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
<ipython-input-2-60c97adec230> in <module>
----> 1 dd.io.save("test4.h5", {"test" : pd.Series(data=np.random.rand(1))}, )
~/.local/lib/python3.7/site-packages/deepdish-0.3.4-py3.7.egg/deepdish/io/hdf5io.py in save(path, data, compression)
587 for key, value in data.items():
588 _save_level(h5file, group, value, name=key,
--> 589 filters=filters, idtable=idtable)
590
591 elif (_sns and isinstance(data, SimpleNamespace) and
~/.local/lib/python3.7/site-packages/deepdish-0.3.4-py3.7.egg/deepdish/io/hdf5io.py in _save_level(handler, group, level, name, filters, idtable)
256 store = _HDFStoreWithHandle(handler)
257 # print(store.get_node(group._v_pathname))
--> 258 store.append(group._v_pathname + '/' + name, level)
259
260 elif isinstance(level, (sparse.dok_matrix,
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
984 kwargs = self._validate_format(format, kwargs)
985 self._write_to_group(key, value, append=append, dropna=dropna,
--> 986 **kwargs)
987
988 def append_to_multiple(self, d, value, selector, data_columns=None,
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
1365 def _write_to_group(self, key, value, format, index=True, append=False,
1366 complib=None, encoding=None, **kwargs):
-> 1367 group = self.get_node(key)
1368
1369 # remove the node if we are not appending
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in get_node(self, key)
1159 key = '/' + key
1160 return self._handle.get_node(self.root, key)
-> 1161 except _table_mod.exceptions.NoSuchNodeError:
1162 return None
1163
AttributeError: 'NoneType' object has no attribute 'exceptions'
From the above, we see that the _table_mod
variable is None, which is throwing the error. The reason that this is now an error is related to https://github.com/pandas-dev/pandas/pull/22919, where the exception in HDFStore.get_node
was changed from a bare exception to a specific exception.
Before: https://github.com/pandas-dev/pandas/blob/2d0c96119391c85bd4f7ffbb847759ee3777162a/pandas/io/pytables.py#L1157-L1165
After: https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L1141-L1149
So, now the _table_mod
variable is used to only return None in the case that the exception is a NoSuchNodeError
, rather than any error. However, _table_mod
should be set by running of the function pandas.io.pytables._tables
, which imports PyTables into the namespace as _table_mod
. If this function is not run, then _table_mod
is left as None, and the above AttributeError
occurs.
The problem is that in deepdish's use of pandas.io.pytables.HDFStore
, where there's a wrapper of the function called _HDFStoreWithHandle
, none of the methods that call the _tables
function are called, and _table_mod
is left as None, which gives us the AttributeError
.
My proposed solution is to add one line to the beginning hdf5io.py
file in deepdish, where we call the pandas.io.pytables._tables
.
Before:
https://github.com/uchicago-cs/deepdish/blob/01af93621fe082a3972fe53ba7375388c02b0085/deepdish/io/hdf5io.py#L1-L12
After:
from __future__ import division, print_function, absolute_import
import numpy as np
import tables
import warnings
from scipy import sparse
from deepdish import conf
try:
import pandas as pd
pd.io.pytables._tables()
_pandas = True
except ImportError:
_pandas = False
After making this change, I no longer get the AttributeError
and the saving of Pandas data types works seamlessly.