Flexible HDF5 saving/loading and other data science tools from the University of Chicago

Overview
Documentation Status https://travis-ci.org/uchicago-cs/deepdish.svg?branch=master https://coveralls.io/repos/uchicago-cs/deepdish/badge.svg?branch=master&service=github https://img.shields.io/badge/license-BSD%203--Clause-blue.svg?style=flat

deepdish

Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog:

Installation

pip install deepdish

Alternatively (if you have conda with the conda-forge channel):

conda install -c conda-forge deepdish

Main feature

The primary feature of deepdish is its ability to save and load all kinds of data as HDF5. It can save any Python data structure, offering the same ease of use as pickling or numpy.save. However, it improves by also offering:

  • Interoperability between languages (HDF5 is a popular standard)
  • Easy to inspect the content from the command line (using h5ls or our specialized tool ddls)
  • Highly compressed storage (thanks to a PyTables backend)
  • Native support for scipy sparse matrices and pandas DataFrame, Series and Panel
  • Ability to partially read files, even slices of arrays

An example:

import deepdish as dd

d = {
    'foo': np.ones((10, 20)),
    'sub': {
        'bar': 'a string',
        'baz': 1.23,
    },
}
dd.io.save('test.h5', d)

This can be reconstructed using dd.io.load('test.h5'), or inspected through the command line using either a standard tool:

$ h5ls test.h5
foo                      Dataset {10, 20}
sub                      Group

Or, better yet, our custom tool ddls (or python -m deepdish.io.ls):

$ ddls test.h5
/foo                       array (10, 20) [float64]
/sub                       dict
/sub/bar                   'a string' (8) [unicode]
/sub/baz                   1.23 [float64]

Read more at Saving and loading data.

Documentation

Comments
  • Add simplenamespace io

    Add simplenamespace io

    This PR adds native save/load of Python SimpleNamespace objects to deepdish. The implementation is such that older versions of Python (that do not support SimpleNamespace types) will seamlessly load them as dictionaries. The current version of deepdish supports the SimpleNamespaces via pickling, which has obvious downsides.

    Our group uses dictionaries and SimpleNamespaces very often in our work: dictionaries when the keys are variables, SimpleNamespaces when the keys are fixed.

    From "The Zen of Python": Namespaces are one honking great idea -- let's do more of those!

    opened by twmacro 15
  • Remove file handle support and type check in save and load

    Remove file handle support and type check in save and load

    Currently, if a unicode string was passed to deepdish.io.load or save, it would fail because isinstance(u'file.h5', str) = False. This PR allows for all six.string_types to be passed.

    More generally, I'd suggest something like this, assuming it covers your intended use-cases:

    if isinstance(path, file):
        path = path.name
    elif not isinstance(path, six.string_types):
        raise ValueError('path type {} not supported.'.format(type(path))
    
    opened by craffel 11
  • A Mistake?

    A Mistake?

    http://deepdish.io/2014/10/28/hintons-dark-knowledge/

    I was wondering if this was a mistake Hinton in his lecture slides says that raising the temperature was

    yk/T

    where as you have it as

    yk^1/T

    which is a squareroot ? if I use T ?

    Can you explain ? this occurs for the denominator as well.

    blog 
    opened by ArEnSc 7
  • add soft links to save shared objects just once and support recursion

    add soft links to save shared objects just once and support recursion

    This PR adds soft links to deepdish. Primary, this means that a shared object will be written to the disk just once no matter how many names or other objects refer to it. These relationships are maintained upon load. Also allows for recursion.

    A common use case for us is the saving of time vectors for accelerometer data or analysis data. Often, the time vectors are all the same, but sometimes they are all different. Similarly for frequency vectors in frequency response analysis. Also, sometimes it's handy to have a shortcut link (not unlike a file-system link) from within one dictionary to another "common data" dictionary in a large data structure. The Softlinks capability in HDF5 make adding this feature to deepdish almost trivial. :smile:

    For example:

    import numpy as np
    import deepdish as dd
    A = np.random.randn(3, 3)
    d = dict(A=A, B=A)   # two objects point to same matrix
    d['C'] = d           # add a recursive member
    dd.io.save('test.h5', d)
    d2 = dd.io.load('test.h5')
    

    From within ipython:

    In [2]: d2['B'] is d2['A']
    Out[2]: True
    
    In [3]: d2['C'] is d2
    Out[3]: True
    

    Here is a ddls view of the file:

    $ ddls test.h5
    /A                         array (3, 3) [float64]
    /B                         link -> /A [SoftLink]
    /C                         link -> / [SoftLink]
    
    opened by twmacro 6
  • Incompatibility with pandas v1.2.0

    Incompatibility with pandas v1.2.0

    Hi guys,

    Thanks for the incredibly useful piece of software!

    The recent release of v1.2.0 of pandas seems to have broken compatibility with deepdish. Things work perfectly with pandas v1.1.5 and deepdish v0.3.6, however upgrading to pandas v1.2.0 produces the following error:

    File "/Users/adam/anaconda3/lib/python3.8/site-packages/deepdish/io/hdf5io.py", line 583, in save _save_level(h5file, group, value, name=key, File "/Users/adam/anaconda3/lib/python3.8/site-packages/deepdish/io/hdf5io.py", line 211, in _save_level _save_level(handler, new_group, v, name=k, filters=filters, File "/Users/adam/anaconda3/lib/python3.8/site-packages/deepdish/io/hdf5io.py", line 211, in _save_level _save_level(handler, new_group, v, name=k, filters=filters, File "/Users/adam/anaconda3/lib/python3.8/site-packages/deepdish/io/hdf5io.py", line 251, in _save_level elif _pandas and isinstance(level, (pd.DataFrame, pd.Series, pd.Panel)): File "/Users/adam/anaconda3/lib/python3.8/site-packages/pandas/init.py", line 244, in getattr raise AttributeError(f"module 'pandas' has no attribute '{name}'") AttributeError: module 'pandas' has no attribute 'Panel'

    It looks like pandas.Panel has been removed from the latest release. Presumably all that needs to be done is removing references to pd.Panel from hdf5io.py? I'd be happy to submit this as a pull request if you agree this is the right course of action.

    Cheers, Adam

    opened by ACCarnall 5
  • AttributeError when saving a Pandas object with Pandas 0.24.0 or greater

    AttributeError when saving a Pandas object with Pandas 0.24.0 or greater

    When saving a pd.Series, pd.DataFrame, or pd.Panel to HDF5 using deepdish, an AttributeError is raised, and I cannot save the file. I've tracked down the issue, and it's due to a change in Pandas version 0.24.0.

    Here is how I've been able to reproduce the error, where I have installed Pandas 0.24.2, Numpy 0.15.4, deepdish 0.3.6, and PyTables 3.5.1.

    import pandas as pd
    import numpy as np
    import deepdish as dd
    
    dd.io.save("test.h5", {"test" : pd.Series(data=np.random.rand(1))}, )
    

    The error returned is:

    ---------------------------------------------------------------------------
    NoSuchNodeError                           Traceback (most recent call last)
    /galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in get_node(self, key)
       1159                 key = '/' + key
    -> 1160             return self._handle.get_node(self.root, key)
       1161         except _table_mod.exceptions.NoSuchNodeError:
    
    /galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in get_node(self, where, name, classname)
       1643             nodepath = join_path(basepath, name or '') or '/'
    -> 1644             node = where._v_file._get_node(nodepath)
       1645         elif isinstance(where, (six.string_types, numpy.str_)):
    
    /galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in _get_node(self, nodepath)
       1598 
    -> 1599         node = self._node_manager.get_node(nodepath)
       1600         assert node is not None, "unable to instantiate node ``%s``" % nodepath
    
    /galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in get_node(self, key)
        436         if self.node_factory:
    --> 437             node = self.node_factory(key)
        438             self.cache_node(node, key)
    
    /galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/group.py in _g_load_child(self, childname)
       1180         # Is the node a group or a leaf?
    -> 1181         node_type = self._g_check_has_child(childname)
       1182 
    
    /galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/group.py in _g_check_has_child(self, name)
        397                 "group ``%s`` does not have a child named ``%s``"
    --> 398                 % (self._v_pathname, name))
        399         return node_type
    
    NoSuchNodeError: group ``/`` does not have a child named ``//test``
    
    During handling of the above exception, another exception occurred:
    
    AttributeError                            Traceback (most recent call last)
    <ipython-input-2-60c97adec230> in <module>
    ----> 1 dd.io.save("test4.h5", {"test" : pd.Series(data=np.random.rand(1))}, )
    
    ~/.local/lib/python3.7/site-packages/deepdish-0.3.4-py3.7.egg/deepdish/io/hdf5io.py in save(path, data, compression)
        587             for key, value in data.items():
        588                 _save_level(h5file, group, value, name=key,
    --> 589                             filters=filters, idtable=idtable)
        590 
        591         elif (_sns and isinstance(data, SimpleNamespace) and
    
    ~/.local/lib/python3.7/site-packages/deepdish-0.3.4-py3.7.egg/deepdish/io/hdf5io.py in _save_level(handler, group, level, name, filters, idtable)
        256         store = _HDFStoreWithHandle(handler)
        257 #         print(store.get_node(group._v_pathname))
    --> 258         store.append(group._v_pathname + '/' + name, level)
        259 
        260     elif isinstance(level, (sparse.dok_matrix,
    
    /galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
        984         kwargs = self._validate_format(format, kwargs)
        985         self._write_to_group(key, value, append=append, dropna=dropna,
    --> 986                              **kwargs)
        987 
        988     def append_to_multiple(self, d, value, selector, data_columns=None,
    
    /galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
       1365     def _write_to_group(self, key, value, format, index=True, append=False,
       1366                         complib=None, encoding=None, **kwargs):
    -> 1367         group = self.get_node(key)
       1368 
       1369         # remove the node if we are not appending
    
    /galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in get_node(self, key)
       1159                 key = '/' + key
       1160             return self._handle.get_node(self.root, key)
    -> 1161         except _table_mod.exceptions.NoSuchNodeError:
       1162             return None
       1163 
    
    AttributeError: 'NoneType' object has no attribute 'exceptions'
    

    From the above, we see that the _table_mod variable is None, which is throwing the error. The reason that this is now an error is related to https://github.com/pandas-dev/pandas/pull/22919, where the exception in HDFStore.get_node was changed from a bare exception to a specific exception.

    Before: https://github.com/pandas-dev/pandas/blob/2d0c96119391c85bd4f7ffbb847759ee3777162a/pandas/io/pytables.py#L1157-L1165

    After: https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L1141-L1149

    So, now the _table_mod variable is used to only return None in the case that the exception is a NoSuchNodeError, rather than any error. However, _table_mod should be set by running of the function pandas.io.pytables._tables, which imports PyTables into the namespace as _table_mod. If this function is not run, then _table_mod is left as None, and the above AttributeError occurs.

    The problem is that in deepdish's use of pandas.io.pytables.HDFStore, where there's a wrapper of the function called _HDFStoreWithHandle, none of the methods that call the _tables function are called, and _table_mod is left as None, which gives us the AttributeError.

    My proposed solution is to add one line to the beginning hdf5io.py file in deepdish, where we call the pandas.io.pytables._tables .

    Before:

    https://github.com/uchicago-cs/deepdish/blob/01af93621fe082a3972fe53ba7375388c02b0085/deepdish/io/hdf5io.py#L1-L12

    After:

    from __future__ import division, print_function, absolute_import
    
    import numpy as np
    import tables
    import warnings
    from scipy import sparse
    from deepdish import conf
    try:
        import pandas as pd
        pd.io.pytables._tables()
        _pandas = True
    except ImportError:
        _pandas = False
    

    After making this change, I no longer get the AttributeError and the saving of Pandas data types works seamlessly.

    opened by slwatkins 4
  • ValueError when trying to save numpy scalar arrays (ndim = 0)

    ValueError when trying to save numpy scalar arrays (ndim = 0)

    import numpy as np
    import deepdish
    deepdish.io.save('test.h5', np.array(0.))
    

    results in

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    
    deepdish/io/hdf5io.pyc in save(path, data, compression)
        458
        459         else:
    --> 460             _save_level(h5file, group, data, name='data', filters=filters)
        461             # Mark this to automatically unpack when loaded
        462             group._v_attrs[DEEPDISH_IO_UNPACK] = True
    
    deepdish/io/hdf5io.pyc in _save_level(handler, group, level, name, filters)
        184
        185     elif isinstance(level, np.ndarray):
    --> 186         _save_ndarray(handler, group, name, level, filters=filters)
        187
        188     elif _pandas and isinstance(level, (pd.DataFrame, pd.Series, pd.Panel)):
    
    deepdish/io/hdf5io.pyc in _save_ndarray(handler, group, name, x, filters)
        112         strtype = None
        113         itemsize = None
    --> 114     assert np.min(x.shape) > 0, ("deepdish.io.save does not support saving "
        115                                  "numpy arrays with a zero-length axis")
        116     # For small arrays, compression actually leads to larger files, so we are
    
    numpy/core/fromnumeric.pyc in amin(a, axis, out, keepdims)
       2217         except AttributeError:
       2218             return _methods._amin(a, axis=axis,
    -> 2219                                 out=out, keepdims=keepdims)
       2220         # NOTE: Dropping the keepdims parameter
       2221         return amin(axis=axis, out=out)
    
    numpy/core/_methods.pyc in _amin(a, axis, out, keepdims)
         27
         28 def _amin(a, axis=None, out=None, keepdims=False):
    ---> 29     return umr_minimum(a, axis, None, out, keepdims)
         30
         31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
    
    ValueError: zero-size array to reduction operation minimum which has no identity
    

    Is saving numpy scalar arrays (ndim = 0) supported? If not, I think you could change the test to assert x.ndim > 0 and np.min(x.shape) > 0. The first condition would fail for numpy scalar arrays, so the second wouldn't be evaluated. If numpy scalar arrays aren't supported, I'm curious why and if that's functionality that could be added. Finally, separately, you also should arguably be doing

    if np.min(x.shape) > 0:
        raise ValueError(...)
    

    instead of an assert, see e.g. here, but that's a separate discussion! Thank you again for the excellent library!

    opened by craffel 4
  • dd.io.save crashes while saving np.array of objects

    dd.io.save crashes while saving np.array of objects

    dd.io.save crashes when you try to save np.array with dtype=object

    Ubuntu 14.04 x64 Python 2.7.11 (default, Dec 15 2015, 16:46:19) [GCC 4.8.4]

    In [1]: np.__version__
    Out[2]: '1.11.2'
    In [2]: dd.__version__
    Out[2]: '0.3.4'
    In [3]: tables.__version__
    Out[3]: '3.2.2'
    
    
    In [10]: x = np.array(['123', 567, 'hjjjk'], dtype='O')
    
    In [11]: x
    Out[11]: array(['123', 567, 'hjjjk'], dtype=object)
    
    In [12]: dd.io.save('t.h5', x)
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-12-dc094ba588e4> in <module>()
    ----> 1 dd.io.save('t.h5', x)
    
    /export/home/asanakoy/.local/lib/python2.7/site-packages/deepdish/io/hdf5io.pyc in save(path, data, compression)
        579         else:
        580             _save_level(h5file, group, data, name='data',
    --> 581                         filters=filters, idtable=idtable)
        582             # Mark this to automatically unpack when loaded
        583             group._v_attrs[DEEPDISH_IO_UNPACK] = True
    
    /export/home/asanakoy/.local/lib/python2.7/site-packages/deepdish/io/hdf5io.pyc in _save_level(handler, group, level, name, filters, idtable)
        242 
        243     elif isinstance(level, np.ndarray):
    --> 244         _save_ndarray(handler, group, name, level, filters=filters)
        245 
        246     elif _pandas and isinstance(level, (pd.DataFrame, pd.Series, pd.Panel)):
    
    /export/home/asanakoy/.local/lib/python2.7/site-packages/deepdish/io/hdf5io.pyc in _save_ndarray(handler, group, name, x, filters)
        123         atom = tables.StringAtom(itemsize)
        124     else:
    --> 125         atom = tables.Atom.from_dtype(x.dtype)
        126         strtype = None
        127         itemsize = None
    
    /export/home/asanakoy/.local/lib/python2.7/site-packages/tables/atom.pyc in from_dtype(class_, dtype, dflt)
        377             return class_.from_kind('string', itemsize, dtype.shape, dflt)
        378         # Most NumPy types have direct correspondence with PyTables types.
    --> 379         return class_.from_type(basedtype.name, dtype.shape, dflt)
        380 
        381     @classmethod
    
    /export/home/asanakoy/.local/lib/python2.7/site-packages/tables/atom.pyc in from_type(class_, type, shape, dflt)
        402 
        403         if type not in all_types:
    --> 404             raise ValueError("unknown type: %r" % (type,))
        405         kind, itemsize = split_type(type)
        406         return class_.from_kind(kind, itemsize, shape, dflt)
    
    ValueError: unknown type: 'object'
    
    bug 
    opened by asanakoy 3
  • `save` function with mode 'a'

    `save` function with mode 'a'

    Currently dd.io.save overwrites target file if exists. Would it be a good idea to add a mode argument so that the target file can be constantly updated as new data arrives?

    opened by gaow 3
  • I followed example in http://deepdish.io/ but 'numpy.ndarray' object has no attribute 'tobytes'

    I followed example in http://deepdish.io/ but 'numpy.ndarray' object has no attribute 'tobytes'

    I run the example code in http://deepdish.io/ However, at datum.data = X[i].tobytes() error came. I know that error is come from deepdish but I don't know the reason and I want to run example code Do you know the reason the problem ??

    My numpy version is 1.8.2

    blog 
    opened by stray-leone 3
  • Error while installing deepdish

    Error while installing deepdish

    I'm trying to create an LMDB database file to be used with Caffe according to this tutorial on an Ubuntu 14.04 machine using Anaconda Python 2.7.9. However, when I do pip install deepdish, I'm getting the following error:

    Collecting deepdish
      Using cached deepdish-0.1.4.tar.gz
        Complete output from command python setup.py egg_info:
        Traceback (most recent call last):
          File "<string>", line 20, in <module>
          File "/tmp/pip-build-qKwOBx/deepdish/setup.py", line 12, in <module>
            with open('requirements.txt') as f:
        IOError: [Errno 2] No such file or directory: 'requirements.txt'
    
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-qKwOBx/deepdish
    

    Any ideas why this error might be occurring and how to go about correcting it? Any help is much appreciated. Thank you.

    bug 
    opened by prasannadate 3
  • numpy >= 1.24 enforces deprecation of `np.object`

    numpy >= 1.24 enforces deprecation of `np.object`

    np.object is used here: https://github.com/uchicago-cs/deepdish/blob/master/deepdish/io/hdf5io.py#L125 Thus the code crashes when it goes through this line when numpy >= 1.24.

    opened by fgoudreault 0
  • Fixing deprecated np.object for numpy >= 1.24

    Fixing deprecated np.object for numpy >= 1.24

    np.object is deprecated since numpy 1.20 but since 1.24, the deprecation is enforced (see numpy release notes). More details: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations Fixes #50

    opened by fgoudreault 0
  • Saving Class instances not working when registry is not called

    Saving Class instances not working when registry is not called

    Hi, thanks for this cool library. The following example I took from your readthedocs (https://deepdish.readthedocs.io/en/latest/io.html#class-instances):

    import deepdish as dd
    
    class Foo(dd.util.SaveableRegistry):
        def __init__(self, x):
            self.x = x
    
        @classmethod
        def load_from_dict(self, d):
            obj = Foo(d['x'])
            return obj
    
        def save_to_dict(self):
            return {'x': self.x}
    
    
    if __name__ == '__main__':
        f = Foo(10)
        f.save('foo.h5')
        f = Foo.load('foo.h5')
    

    This is a more minimal example because there is no class 'Bar' that inherits from Foo and therefore the @Foo.register('bar') decorator) is never called. When doing this. This leads to the following traceback:

    /Users/jorenretel/bin/miniconda3/envs/abstract_classifier/bin/python /Users/jorenretel/Library/Preferences/PyCharm2019.3/scratches/scratch_3.py
    /Users/jorenretel/bin/miniconda3/envs/abstract_classifier/lib/python3.8/site-packages/deepdish/io/hdf5io.py:246: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
      elif _pandas and isinstance(level, (pd.DataFrame, pd.Series, pd.Panel)):
    Traceback (most recent call last):
      File "/Users/jorenretel/Library/Preferences/PyCharm2019.3/scratches/scratch_3.py", line 20, in <module>
        f = Foo.load('foo.h5')
      File "/Users/jorenretel/bin/miniconda3/envs/abstract_classifier/lib/python3.8/site-packages/deepdish/util/saveable.py", line 162, in load
        return cls.getclass(class_name).load_from_dict(d)
      File "/Users/jorenretel/bin/miniconda3/envs/abstract_classifier/lib/python3.8/site-packages/deepdish/util/saveable.py", line 121, in getclass
        return cls.REGISTRY[name]
    KeyError: 'noname'
    
    Process finished with exit code 1
    

    The problem is that this function in deepdish/util/saveable.py never gets overloaded when registry is not called:

        @property
        def name(self):
            """Returns the name of the registry entry."""
            # Automatically overloaded by 'register'
            return "noname"
    

    Possible solution: return None, instead of 'noname'? I am not sure whether this does not have some side effect that I am not aware of.

    opened by jorenretel 0
  • Update conda-forge version to match pypi

    Update conda-forge version to match pypi

    Thanks for the great package! Not sure if this is the best place to make this request, but is there any chance you can update the version of deepdish on conda-forge to match the latest version in pypi? It looks like latest version is 0.3.4 which breaks for saving sometimes of objects, but this is resolved in 0.3.6.

    There's a user submitted one on Anaconda cloud in case that's helpful for the update: https://anaconda.org/turbach/deepdish, but it would be nice for it to be updated on conda-forge for ease of installation of packages that depend on deepdish

    opened by ejolly 0
  • Overflow Error When Attempting to Save Large Amounts of Data

    Overflow Error When Attempting to Save Large Amounts of Data

    I have been using deepdish to save dictionaries with large amounts of data. I ran into the following issue when attempting to save a particularly large file. I have tried saving the data with and without compression, if that helps. Can you help me out with it please?

    File "C:/Users/xxxxxxxx/Documents/Python_Scripts/Data_Scripts/Finalized_Data_Review_Presentations/data_save_cc_test.py", line 513, in dd.io.save('%s/Data/%s_%s_cc_data.h5'%(directory,m_list[m],list_type),cc_data,('blosc', 9))

    File "C:\Users\xxxxxxxx\AppData\Local\Continuum\anaconda2\lib\site-packages\deepdish\io\hdf5io.py", line 596, in save filters=filters, idtable=idtable)

    File "C:\Users\xxxxxxxx\AppData\Local\Continuum\anaconda2\lib\site-packages\deepdish\io\hdf5io.py", line 304, in _save_level _save_pickled(handler, group, level, name=name)

    File "C:\Users\xxxxxxxx\AppData\Local\Continuum\anaconda2\lib\site-packages\deepdish\io\hdf5io.py", line 172, in _save_pickled node.append(level)

    File "C:\Users\xxxxxxxx\AppData\Local\Continuum\anaconda2\lib\site-packages\tables\vlarray.py", line 547, in append self._append(nparr, nobjects)

    File "tables/hdf5extension.pyx", line 2032, in tables.hdf5extension.VLArray._append

    OverflowError: Python int too large to convert to C long

    opened by 0Maximus0 0
Owner
UChicago - Department of Computer Science
UChicago - Department of Computer Science
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 1, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Tuplex 791 Jan 4, 2023
A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

A lightweight, hub-and-spoke dashboard for cross-account Data Science Projects Introduction Modern Data Science environments often involve many indepe

AWS Samples 3 Oct 30, 2021
Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python ??

Thomas 2 May 26, 2022
Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

null 1 Feb 11, 2022
Orchest is a browser based IDE for Data Science.

Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you don’t have to. The application is easy to use and can run on your laptop as well as on a large scale cloud cluster.

Orchest 3.6k Jan 9, 2023
Lale is a Python library for semi-automated data science.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion.

International Business Machines 293 Dec 29, 2022
Data Science Environment Setup in single line

datascienv is package that helps your to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

Ashish Patel 55 Dec 16, 2022
Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git repo for Makefiles: One great trick for making your conda environments mo

Kjell Wooding 18 Dec 23, 2022
Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

hydrosphere.io 6 Aug 10, 2021
MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

Isabela Caetano 1 Dec 9, 2021
A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

How useful is the aswer? A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful. If you want to l

null 1 Dec 17, 2021
2019 Data Science Bowl

Kaggle-2019-Data-Science-Bowl-Solution - Here i present my solution to kaggle 2019 data science bowl and how i improved it to win a silver medal in that competition.

Deepak Nandwani 1 Jan 1, 2022
Kennedy Institute of Rheumatology University of Oxford Project November 2019

TradingBot6M Kennedy Institute of Rheumatology University of Oxford Project November 2019 Run Change api.txt to binance api key: https://www.binance.c

Kannan SAR 2 Nov 16, 2021
University Challenge 2021 With Python

University Challenge 2021 This repository contains: The TeX file of the technical write-up describing the University / HYPER Challenge 2021 under late

null 2 Nov 27, 2021
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

Marc Skov Madsen 97 Dec 8, 2022
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

Jacob Schreiber 3k Jan 2, 2023
A collection of robust and fast processing tools for parsing and analyzing web archive data.

ChatNoir Resiliparse A collection of robust and fast processing tools for parsing and analyzing web archive data. Resiliparse is part of the ChatNoir

ChatNoir 24 Nov 29, 2022
Additional tools for particle accelerator data analysis and machine information

PyLHC Tools This package is a collection of useful scripts and tools for the Optics Measurements and Corrections group (OMC) at CERN. Documentation Au

PyLHC 3 Apr 13, 2022