Flexible HDF5 saving/loading and other data science tools from the University of Chicago

UChicago - Department of Computer Science

Last update: Dec 10, 2022

Related tags

Overview

https://travis-ci.org/uchicago-cs/deepdish.svg?branch=master

https://img.shields.io/badge/license-BSD%203--Clause-blue.svg?style=flat

deepdish

Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog:

http://deepdish.io

Installation

pip install deepdish

Alternatively (if you have conda with the conda-forge channel):

conda install -c conda-forge deepdish

Main feature

The primary feature of deepdish is its ability to save and load all kinds of data as HDF5. It can save any Python data structure, offering the same ease of use as pickling or numpy.save. However, it improves by also offering:

Interoperability between languages (HDF5 is a popular standard)
Easy to inspect the content from the command line (using h5ls or our specialized tool ddls)
Highly compressed storage (thanks to a PyTables backend)
Native support for scipy sparse matrices and pandas DataFrame, Series and Panel
Ability to partially read files, even slices of arrays

An example:

import deepdish as dd

d = {
    'foo': np.ones((10, 20)),
    'sub': {
        'bar': 'a string',
        'baz': 1.23,
    },
}
dd.io.save('test.h5', d)

This can be reconstructed using dd.io.load('test.h5'), or inspected through the command line using either a standard tool:

$ h5ls test.h5
foo                      Dataset {10, 20}
sub                      Group

Or, better yet, our custom tool ddls (or python -m deepdish.io.ls):

$ ddls test.h5
/foo                       array (10, 20) [float64]
/sub                       dict
/sub/bar                   'a string' (8) [unicode]
/sub/baz                   1.23 [float64]

Documentation

http://deepdish.readthedocs.io/

Comments

Add simplenamespace io

This PR adds native save/load of Python SimpleNamespace objects to deepdish. The implementation is such that older versions of Python (that do not support SimpleNamespace types) will seamlessly load them as dictionaries. The current version of deepdish supports the SimpleNamespaces via pickling, which has obvious downsides.

Our group uses dictionaries and SimpleNamespaces very often in our work: dictionaries when the keys are variables, SimpleNamespaces when the keys are fixed.

From "The Zen of Python": Namespaces are one honking great idea -- let's do more of those!

opened by twmacro 15
Remove file handle support and type check in save and load
Currently, if a unicode string was passed to deepdish.io.load or save, it would fail because isinstance(u'file.h5', str) = False. This PR allows for all six.string_types to be passed.

More generally, I'd suggest something like this, assuming it covers your intended use-cases:

if isinstance(path, file): path = path.name elif not isinstance(path, six.string_types): raise ValueError('path type {} not supported.'.format(type(path))
opened by craffel 11
A Mistake?

http://deepdish.io/2014/10/28/hintons-dark-knowledge/

I was wondering if this was a mistake Hinton in his lecture slides says that raising the temperature was

yk/T

where as you have it as

yk^1/T

which is a squareroot ? if I use T ?

Can you explain ? this occurs for the denominator as well.
blog

opened by ArEnSc 7
add soft links to save shared objects just once and support recursion
This PR adds soft links to deepdish. Primary, this means that a shared object will be written to the disk just once no matter how many names or other objects refer to it. These relationships are maintained upon load. Also allows for recursion.

A common use case for us is the saving of time vectors for accelerometer data or analysis data. Often, the time vectors are all the same, but sometimes they are all different. Similarly for frequency vectors in frequency response analysis. Also, sometimes it's handy to have a shortcut link (not unlike a file-system link) from within one dictionary to another "common data" dictionary in a large data structure. The Softlinks capability in HDF5 make adding this feature to deepdish almost trivial. :smile:

For example:

import numpy as np import deepdish as dd A = np.random.randn(3, 3) d = dict(A=A, B=A) # two objects point to same matrix d['C'] = d # add a recursive member dd.io.save('test.h5', d) d2 = dd.io.load('test.h5')

From within ipython:

In [2]: d2['B'] is d2['A'] Out[2]: True In [3]: d2['C'] is d2 Out[3]: True

Here is a ddls view of the file:

$ ddls test.h5 /A array (3, 3) [float64] /B link -> /A [SoftLink] /C link -> / [SoftLink]
opened by twmacro 6
Incompatibility with pandas v1.2.0

Hi guys,

Thanks for the incredibly useful piece of software!

The recent release of v1.2.0 of pandas seems to have broken compatibility with deepdish. Things work perfectly with pandas v1.1.5 and deepdish v0.3.6, however upgrading to pandas v1.2.0 produces the following error:

File "/Users/adam/anaconda3/lib/python3.8/site-packages/deepdish/io/hdf5io.py", line 583, in save _save_level(h5file, group, value, name=key, File "/Users/adam/anaconda3/lib/python3.8/site-packages/deepdish/io/hdf5io.py", line 211, in _save_level _save_level(handler, new_group, v, name=k, filters=filters, File "/Users/adam/anaconda3/lib/python3.8/site-packages/deepdish/io/hdf5io.py", line 211, in _save_level _save_level(handler, new_group, v, name=k, filters=filters, File "/Users/adam/anaconda3/lib/python3.8/site-packages/deepdish/io/hdf5io.py", line 251, in _save_level elif _pandas and isinstance(level, (pd.DataFrame, pd.Series, pd.Panel)): File "/Users/adam/anaconda3/lib/python3.8/site-packages/pandas/init.py", line 244, in getattr raise AttributeError(f"module 'pandas' has no attribute '{name}'") AttributeError: module 'pandas' has no attribute 'Panel'

It looks like pandas.Panel has been removed from the latest release. Presumably all that needs to be done is removing references to pd.Panel from hdf5io.py? I'd be happy to submit this as a pull request if you agree this is the right course of action.

Cheers, Adam

opened by ACCarnall 5

AttributeError when saving a Pandas object with Pandas 0.24.0 or greater

When saving a pd.Series, pd.DataFrame, or pd.Panel to HDF5 using deepdish, an AttributeError is raised, and I cannot save the file. I've tracked down the issue, and it's due to a change in Pandas version 0.24.0.

Here is how I've been able to reproduce the error, where I have installed Pandas 0.24.2, Numpy 0.15.4, deepdish 0.3.6, and PyTables 3.5.1.

import pandas as pd
import numpy as np
import deepdish as dd

dd.io.save("test.h5", {"test" : pd.Series(data=np.random.rand(1))}, )

The error returned is:

---------------------------------------------------------------------------
NoSuchNodeError                           Traceback (most recent call last)
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in get_node(self, key)
   1159                 key = '/' + key
-> 1160             return self._handle.get_node(self.root, key)
   1161         except _table_mod.exceptions.NoSuchNodeError:

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in get_node(self, where, name, classname)
   1643             nodepath = join_path(basepath, name or '') or '/'
-> 1644             node = where._v_file._get_node(nodepath)
   1645         elif isinstance(where, (six.string_types, numpy.str_)):

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in _get_node(self, nodepath)
   1598 
-> 1599         node = self._node_manager.get_node(nodepath)
   1600         assert node is not None, "unable to instantiate node ``%s``" % nodepath

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in get_node(self, key)
    436         if self.node_factory:
--> 437             node = self.node_factory(key)
    438             self.cache_node(node, key)

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/group.py in _g_load_child(self, childname)
   1180         # Is the node a group or a leaf?
-> 1181         node_type = self._g_check_has_child(childname)
   1182 

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/group.py in _g_check_has_child(self, name)
    397                 "group ``%s`` does not have a child named ``%s``"
--> 398                 % (self._v_pathname, name))
    399         return node_type

NoSuchNodeError: group ``/`` does not have a child named ``//test``

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-2-60c97adec230> in <module>
----> 1 dd.io.save("test4.h5", {"test" : pd.Series(data=np.random.rand(1))}, )

~/.local/lib/python3.7/site-packages/deepdish-0.3.4-py3.7.egg/deepdish/io/hdf5io.py in save(path, data, compression)
    587             for key, value in data.items():
    588                 _save_level(h5file, group, value, name=key,
--> 589                             filters=filters, idtable=idtable)
    590 
    591         elif (_sns and isinstance(data, SimpleNamespace) and

~/.local/lib/python3.7/site-packages/deepdish-0.3.4-py3.7.egg/deepdish/io/hdf5io.py in _save_level(handler, group, level, name, filters, idtable)
    256         store = _HDFStoreWithHandle(handler)
    257 #         print(store.get_node(group._v_pathname))
--> 258         store.append(group._v_pathname + '/' + name, level)
    259 
    260     elif isinstance(level, (sparse.dok_matrix,

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
    984         kwargs = self._validate_format(format, kwargs)
    985         self._write_to_group(key, value, append=append, dropna=dropna,
--> 986                              **kwargs)
    987 
    988     def append_to_multiple(self, d, value, selector, data_columns=None,

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1365     def _write_to_group(self, key, value, format, index=True, append=False,
   1366                         complib=None, encoding=None, **kwargs):
-> 1367         group = self.get_node(key)
   1368 
   1369         # remove the node if we are not appending

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in get_node(self, key)
   1159                 key = '/' + key
   1160             return self._handle.get_node(self.root, key)
-> 1161         except _table_mod.exceptions.NoSuchNodeError:
   1162             return None
   1163 

AttributeError: 'NoneType' object has no attribute 'exceptions'

From the above, we see that the _table_mod variable is None, which is throwing the error. The reason that this is now an error is related to https://github.com/pandas-dev/pandas/pull/22919, where the exception in HDFStore.get_node was changed from a bare exception to a specific exception.

Before: https://github.com/pandas-dev/pandas/blob/2d0c96119391c85bd4f7ffbb847759ee3777162a/pandas/io/pytables.py#L1157-L1165

After: https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L1141-L1149

So, now the _table_mod variable is used to only return None in the case that the exception is a NoSuchNodeError, rather than any error. However, _table_mod should be set by running of the function pandas.io.pytables._tables, which imports PyTables into the namespace as _table_mod. If this function is not run, then _table_mod is left as None, and the above AttributeError occurs.

The problem is that in deepdish's use of pandas.io.pytables.HDFStore, where there's a wrapper of the function called _HDFStoreWithHandle, none of the methods that call the _tables function are called, and _table_mod is left as None, which gives us the AttributeError.

My proposed solution is to add one line to the beginning hdf5io.py file in deepdish, where we call the pandas.io.pytables._tables .

Before:

https://github.com/uchicago-cs/deepdish/blob/01af93621fe082a3972fe53ba7375388c02b0085/deepdish/io/hdf5io.py#L1-L12

After:

from __future__ import division, print_function, absolute_import

import numpy as np
import tables
import warnings
from scipy import sparse
from deepdish import conf
try:
    import pandas as pd
    pd.io.pytables._tables()
    _pandas = True
except ImportError:
    _pandas = False

After making this change, I no longer get the AttributeError and the saving of Pandas data types works seamlessly.

opened by slwatkins 4

ValueError when trying to save numpy scalar arrays (ndim = 0)

import numpy as np
import deepdish
deepdish.io.save('test.h5', np.array(0.))

results in

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

deepdish/io/hdf5io.pyc in save(path, data, compression)
    458
    459         else:
--> 460             _save_level(h5file, group, data, name='data', filters=filters)
    461             # Mark this to automatically unpack when loaded
    462             group._v_attrs[DEEPDISH_IO_UNPACK] = True

deepdish/io/hdf5io.pyc in _save_level(handler, group, level, name, filters)
    184
    185     elif isinstance(level, np.ndarray):
--> 186         _save_ndarray(handler, group, name, level, filters=filters)
    187
    188     elif _pandas and isinstance(level, (pd.DataFrame, pd.Series, pd.Panel)):

deepdish/io/hdf5io.pyc in _save_ndarray(handler, group, name, x, filters)
    112         strtype = None
    113         itemsize = None
--> 114     assert np.min(x.shape) > 0, ("deepdish.io.save does not support saving "
    115                                  "numpy arrays with a zero-length axis")
    116     # For small arrays, compression actually leads to larger files, so we are

numpy/core/fromnumeric.pyc in amin(a, axis, out, keepdims)
   2217         except AttributeError:
   2218             return _methods._amin(a, axis=axis,
-> 2219                                 out=out, keepdims=keepdims)
   2220         # NOTE: Dropping the keepdims parameter
   2221         return amin(axis=axis, out=out)

numpy/core/_methods.pyc in _amin(a, axis, out, keepdims)
     27
     28 def _amin(a, axis=None, out=None, keepdims=False):
---> 29     return umr_minimum(a, axis, None, out, keepdims)
     30
     31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):

ValueError: zero-size array to reduction operation minimum which has no identity

Is saving numpy scalar arrays (ndim = 0) supported? If not, I think you could change the test to assert x.ndim > 0 and np.min(x.shape) > 0. The first condition would fail for numpy scalar arrays, so the second wouldn't be evaluated. If numpy scalar arrays aren't supported, I'm curious why and if that's functionality that could be added. Finally, separately, you also should arguably be doing

if np.min(x.shape) > 0:
    raise ValueError(...)

instead of an assert, see e.g. here, but that's a separate discussion! Thank you again for the excellent library!

opened by craffel 4

dd.io.save crashes while saving np.array of objects

dd.io.save crashes when you try to save np.array with dtype=object

Ubuntu 14.04 x64 Python 2.7.11 (default, Dec 15 2015, 16:46:19) [GCC 4.8.4]

In [1]: np.__version__
Out[2]: '1.11.2'
In [2]: dd.__version__
Out[2]: '0.3.4'
In [3]: tables.__version__
Out[3]: '3.2.2'


In [10]: x = np.array(['123', 567, 'hjjjk'], dtype='O')

In [11]: x
Out[11]: array(['123', 567, 'hjjjk'], dtype=object)

In [12]: dd.io.save('t.h5', x)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-dc094ba588e4> in <module>()
----> 1 dd.io.save('t.h5', x)

/export/home/asanakoy/.local/lib/python2.7/site-packages/deepdish/io/hdf5io.pyc in save(path, data, compression)
    579         else:
    580             _save_level(h5file, group, data, name='data',
--> 581                         filters=filters, idtable=idtable)
    582             # Mark this to automatically unpack when loaded
    583             group._v_attrs[DEEPDISH_IO_UNPACK] = True

/export/home/asanakoy/.local/lib/python2.7/site-packages/deepdish/io/hdf5io.pyc in _save_level(handler, group, level, name, filters, idtable)
    242 
    243     elif isinstance(level, np.ndarray):
--> 244         _save_ndarray(handler, group, name, level, filters=filters)
    245 
    246     elif _pandas and isinstance(level, (pd.DataFrame, pd.Series, pd.Panel)):

/export/home/asanakoy/.local/lib/python2.7/site-packages/deepdish/io/hdf5io.pyc in _save_ndarray(handler, group, name, x, filters)
    123         atom = tables.StringAtom(itemsize)
    124     else:
--> 125         atom = tables.Atom.from_dtype(x.dtype)
    126         strtype = None
    127         itemsize = None

/export/home/asanakoy/.local/lib/python2.7/site-packages/tables/atom.pyc in from_dtype(class_, dtype, dflt)
    377             return class_.from_kind('string', itemsize, dtype.shape, dflt)
    378         # Most NumPy types have direct correspondence with PyTables types.
--> 379         return class_.from_type(basedtype.name, dtype.shape, dflt)
    380 
    381     @classmethod

/export/home/asanakoy/.local/lib/python2.7/site-packages/tables/atom.pyc in from_type(class_, type, shape, dflt)
    402 
    403         if type not in all_types:
--> 404             raise ValueError("unknown type: %r" % (type,))
    405         kind, itemsize = split_type(type)
    406         return class_.from_kind(kind, itemsize, shape, dflt)

ValueError: unknown type: 'object'

bug

opened by asanakoy 3

`save` function with mode 'a'

Currently dd.io.save overwrites target file if exists. Would it be a good idea to add a mode argument so that the target file can be constantly updated as new data arrives?

opened by gaow 3
I followed example in http://deepdish.io/ but 'numpy.ndarray' object has no attribute 'tobytes'

I run the example code in http://deepdish.io/ However, at datum.data = X[i].tobytes() error came. I know that error is come from deepdish but I don't know the reason and I want to run example code Do you know the reason the problem ??

My numpy version is 1.8.2
blog

opened by stray-leone 3

Error while installing deepdish

I'm trying to create an LMDB database file to be used with Caffe according to this tutorial on an Ubuntu 14.04 machine using Anaconda Python 2.7.9. However, when I do pip install deepdish, I'm getting the following error:

Collecting deepdish
  Using cached deepdish-0.1.4.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/tmp/pip-build-qKwOBx/deepdish/setup.py", line 12, in <module>
        with open('requirements.txt') as f:
    IOError: [Errno 2] No such file or directory: 'requirements.txt'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-qKwOBx/deepdish

Any ideas why this error might be occurring and how to go about correcting it? Any help is much appreciated. Thank you.

bug

opened by prasannadate 3

numpy >= 1.24 enforces deprecation of `np.object`

np.object is used here: https://github.com/uchicago-cs/deepdish/blob/master/deepdish/io/hdf5io.py#L125 Thus the code crashes when it goes through this line when numpy >= 1.24.

opened by fgoudreault 0
Fixing deprecated np.object for numpy >= 1.24

np.object is deprecated since numpy 1.20 but since 1.24, the deprecation is enforced (see numpy release notes). More details: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations Fixes #50

opened by fgoudreault 0

Saving Class instances not working when registry is not called

Hi, thanks for this cool library. The following example I took from your readthedocs (https://deepdish.readthedocs.io/en/latest/io.html#class-instances):

import deepdish as dd

class Foo(dd.util.SaveableRegistry):
    def __init__(self, x):
        self.x = x

    @classmethod
    def load_from_dict(self, d):
        obj = Foo(d['x'])
        return obj

    def save_to_dict(self):
        return {'x': self.x}


if __name__ == '__main__':
    f = Foo(10)
    f.save('foo.h5')
    f = Foo.load('foo.h5')

This is a more minimal example because there is no class 'Bar' that inherits from Foo and therefore the @Foo.register('bar') decorator) is never called. When doing this. This leads to the following traceback:

/Users/jorenretel/bin/miniconda3/envs/abstract_classifier/bin/python /Users/jorenretel/Library/Preferences/PyCharm2019.3/scratches/scratch_3.py
/Users/jorenretel/bin/miniconda3/envs/abstract_classifier/lib/python3.8/site-packages/deepdish/io/hdf5io.py:246: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
  elif _pandas and isinstance(level, (pd.DataFrame, pd.Series, pd.Panel)):
Traceback (most recent call last):
  File "/Users/jorenretel/Library/Preferences/PyCharm2019.3/scratches/scratch_3.py", line 20, in <module>
    f = Foo.load('foo.h5')
  File "/Users/jorenretel/bin/miniconda3/envs/abstract_classifier/lib/python3.8/site-packages/deepdish/util/saveable.py", line 162, in load
    return cls.getclass(class_name).load_from_dict(d)
  File "/Users/jorenretel/bin/miniconda3/envs/abstract_classifier/lib/python3.8/site-packages/deepdish/util/saveable.py", line 121, in getclass
    return cls.REGISTRY[name]
KeyError: 'noname'

Process finished with exit code 1

The problem is that this function in deepdish/util/saveable.py never gets overloaded when registry is not called:

    @property
    def name(self):
        """Returns the name of the registry entry."""
        # Automatically overloaded by 'register'
        return "noname"

Possible solution: return None, instead of 'noname'? I am not sure whether this does not have some side effect that I am not aware of.

opened by jorenretel 0

Update conda-forge version to match pypi

Thanks for the great package! Not sure if this is the best place to make this request, but is there any chance you can update the version of deepdish on conda-forge to match the latest version in pypi? It looks like latest version is 0.3.4 which breaks for saving sometimes of objects, but this is resolved in 0.3.6.

There's a user submitted one on Anaconda cloud in case that's helpful for the update: https://anaconda.org/turbach/deepdish, but it would be nice for it to be updated on conda-forge for ease of installation of packages that depend on deepdish

opened by ejolly 0
Overflow Error When Attempting to Save Large Amounts of Data

I have been using deepdish to save dictionaries with large amounts of data. I ran into the following issue when attempting to save a particularly large file. I have tried saving the data with and without compression, if that helps. Can you help me out with it please?

File "C:/Users/xxxxxxxx/Documents/Python_Scripts/Data_Scripts/Finalized_Data_Review_Presentations/data_save_cc_test.py", line 513, in dd.io.save('%s/Data/%s_%s_cc_data.h5'%(directory,m_list[m],list_type),cc_data,('blosc', 9))

File "C:\Users\xxxxxxxx\AppData\Local\Continuum\anaconda2\lib\site-packages\deepdish\io\hdf5io.py", line 596, in save filters=filters, idtable=idtable)

File "C:\Users\xxxxxxxx\AppData\Local\Continuum\anaconda2\lib\site-packages\deepdish\io\hdf5io.py", line 304, in _save_level _save_pickled(handler, group, level, name=name)

File "C:\Users\xxxxxxxx\AppData\Local\Continuum\anaconda2\lib\site-packages\deepdish\io\hdf5io.py", line 172, in _save_pickled node.append(level)

File "C:\Users\xxxxxxxx\AppData\Local\Continuum\anaconda2\lib\site-packages\tables\vlarray.py", line 547, in append self._append(nparr, nobjects)

File "tables/hdf5extension.pyx", line 2032, in tables.hdf5extension.VLArray._append

OverflowError: Python int too large to convert to C long

opened by 0Maximus0 0

Owner

UChicago - Department of Computer Science

GitHub http://deepdish.io

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

27 Nov 1, 2022

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

791 Jan 4, 2023

A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

A lightweight, hub-and-spoke dashboard for cross-account Data Science Projects Introduction Modern Data Science environments often involve many indepe

3 Oct 30, 2021

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python ??

2 May 26, 2022

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

1 Feb 11, 2022

Orchest is a browser based IDE for Data Science.

Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you don’t have to. The application is easy to use and can run on your laptop as well as on a large scale cloud cluster.

3.6k Jan 9, 2023

Lale is a Python library for semi-automated data science.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion.

293 Dec 29, 2022

Data Science Environment Setup in single line

datascienv is package that helps your to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

55 Dec 16, 2022

Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git repo for Makefiles: One great trick for making your conda environments mo

18 Dec 23, 2022

Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

6 Aug 10, 2021

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

1 Dec 9, 2021

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

How useful is the aswer? A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful. If you want to l

1 Dec 17, 2021

2019 Data Science Bowl

Kaggle-2019-Data-Science-Bowl-Solution - Here i present my solution to kaggle 2019 data science bowl and how i improved it to win a silver medal in that competition.

1 Jan 1, 2022

Kennedy Institute of Rheumatology University of Oxford Project November 2019

TradingBot6M Kennedy Institute of Rheumatology University of Oxford Project November 2019 Run Change api.txt to binance api key: https://www.binance.c

2 Nov 16, 2021

University Challenge 2021 With Python

University Challenge 2021 This repository contains: The TeX file of the technical write-up describing the University / HYPER Challenge 2021 under late

2 Nov 27, 2021

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

97 Dec 8, 2022

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

Related tags

Overview

deepdish

Installation

Main feature

Documentation

Comments

Owner

UChicago - Department of Computer Science

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

Orchest is a browser based IDE for Data Science.

Lale is a Python library for semi-automated data science.

Data Science Environment Setup in single line

Improving your data science workflows with

Open source platform for Data Science Management automation

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

2019 Data Science Bowl

Kennedy Institute of Rheumatology University of Oxford Project November 2019

University Challenge 2021 With Python

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

Fast, flexible and easy to use probabilistic modelling in Python.

A collection of robust and fast processing tools for parsing and analyzing web archive data.

Additional tools for particle accelerator data analysis and machine information