Intake is a lightweight package for finding, investigating, loading and disseminating data.

Overview

Intake: A general interface for loading data

Logo

Build Status Documentation Status Join the chat at https://gitter.im/ContinuumIO/intake

Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps you:

  • Load data from a variety of formats (see the current list of known plugins) into containers you already know, like Pandas dataframes, Python lists, NumPy arrays, and more.
  • Convert boilerplate data loading code into reusable Intake plugins
  • Describe data sets in catalog files for easy reuse and sharing between projects and with others.
  • Share catalog information (and data sets) over the network with the Intake server

Documentation is available at Read the Docs.

Status of intake and related packages is available at Status Dashboard

Weekly news about this repo and other related projects can be found on the wiki

Install

Recommended method using conda:

conda install -c conda-forge intake

You can also install using pip, in which case you have a choice as to how many of the optional dependencies you install, with the simplest having least requirements

pip install intake

and additional sections [server], [plot] and [dataframe], or to include everything:

pip install intake[complete]

Note that you may well need specific drivers and other plugins, which usually have additional dependencies of their own.

Development

  • Create development Python environment with the required dependencies, ideally with conda. The requirements can be found in the yml files in the scripts/ci/ directory of this repo.
    • e.g. conda env create -f scripts/ci/environment-py38.yml and then conda activate test_env
  • Install intake using pip install -e .[complete]
  • Use pytest to run tests.
  • Create a fork on github to be able to submit PRs.
  • We respect, but do not enforce, pep8 standards; all new code should be covered by tests.
Comments
  • Monthly dev meeting

    Monthly dev meeting

    I mentioned this idea on gitter, but the idea arose at the dask dev meeting that it would be a generally useful thing to have a monthly dev meeting where we can discuss maintenance, road-map and so on.

    Perhaps 10:30am eastern on the first Thursday of the month (that would put it right before the dask monthly meeting, so maybe it'd be easier to remember?).

    @danielballan @martindurant

    opened by jsignell 40
  • Use entrypoints to manage drivers. Add subcommand.

    Use entrypoints to manage drivers. Add subcommand.

    I would like it to be possible to:

    • Provide drivers that are discoverable by intake without necessarily packaging them in a package named intake*
    • Have the option to disable a specific intake driver from getting autodiscovered without uninstalling the package that provides it or disabling other drivers in that package

    I think the Jupyter notebook serverextension system has settled on a nice way to manage this kind of configuration (after many iterations and pivots over the years). This PR imitates that system. It's just a first pass to evaluate interest and would need more careful thought before being merged.

    Demo:

    An intake drivers subcommand can list the drivers that are added to intake.registry at import time.

    $ intake drivers list
    netcdf                        intake_xarray.netcdf.NetCDFSource
    opendap                       intake_xarray.opendap.OpenDapSource
    rasterio                      intake_xarray.raster.RasterIOSource
    remote-xarray                 intake_xarray.xarray_container.RemoteXarray
    zarr                          intake_xarray.xzarr.ZarrSource
    

    A verbose option includes __file__ locations, potentially useful for untangling issues with environments.

    $ intake drivers list -v
    netcdf                        intake_xarray.netcdf.NetCDFSource @ /home/dallan/Repos/bnl/intake-xarray/intake_xarray/netcdf.py
    opendap                       intake_xarray.opendap.OpenDapSource @ /home/dallan/Repos/bnl/intake-xarray/intake_xarray/opendap.py
    rasterio                      intake_xarray.raster.RasterIOSource @ /home/dallan/Repos/bnl/intake-xarray/intake_xarray/raster.py
    remote-xarray                 intake_xarray.xarray_container.RemoteXarray @ /home/dallan/Repos/bnl/intake-xarray/intake_xarray/xarray_container.py
    zarr                          intake_xarray.xzarr.ZarrSource @ /home/dallan/Repos/bnl/intake-xarray/intake_xarray/xzarr.py
    

    Now suppose I want to disable the 'zarr'` driver provided by intake_xarray. Perhaps I have a different implementation that I want to use with 'zarr' and I need to avoid the name collision.

    $ intake drivers disable intake_xarray.xzarr.ZarrSource
    $ intake drivers list
    netcdf                        intake_xarray.netcdf.NetCDFSource
    opendap                       intake_xarray.opendap.OpenDapSource
    rasterio                      intake_xarray.raster.RasterIOSource
    remote-xarray                 intake_xarray.xarray_container.RemoteXarray
    $ python -c "import intake; print('zarr' in intake.registry)"
    False
    

    I can later re-enable it:

    $ intake drivers enable intake_xarray.xzarr.ZarrSource
    $ python -c "import intake; print('zarr' in intake.registry)"
    True
    

    The enable/disable state is stored in a separate YAML file for each driver in ~/.intake/drivers.d, imitating the system used by Jupyter. For backward compatibility, drivers in packages that begin with intake* are included in the registry unless they are explicitly disabled. (That is, they need not have any configuration in ~/.intake/drivers.d.) Drivers in packages with other names can be explicitly enabled:

    $ intake drivers enable offbrand_catalog.MongoMetadataStoreCatalog
    $ intake drivers list
    netcdf                        intake_xarray.netcdf.NetCDFSource
    opendap                       intake_xarray.opendap.OpenDapSource
    rasterio                      intake_xarray.raster.RasterIOSource
    remote-xarray                 intake_xarray.xarray_container.RemoteXarray
    zarr                          intake_xarray.xzarr.ZarrSource
    mongo_metadatastore           offbrand_catalog.MongoMetadataStoreCatalog
    

    The enable command created the following file at ~/.intake/drivers.d/offbrand_catalog.MongoMetadataStoreCatalog.yml:

    offbrand_catalog.MongoMetadataStoreCatalog:                                                                                                                                                   
      enabled: true
    

    As documented by Jupyter, packages can automatically enable their drivers at install time by using data_files in setup.py to place the corresponding files in ~/.intake/drivers.d/.

    in progress 
    opened by danielballan 35
  • New Panel GUI

    New Panel GUI

    Try it locally

    At the moment this work depends on development version of panel and bokeh:

    conda install -c conda-forge panel==0.5.1 hvplot==0.4.0 bokeh==1.1.0
    jupyter labextension install @pyviz/jupyterlab_pyviz  # if using jupyterlab
    

    Then: take a look at examples/GUI.ipynb or do panel serve intake/gui/server.py

    Towards #225

    Feature comparability with ipywidgets GUI:

    • [x] local files
    • [x] remote files
    • [x] catalog selection
    • [x] data source selection
    • [x] data source description
    • [x] catalog search
    • [x] initialization as intake.gui

    Additional functionality:

    • [x] windows paths
    • [x] make sure that control buttons reflect visible status
    • [x] add home button to local file browser
    • [x] unit tests
    • [x] make path editable
    • [x] disable buttons when they don't make sense
    • [x] use properties over methods
    • [x] try to abstract widget from functionality
    • [x] choose from defined plots
    opened by jsignell 33
  • nesting of catalogs in a deep directory structure

    nesting of catalogs in a deep directory structure

    Hi folks, especially @martindurant to whom I spoke briefly today. I am working with a large collection of data (CMIP6) containing mostly netcdf and zarr objects - consisting of thousands of files in a directory tree. Obviously I cannot write flat YAML catalogs in this case. What I am thinking is putting a 'config.yaml' file in EVERY directory and making the sources section point to config.yaml files in each of its subdirectories and so on ... The nesting would be about 6-7 levels deep.

    Here is my example.

    This works great, except for two (hopefully) small issues (for more details, see above link)

    1. <tab> completion will work to get from the parent to child catalog (sub-directory), but then will not get to the grandchild catalog (sub-sub-directory)

    2. The path in each YAML file is relative to the <dir> in the initial intake.open_catalog('<dir>/config.yaml'). This means that if I intake.open_catalog(config.yaml) in a subdirectory, the paths are going to need to be relative to this subdirectory, but I have had to hard-wire them to an arbitrary initial parent directory.

    If I am re-inventing the wheel and you could point me to an existing solution, that would be very much appreciated. Or maybe there is a parent/child setting or equivalent to make this work?

    opened by naomi-henderson 27
  • Slim down requirements

    Slim down requirements

    Currently the requirements file looks like the following:

    jinja2
    msgpack-numpy
    msgpack-python
    numpy
    pandas
    pytest
    python-snappy
    ruamel.yaml >= 0.15.0
    requests
    appdirs
    six
    tornado >= 4.5.1
    dask[complete]
    holoviews
    

    This makes it difficult for downstream deployments to easily depend on it. I wonder if perhaps some of these requirements could be made optional for minimal deployments.

    opened by mrocklin 21
  • Let users never see Entry objects

    Let users never see Entry objects

    This is "unification light" - still have Entry objects, but the user doesn't see them. The catalog just gives you the default source directly, but you can override them with call or clone (maybe should be get??) to get a new version of the source. Note that .describe() will give you the original catalog definition (with user parameters), not your overridden version, but repr of the source now also gives you that YAML view with the current set of arguments.

    cc @danielballan @tacaswell

    opened by martindurant 18
  • problem under pip

    problem under pip

    I don't catch why but this instruction doesn't work, yet ruamel.yaml is installed (Python-3.7, same error on Python-3.6))

    from ruamel_yaml.constructor import DuplicateKeyError
    

    I get a

    C:\WinP\bd37\bu\winp64-3.7.x.1\python-3.7.0b5.amd64\lib\site-packages\intake\catalog\local.py in <module>()
          7 import yaml
          8 
    ----> 9 from ruamel_yaml.constructor import DuplicateKeyError
         10 
         11 from jinja2 import Template
    
    ModuleNotFoundError: No module named 'ruamel_yaml'
    

    any clue ?

    opened by stonebig 18
  • (Compressed) Excel driver

    (Compressed) Excel driver

    Hi,

    My main data sources at the moment are remote zipped Excel files, so I was starting to build a data package and a driver package but I need your feedback: should it be an Excel Driver with a compression type option, is anyone already working on it? I would use pandas Excel reader as the reader.

    If all lights are green I will start this intake-excel driver.

    Regards, Guillaume

    opened by gansanay 17
  • Metadata fields used for plotting

    Metadata fields used for plotting

    Catalogues allow defining arbitrary metadata to be associated with a dataset, which could be very useful to provide hints to the plotting system. Specifically there are two types of options that could be useful in this regard:

    • Plot options: Options passed directly to the plotting API, e.g. datashade, width, height, colorbar, logx, logy etc.
    • Field annotations: Additional metadata to associate with dataset fields (i.e. columns in a dataframe), including labels (used for axis labels), units (also for axis labels), ranges (to set axis and color range limits)

    It would be good to decide on the syntax to express these options. I'm currently imagining something like this:

    sources:
      nyc_taxi:
        description: NYC Taxi dataset
        driver: parquet
        args:
          urlpath: 's3://datashader-data/nyc_taxi_wide.parq'
        metadata:
          plot:
            datashade: true
          fields:
            dropoff_x:
              label: 'Longitude'
            dropoff_y:
              label: 'Latitude'
            fare_amount:
              label: 'Fare'
              unit: '$'
              range: [0, 100]
    

    Other suggestions with (perhaps with less nesting) would be welcome though.

    opened by philippjfr 16
  • Incompatible with pandas 1.0.0

    Incompatible with pandas 1.0.0

    Pandas chose to remove support for msgpack in order to guide users toward Arrow instead. This breaks this codepath in intake

    https://github.com/intake/intake/blob/33a096721765fc7fe79e958d3aa5f050a4c60937/intake/container/serializer.py#L56-L57

    which now raises an AttributeError because Data.Frame.to_msgpack no longer exists.

    I'm not immediately sure what the right fix is here, but maybe we should take up the suggestion in the TODO comment in this same function and transition to relying on distributed.

    https://github.com/intake/intake/blob/33a096721765fc7fe79e958d3aa5f050a4c60937/intake/container/serializer.py#L48-L49

    opened by danielballan 15
  • Windows support

    Windows support

    Adds window support and appveyor to make sure it doesn't break again.

    Some things I'm worried about not being tested properly:

    • does serialization work properly on windows?
    • should CATALOG_DIR have a trailing /?
    • does catalog flattening work properly on windows?
    opened by jsignell 15
  • Question: what is the expected type for `direct_access` ?

    Question: what is the expected type for `direct_access` ?

    When looking at the occurrences of direct_access the expected types don't align which leads me to wonder what the expected type should be?

    https://github.com/intake/intake/search?q=direct_access

    When loading a catalog from a YAML file, the expected type based here is expected to be a string. However, the default value assumed by a LocalCatalogEntry appears to be a boolean.

    If I ignore the type specified by type hints and the docstrings, I can properly serialize the catalog and read from a YAML file to instantiate the entries.

    opened by lukecampbell 1
  • Programmatically add a catalog to Intake

    Programmatically add a catalog to Intake

    On https://intake.readthedocs.io/en/latest/quickstart.html#adding-data-source-packages-using-the-intake-path I read `Adding Data Source Packages using the Intake path: Intake checks the Intake config file for catalog_path or the environment variable "INTAKE_PATH" for a colon separated list of paths (semicolon on windows) to search for catalog files. When you import intake we will see all entries from all of the catalogues referenced as part of a global catalog called intake.cat``

    1. Should the title be Adding Catalog Packages... instead of Adding Data Source Packages.
    2. Is there an API to add a catalog to intake? (*)

    (*) eg.

    cat = intake.open_catalog('us_states.yml')
    intake.add_catalog(cat) # ???
    
    opened by echarles 4
  • Add CSV output to intake get CLI

    Add CSV output to intake get CLI

    This small change adds the ability to get CSV output from the intake get CLI . Previously it was printing out the string representation of a DataFrame. It also adds a --output option for writing to a file instead of stdout.

    Closes #684

    opened by edsu 4
  • intake get output CSV

    intake get output CSV

    The docs say this for intake get:

    Given the name of a catalog entry, this subcommand outputs the entire data source to standard output.

    But when I run it I see the string representation of the Pandas DataFrame, not the entire dataset as CSV:

    $ intake get catalogs/bodleian.yaml turkish
                                                      id  ...                                           contents
    0  https://iiif.bodleian.ox.ac.uk/iiif/manifest/a...  ...                                                NaN
    0  https://iiif.bodleian.ox.ac.uk/iiif/manifest/2...  ...                                                NaN
    0  https://iiif.bodleian.ox.ac.uk/iiif/manifest/a...  ...                                                NaN
    0  https://iiif.bodleian.ox.ac.uk/iiif/manifest/9...  ...                                                NaN
    0  https://iiif.bodleian.ox.ac.uk/iiif/manifest/2...  ...                                                NaN
    0  https://iiif.bodleian.ox.ac.uk/iiif/manifest/e...  ...                                                NaN
    0  https://iiif.bodleian.ox.ac.uk/iiif/manifest/6...  ...                                                NaN
    0  https://iiif.bodleian.ox.ac.uk/iiif/manifest/c...  ...  [Persian ghazals (ff. 1b-16b). Nesimi, Dīvān-i...
    
    [8 rows x 30 columns]
    

    Would a PR to implement CSV output to STDOUT or an optional file be a welcome addition?

    PS. Thank you for a beautiful and extremely useful data tool!

    opened by edsu 0
  • Usage/design question: why specifying `driver` is compulsory?

    Usage/design question: why specifying `driver` is compulsory?

    For some use cases, I would like to store a path to a folder. I thought that this could be done by removing driver from the catalog entry, but this triggers an error. For example:

    ---------------------------------------------------------------------------
    ValidationError                           Traceback (most recent call last)
    Input In [6], in <cell line: 3>()
          1 from intake import open_catalog
    ----> 3 cat = open_catalog("../cat.yml")
          5 cat.f
    
    File ~/miniconda3/envs/intake/lib/python3.10/site-packages/intake/__init__.py:167, in open_catalog(uri, **kwargs)
        164 if driver not in registry:
        165     raise ValueError('Unknown catalog driver (%s), supply one of: %s'
        166                      % (driver, list(sorted(registry))))
    --> 167 return registry[driver](uri, **kwargs)
    
    File ~/miniconda3/envs/intake/lib/python3.10/site-packages/intake/catalog/local.py:573, in YAMLFileCatalog.__init__(self, path, autoreload, **kwargs)
        571 self.filesystem = kwargs.pop('fs', None)
        572 self.access = "name" not in kwargs
    --> 573 super(YAMLFileCatalog, self).__init__(**kwargs)
    
    File ~/miniconda3/envs/intake/lib/python3.10/site-packages/intake/catalog/base.py:110, in Catalog.__init__(self, entries, name, description, metadata, ttl, getenv, getshell, persist_mode, storage_options, user_parameters)
        108 self.updated = time.time()
        109 self._entries = entries if entries is not None else self._make_entries_container()
    --> 110 self.force_reload()
    
    File ~/miniconda3/envs/intake/lib/python3.10/site-packages/intake/catalog/base.py:168, in Catalog.force_reload(self)
        166 """Imperative reload data now"""
        167 self.updated = time.time()
    --> 168 self._load()
    
    File ~/miniconda3/envs/intake/lib/python3.10/site-packages/intake/catalog/local.py:608, in YAMLFileCatalog._load(self, reload)
        606     logger.warning("Use of '!template' deprecated - fixing")
        607     text = text.replace('!template ', '')
    --> 608 self.parse(text)
    
    File ~/miniconda3/envs/intake/lib/python3.10/site-packages/intake/catalog/local.py:687, in YAMLFileCatalog.parse(self, text)
        684 result = CatalogParser(data, context=context, getenv=self.getenv,
        685                        getshell=self.getshell)
        686 if result.errors:
    --> 687     raise exceptions.ValidationError(
        688         "Catalog '{}' has validation errors:\n\n{}"
        689         "".format(self.path, "\n".join(result.errors)), result.errors)
        691 cfg = result.data
        693 self._entries = {}
    
    ValidationError: Catalog '../cat.yml' has validation errors:
    
    ("missing required key 'driver'", {'args': {'urlpath': '{{ CATALOG_DIR }}/data'}})
    

    My question is: why is specifying driver compulsory? I understand that without it, a user could import catalog and face an error at a later stage. Is that the main reason or are there some other considerations? (apologies if this was covered in previous issues or documentation, I did a quick scan, but couldn't find anything relevant)

    opened by SultanOrazbayev 8
  • not getting optional dependencies for dataframe using poetry

    not getting optional dependencies for dataframe using poetry

    I'm not sure whether this behavior indicates a problem with poetry or with a setup.py file in intake or dask. Had to bug one of you first, and you drew the short straw (sorry!).

    I am working on a new Intake driver. I have a pyproject.toml file like this:

    [tool.poetry]
    name = "intake-xyz"
    version = "0.1.0"
    description = "XYZ plugin for Intake"
    authors = ["Ian Carroll <ian.t.carroll@nasa.gov>"]
    exclude = ['**/tests']
    
    [tool.poetry.plugins."intake.drivers"]
    "xyz" = "intake_xyz.source:XYZDataFrameSource"
    
    [tool.poetry.dependencies]
    python = "^3.8"
    intake = {extras = ["dataframe"], version = "^0.6.5"}
    
    [tool.poetry.dev-dependencies]
    pytest = "^7.1.2"
    
    [build-system]
    requires = ["poetry-core>=1.0.0"]
    build-backend = "poetry.core.masonry.api"```
    

    With the above, poetry install will not install all the dependencies. It does this:

    % poetry install
    Creating virtualenv intake-xyz-1PWxoOBD-py3.10 in ...
    Updating dependencies
    Resolving dependencies... (0.3s)
    
    Writing lock file
    
    Package operations: 24 installs, 0 updates, 0 removals
    
      • Installing locket (1.0.0)
      • Installing pyparsing (3.0.9)
      • Installing toolz (0.12.0)
      • Installing cloudpickle (2.1.0)
      • Installing fsspec (2022.5.0)
      • Installing markupsafe (2.1.1)
      • Installing msgpack (1.0.4)
      • Installing numpy (1.23.1)
      • Installing packaging (21.3)
      • Installing partd (1.2.0)
      • Installing pyyaml (6.0)
      • Installing appdirs (1.4.4)
      • Installing attrs (21.4.0)
      • Installing dask (2022.7.0)
      • Installing entrypoints (0.4)
      • Installing iniconfig (1.1.1)
      • Installing jinja2 (3.1.2)
      • Installing msgpack-numpy (0.4.8)
      • Installing pluggy (1.0.0)
      • Installing py (1.11.0)
      • Installing pyarrow (8.0.0)
      • Installing tomli (2.0.1)
      • Installing intake (0.6.5)
      • Installing pytest (7.1.2)
    
    Installing the current project: intake-xyz (0.1.0)
    

    Notice that pandas is missing. Only by installing "dask[dataframe]" explicitly do I get pandas.

    % poetry run pip install "dask[dataframe]"
    Requirement already satisfied: dask[dataframe] in path/to/site-packages (2022.7.0)
    Requirement already satisfied: packaging>=20.0 in path/to/site-packages (from dask[dataframe]) (21.3)
    Requirement already satisfied: cloudpickle>=1.1.1 in path/to/site-packages (from dask[dataframe]) (2.1.0)
    Requirement already satisfied: fsspec>=0.6.0 in path/to/site-packages (from dask[dataframe]) (2022.5.0)
    Requirement already satisfied: toolz>=0.8.2 in path/to/site-packages (from dask[dataframe]) (0.12.0)
    Requirement already satisfied: partd>=0.3.10 in path/to/site-packages (from dask[dataframe]) (1.2.0)
    Requirement already satisfied: pyyaml>=5.3.1 in path/to/site-packages (from dask[dataframe]) (6.0)
    Collecting pandas>=1.0
      Using cached pandas-1.4.3-cp310-cp310-macosx_10_9_x86_64.whl (11.5 MB)
    Requirement already satisfied: numpy>=1.18 in path/to/site-packages (from dask[dataframe]) (1.23.1)
    Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in path/to/site-packages (from packaging>=20.0->dask[dataframe]) (3.0.9)
    Collecting python-dateutil>=2.8.1
      Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
    Collecting pytz>=2020.1
      Using cached pytz-2022.1-py2.py3-none-any.whl (503 kB)
    Requirement already satisfied: locket in path/to/site-packages (from partd>=0.3.10->dask[dataframe]) (1.0.0)
    Collecting six>=1.5
      Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
    Installing collected packages: pytz, six, python-dateutil, pandas
    Successfully installed pandas-1.4.3 python-dateutil-2.8.2 pytz-2022.1 six-1.16.0
    

    I have no trouble working around this, by adding the "dask[dataframe]" dependency explicitly, but that shouldn't be necessary. Care to place blame?

    opened by itcarroll 0
Owner
Intake
Taking the pain out of data access and distribution
Intake
VevestaX is an open source Python package for ML Engineers and Data Scientists.

VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.

Vevesta 24 Dec 14, 2022
Python package to transfer data in a fast, reliable, and packetized form.

pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.

PB2 101 Dec 7, 2022
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

null 898 Jan 9, 2023
GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors. GWpy provides a user-f

GWpy 342 Jan 7, 2023
A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

The leading use-case for the staircase package is for the creation and analysis of step functions. Pretty exciting huh. But don't hit the close button

null 48 Dec 21, 2022
Python package for processing UC module spectral data.

UC Module Python Package How To Install clone repo. cd UC-module pip install . How to Use uc.module.UC(measurment=str, dark=str, reference=str, heade

Nicolai Haaber Junge 1 Oct 20, 2021
PyEmits, a python package for easy manipulation in time-series data.

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

Thompson 5 Sep 23, 2022
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 8, 2022
small package with utility functions for analyzing (fly) calcium imaging data

fly2p Tools for analyzing two-photon (2p) imaging data collected with Vidrio Scanimage software and micromanger. Loading scanimage data relies on scan

Hannah Haberkern 3 Dec 14, 2022
Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual

Allen Institute 16 Nov 4, 2022
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

Marc Skov Madsen 97 Dec 8, 2022
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

null 2 Nov 20, 2021
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

Rishikesh S 4 Oct 17, 2022
ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

ToeholdTools Category Status Repository Package Build Quality A library for the analysis of toehold switch riboregulators created by the iGEM team Cit

null 0 Dec 1, 2021
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

DAGsHub 359 Dec 22, 2022
Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

FangWei 1 Jan 16, 2022
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 1, 2022
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 1.6k Dec 29, 2022