Deep Learning Pipelines for Apache Spark

Databricks

Last update: Jan 8, 2023

Related tags

Deep Learning spark-deep-learning

Overview

Deep Learning Pipelines for Apache Spark

The repo only contains HorovodRunner code for local CI and API docs. To use HorovodRunner for distributed training, please use Databricks Runtime for Machine Learning, Visit databricks doc HorovodRunner: distributed deep learning with Horovod for details.

To use the previous release that contains Spark Deep Learning Pipelines API, please go to Spark Packages page.

API Documentation

class sparkdl.HorovodRunner(*, np, driver_log_verbosity='all')

Bases: object

HorovodRunner runs distributed deep learning training jobs using Horovod.

On Databricks Runtime 5.0 ML and above, it launches the Horovod job as a distributed Spark job. It makes running Horovod easy on Databricks by managing the cluster setup and integrating with Spark. Check out Databricks documentation to view end-to-end examples and performance tuning tips.

The open-source version only runs the job locally inside the same Python process, which is for local development only.

NOTE: Horovod is a distributed training framework developed by Uber.

Parameters
- np - number of parallel processes to use for the Horovod job. This argument only takes effect on Databricks Runtime 5.0 ML and above. It is ignored in the open-source version. On Databricks, each process will take an available task slot, which maps to a GPU on a GPU cluster or a CPU core on a CPU cluster. Accepted values are:
  - If <0, this will spawn -np subprocesses on the driver node to run Horovod locally. Training stdout and stderr messages go to the notebook cell output, and are also available in driver logs in case the cell output is truncated. This is useful for debugging and we recommend testing your code under this mode first. However, be careful of heavy use of the Spark driver on a shared Databricks cluster. Note that np < -1 is only supported on Databricks Runtime 5.5 ML and above.
  - If >0, this will launch a Spark job with np tasks starting all together and run the Horovod job on the task nodes. It will wait until np task slots are available to launch the job. If np is greater than the total number of task slots on the cluster, the job will fail. As of Databricks Runtime 5.4 ML, training stdout and stderr messages go to the notebook cell output. In the event that the cell output is truncated, full logs are available in stderr stream of task 0 under the 2nd spark job started by HorovodRunner, which you can find in the Spark UI.
  - If 0, this will use all task slots on the cluster to launch the job. .. warning:: Setting np=0 is deprecated and it will be removed in the next major Databricks Runtime release. Choosing np based on the total task slots at runtime is unreliable due to dynamic executor registration. Please set the number of parallel processes you need explicitly.
- np - driver_log_verbosity: This argument is only available on Databricks Runtime.

run(main, **kwargs)

Runs a Horovod training job invoking main(**kwargs).

The open-source version only invokes main(**kwargs) inside the same Python process. On Databricks Runtime 5.0 ML and above, it will launch the Horovod job based on the documented behavior of np. Both the main function and the keyword arguments are serialized using cloudpickle and distributed to cluster workers.

Parameters
- main – a Python function that contains the Horovod training code. The expected signature is def main(**kwargs) or compatible forms. Because the function gets pickled and distributed to workers, please change global states inside the function, e.g., setting logging level, and be aware of pickling limitations. Avoid referencing large objects in the function, which might result large pickled data, making the job slow to start.
- kwargs – keyword arguments passed to the main function at invocation time.
Returns

return value of the main function. With np>=0, this returns the value from the rank 0 process. Note that the returned value should be serializable using cloudpickle.

Releases

Visit Github Release Page to check the release notes.

License

The source code is released under the Apache License 2.0 (see the LICENSE file).

Comments

Can't import sparkdl with spark-deep-learning-assembly-0.1.0-spark2.1.jar

First of all, thank you for a great library!

I tried to use sparkdl in PySpark, but couldn't import sparkdl. Detailed procedure is as follows:

# make sparkdl jar
build/sbt assembly

# run pyspark with sparkdl
pyspark --master local[4] --jars target/scala-2.11/spark-deep-learning-assembly-0.1.0-spark2.1.jar

# import sparkdl
import sparkdl
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named sparkdl

After digging a few places, I found that it works if I deflate the jar file as follows.

cd target/scala-2.11
mkdir tmp
cp spark-deep-learning-assembly-0.1.0-spark2.1.jar tmp/
cd tmp
jar xf spark-deep-learning-assembly-0.1.0-spark2.1.jar

pyspark --jars spark-deep-learning-assembly-0.1.0-spark2.1.jar

import sparkdl
Using TensorFlow backend.

Edited-1 : The second method works only in the directory where the jar file is deflated.

Best wishes, HanCheol

opened by priancho 14

Porting Keras Estimator API and Reference Implementation
What changes are proposed in this pull request?

Creating a Spark MLlib Estimator API for Keras models, with a reference implementation. It provides a taste of how to ingest Image from URI in a DataFrame and use them to train a Keras model.

The changes consist of these components.

Extracted a few Params types for Keras Transformers/Estimators.

Keras utilities

Serialization: model <=> hdf5 <=> bytes (for broadcast)

Check avaialble Keras options (optimizers, loss functions, etc.)

Keras Estimator.

How is this patch tested?

[x] Unit tests

[x] Manual tests
opened by phi-dbq 11
Not able to import sparkdl in jupyter notebook

Hi,

I am trying to use this library in jupyter notebook, but I am getting error "no module found".

When I am running the below command pyspark --packages databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11 I am able to import sparkdl in the spark shell.

How can I use it in jupyter notebook?

opened by yashwanthmadaka24 7
Support and build against Keras 2.2.2 and TF 1.10.0
bump spark version to 2.3.1

bump tensorframes version to 0.4.0

bump keras==2.2.2 and tensorflow==1.10.0 to fix travis issues

TF_C_API_GRAPH_CONSTRUCTION added as a temp fix

Drop support for Spark <2.3 and hence Scala 2.10

add python3 friendly print

add pooling='avg' in resnet50 testing model beccause keras api changed

test arrays almost equal with whatever precision 5 in NamedImageTransformerBaseTestCase, test_bare_keras_module, keras_load_and_preproc

make keras model smaller in test_simple_keras_udf

This is a continued work from https://github.com/databricks/spark-deep-learning/pull/149.
opened by lu-wang-dl 6
Fix KerasImageFileEstimator for model tuning
add test for KerasImageFileEstimator used with CrossValidator

fix pickling issue

use new fitMultiple api and ensure thread safety

fix tests to reflect new api

bugfix setDefault

bugfix HasOutputMode

remove _validateParams

avoid testing KIFT functionality in KIFEst tests
opened by yogeshg 6
Replace sparkdl's ImageSchema with Spark2.3's version
Use Spark 2.3's ImageSchema as image interface.

the biggest change is using opposite ordering of color channels - BGR instead of RGB, requires extra reordering in couple of places. -preserved ability to read and resize images in python using PIL to match Keras (resize gives different result but also reading jpegs produced images which were off by 1 on some green pixels)

needed few tweeks to run with spark 2.3 - notably UDFs are now referenced by SQL identifier and can not have dash as part of the name

[TODO] - In order to run on spark < 2.3, the image schema files have been copied here and need to be removed in the future.
opened by tomasatdatabricks 6
TensorFlow Graph Transformer Part-1: Params and Converters
This is the first part of PRs from the TF Transformer API design POC PR.

It introduces parameters needed for the TFTransformer (minus those for TFInputGraph) and corresponding type converters.

Introducing MLlib model Params for TFTransformer.

Type conversion utilities for the introduced MLlib Params used in Spark Deep Learning Pipelines.

We follow the convention of MLlib to name these utilities "converters", but most of them act as type checkers that return the argument if it is the desired type and raise TypeError otherwise.

We follow the practice that a type converter only returns a single type.
opened by phi-dbq 6
Add style checks and refactor suggestions
In this PR, we

add python/.pylint/suggested.rc adapted from the default configuration generated by pylint

allow both camelCase and snake_case using regexes lifted from pylint source code

increase thresholds for number of arguments, local, variables

disable checks that are used often in this project: unused-argument, too-many-arguments, no-member, missing-docstring, no-init, protected-access, misplaced-comparison-constant, no-else-return, fixme

escape some code with # pylint: disable=... because it was hard to refactor without thorough testing

Some style decisions that were discussed are:

disables are acceptable if there is no other way to do this, in which case a comment must be left explaining that

other disables should be removed and should be considered similar to todos

we allow todo marks in code because these are acceptable for this project and should be taken care of in future

there are 50 todos, fixmes or pylint disables currently, we should aim to bring this down find python/sparkdl | grep ".*\.py$" | xargs egrep -ino --color=auto "(TODO|FIXME|# pylint).*"

function calls and function defintions that span more than 1 line are left to committer and reviewer's discretion

pep8 style:

long_function_name( long_argument_one = "one", long_argument_two = "two", long_argument_three = "three", long_argument_four = "four", long_argument_five = "five")

MLlib style:

long_function_name( long_argument_one = "one", long_argument_two = "two", long_argument_three = "three", long_argument_four = "four", long_argument_five = "five")
opened by yogeshg 5
Fix bug in conversions from row image to/from BufferedImage

Fix bug in conversions to/from BufferedImage in which we copied raw byte data to BufferedImage rasters using the wrong channel ordering. The fix in this PR is to use BufferedImage.setRGB, BufferedImage.getRGB APIs instead of accessing image raster data directly for three and four-channel images.

Also enhanced an existing unit test to verify that we correctly convert from row image to BufferedImage for one and four-channel images.

opened by smurching 5
Update ImageUtils to support resizing one, three, or four channel images
This PR:

Updates conversions from row image to/from Buffered image (spImageToBufferedImage and spImageFromBufferedImage) to support one, three, and four channel images

Updates resizeImage to use the tgtChannels parameter to determine the number of channels in the output image instead of defaulting to three output channels

Updates existing tests to verify that resizing, conversions to/from BufferedImage work for one, three, and four-channel images
opened by smurching 5
Make python DeepImageFeaturizer use Scala version.
Based of Image schema PR, do not merge until Image schema is merged.

Otherwise mostly straightforward except results will not match keras in general due to different image libraries
opened by tomasatdatabricks 5

sparkdl.xgboost getting stuck trying to map partitions

I am running the following code to try to fit a model

from sparkdl.xgboost import XgboostClassifier
param = {
    'num_workers': 4, # number of workers on the cluster, adjust as needed
  'missing': 0,
    "objective": "binary:logistic",
    "eval_metric": "logloss",
      'featuresCol':"features", 
      'labelCol':"objective",
      'nthread':32 # equal to the number of cpus on each worker machine
}
  
train, test = data.randomSplit([0.001, 0.001])
xgb_classifier = XgboostClassifier(**param)
xgb_clf_model = xgb_classifier.fit(train)

When I run the model training on my databricks cluster is seems to be getting stuck when it is trying to map partitions. It is using almost zero cpu on each cluster but the memory usage is slowly increasing.

is there anything I can do to get around this issue

opened by timpiperseek 0

Need to modify kdl/transformers/keras_applications to be able to use resnet50

Hi,

Per this overflow question one needs to modify /home/user/.local/lib/python3.8/site-packages/sparkdl/transformers/keras_applications.py . This happened in databricks using v 10.3.

Have to change from keras.applications import inception_v3, xception, resnet50

to

from keras.applications import inception_v3, xception from tensorflow.keras.applications import resnet50

opened by yobdoy 1
Plugin Help with Spark framework

https://github.com/hongzimao/decima-sim would you like to help me to integrate this deep learning model into your pipeline> how can I integrate or plug it with your frameworks?

opened by jahidhasanlinix 0

Necessary imports not included in setup.py

Hi,

I'm developing a neural network using Pytorch in a non-databricks cluster to ensure its functionality prior migrating to a databricks cluster.

Since I'm using Pytorch, I don't need Keras or TensorFlow. I installed successfully Horovod and Sparkdl, however, when I try to run the Spark process I found (for now) three consecutive exceptions related to missing dependencies:

    from sparkdl import HorovodRunner
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/__init__.py", line 17, in <module>
    from sparkdl.transformers.keras_image import KerasImageFileTransformer
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/keras_image.py", line 16, in <module>
    import keras.backend as K
  File "/opt/conda/default/lib/python3.8/site-packages/keras/__init__.py", line 21, in <module>
    from tensorflow.python import tf2
ModuleNotFoundError: No module named 'tensorflow'

    from sparkdl import HorovodRunner
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/__init__.py", line 17, in <module>
    from sparkdl.transformers.keras_image import KerasImageFileTransformer
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/keras_image.py", line 16, in <module>
    import keras.backend as K
ModuleNotFoundError: No module named 'keras'

This one is DEPRECATED!!:

    from sparkdl import HorovodRunner
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/__init__.py", line 17, in <module>
    from sparkdl.transformers.keras_image import KerasImageFileTransformer
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/keras_image.py", line 27, in <module>
    from sparkdl.transformers.tf_image import TFImageTransformer
  File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/tf_image.py", line 18, in <module>
    import tensorframes as tfs
ModuleNotFoundError: No module named 'tensorframes'

On one hand, I don't understand why should I need these dependencies if I'm not going to use them... Shouldn't it be checked and disabled instead of forcing it to be installed?

On the other hand, if those dependencies are unavoidable, they should be included in the setup.py script to avoid having these errors and losing time, since installing Horovod packages in an ephemeral cluster takes a lot of time just to discover that you cannot run the program...

I'm sure I won't have a problem in a Databricks cluster, but I cannot use it yet and that shouldn't be a problem to test HorovodRunner functionality as stated in the warning message when running a program in a non-databricks cluster...

Kind regards

opened by carlosfrutos 0

I find it so many ‘spark-deep-learning’

I find it so many ‘spark-deep-learning’, such as : elephas：https://github.com/maxpumperla/elephas dist-keras：https://github.com/cerndb/dist-keras sparknet：https://github.com/amplab/sparknet dl4j：https://github.com/deeplearning4j/dl4j-spark-ml TensorFlowOnSpark：https://github.com/yahoo/TensorFlowOnSpark spark-deep-learning：https://github.com/databricks/spark-deep-learning H2O：https://github.com/h2oai/sparkling-water/tree/master/ BigDL：https://github.com/intel-analytics/BigDL analytics-zoo：https://github.com/intel-analytics/analytics-zoo

It looks like BigDL is the most active one. I want to start my DeepLearning on spark by using spark-deep-learning, but I afraid others will popular than databricks.spark-deep-learning. So I still hesitate which one to choice.

opened by shuDaoNan9 1

Releases(v1.6.0)

v1.6.0(Jan 8, 2020)
Deprecated some submodules and classes

Updated dependencies

Source code(tar.gz)
Source code(zip)
v1.5.0(Jan 25, 2019)
HorovodRunner's main function can now have a return value.

Better support for multiple GPU machines (Databricks only version).

Source code(tar.gz)
Source code(zip)
v1.4.0(Nov 18, 2018)
Upgrade to TensorFlow 1.12.0.

Source code(tar.gz)
Source code(zip)
v1.3.0(Nov 13, 2018)
Added HorovovodRunner API.

Simplified test and doc build w/ Docker and conda.

Updated public Python API docs.

Removed persistence from DeepImageFeaturizer.

Source code(tar.gz)
Source code(zip)
v1.2.0(Aug 28, 2018)
ignore nullable in DeepImageFeaturizer.validateSchema

upgrade TensorFrames version to 0.5.0

upgrade Tensorflow version to 1.10.0 and Keras version to 2.2.2

Source code(tar.gz)
Source code(zip)
v1.1.0(Jun 18, 2018)
keras_image_file_estimator support both sparse and dense vectors

upgrade TensorFrames version to 0.4.0

add style checks to Travis CI

doc fixes

Source code(tar.gz)
Source code(zip)
v1.0.0(May 1, 2018)
This is the 1.0.0 release. It brings compatibility with newer versions of Spark (2.3) and Tensorflow (1.6+). The custom image schema formerly defined in this package has been replaced with Spark's ImageSchema so there may be some breaking changes when updating to this version.

Notable changes:

(breaking change) Using the definition of images from Spark 2.3.0. The new definition uses the BGR channel ordering for 3-channel images instead of the RGB ordering used in this project before the change.

Persistence for DeepImageFeaturizer (both Python and Scala).

Source code(tar.gz)
Source code(zip)
v0.3.0(Jan 30, 2018)
This is the final release of dl-pipelines prior to migrating to new ImageSchema.

Notable changes:

Added vgg16, vgg19 models to DeepImageFeaturizer/DeepImagePredictor (Python).

Added a Scala API for DeepImageFeaturizer (for transfer learning for images).

Added TFTransformer and KerasTransformer for applying TensorFlow graphs or TensorFlow-backed Keras models to a column of arrays in a Spark DataFrame.

Source code(tar.gz)
Source code(zip)
v0.2.0(Oct 31, 2017)
This is the final release for Deep Learning Pipelines 0.2.0

Notable additions since 0.1.0:

KerasImageFileEstimator API (train a Keras model on image files)

SQL UDF support for Keras models

Added Xception, Resnet50 models to DeepImageFeaturizer/DeepImagePredictor.

Source code(tar.gz)
Source code(zip)

Owner

Databricks

Helping data teams solve the world’s toughest problems using data and AI

GitHub https://databricks.github.io/spark-deep-learning

Objax Apache-2Objax (🥉19 · ⭐ 580) - Objax is a machine learning framework that provides an Object.. Apache-2 jax

Objax Tutorials | Install | Documentation | Philosophy This is not an officially supported Google product. Objax is an open source machine learning fr

729 Jan 2, 2023

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo

781 Jan 3, 2023

Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

1.6k Jan 5, 2023

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

8.9k Dec 30, 2022

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

21 Dec 14, 2022

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines. We've created a system in which you can easily select and

Medical Machine Learning Lab - University of Münster

57 Nov 12, 2022

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community

23.6k Dec 31, 2022

Serverless proxy for Spark cluster

Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f

317 Dec 1, 2022

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

20.6k Feb 13, 2021

City-seeds - A random generator of cultural characteristics intended to spark ideas and help draw threads

City Seeds This is a random generator of cultural characteristics intended to sp

2 Mar 12, 2022

Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

deep_autoviml Build keras pipelines and models in a single line of code! Table of Contents Motivation How it works Technology Install Usage API Image

102 Dec 17, 2022

🤗 Push your spaCy pipelines to the Hugging Face Hub

spacy-huggingface-hub: Push your spaCy pipelines to the Hugging Face Hub This package provides a CLI command for uploading any trained spaCy pipeline

30 Oct 9, 2022

AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

96 Dec 23, 2022

The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines.

The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines. It includes tools for downloading pipelines and their dependencies and tools for measuring their performace.

8 Dec 4, 2022

Apache Flink

Apache Flink Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Learn more about Flin

20.4k Dec 30, 2022

Keeper for Ricochet Protocol, implemented with Apache Airflow

Ricochet Keeper This repository contains Apache Airflow DAGs for executing keeper operations for Ricochet Exchange. Usage You will need to run this us

5 May 24, 2022

Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Graph Notebook: easily query and visualize graphs The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Us

501 Dec 28, 2022

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases. Ivy wraps the functional APIs of existing frameworks. Framework-agnostic functions, libraries and layers can then be written using Ivy, with simultaneous support for all frameworks. Ivy currently supports Jax, TensorFlow, PyTorch, MXNet and Numpy. Check out the docs for more info!

8.2k Jan 2, 2023

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

Deep-rPPG: Camera-based pulse estimation using deep learning tools Deep learning (neural network) based remote photoplethysmography: how to extract pu

138 Dec 17, 2022

Deep Learning Pipelines for Apache Spark

Related tags

Overview

Deep Learning Pipelines for Apache Spark

API Documentation

class sparkdl.HorovodRunner(*, np, driver_log_verbosity='all')

run(main, **kwargs)

Releases

License

Comments

What changes are proposed in this pull request?

How is this patch tested?

Releases(v1.6.0)

v1.6.0(Jan 8, 2020)

v1.5.0(Jan 25, 2019)

v1.4.0(Nov 18, 2018)

v1.3.0(Nov 13, 2018)

v1.2.0(Aug 28, 2018)

v1.1.0(Jun 18, 2018)

v1.0.0(May 1, 2018)

v0.3.0(Jan 30, 2018)

v0.2.0(Oct 31, 2017)

Owner

Databricks

Objax Apache-2Objax (🥉19 · ⭐ 580) - Objax is a machine learning framework that provides an Object.. Apache-2 jax

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Distributed Deep learning with Keras & Spark

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Serverless proxy for Spark cluster

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

City-seeds - A random generator of cultural characteristics intended to spark ideas and help draw threads

Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

🤗 Push your spaCy pipelines to the Hugging Face Hub

AI pipelines for Nvidia Jetson Platform

The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines.

Apache Flink

Keeper for Ricochet Protocol, implemented with Apache Airflow

Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools