292 Python DistilBert-offline-pipeline Libraries

Multi-Branch CI/CD Pipeline using CDK Pipelines.

Using AWS CDK Pipelines and AWS Lambda for multi-branch pipeline management and infrastructure deployment. This project shows how to use the AWS CDK P

36 Dec 23, 2022

Pipelines de datos, 2021.

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi. Stack princip

8 May 19, 2022

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 9, 2023

Shotgrid Toolkit Engine for Gaffer

Shotgun toolkit engine for Gaffer Contact : Diego Garcia Huerta Overview Implementation of a shotgun engine for Gaffer. It supports the classic bootst

12 May 21, 2022

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

1.2k Jan 1, 2023

A bot which provides online/offline and player status for Thicc SMP, using Replit.

AlynaaStatus A bot which provides online/offline and player status for Thicc SMP. Currently being hosted on Replit. How to use? Create a repl on Repli

8 Dec 15, 2022

Know your customer pipeline in apache air flow

KYC_pipline Know your customer pipeline in apache air flow For a successful pipeline run take these steps: Run you Airflow server Admin - connection

4 Aug 1, 2022

An adaptable Snakemake workflow which uses GATKs best practice recommendations to perform germline mutation calling starting with BAM files

Germline Mutation Calling This Snakemake workflow follows the GATK best-practice recommandations to call small germline variants. The pipeline require

12 Dec 24, 2022

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

74 Oct 7, 2022

A production-ready pipeline for text mining and subject indexing

12 Nov 6, 2022

Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.

Airflow on Docker in EC2 + GitLab's CI/CD Personal project for simple data pipeline using Airflow. Airflow will be installed inside Docker container,

13 Nov 29, 2022

Pyan3 - Offline call graph generator for Python 3

Pyan takes one or more Python source files, performs a (rather superficial) static analysis, and constructs a directed graph of the objects in the combined source, and how they define or use each other. The graph can be output for rendering by GraphViz or yEd.

235 Jan 2, 2023

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

Django REST Pandas Django REST Framework + pandas = A Model-driven Visualization API Django REST Pandas (DRP) provides a simple way to generate and se

1.2k Jan 1, 2023

Pipeline is an asset packaging library for Django.

Pipeline Pipeline is an asset packaging library for Django, providing both CSS and JavaScript concatenation and compression, built-in JavaScript templ

1.4k Aug 29, 2021

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

80 Jan 3, 2023

Pokemon catch events project to demonstrate data pipeline on AWS

Pokemon Catches Data Pipeline This is a sample project to practice end-to-end data project; Terraform is used to deploy infrastructure; Kafka is the t

4 Sep 3, 2021

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

A Python framework for creating reproducible, maintainable and modular data science code.

7.9k Jan 1, 2023

Conda package for artifact creation that enables offline environments. Ideal for air-gapped deployments.

Conda-Vendor Conda Vendor is a tool to create local conda channels and manifests for vendored deployments Installation To install with pip, run: pip i

13 Nov 17, 2022

Mangá downloader (para leitura offline) voltado para sites e scans brasileiros.

yonde! yonde! (読んで!) é um mangá downloader (para leitura offline) voltado para sites e scans brasileiros. Também permite que você converta os capítulo

8 Nov 28, 2021

LGVL helper script to batch and convert with lvgl offline image converter

script to batch and convert with lvgl offline image converter

1 Oct 5, 2022

A scalable implementation of WobblyStitcher for 3D microscopy images

WobblyStitcher Introduction A scalable implementation of WobblyStitcher Dependencies $ python -m pip install numpy scikit-image Visualization ImageJ

7 Jul 25, 2022

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm

86 Dec 7, 2022

Pipeline for fast building text classification TF-IDF + LogReg baselines.

Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif

57 Dec 7, 2022

CPOST is a CLI tool to assist with the proper sizing of Clara Deploy pipelines

CPOST (Clara Pipeline Operator Sizing Tool) Tool to measure resource usage of Clara Platform pipeline operators Cpost is a tool that will help you run

5 Sep 27, 2021

MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

MINERVA is an out-of-the-box GUI tool for offline deep reinforcement learning, designed for everyone including non-programmers to do reinforcement learning as a tool.

80 Nov 6, 2022

This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.

Sample streaming Dataflow pipeline written in Python This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, readin

9 Mar 18, 2022

A robust pointcloud registration pipeline based on correlation.

PHASER: A Robust and Correspondence-Free Global Pointcloud Registration Ubuntu 18.04+ROS Melodic: Overview Pointcloud registration using correspondenc

101 Dec 1, 2022

Procedural 3D data generation pipeline for architecture

Synthetic Dataset Generator Authors: Stanislava Fedorova Alberto Tono Meher Shashwat Nigam Jiayao Zhang Amirhossein Ahmadnia Cecilia bolognesi Dominik

49 Nov 25, 2022

Finetuning Pipeline

KLUE Baseline Korean(한국어) KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark. See our paper fo

74 Dec 13, 2022

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

SOFA This repository is the implementation of SOFA, the Simulator for OFfline leArning and evaluation. Keeping Dataset Biases out of the Simulation: A

22 Nov 23, 2022

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R

14 Dec 17, 2022

Distributed DataLoader For Pytorch Based On Ray

Dpex——用户无感知分布式数据预处理组件一、前言随着GPU与CPU的算力差距越来越大以及模型训练时的预处理Pipeline变得越来越复杂，CPU部分的数据预处理已经逐渐成为了模型训练的瓶颈所在，这导致单机的GPU配置的提升并不能带来期望的线性加速。预处理性能瓶颈的本质在于每个GPU能够使用的C

23 Nov 2, 2022

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

simple_diarizer Simplified diarization pipeline using some pretrained models. Made to be a simple as possible to go from an input audio file to diariz

65 Dec 30, 2022

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline

193 Dec 22, 2022

This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

Polygonal Building Segmentation by Frame Field Learning We add a frame field output to an image segmentation neural network to improve segmentation qu

186 Jan 4, 2023

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline For NLP with Bloom's Taxonomy Using Improved Question Classification and Question Generation using Deep Learning This repository contains all

9 Jul 17, 2021

Pipeline for chemical image-to-text competition

BMS-Molecular-Translation Introduction This is a pipeline for Bristol-Myers Squibb – Molecular Translation by Vadim Timakin and Maksim Zhdanov. We got

7 Sep 20, 2022

Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

A Minimalist Approach to Offline Reinforcement Learning TD3+BC is a simple approach to offline RL where only two changes are made to TD3: (1) a weight

193 Dec 23, 2022

Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Journey Towards Tiny Perceptual Super-Resolution Test code for our ECCV2020 paper: https://arxiv.org/abs/2007.04356 Our x4 upscaling pre-trained model

6 Mar 30, 2022

Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Clairvoyance: A Pipeline Toolkit for Medical Time Series Authors: van der Schaar Lab This repository contains implementations of Clairvoyance: A Pipel

$van_der_Schaar \LAB$ 89 Dec 7, 2022

Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline

upai-gst-dl-plugins Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline Introduction Thanks to the work done by @j

11 Dec 11, 2022

A declarative Kubeflow Management Tool inspired by Terraform

🍭 KRSH is Alpha version, so many bugs can be reported. If you find a bug, please write an Issue and grow the project together! A declarative Kubeflow

128 Oct 18, 2022

Simple, hackable offline speech to text - using the VOSK-API.

Nerd Dictation Offline Speech to Text for Desktop Linux. This is a utility that provides simple access speech to text for using in Linux without being

844 Jan 7, 2023

Procedural 3D data generation pipeline for architecture

Synthetic Dataset Generator Authors: Stanislava Fedorova Alberto Tono Meher Shashwat Nigam Jiayao Zhang Amirhossein Ahmadnia Cecilia bolognesi Dominik

49 Nov 25, 2022

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

988 Jan 4, 2023

[CVPR-2021] UnrealPerson: An adaptive pipeline for costless person re-identification

UnrealPerson: An Adaptive Pipeline for Costless Person Re-identification In our paper (arxiv), we propose a novel pipeline, UnrealPerson, that decreas

70 Oct 10, 2022

This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I recreated the below pipeline.

AWS Data Engineering Pipeline This is a repository for the Duke University Cloud Computing course project on Serverless Data Engineering Pipeline. For

15 Jul 28, 2021

Focus on Algorithm Design, Not on Data Wrangling

The dataTap Python library is the primary interface for using dataTap's rich data management tools. Create datasets, stream annotations, and analyze model performance all with one library.

37 Nov 25, 2022

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

ETL Pipeline with Airflow, Spark, s3, MongoDB and Amazon Redshift

214 Jan 2, 2023

💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!

LocalStack - A fully functional local AWS cloud stack LocalStack provides an easy-to-use test/mocking framework for developing Cloud applications. Cur

45.3k Jan 2, 2023

Socorro is the Mozilla crash ingestion pipeline. It accepts and processes Breakpad-style crash reports. It provides analysis tools.

Socorro Socorro is a Mozilla-centric ingestion pipeline and analysis tools for crash reports using the Breakpad libraries. Support This is a Mozilla-s

552 Dec 19, 2022

git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser Abstract The success of deep denoisers on real-world colo

51 Nov 22, 2022

Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. I

2.4k Jan 2, 2023

(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

combo: A Python Toolbox for Machine Learning Model Combination Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License

606 Dec 21, 2022

PyTorch extensions for fast R&D prototyping and Kaggle farming

Pytorch-toolbelt A pytorch-toolbelt is a Python library with a set of bells and whistles for PyTorch for fast R&D prototyping and Kaggle farming: What

1.3k Jan 5, 2023

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi

4.2k Jan 8, 2023

Let your friends know when you are online and offline xD

Twitter Last Seen Activity Let your friends know when you are online and offline Laser-light eyes when online Last seen is mentioned in user bio Also

12 Aug 16, 2021

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Deep GNN, Shallow Sampling Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, R

117 Dec 20, 2022

An expandable and scalable OCR pipeline

Overview Nidaba is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable

81 Jan 4, 2023

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

DeepSceneTextReader This is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images. Prerequsites The proj

49 Sep 10, 2022

End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

89 Aug 4, 2022

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Toy Machine Learning Pipeline Table of Contents About Getting Started ML task description and evaluation procedure Dataset description Repository stru

190 Dec 21, 2022

BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

BatchFlow BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflo

185 Dec 20, 2022

Easy pipelines for pandas DataFrames.

pdpipe ˨ Easy pipelines for pandas DataFrames (learn how!). Website: https://pdpipe.github.io/pdpipe/ Documentation: https://pdpipe.github.io/pdpipe/d

694 Jan 5, 2023

MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

1.4k Jan 6, 2023

DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

71 Jan 6, 2023

Data intensive science for everyone.

The latest information about Galaxy can be found on the Galaxy Community Hub. Community support is available at Galaxy Help. Galaxy Quickstart Galaxy

1k Jan 8, 2023

Kolibri: the offline app for universal education

Kolibri This repository is for software developers wishing to contribute to Kolibri. If you are looking for help installing, configuring and using Kol

564 Jan 2, 2023

Easy-to-use and powerful offline translation tool

Introduction Virtaal is a graphical program for doing translation. It is meant to be easy to use and powerful at the same time. Although the initial f

271 Nov 22, 2022

Read/sync your IMAP mailboxes (python2)

Upstream status (master branch): Upstream status (next branch): Financial contributors: Links: Official github code repository: offlineimap Website: w

1.7k Dec 29, 2022

RedNotebook is a cross-platform journal

RedNotebook RedNotebook is a modern desktop journal. It lets you format, tag and search your entries. You can also add pictures, links and customizabl

417 Dec 28, 2022

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

2.1k Feb 13, 2021

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

831 Feb 17, 2021

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

903 Feb 17, 2021

✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

2.2k Feb 18, 2021

Speech recognition module for Python, supporting several engines and APIs, online and offline.

SpeechRecognition Library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/

6.7k Jan 8, 2023

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

20.8k Jan 3, 2023

An audio digital processing toolbox based on a workflow/pipeline principle

AudioTK Audio ToolKit is a set of audio filters. It helps assembling workflows for specific audio processing workloads. The audio workflow is split in

238 Oct 18, 2022

PipeLayer is a lightweight Python pipeline framework

PipeLayer is a lightweight Python pipeline framework. Define a series of steps, and chain them together to create modular applications

64 Jul 21, 2022

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

2.7k Jan 8, 2023

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

1.3k Jan 3, 2023

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

1.2k Jan 8, 2023

✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

2.6k Jan 4, 2023

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

6.4k Jan 9, 2023

MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)

mongo-connector The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Pl

1.9k Jan 4, 2023

Pipeline is an asset packaging library for Django.

Pipeline Pipeline is an asset packaging library for Django, providing both CSS and JavaScript concatenation and compression, built-in JavaScript templ

1.4k Jan 3, 2023

Read/sync your IMAP mailboxes (python2)

Upstream status (master branch): Upstream status (next branch): Financial contributors: Links: Official github code repository: offlineimap Website: w

1.7k Dec 29, 2022

Soda SQL Data testing, monitoring and profiling for SQL accessible data.

Soda SQL Data testing, monitoring and profiling for SQL accessible data. What does Soda SQL do? Soda SQL allows you to Stop your pipeline when bad dat

51 Jan 1, 2023

Pipeline is an asset packaging library for Django.

Pipeline Pipeline is an asset packaging library for Django, providing both CSS and JavaScript concatenation and compression, built-in JavaScript templ

1.4k Dec 26, 2022

Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents

2.4k Dec 31, 2022

A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.

Neuraxle Pipelines Code Machine Learning Pipelines - The Right Way. Neuraxle is a Machine Learning (ML) library for building machine learning pipeline

555 Dec 24, 2022

Lightweight, Python library for fast and reproducible experimentation :microscope:

Steppy What is Steppy? Steppy is a lightweight, open-source, Python 3 library for fast and reproducible experimentation. Steppy lets data scientist fo

134 Jul 10, 2022

Python DistilBert-offline-pipeline Resources

Python DistilBert-offline-pipeline Libraries

Multi-Branch CI/CD Pipeline using CDK Pipelines.

Pipelines de datos, 2021.

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Shotgrid Toolkit Engine for Gaffer

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

A bot which provides online/offline and player status for Thicc SMP, using Replit.

Know your customer pipeline in apache air flow

An adaptable Snakemake workflow which uses GATKs best practice recommendations to perform germline mutation calling starting with BAM files

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

A production-ready pipeline for text mining and subject indexing

Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.

Pyan3 - Offline call graph generator for Python 3

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

Pipeline is an asset packaging library for Django.

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

Pokemon catch events project to demonstrate data pipeline on AWS

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

Conda package for artifact creation that enables offline environments. Ideal for air-gapped deployments.

Mangá downloader (para leitura offline) voltado para sites e scans brasileiros.

LGVL helper script to batch and convert with lvgl offline image converter

A scalable implementation of WobblyStitcher for 3D microscopy images

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

Pipeline for fast building text classification TF-IDF + LogReg baselines.

CPOST is a CLI tool to assist with the proper sizing of Clara Deploy pipelines

MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.

A robust pointcloud registration pipeline based on correlation.

Procedural 3D data generation pipeline for architecture

Finetuning Pipeline

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Distributed DataLoader For Pytorch Based On Ray

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline for chemical image-to-text competition

Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline

A declarative Kubeflow Management Tool inspired by Terraform

Simple, hackable offline speech to text - using the VOSK-API.

Procedural 3D data generation pipeline for architecture

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

[CVPR-2021] UnrealPerson: An adaptive pipeline for costless person re-identification

This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I recreated the below pipeline.

Focus on Algorithm Design, Not on Data Wrangling

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!

Socorro is the Mozilla crash ingestion pipeline. It accepts and processes Breakpad-style crash reports. It provides analysis tools.

git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

PyTorch extensions for fast R&D prototyping and Kaggle farming

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Let your friends know when you are online and offline xD

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

An expandable and scalable OCR pipeline

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

End-to-end pipeline for real-time scene text detection and recognition.

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

Easy pipelines for pandas DataFrames.

MLBox is a powerful Automated Machine Learning python library.

DaCy: The State of the Art Danish NLP pipeline using SpaCy

Data intensive science for everyone.

Kolibri: the offline app for universal education

Easy-to-use and powerful offline translation tool

Read/sync your IMAP mailboxes (python2)

RedNotebook is a cross-platform journal

Text preprocessing, representation and visualization from zero to hero.

A full spaCy pipeline and models for scientific/biomedical documents.

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

✨Fast Coreference Resolution in spaCy with Neural Networks

Speech recognition module for Python, supporting several engines and APIs, online and offline.

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

An audio digital processing toolbox based on a workflow/pipeline principle