186 Repositories
Python kafka-to-spark-streaming Libraries
Real-time Object Detection for Streaming Perception, CVPR 2022
StreamYOLO Real-time Object Detection for Streaming Perception Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Sun Jian Real-time Object Detection
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
Streamify A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more! Description Objective The project will stre
🎥 Stream your favorite movie from the terminal!
Stream-Cli stream-cli is a Python scrapping CLI that combine scrapy and webtorrent in one command for streaming movies from your terminal. Installatio
Free Data Engineering course!
Data Engineering Zoomcamp Register in DataTalks.Club's Slack Join the #course-data-engineering channel The videos are published to DataTalks.Club's Yo
Data stream analytics: Implement online learning methods to address concept drift in data streams using the River library. Code for the paper entitled "PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams" accepted in IEEE GlobeCom 2021.
PWPAE-Concept-Drift-Detection-and-Adaptation This is the code for the paper entitled "PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT
A CLI tool to transfer, sync, and backup playlists on music streaming services
unitunes A command-line interface tool to manage playlists across music streaming services. Introduction unitunes manages playlists across streaming s
Shared, streaming Python dict
UltraDict Sychronized, streaming Python dictionary that uses shared memory as a backend Warning: This is an early hack. There are only few unit tests
Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along with material in the form of Jupyter Notebooks.
Databricks Certification Spark Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along
Containerized Demo of Apache Spark MLlib on a Data Lakehouse (2022)
Spark-DeltaLake-Demo Reliable, Scalable Machine Learning (2022) This project was completed in an attempt to become better acquainted with the latest b
Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogramas anuais com spark, em pyspark e SQL!
Olá! Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogr
Telegram Group Calls Streaming bot with some useful features, written in Python with Pyrogram and Py-Tgcalls. Supporting platforms like Youtube, Spotify, Resso, AppleMusic, Soundcloud and M3u8 Links.
Yukki Music Bot Yukki Music Bot is a Powerful Telegram Music+Video Bot written in Python using Pyrogram and Py-Tgcalls by which you can stream songs,
🎥 Stream your favorite movie from the terminal!
Stream-Cli stream-cli is a Python scrapping CLI that combine scrapy and webtorrent in one command for streaming movies from your terminal. Installatio
An open souce video/music streamer based on MPV and piped.
🎶 Harmony Music An easy way to stream videos or music from Youtube from the command line while regaining your privacy. 📖 Table Of Contents ❔ What's
Securetar - A streaming wrapper around python tarfile and allow secure handling files and support encryption
Secure Tar Secure Tarfile library It's a streaming wrapper around python tarfile
Python-kafka-reset-consumergroup-offset-example - Python Kafka reset consumergroup offset example
Python Kafka reset consumergroup offset example This is a simple example of how
Search & download music from a certain streaming service
Search & download music from a certain streaming service
Nonton anime subtitle Indonesia tanpa iklan. Dengan GUI berbasis PyQt5 dan spaghetti code yang sangat tidak terstruktur
Nonton anime subtitle Indonesia tanpa iklan. Dengan GUI berbasis PyQt5 dan spaghetti code yang sangat tidak terstruktur
A framework for GPU based high-performance medical image processing and visualization
FAST is an open-source cross-platform framework with the main goal of making it easier to do high-performance processing and visualization of medical images on heterogeneous systems utilizing both multi-core CPUs and GPUs. To achieve this, FAST use modern C++, OpenCL and OpenGL.
🏅 Top 5% in 제2회 연구개발특구 인공지능 경진대회 AI SPARK 챌린지
AI_SPARK_CHALLENG_Object_Detection 제2회 연구개발특구 인공지능 경진대회 AI SPARK 챌린지 🏅 Top 5% in mAP(0.75) (443명 중 13등, mAP: 0.98116) 대회 설명 Edge 환경에서의 가축 Object Dete
An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.
An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.
Stream-Cli application that allow you to play your favorite movies from the terminal
Stream-Cli application that allow you to play your favorite movies from the terminal
TG-Streaming-bot - TG Simple Streaming bot
TG Simple Streaming bot telegram video straming bot 🎚️ Features Play youtube li
Yts-cli-streamer - A CLI movie streaming client which works on yts.mx API written in python
YTSP It is a CLI movie streaming client which works on yts.mx API written in pyt
LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.
LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.
A data structure that extends pyspark.sql.DataFrame with metadata information.
MetaFrame A data structure that extends pyspark.sql.DataFrame with metadata info
PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra
PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra The purpose of this project is to demonstrate a structured streaming pipeline with Apache
Django-fast-export - Utilities for quickly streaming CSV responses to the client
django-fast-export Utilities for quickly streaming CSV responses to the client T
Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms
scikit-event-correlation Event Correlation and Changing Detection Algorithm Theo
Minecraft - Online Players Overlay Generator
Minecraft - Online Players Overlay Generator Contents About Quick Start Download Pre-Built Binary Run from Source Configuration Command-Line Options F
Event-driven-model-serving - Unified API of Apache Kafka and Google PubSub
event-driven-model-serving Unified API of Apache Kafka and Google PubSub 1. Proj
A simple healthcheck wrapper to monitor Kafka.
kafka-healthcheck A simple healthcheck wrapper to monitor Kafka. Kafka Healthcheck is a simple server that provides a singular API endpoint to determi
DeepFaceLive - Live Deep Fake in python, Real-time face swap for PC streaming or video calls
DeepFaceLive - Live Deep Fake in python, Real-time face swap for PC streaming or video calls
Elkeid HUB - A rule/event processing engine maintained by the Elkeid Team that supports streaming/offline data processing
Elkeid HUB - A rule/event processing engine maintained by the Elkeid Team that supports streaming/offline data processing
SynapseML - an open source library to simplify the creation of scalable machine learning pipelines
Synapse Machine Learning SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. Sy
An anime themed telegram bot that can convert telegram media.
ShoukoKomiRobot • 𝕎𝕣𝕚𝕥𝕥𝕖𝕟 𝕀𝕟 Python3 • 𝕃𝕚𝕓𝕣𝕒𝕣𝕪 𝕌𝕤𝕖𝕕 Pyrogram • 𝕊𝕠𝕗𝕥𝕨𝕒𝕣𝕖 𝕌𝕤𝕖𝕕 Ebook-convert Deploy 𝔽𝕠𝕣𝕜 𝕥𝕙𝕚𝕤 𝕣
This is a demo app to be used in the video streaming applications
MoViDNN: A Mobile Platform for Evaluating Video Quality Enhancement with Deep Neural Networks MoViDNN is an Android application that can be used to ev
macOS development environment setup: Setting up a new developer machine can be an ad-hoc, manual, and time-consuming process.
dev-setup Motivation Setting up a new developer machine can be an ad-hoc, manual, and time-consuming process. dev-setup aims to simplify the process w
A Time Series Library for Apache Spark
Flint: A Time Series Library for Apache Spark The ability to analyze time series data at scale is critical for the success of finance and IoT applicat
A streaming animation of all the edits to a given Wikipedia page.
WikiFilms! What is it? A streaming animation of all the edits to a given Wikipedia page. How it works. It works by creating a "virtual camera," which
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Spark Python Notebooks This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, fro
Distributed deep learning on Hadoop and Spark clusters.
Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version
BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems
Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.
Kafka Connect JDBC Docker Image.
kafka-connect-jdbc This is a dockerized version of the Confluent JDBC database connector. Usage This image is running the connect-standalone command w
Paddlespeech Streaming ASR GUI
Paddlespeech-Streaming-ASR-GUI Introduction A paddlespeech Streaming ASR GUI. Us
Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.
Streaming Data Pipeline - Kafka + ELK Stack Streaming weather data using Apache Kafka and Elastic Stack. Data source: https://openweathermap.org/api O
Streaming Finance Data with AWS Lambda
A data pipeline consisting of an AWS lambda function reading data from yfinance API, an AWS Kinesis stream to receive & store data in S3 buckets and AWS Glue crawler & Athena to run SQL queries.
DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.
(简体中文|English) Quick Start | Documents | Models List PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks i
Telegram Music/ Video Streaming Bot Using Pytgcalls
Video Player 🔥 ᴢᴀɪᴅ ᴠᴄ ᴘʟᴀyᴇʀ ɪꜱ ᴀ ᴛᴇʟᴇɢʀᴀᴍ ᴘʀᴏᴊᴇᴄᴛ ʙᴀꜱᴇᴅ ᴏɴ ᴘʏʀᴏɢʀᴀᴍ ꜰᴏʀ ᴘʟᴀʏ ᴍᴜꜱɪᴄꜱ ɪɴ ᴠᴄ ᴄʜᴀᴛꜱ... 🅡🅔🅟🅞 🅢🅣🅐🅣🅢 ʀᴇQᴜɪʀᴇᴍᴇɴᴛꜱ 📝 FFmpeg NodeJ
Video-Player - Telegram Music/ Video Streaming Bot Using Pytgcalls
Video Player 🔥 ᴢᴀɪᴅ ᴠᴄ ᴘʟᴀyᴇʀ ɪꜱ ᴀ ᴛᴇʟᴇɢʀᴀᴍ ᴘʀᴏᴊᴇᴄᴛ ʙᴀꜱᴇᴅ ᴏɴ ᴘʏʀᴏɢʀᴀᴍ ꜰᴏʀ ᴘʟᴀʏ
Spark-movie-lens - An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
A scalable on-line movie recommender using Spark and Flask This Apache Spark tutorial will guide you step-by-step into how to use the MovieLens datase
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Recommendation engines are one of the most well known, widely used and highest value use cases for applying machine learning. Despite this, while there are many resources available for the basics of training a recommendation model, there are relatively few that explain how to actually deploy these models to create a large-scale recommender system.
X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana
X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana
Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster
Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed
TwitterDataStreaming - Twitter data streaming using APIs
Twitter_Data_Streaming Twitter data streaming using APIs Use Case 1: Streaming r
City-seeds - A random generator of cultural characteristics intended to spark ideas and help draw threads
City Seeds This is a random generator of cultural characteristics intended to sp
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
GDB plugin for streaming defmt messages over RTT from e.g. JLinkGDBServer
Defmt RTT plugin from GDB This small plugin runs defmt-print on the RTT stream produced by JLinkGDBServer, so that you can see the defmt logs in the G
Music library streaming app written in Flask & VueJS
djtaytay This is a little toy app made to explore Vue, brush up on my Python, and make a remote music collection accessable through a web interface. I
This mini project showcase how to build and debug Apache Spark application using Python
Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark application on Spark container
Pandas and Spark DataFrame comparison for humans
DataComPy DataComPy is a package to compare two Pandas DataFrames. Originally started to be something of a replacement for SAS's PROC COMPARE for Pand
A fast streaming JSON parser for Python that generates SAX-like events using yajl
json-streamer jsonstreamer provides a SAX-like push parser via the JSONStreamer class and a 'object' parser via the ObjectStreamer class which emits t
Fetching tweets and integrating it with Kafka and PySpark
KafkaPySpark Zookeeper bin/zookeeper-server-start.sh config/zookeeper.properties Kafka Server bin/kafka-server-start.sh config/server.properties Kafka
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Redash is designed to enable anyone, regardless of the level of technical sophistication, to harness the power of data big and small. SQL users levera
Monitor the stability of a pandas or spark dataframe ⚙︎
Population Shift Monitoring popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets.
Streaming over lightweight data transformations
Description Data augmentation libarary for Deep Learning, which supports images, segmentation masks, labels and keypoints. Furthermore, SOLT is fast a
Implementation of K-Nearest Neighbors Algorithm Using PySpark
KNN With Spark Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https:
Control YouTube, streaming sites, media players on your computer using your phone as a remote.
Media Control Control Youtube, streaming sites, media players on your computer using your phone as a remote. Installation pip install -r requirements.
A self-hosted streaming platform with Discord authentication, auto-recording and more!
A self-hosted streaming platform with Discord authentication, auto-recording and more!
Apache (Py)Spark type annotations (stub files).
PySpark Stubs A collection of the Apache Spark stub files. These files were generated by stubgen and manually edited to include accurate type hints. T
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.
Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is
A very fast file streaming bot used for streaming and downloading movies
FileStreamBot GIVE A STAR AND FORK ELSE NO MORE OPENSOURCE A Telegram bot to turn all media and documents files to web link . Report a Bug | Request F
Pyspark project that able to do joins on the spark data frames.
SPARK JOINS This project is to perform inner, all outer joins and semi joins. create_df.py: load_data.py : helps to put data into Spark data frames. d
The virtual calculator will be above the live streaming from your camera
The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the specific length , it detected as a click i can write any arithmitic operation , when i click in the equals sign the result appears in the display section. i can clear the display section by pressing c button in the keyboard .
Delta Sharing: An Open Protocol for Secure Data Sharing
Delta Sharing: An Open Protocol for Secure Data Sharing Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enabl
Streaming parser for multipart/form-data written in Python
Streaming multipart/form-data parser streaming_form_data provides a Python parser for parsing multipart/form-data input chunks (the encoding used when
Utils for streaming large files (S3, HDFS, gzip, bz2...)
smart_open — utils for streaming large files in Python What? smart_open is a Python 3 library for efficient streaming of very large files from/to stor
SAMO: Streaming Architecture Mapping Optimisation
SAMO: Streaming Architecture Mapping Optimiser The SAMO framework provides a method of optimising the mapping of a Convolutional Neural Network model
The Spark Challenge Student Check-In/Out Tracking Script
The Spark Challenge Student Check-In/Out Tracking Script This Python Script uses the Student ID Database to match the entries with the ID Card Swipe a
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming
Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream
A live streaming chatroom involving multiple modalities, such as voice, gesture, and facial expression
HiLive A live streaming chatroom involving multiple modalities, such as voice, gesture, and facial expression. Introduction We focus on demonstrating
CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system
CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system
Simple and Distributed Machine Learning
Synapse Machine Learning SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. Sy
Find out where all films you want to watch are streaming
Just Watch Letterboxd Find out where all films you want to watch are streaming Ever wonder what films you want to watch are already on the streaming p
Deep Learning Pipelines for Apache Spark
Deep Learning Pipelines for Apache Spark The repo only contains HorovodRunner code for local CI and API docs. To use HorovodRunner for distributed tra
🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
Telegram Video Chat Video Streaming bot 🇱🇰
🧪 Get SESSION_NAME from below: Pyrogram 🎭 Preview ✨ Features Music & Video stream support MultiChat support Playlist & Queue support Skip, Pause, Re
Code base of KU AIRS: SPARK Autonomous Vehicle Team
KU AIRS: SPARK Autonomous Vehicle Project Check this link for the blog post describing this project and the video of SPARK in simulation and on parkou
Building house price data pipelines with Apache Beam and Spark on GCP
This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.
This is a Client-Server-System which can send audio from a microphone from the server to client and in the other direction.
Audio-Streaming-Python This is a Client-Server-System which can send audio from a microphone from the server to client and in the other direction. You
ZipFly is a zip archive generator based on zipfile.py
ZipFly is a zip archive generator based on zipfile.py. It was created by Buzon.io to generate very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without memory inflation.
This is a Client-Server-System which can share the screen from the server to client and in the other direction.
Screenshare-Streaming-Python This is a Client-Server-System which can share the screen from the server to client and in the other direction. You have
PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.
PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.
This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming"
Coresets via Bilevel Optimization This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming" ht
🌊 River is a Python library for online machine learning.
River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
List of Data Science Cheatsheets to rule the world
Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat
Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python
Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python This project is a good starting point for those who have little
Music Streaming Platform based on full implementation of DBSM
Symphony Music Streaming Platform based on full implementation of DBSM List of Commands Insert User (INSERT) Function to implement input in USER Get a
Microservice example with Python, Faust-Streaming and Kafka (Redpanda)
Microservices Orchestration with Python, Faust-Streaming and Kafka (Redpanda) Example project for PythonBenin meetup. It demonstrates how to use Faust
Streamz helps you build pipelines to manage continuous streams of data
Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.