2925 Repositories
Python youtube-data-scraper Libraries
Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array
shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.
ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstractions that are catered towards ML workflows.
A Telegram bot for Download songs in mp3 format from YouTube and Extract lyrics from Genius.com ❤️
MeudsaMusic A Telegram bot for Download songs in mp3 format from YouTube and Extract lyrics from Genius.com ❤️ Commands Reach @MedusaMusic on Telegram
A program to convert YouTube channel registration information into Json files for ThirdTube.
ThirdTubeImporter A program to convert YouTube channel registration information into Json files for ThirdTube. Usage Japanese https://takeout.google.c
A compact version of EDI-Vetter, which uses the TLS output to quickly vet transit signals.
A compact version of EDI-Vetter, which uses the TLS output to quickly vet transit signals. All your favorite hits in a simplified format.
This tool parses log data and allows to define analysis pipelines for anomaly detection.
logdata-anomaly-miner This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis wit
Tools for analyzing data collected with a custom unity-based VR for insects.
unityvr Tools for analyzing data collected with a custom unity-based VR for insects. Organization: The unityvr package contains the following submodul
A collection of tools for biomedical research assay analysis in Python.
waltlabtools A collection of tools for biomedical research assay analysis in Python. Key Features Analysis for assays such as digital ELISA, including
A learning-based data collection tool for human segmentation
FullBodyFilter A Learning-Based Data Collection Tool For Human Segmentation Contents Documentation Source Code and Scripts Overview of Project Usage O
A collection of robust and fast processing tools for parsing and analyzing web archive data.
ChatNoir Resiliparse A collection of robust and fast processing tools for parsing and analyzing web archive data. Resiliparse is part of the ChatNoir
Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.
Documentation OWPHydroTools GitHub pages documentation Motivation We developed OWPHydroTools with data scientists in mind. We attempted to ensure the
A tool and a library for SVG path data transformations.
SVG path data transformation toolkit A tool and a library for SVG path data transformations. Currently it supports a translation and a scaling. Usage
Categorizing comments on YouTube into different categories.
Youtube Comments Categorization This repo is for categorizing comments on a youtube video into different categories. negative (grievances, complaints,
Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.
causal-bald | Abstract | Installation | Example | Citation | Reproducing Results DUE An implementation of the methods presented in Causal-BALD: Deep B
A simplistic scraper made to download tons of random screenshots made by people.
printStealer 1.1 What is this tool? This tool is developed to show the insecurity of the screenshot utility called prnt sc. It is a site that stores s
python's memory-saving dictionary data structure
ConstDict python代替的Dict数据结构 若字典不会增加字段,只读/原字段修改 使用ConstDict可节省内存 Dict()内存主要消耗的地方: 1、Dict扩容机制,预留内存空间 2、Dict也是一个对象,内部会动态维护__dict__,增加slot类属性可以节省内容 节省内存大小
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!
🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine
This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here
uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain
Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.
Agroforestry Species Switchboard 2.0 Scraper Scrape plants scientific name information from Species Switchboard 2.0. Requirements python = 3.10 (you
Streamz helps you build pipelines to manage continuous streams of data
Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.
MongoDB utility to inflate the contents of small collection to a new larger collection
MongoDB Data Inflater ("data-inflater") The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using
A rofi-blocks script that searches youtube and plays the selected audio on mpv.
rofi-ytm A rofi-blocks script that searches youtube and plays the selected audio on mpv. To use the script, run the following command rofi -modi block
Visualization Data Drug in thailand during 2014 to 2020
Visualization Data Drug in thailand during 2014 to 2020 Data sorce from ข้อมูลเปิดภาครัฐ สำนักงาน ป.ป.ส Inttroducing program Using tkinter module for
Official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021).
Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas
nrgpy is the Python package for processing NRG Data Files
nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github
The Metabolomics Integrator (MINT) is a post-processing tool for liquid chromatography-mass spectrometry (LCMS) based metabolomics.
MINT (Metabolomics Integrator) The Metabolomics Integrator (MINT) is a post-processing tool for liquid chromatography-mass spectrometry (LCMS) based m
Data science, Data manipulation and Machine learning package.
duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l
Extract the temperature data of each wire from the thermal imager raw data.
Wire-Tempurature-Detection Extract the temperature data of each wire from the thermal imager raw data. The motivation of this computer vision project
VevestaX is an open source Python package for ML Engineers and Data Scientists.
VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.
A Bot that adds YouTube views to your video of choice
YoutubeViews Free Youtube viewer bot A Bot that adds YouTube views to your video of choice Installation git clone https://github.com/davdtheemonk/Yout
Connects to a local SenseCap M1 Helium Hotspot and pulls API Data.
sensecap_api_checker_HELIUM Connects to a local SenseCap M1 Helium Hotspot and pulls API Data.
scikit-multimodallearn is a Python package implementing algorithms multimodal data.
scikit-multimodallearn is a Python package implementing algorithms multimodal data. It is compatible with scikit-learn, a popul
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation
DistMIS Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation. DistriMIS Distributing Deep Learning Hyperparameter Tuning
Generalized Data Weighting via Class-level Gradient Manipulation
Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Thank you for you
Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec
Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec This repo
SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.
FIW Data Development Kit Table of Contents Introduction Families In the Wild Database Publications Organization To Do License Getting Involved Introdu
[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”
Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in
Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available for research purposes.
Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available f
Code for a seq2seq architecture with Bahdanau attention designed to map stereotactic EEG data from human brains to spectrograms, using the PyTorch Lightning.
stereoEEG2speech We provide code for a seq2seq architecture with Bahdanau attention designed to map stereotactic EEG data from human brains to spectro
The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.
Timescale NFT Starter Kit The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualiz
This is the accompanying repository for the Bloomberg Global Coal Countdown website.
This is the accompanying repository for the Bloomberg Global Coal Countdown (BGCC) website. Data Sources Dashboard Data Schema and Validation License
Youtube Channel Website
Videos-By-Sanjeevi Youtube Channel Website YouTube Channel Website Features: Free Hosting using GitHub Pages and open-source code base in GitHub. It c
Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.
Meet Fidesops: Privacy as Code for DSAR Orchestration A part of the greater Fides ecosystem. ⚡ Overview Fidesops (fee-dez-äps, combination of the Lati
Google Drive, OneDrive and Youtube as covert-channels - Control systems remotely by uploading files to Google Drive, OneDrive, Youtube or Telegram
covert-control Control systems remotely by uploading files to Google Drive, OneDrive, Youtube or Telegram using Python to create the files and the lis
A tool for hiding data inside of images
Stegenography-tool a tool for hiding data inside of images Quick test: do python steg-encode.py test/message.txt test/covid19.png to generate the test
Planning Algorithms in AI and Robotics. MSc course at Skoltech Data Science program
Planning Algorithms in AI and Robotics course T2 2021-22 The Planning Algorithms in AI and Robotics course at Skoltech, MS in Data Science, during T2,
EOD Historical Data Python Library (Unofficial)
EOD Historical Data Python Library (Unofficial) https://eodhistoricaldata.com Installation python3 -m pip install eodhistoricaldata Note Demo API key
A selection of SQLite3 databases to practice querying from.
Dummy SQL Databases This is a collection of dummy SQLite3 databases, for learning and practicing SQL querying, generated with the VS Code extension Ge
This is an auto-ML tool specialized in detecting of outliers
Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le
Python Youtube Video-Playlist Downloader
Youtube-Video-Playlist-Downloader-PyQt5 You can download videos and playlists on YouTube with this script. Script has GUI. Enjoy. Setup git clone http
Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.
Official repository of the paper (UAI 2021) "A Variational Approximation for Analyzing the Dynamics of Panel Data", Mixed Effect Neural ODE. Panel dat
Data Poisoning based on Adversarial Attacks using Non-Robust Features
Data Poisoning based on Adversarial Attacks using Non-Robust Features Usage python main.py [-h] [--gpu | -g GPU] [--eps |-e EPSILON] [--pert | -p PER
Proxy scraper. Format: IP | PORT | COUNTRY | TYPE
proxy scraper 🔎 Installation: git clone https://github.com/ebankoff/proxy_scraper Required pip libraries (pip install library name): lxml beautifulso
this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here
uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain
A terminal spreadsheet multitool for discovering and arranging data
VisiData v2.6.1 A terminal interface for exploring and arranging tabular data. VisiData supports tsv, csv, sqlite, json, xlsx (Excel), hdf5, and many
A large-scale database for graph representation learning
A large-scale database for graph representation learning
In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.
ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t
Download Youtube videos in mp4 format in a fast, easy, convenient way made with Python!
yt_downloader Download Youtube videos in mp4 format in a fast, easy, convenient way made with Python! Required Modules pytube os time colorama Errors
Data on Free Food at MIT
MIT Free Food Timing Procrastinating research by plotting data on how long it takes emails on the free-food at mit edu mailing list to go through. Dat
Minimal working example of data acquisition with nidaqmx python API
Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It
A toolkit for geo ML data processing and model evaluation (fork of solaris)
An open source ML toolkit for overhead imagery. This is a beta version of lunular which may continue to develop. Please report any bugs through issues
Data cleaning tools for Business analysis
Datacleaning datacleaning tools for Business analysis This program is made for Vicky's work. You can use it, too. 数据清洗 该数据清洗工具是为了商业分析 这个程序是为了Vicky的工作而
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021
Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini
This repo represents all we learned and are learning in Data Structure course.
DataStructure Journey This repo represents all we learned and are learning in Data Structure course which is based on CLRS book and is being taught by
Enhancing Knowledge Tracing via Adversarial Training
Enhancing Knowledge Tracing via Adversarial Training This repository contains source code for the paper "Enhancing Knowledge Tracing via Adversarial T
Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions
Aquarius Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions NOTE: We are currently going through the open-source process requir
Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation
deeptime Releases: Installation via conda recommended. conda install -c conda-forge deeptime pip install deeptime Documentation: deeptime-ml.github.io
How to Leverage Multimodal EHR Data for Better Medical Predictions?
How to Leverage Multimodal EHR Data for Better Medical Predictions? This repository contains the code of the paper: How to Leverage Multimodal EHR Dat
A command line tool for visualizing CSV/spreadsheet-like data
PerfPlotter Read data from CSV files using pandas and generate interactive plots using bokeh, which can then be embedded into HTML pages and served by
Scrapy-based cyber security news finder
Cyber-Security-News-Scraper Scrapy-based cyber security news finder Goal To keep up to date on the constant barrage of information within the field of
Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly
Table of contents Introduction Dataset Model & Metrics How to Run Quickstart Install Training Evaluation Detection DATA COMPETITION The COVID-19 pande
Scraping web pages to get data
Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4
An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post
Autoscraper-n-blogger An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post and notifies via Telegram bot
[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences
NYU-VPR This repository provides the experiment code for the paper Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymiza
Python package with library and CLI tool for analyzing SeaFlow data
Seaflowpy A Python package for SeaFlow flow cytometer data. Table of Contents Install Read EVT/OPP/VCT Files Command-line Interface Configuration Inte
Python and data science snippets on the command line
Python Snippet Tool A tool to get Python and data science snippets at Data Science Simplified on the command line. You can read my article to learn ho
Splitgraph command line client and python library
Splitgraph Overview Splitgraph is a tool for building, versioning and querying reproducible datasets. It's inspired by Docker and Git, so it feels fam
Command-line script to upload videos to Youtube using theYoutube APIv3.
Introduction Command-line script to upload videos to Youtube using theYoutube APIv3. It should work on any platform (GNU/Linux, BSD, OS X, Windows, ..
TAug :: Time Series Data Augmentation using Deep Generative Models
TAug :: Time Series Data Augmentation using Deep Generative Models Note!!! The package is under development so be careful for using in production! Fea
A Bot Upload file|video To Telegram using given Links.
A Bot Upload file|video To Telegram using given Links.
Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide range of illumination variants of a single image.
Deep Illuminator Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au
A general, feasible, and extensible framework for classification tasks.
Pytorch Classification A general, feasible and extensible framework for 2D image classification. Features Easy to configure (model, hyperparameters) T
This provides the R code and data to replicate results in "The USS Trustee’s risky strategy"
USSBriefs2021 This provides the R code and data to replicate results in "The USS Trustee’s risky strategy" by Neil M Davies, Jackie Grant and Chin Yan
[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
MosaicKD Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data" 1. Motivation Natural images share common l
Codes and Data Processing Files for our paper.
Code Scripts and Processing Files for EEG Sleep Staging Paper 1. Folder Tree ./src_preprocess (data preprocessing files for SHHS and Sleep EDF) sleepE
Simple data balancing baselines for worst-group-accuracy benchmarks.
BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating
Official project repository for 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination'
NCAE_UAD Official project repository of 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination' Abstract In this p
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
ColossalAI An integrated large-scale model training system with efficient parallelization techniques. arXiv: Colossal-AI: A Unified Deep Learning Syst
Data visualization electromagnetic spectrum
Datenvisualisierung-Elektromagnetischen-Spektrum Anhand des Moduls matplotlib sollen die Daten des elektromagnetischen Spektrums dargestellt werden. D
A framework for feature exploration in Data Science
Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no
A simple telegram Bot, Upload Media File| video To telegram using the direct download link. (youtube, Mediafire, google drive, mega drive, etc)
URL-Uploader (Bot) A Bot Upload file|video To Telegram using given Links. Features: 👉 Only Auth Users (AUTH_USERS) Can Use The Bot 👉 Upload YTDL Sup
Experimental proxy for dumping the unencrypted packet data from Brawl Stars (WIP)
Brawl Stars Proxy Experimental proxy for version 39.99 of Brawl Stars. It allows you to capture the packets being sent between the Brawl Stars client
DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.
DWIPrep: A Robust Preprocessing Pipeline for dMRI Data DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transp
A tutorial for people to run synthetic data replica's from source healthcare datasets
Synthetic-Data-Replica-for-Healthcare Description What is this? A tailored hands-on tutorial showing how to use Python to create synthetic data replic
Web scrapping
Project Setup Table of Contents Project Setup Table of Contents Run project locally Install Requirements Run script Run project locally Install Requir
[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
MosaicKD Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data" 1. Motivation Natural images share common l
Rover is a command line interface application that allows through browse through mission data, images, metadata from the NASA Official Website
🤖 rover Rover is a command line interface application that allows through browse through mission data, images, metadata from the NASA Official Websit