3123 Repositories
Python how-to-move-data-to-any-cloud Libraries
Benchmark datasets, data loaders, and evaluators for graph machine learning
Overview The Open Graph Benchmark (OGB) is a collection of benchmark datasets, data loaders, and evaluators for graph machine learning. Datasets cover
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou
AutoSub is a CLI application to generate subtitle files (.srt, .vtt, and .txt transcript) for any video file using Mozilla DeepSpeech.
AutoSub About Motivation Installation Docker How-to example How it works TO-DO Contributing References About AutoSub is a CLI application to generate
DataPrep — The easiest way to prepare data in Python
DataPrep — The easiest way to prepare data in Python
A data visualization curriculum of interactive notebooks.
A data visualization curriculum of interactive notebooks, using Vega-Lite and Altair. This repository contains a series of Python-based Jupyter notebooks.
Python Library for Signal/Image Data Analysis with Transport Methods
PyTransKit Python Transport Based Signal Processing Toolkit Website and documentation: https://pytranskit.readthedocs.io/ Installation The library cou
"Domain Adaptive Semantic Segmentation without Source Data" (ACM MM 2021)
LDBE Pytorch implementation for two papers (the paper will be released soon): "Domain Adaptive Semantic Segmentation without Source Data", ACM MM2021.
A cross-lingual COVID-19 fake news dataset
CrossFake An English-Chinese COVID-19 fake&real news dataset from the ICDMW 2021 paper below: Cross-lingual COVID-19 Fake News Detection. Jiangshu Du,
Ground truth data for the Optical Character Recognition of Historical Classical Commentaries.
OCR Ground Truth for Historical Commentaries The dataset OCR ground truth for historical commentaries (GT4HistComment) was created from the public dom
Opasium AI was specifically designed for the Opasium Games discord only. It is a bot that covers the basic functions of any other bot.
OpasiumAI Opasium AI was specifically designed for the Opasium Games discord only. It is a bot that covers the basic functions of any other bot. Insta
Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info
SpaceX Sofware I developed software to scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info to use the software you need Python a
Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022
Time Masking for Temporal Language Models This repository provides a reference implementation of the paper: Time Masking for Temporal Language Models
A Python package that scrapes Google News article data while remaining undetected by Google.
A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)
Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.
Web Scrapping Popular Youtube Tech Channels with Selenium Data Mining, Data Wrangling, and Exploratory Data Analysis About the Data Web scrapi
👁️ Tool for Data Extraction and Web Requests.
httpmapper 👁️ Project • Technologies • Installation • How it works • License Project 🚧 For educational purposes. This is a project that I developed,
The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.
The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker. Processed data ready for analysis is available at datade
A tool for scraping and organizing data from NewsBank API searches
nbscraper Overview This simple tool automates the process of copying, pasting, and organizing data from NewsBank API searches. Curerntly, nbscrape onl
A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.
New to Streaming Scraper An in-progress web scraping project built with Python, R, and SQL. The scraped data are movie and TV show information. The go
A tool to easily scrape youtube data using the Google API
YouTube data scraper To easily scrape any data from the youtube homepage, a youtube channel/user, search results, playlists, and a single video itself
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which he recommends to buy. We will use this data to build a portfolio
Backtesting the "Cramer Effect" & Recommendations from Cramer Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which
DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.
DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis. The main goal of the package is to accelerate the process of computing estimates of forward reachable sets for nonlinear dynamical systems.
Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.
Statistical Analysis 📈 This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr
pyETT: Python library for Eleven VR Table Tennis data
pyETT: Python library for Eleven VR Table Tennis data Documentation Documentation for pyETT is located at https://pyett.readthedocs.io/. Installation
A collection of learning outcomes data analysis using Python and SQL, from DQLab.
Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu
Python Package for DataHerb: create, search, and load datasets.
The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.
A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.
The leading use-case for the staircase package is for the creation and analysis of step functions. Pretty exciting huh. But don't hit the close button
HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets
HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets that can be described as multidimensional arrays o
Additional tools for particle accelerator data analysis and machine information
PyLHC Tools This package is a collection of useful scripts and tools for the Optics Measurements and Corrections group (OMC) at CERN. Documentation Au
GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors
GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors. GWpy provides a user-f
Open Data Cube analyses continental scale Earth Observation data through time
Open Data Cube Core Overview The Open Data Cube Core provides an integrated gridded data analysis environment for decades of analysis ready earth obse
Toolchest provides APIs for scientific and bioinformatic data analysis.
Toolchest Python Client Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of runni
A data analysis using python and pandas to showcase trends in school performance.
A data analysis using python and pandas to showcase trends in school performance. A data analysis to showcase trends in school performance using Panda
An end-to-end regression problem of predicting the price of properties in Bangalore.
Bangalore-House-Price-Prediction An end-to-end regression problem of predicting the price of properties in Bangalore. Deployed in Heroku using Flask.
Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis
Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.
Universal data analysis tools for atmospheric sciences
U_analysis Universal data analysis tools for atmospheric sciences Script written in python 3. This file defines multiple functions that can be used fo
Django3 web app that renders OpenWeather API data ☁️☁️
nz-weather For a live build, visit - https://brandonru.pythonanywhere.com/ NZ Openweather API data rendered using Django3 and requests ☀️ Local Run In
Trained on Simulated Data, Tested in the Real World
Trained on Simulated Data, Tested in the Real World
Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.
Elicited Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations. Credit to Brett Hoove
A synthetic texture-invariant dataset for object detection of UAVs
A synthetic dataset for object detection of UAVs This repository contains a synthetic datasets accompanying the paper Sim2Air - Synthetic aerial datas
Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification
Less is More: Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification Suncheng Xiang Shanghai Jiao Tong University Over
Official Implementation (PyTorch) of "Point Cloud Augmentation with Weighted Local Transformations", ICCV 2021
PointWOLF: Point Cloud Augmentation with Weighted Local Transformations This repository is the implementation of PointWOLF(To appear). Sihyeon Kim1*,
Perform Linear Classification with Multi-way Data
MultiwayClassification This is an R package to perform linear classification for data with multi-way structure. The distance-weighted discrimination (
TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.
TCube: Domain-Agnostic Neural Time series Narration This repository contains the code for the paper: "TCube: Domain-Agnostic Neural Time series Narrat
This repository is for EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data
InterpretationData This repository is for our EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpr
This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.
OpenCV-Multiple-Object-Tracking Python is version 3.6.7 to install opencv: pip uninstall opecv-python pip uninstall opencv-contrib-python pip install
Biomarker identification for COVID-19 Severity in BALF cells Single-cell RNA-seq data
scBALF Covid-19 dataset Analysis Here is the Github page that has the codes for the bioinformatics pipeline described in the paper COVID-Datathon: Bio
Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.
Doubly Trained Neural Machine Translation System for Adversarial Attack and Data Augmentation Languages Experimented: Data Overview: Source Target Tra
Continuous Conditional Random Field Convolution for Point Cloud Segmentation
CRFConv This repository is the implementation of "Continuous Conditional Random Field Convolution for Point Cloud Segmentation" 1. Setup 1) Building c
Nonnegative spatial factorization for multivariate count data
Nonnegative spatial factorization for multivariate count data This repository contains supporting code to facilitate reproducible analysis. For detail
This is a simple bot that can be used to upload images to a third-party cloud (image hosting). Currently, only the imgbb.com website supports the bot. I Will do future updates
TGImageHosting This is a simple bot that can be used to upload images to a third party cloud (image hosting). Currently, only the imgbb.com website su
Download India Stocks Historical Data
Kite Helper - Download Stock Market Data 🌎 Website Simple Application to Download any stock market data in .csv format using Kite 🏃♂️ Running Serve
Stitch together Nanopore tiled amplicon data without polishing a reference
Stitch together Nanopore tiled amplicon data using a reference guided approach Tiled amplicon data, like those produced from primers designed with pri
A bot to display per user data from the Twitch Leak
twitch-leak-bot-discord A bot to display per user data from the Twitch Leak by username Where's the data? I can't and don't want to supply the .csv's
Tooling for converting STAC metadata to ODC data model
Tooling for converting STAC metadata to ODC data model.
Helpers to extend Django Admin with data from external service with minimal hacks
django-admin-data-from-external-service Helpers to extend Django Admin with data from external service with minimal hacks Live demo with sources on He
A lightweight, hub-and-spoke dashboard for multi-account Data Science projects
A lightweight, hub-and-spoke dashboard for cross-account Data Science Projects Introduction Modern Data Science environments often involve many indepe
This repository will be a draft of a package about the latest total marine fish production in Indonesia. Data will be collected from PIPP (Pusat Informasi Pelabuhan Perikanan).
indomarinefish This package will give us information about the latest total marine fish production in Indonesia. The Name of the fish is written in In
Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021)
Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021) Alexey Nekrasov*, Jonas Schult*, Or Litany, Bastian Leibe, Francis Engelmann Mix3D is
A Python script to update Spotify Playlist data every 5 minutes.
Spotify Playlist Updater A Python script to update Spotify Playlist data every 5 minutes. Description An automatic playlist updater using Spotify API
The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"
SubTab: Author: Talip Ucar ([email protected]) The official implementation of the paper, SubTab: Subsetting Features of Tabular Data for Self-Supervis
This python script extracts all the video URLs from any youtube channel. Then it extracts all the information like the name of the youtube channel, published date, likes, dislikes, comments, views, etc for all the videos in that channel.
youtube-channel-video-url-extractor This python script extracts all the video URLs from any youtube channel. Then it extracts all the information like
An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!
Social Media Scraper An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line! Go to the website » Vie
FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR
This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Data Augmentation with Variational Autoencoders
Documentation Pyraug This library provides a way to perform Data Augmentation using Variational Autoencoders in a reliable way even in challenging con
[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
This repository contains the source code for the paper SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer (ICCV 2021 Oral). The project page is here.
A simple tool to move and rename Nvidia Share recordings to a more sensible format.
A simple tool to move and rename Nvidia Share recordings to a more sensible format.
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.
A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search
A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.
MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.
A simple command for converting and processing data from your clipboard.
A generic text conversion/processing tool
Script that organizes the Google Takeout archive into one big chronological folder
Script that organizes the Google Takeout archive into one big chronological folder
This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease.
LeasePlan - Scraper This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease. It has
Shows twitch pay for any streamer from Twitch leaked CSV files.
twitch_leak_csv_reader Shows twitch pay for any streamer from Twitch leaked CSV files. Requirements: You need python3 (you can install python 3 from o
PyTorch implementation of DUL (Data Uncertainty Learning in Face Recognition, CVPR2020)
PyTorch implementation of DUL (Data Uncertainty Learning in Face Recognition, CVPR2020)
Projects that implement various aspects of Data Engineering.
DATAWAREHOUSE ON AWS The purpose of this project is to build a datawarehouse to accomodate data of active user activity for music streaming applicatio
Simple PoC script that allows you to exploit telegram's "send with timer" feature by saving any media sent with this functionality.
Simple PoC script that allows you to exploit telegram's "send with timer" feature by saving any media sent with this functionality.
Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features
MediumVC MediumVC is an utterance-level method towards any-to-any VC. Before that, we propose SingleVC to perform A2O tasks(Xi → Ŷi) , Xi means utter
A game theoretic approach to explain the output of any machine learning model.
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo
MAASTA is a wrapper to create an Ansible inventory for MAAS instances that are provisioned by Terraform.
MAASTA is a wrapper to create an Ansible inventory for MAAS instances that are provisioned by Terraform.
Creates folders into a directory to categorize files in that directory by file extensions and move all things from sub-directories to current directory.
Categorize and Uncategorize Your Folders Table of Content TL;DR just take me to how to install. What are Extension Categorizer and Folder Dumper Insta
GndNet: Fast ground plane estimation and point cloud segmentation for autonomous vehicles using deep neural networks.
GndNet: Fast Ground plane Estimation and Point Cloud Segmentation for Autonomous Vehicles. Authors: Anshul Paigwar, Ozgur Erkent, David Sierra Gonzale
Persian Kaldi profile for Rhasspy built from open speech data
Persian Kaldi Profile A Rhasspy profile for Persian (fa). Installation Get started by first installing Vosk: # Create virtual environment python3 -m v
Simple python code to fix your combo list by removing any text after a separator or removing duplicate combos
Combo List Fixer A simple python code to fix your combo list by removing any text after a separator or removing duplicate combos Removing any text aft
A 2D physics sim for orbits. Made using pygame and tkinter. High degree of intractability, allowing you to create celestial bodies of a custom mass and velocity within the simulation, select what specifically is displayed, and move the camera.
Python-Orbit-Sim A 2D physics sim for orbits. Made using pygame and tkinter. High degree of intractability, allowing you to create celestial bodies of
An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.
Sketch Simulator An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics. See
A CLI tools to get you started on any project in any language
Any Template A faster easier to Quick start any programming project. Installation pip3 install any-template Features No third party dependencies. Tem
Fancy data functions that will make your life as a data scientist easier.
WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins
Exploratory Data Analysis for Employee Retention Dataset
Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee
PyTorch implementation for View-Guided Point Cloud Completion
PyTorch implementation for View-Guided Point Cloud Completion
Graphsignal is a machine learning model monitoring platform.
Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model performance and availability.
This is a public repo where code samples are stored for the book Practical MLOps.
[Book-2021] Practical MLOps O'Reilly Book
Repositori untuk belajar pemrograman Python dalam bahasa Indonesia
Python Repositori ini berisi kumpulan dari berbagai macam contoh struktur data, algoritma dan komputasi matematika yang diimplementasikan dengan mengg
Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.
Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re
Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.
Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s
a short visualisation script for pyvideo data
PyVideo Speakers A CLI that visualises repeat speakers from events listed in https://github.com/pyvideo/data Not terribly efficient, but you know. Ins
Work with the AWS IP address ranges in native Python.
Amazon Web Services (AWS) publishes its current IP address ranges in JSON format. Python v3 provides an ipaddress module in the standard library that allows you to create, manipulate, and perform operations on IPv4 and IPv6 addresses and networks. Wouldn't it be nice if you could work with the AWS IP address ranges like native Python objects?