3149 Repositories
Python fast-data-pipeline Libraries
Tools for calculating and visualizing Elo-like ratings of MLB teams using Retosheet data
Overview This project uses historical baseball games data to calculate an Elo-like rating for MLB teams based on regular season match ups. The Elo rat
Converts a Bangla numeric string to literal words.
Bangla Number in Words Converts a Bangla numeric string to literal words. Install $ pip install banglanum2words Usage
Kaggle Competition using 15 numerical predictors to predict a continuous outcome.
Kaggle-Comp.-Data-Mining Kaggle Competition using 15 numerical predictors to predict a continuous outcome as part of a final project for a stats data
Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis
WASP2 (Currently in pre-development): Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis Requ
A public data repository for datasets created from TransLink GTFS data.
TransLink Spatial Data What: TransLink is the statutory public transit authority for the Metro Vancouver region. This GitHub repository is a collectio
Webtesting for course Data Structures & Algorithms
Selenium job to automate queries to check last posts of Module Data Structures & Algorithms Web-testing for course Data Structures & Algorithms Struct
Robbing the FED: Directly Obtaining Private Data in Federated Learning with Modified Models
Robbing the FED: Directly Obtaining Private Data in Federated Learning with Modified Models This repo contains a barebones implementation for the atta
The python SDK for Eto, the AI focused data platform for teams bringing AI models to production
Eto Labs Python SDK This is the python SDK for Eto, the AI focused data platform for teams bringing AI models to production. The python SDK makes it e
A script that publishes power usage data of iDrac enabled servers to an MQTT broker for integration into automation and power monitoring systems
iDracPowerMonitorMQTT This script publishes iDrac power draw data for iDrac 6 enabled servers to an MQTT broker. This can be used to integrate the pow
SPEAR: Semi suPErvised dAta progRamming
Semi-Supervised Data Programming for Data Efficient Machine Learning SPEAR is a library for data programming with semi-supervision. The package implem
Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)
Self-Supervised Learning (SimCLR) with Biological Plausible Image Augmentations Official code base for the poster "On the use of Cortical Magnificatio
A high-performance distributed deep learning system targeting large-scale and automated distributed training.
HETU Documentation | Examples Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, develop
A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.
How to apply? Create your config.ini file following the example provided in config.ini Choose one of the options below to run: Run with Python3 pip in
pandas: powerful Python data analysis toolkit
pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.
A mutable set that remembers the order of its entries. One of Python's missing data types.
An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index number that can be looked up.
Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases
Estoult Estoult is a Python toolkit for data mapping with an integrated query builder for SQL databases. It currently supports MySQL, PostgreSQL, and
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.
Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is
A very fast file streaming bot used for streaming and downloading movies
FileStreamBot GIVE A STAR AND FORK ELSE NO MORE OPENSOURCE A Telegram bot to turn all media and documents files to web link . Report a Bug | Request F
A Fast Command Analyser based on Dict and Pydantic
Alconna Alconna 隶属于ArcletProject, 在Cesloi内有内置 Alconna 是 Cesloi-CommandAnalysis 的高级版,支持解析消息链 一般情况下请当作简易的消息链解析器/命令解析器 文档 暂时的文档 Example from arclet.alcon
Pyspark project that able to do joins on the spark data frames.
SPARK JOINS This project is to perform inner, all outer joins and semi joins. create_df.py: load_data.py : helps to put data into Spark data frames. d
yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.
The yt Project yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data. yt supports structured, varia
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.
The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine
GebPy is a Python-based, open source tool for the generation of geological data of minerals, rocks and complete lithological sequences.
GebPy is a Python-based, open source tool for the generation of geological data of minerals, rocks and complete lithological sequences. The data can be generated randomly or with respect to user-defined constraints, for example a specific element concentration within minerals and rocks or the order of units within a complete lithological profile.
Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn
Clustering Clustering Application in Python Using scikit-learn This repository contains the prediction of baseball metric clusters using MLB Statcast
Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts
DataSelection-NMT Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts Quick update: The paper got accepted o
FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥
FinRL-Meta: A Universe of Market Environments. FinRL-Meta is a universe of market environments for data-driven financial reinforcement learning. Users
Centroid-UNet is deep neural network model to detect centroids from satellite images.
Centroid UNet - Locating Object Centroids in Aerial/Serial Images Introduction Centroid-UNet is deep neural network model to detect centroids from Aer
This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".
BiCAT This is our TensorFlow implementation for the paper: "BiCAT: Sequential Recommendation with Bidirectional Chronological Augmentation of Transfor
Pydantic model generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.
datamodel-code-generator This code generator creates pydantic model from an openapi file and others. Help See documentation for more details. Supporte
A Django app that creates automatic web UIs for Python scripts.
Wooey is a simple web interface to run command line Python scripts. Think of it as an easy way to get your scripts up on the web for routine data anal
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
RDFLib RDFLib is a pure Python package for working with RDF. RDFLib contains most things you need to work with RDF, including: parsers and serializers
WebRTC and ORTC implementation for Python using asyncio
aiortc What is aiortc? aiortc is a library for Web Real-Time Communication (WebRTC) and Object Real-Time Communication (ORTC) in Python. It is built o
SSLyze is a fast and powerful SSL/TLS scanning tool and Python library.
SSLyze SSLyze is a fast and powerful SSL/TLS scanning tool and Python library. SSLyze can analyze the SSL/TLS configuration of a server by connecting
Titanic data analysis for python
Titanic-data-analysis This Repo is an analysis on Titanic_mod.csv This csv file contains some assumed data of the Titanic ship after sinking This full
The Python-Weather-App is a service that provides weather data
The Python-Weather-App is a service that provides weather data, including current weather data to the developers of web services and mobile applications.
RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.
Randomised controlled trial abstract result tabulator RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into
Big data on k8s
# microsoft azure # https://docs.microsoft.com/en-us/cli/azure/install-azure-cli az account set --subscription [] az aks get-credentials --resource-g
A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data
Easy-ERA5-Trck Easy-ERA5-Trck Galleries Install Usage Repository Structure Module Files Version iteration Easy-ERA5-Trck is a super lightweight Lagran
Simple and fast histogramming in Python accelerated with OpenMP.
pygram11 Simple and fast histogramming in Python accelerated with OpenMP with help from pybind11. pygram11 provides functions for very fast histogram
Jiminy, fast and portable Python/C++ simulator of poly-articulated systems with OpenAI Gym interface for reinforcement learning.
Jiminy is a fast and portable cross-platform open-source simulator for poly-articulated systems. It was built with two ideas in mind: provide a fast y
The geospatial toolkit for redistricting data.
maup maup is the geospatial toolkit for redistricting data. The package streamlines the basic workflows that arise when working with blocks, precincts
Delta Sharing: An Open Protocol for Secure Data Sharing
Delta Sharing: An Open Protocol for Secure Data Sharing Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enabl
PyIOmica (pyiomica) is a Python package for omics analyses.
PyIOmica (pyiomica) This repository contains PyIOmica, a Python package that provides bioinformatics utilities for analyzing (dynamic) omics datasets.
A part of HyRiver software stack for accessing hydrology data through web services
Package Description Status PyNHD Navigate and subset NHDPlus (MR and HR) using web services Py3DEP Access topographic data through National Map's 3DEP
Random dataframe and database table generator
Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien
A 3D Slicer Extension to view data from the flywheel heirarchy
flywheel-connect A 3D Slicer Extension to view, select, and download images from a Flywheel instance to 3D Slicer and storing Slicer outputs back to F
Ballcone is a fast and lightweight server-side Web analytics solution.
Ballcone Ballcone is a fast and lightweight server-side Web analytics solution. It requires no JavaScript on your website. Screenshots Design Goals Si
Streaming parser for multipart/form-data written in Python
Streaming multipart/form-data parser streaming_form_data provides a Python parser for parsing multipart/form-data input chunks (the encoding used when
A python module for retrieving and parsing WHOIS data
pythonwhois A WHOIS retrieval and parsing library for Python. Dependencies None! All you need is the Python standard library. Instructions The manual
Libextract: extract data from websites
Libextract is a statistics-enabled data extraction library that works on HTML and XML documents and written in Python
Persistent/Immutable/Functional data structures for Python
Pyrsistent Pyrsistent is a number of persistent collections (by some referred to as functional data structures). Persistent in the sense that they are
Persistent dict, backed by sqlite3 and pickle, multithread-safe.
sqlitedict -- persistent dict, backed-up by SQLite and pickle A lightweight wrapper around Python's sqlite3 database with a simple, Pythonic dict-like
A mutable set that remembers the order of its entries. One of Python's missing data types.
An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index nu
Python tree data library
Links Documentation PyPI GitHub Changelog Issues Contributors If you enjoy anytree Getting started Usage is simple. Construction from anytree impo
A JSON-friendly data structure which allows both object attributes and dictionary keys and values to be used simultaneously and interchangeably.
A JSON-friendly data structure which allows both object attributes and dictionary keys and values to be used simultaneously and interchangeably.
This is an implementation of PEP 557, Data Classes.
This is an implementation of PEP 557, Data Classes. It is a backport for Python 3.6. Because dataclasses will be included in Python 3.7, any discussio
D2LV: A Data-Driven and Local-Verification Approach for Image Copy Detection
Facebook AI Image Similarity Challenge: Matching Track —— Team: imgFp This is the source code of our 3rd place solution to matching track of Image Sim
Microsoft Azure provides a wide number of services for managing and storing data
Microsoft Azure provides a wide number of services for managing and storing data. One product is Microsoft Azure SQL. Which gives us the capability to create and manage instances of SQL Servers hosted in the cloud. This project, demonstrates how to use these services to manage data we collect from different sources.
Upload an Excel/CSV file ( 200 MB) and produce a short summary of the data.
Data-Analysis-Report Deployed App 1. What is this app? Upload an excel/csv file and produce a summary report of the data. 2. Where to upload? How to p
TensorFlow CNN for fast style transfer
Fast Style Transfer in TensorFlow Add styles from famous paintings to any photo in a fraction of a second! It takes 100ms on a 2015 Titan X to style t
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.
Search for files under the specified directory. Extract the file name and file path and import them as data.
Search for files under the specified directory. Extract the file name and file path and import them as data. Based on that, search for the file, select it and open it.
pandas-gbq is a package providing an interface to the Google BigQuery API from pandas
pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda
Utils for streaming large files (S3, HDFS, gzip, bz2...)
smart_open — utils for streaming large files in Python What? smart_open is a Python 3 library for efficient streaming of very large files from/to stor
A system for quickly generating training data with weak supervision
Programmatically Build and Manage Training Data Announcement The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI applicat
Extract data from a wide range of Internet sources into a pandas DataFrame.
pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d
Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of.
Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of. Th
Synthetic Data Generation for tabular, relational and time series data.
An Open Source Project from the Data to AI Lab, at MIT Website: https://sdv.dev Documentation: https://sdv.dev/SDV User Guides Developer Guides Github
Flexible HDF5 saving/loading and other data science tools from the University of Chicago
deepdish Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog: htt
Python library for reading and writing tabular data via streams.
tabulator-py A library for reading and writing tabular data (csv/xls/json/etc). [Important Notice] We have released Frictionless Framework. This frame
A common, beautiful interface to tabular data, no matter the format
rows No matter in which format your tabular data is: rows will import it, automatically detect types and give you high-level Python objects so you can
Tools for test driven data-wrangling and data validation.
datatest: Test driven data-wrangling and data validation Datatest helps to speed up and formalize data-wrangling and data validation tasks. It impleme
Tools for parsing messy tabular data.
Parsing for messy tables A library for dealing with messy tabular data in several formats, guessing types and detecting headers. See the documentation
A wrapper library to read, manipulate and write data in xlsx and xlsm format using openpyxl
pyexcel-xlsx - Let you focus on data, instead of xlsx format pyexcel-xlsx is a tiny wrapper library to read, manipulate and write data in xlsx and xls
Excalibur: A web interface to extract tabular data from PDFs
Excalibur: A web interface to extract tabular data from PDFs Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It i
Deep Difference and search of any Python object/data.
DeepDiff v 5.6.0 DeepDiff Overview DeepDiff: Deep Difference of dictionaries, iterables, strings and other objects. It will recursively look for all t
Python supercharged for the fastai library
Welcome to fastcore Python goodies to make your coding faster, easier, and more maintainable Python is a powerful, dynamic language. Rather than bake
Analyze, visualize and process sound field data recorded by spherical microphone arrays.
Sound Field Analysis toolbox for Python The sound_field_analysis toolbox (short: sfa) is a Python port of the Sound Field Analysis Toolbox (SOFiA) too
`charts.css.py` brings `charts.css` to Python. Online documentation and samples is available at the link below.
charts.css.py charts.css.py provides a python API to convert your 2-dimension data lists into html snippet, which will be rendered into charts by CSS,
A dashboard built using Plotly-Dash for interactive visualization of Dex-connected individuals across the country.
Dashboard For The DexConnect Platform of Dexterity Global Working prototype submission for internship at Dexterity Global Group. Dashboard for real ti
Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging
Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L
A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece.
COVID-19-Greece A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece. Data sources Data provided by Johns Hopki
Self-Supervised Image Denoising via Iterative Data Refinement
Self-Supervised Image Denoising via Iterative Data Refinement Yi Zhang1, Dasong Li1, Ka Lung Law2, Xiaogang Wang1, Hongwei Qin2, Hongsheng Li1 1CUHK-S
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Environments Effi
Python3 library that can retrieve Chrome-based browser's saved login info.
Passax EDUCATIONAL PURPOSES ONLY Python3 library that can retrieve Chrome-based browser's saved login info. Requirements secretstorage~=3.3.1 pywin32=
Python script that extract data via YouTube Api and manipulates it.
UNLIMITED README for the Unlimited game [Mining game] Explore the docs » View Demo · Report Bug · Request Feature Table of Contents About The Project
This script has been created in order to find what are the most common demanded technologies in Data Engineering field.
This is a Python script that given a whole corpus of job descriptions and a file with keywords it extracts the number of number of ocurrences of these keywords and write it to a file. This script it is easy to extend to accept more functionalities
Kodi script for proper Australian weather data
Kodi Oz Weather weather.ozweather Script for Kodi for high quality Australian weather data sourced directly from the BOM. Available from the Kodi offi
A Python package for Misty II development
Misty2py Misty2py is a Python 3 package for Misty II development using Misty's REST API. Read the full documentation here! Installation Poetry To inst
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification
Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P
Evaluating saliency methods on artificial data with different background types
Evaluating saliency methods on artificial data with different background types This repository contains the relevant code for the MedNeurips 2021 subm
ScisorWiz: Differential Isoform Visualizer for Long-Read RNA Sequencing Data
ScisorWiz: Vizualizer for Differential Isoform Expression README ScisorWiz is a linux-based R-package for visualizing differential isoform expression
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.
Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down
The test data, code and detailed description of the AW t-SNE algorithm
AW-t-SNE The test data, code and result of the AW t-SNE algorithm Structure of the folder Datasets: This folder contains two datasets, the MNIST datas
The sarge package provides a wrapper for subprocess which provides command pipeline functionality.
Overview The sarge package provides a wrapper for subprocess which provides command pipeline functionality. This package leverages subprocess to provi
Validation and inference over LinkML instance data using souffle
Translates LinkML schemas into Datalog programs and executes them using Souffle, enabling advanced validation and inference over instance data
Python Data Structures and Algorithms
No non-sense and no BS repo for how data structure code should be in Python - simple and elegant.
Haphazard scripts for scraping bitcoin/bitcoin data from GitHub
This is a quick-and-dirty tool used to scrape bitcoin/bitcoin pull request and commentary data. Each output/pr number folder contains comments.json:
An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.
An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.