3246 Repositories
Python data-app-performance Libraries
Simple macOS StatusBar app to remind you to unplug your laptop when sufficiently charged
ChargeMon Simple macOS StatusBar app to monitor battery charge status and remind you to unplug your Mac when the battery is sufficiently charged Overv
An Android app that runs Elm in a webview. And a Python script to build the app or install it on the device.
Requirements You need to have installed: the Android SDK Elm Python git Starting a project Clone this repo and cd into it: $ git clone https://github.
GitHub action for AppSweep Mobile Application Security Testing
GitHub action for AppSweep can be used to continuously integrate app scanning using AppSweep into your Android app build process
In the case of your data having only 1 channel while want to use timm models
timm_custom Description In the case of your data having only 1 channel while want to use timm models (with or without pretrained weights), run the fol
Minimum Bounding Box of Geospatial data
BBOX Problem definition: The spatial data users often are required to obtain the coordinates of the minimum bounding box of vector and raster data in
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).
PandasVault — Advanced Pandas Functions and Code Snippets The only Pandas utility package you would ever need. It has no exotic external dependencies
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
PandaPy "I came across PandaPy last week and have already used it in my current project. It is a fascinating Python library with a lot of potential to
Automatically visualize your pandas dataframe via a single print! 📊 💡
A Python API for Intelligent Visual Discovery Lux is a Python library that facilitate fast and easy data exploration by automating the visualization a
Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Intel(R) Extension for Scikit-learn* Installation | Documentation | Examples | Support | FAQ With Intel(R) Extension for Scikit-learn you can accelera
Finding project directories in Python (data science) projects, just like there R rprojroot and here packages
Find relative paths from a project root directory Finding project directories in Python (data science) projects, just like there R here and rprojroot
Intake is a lightweight package for finding, investigating, loading and disseminating data.
Intake: A general interface for loading data Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and many other libraries. Documenta
A columnar data container that can be compressed.
Unmaintained Package Notice Unfortunately, and due to lack of resources, the Blosc Development Team is unable to maintain this package anymore. During
RangDev Notepad App With Python
RangDev Notepad-App-With-Python Take down quick and speedy notes! This is a small project of a notepad app built with Tkinter and SQLite3. Database cr
Metrics to evaluate quality and efficacy of synthetic datasets.
An Open Source Project from the Data to AI Lab, at MIT Metrics for Synthetic Data Generation Projects Website: https://sdv.dev Documentation: https://
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
A DSL for data-driven computational pipelines
"Dataflow variables are spectacularly expressive in concurrent programming" Henri E. Bal , Jennifer G. Steiner , Andrew S. Tanenbaum Quick overview Ne
Integrate bus data from a variety of sources (batch processing and real time processing).
Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and r
Python script for diving image data to train test and val
dataset-division-to-train-val-test-python python script for dividing image data to train test and val If you have an image dataset in the following st
Developing a python based app prototype with KivyMD framework for a competition :))
Developing a python based app prototype with KivyMD framework for a competition :))
Data repo for one-among.us
Our Data Data repo for one-among.us File Structure Directory /people/userid/: Data for a specific person info.json5: Profile information page.md: Pr
Pydantic based mock data generation
This library offers powerful mock data generation capabilities for pydantic based models. It can also be used with other libraries that use pydantic as a foundation, for example SQLModel, Beanie and ormar.
Maze generator and solver with python
Procedural-Maze-Generator-Algorithms Check out my youtube channel : Auctux Ressources Thanks to Jamis Buck Book : Mazes for programmers Requirements P
Helping you manage your data science projects sanely.
PyDS CLI Helping you manage your data science projects sanely. Requirements Anaconda/Miniconda/Miniforge/Mambaforge (Mambaforge recommended!) git on y
Source files for the data lake demo video using the AWS TICKIT database
Data Lake Demo Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a
An encryption format offering better security, performance and ease of use than PGP.
An encryption format offering better security, performance and ease of use than PGP. File a bug if you found anything where we are worse than our competition, and we will fix it.
The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"
MultiModal-Collaborative (MMC) Learning Framework for integrating RGB and Thermal spectral modalities This is the official code for NeurIPS 2021 Machi
This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.
Generative Adversarial Network - Generating Universe This repository contains part of the code used to make the images visible in the article "How doe
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Command line utilities for tabular data files This is a set of command line utilities for manipulating large tabular data files. Files of numeric and
Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer
Desafio Modulo 4 - Cloud Data Engineer Bootcamp - IGTI Objetivos Criar infraestrutura como código Utuilizando um cluster Kubernetes na Azure Ingestão
Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.
Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data. Then used Yahoo Finance to get the related stock data and displayed them in the form of charts.
Flask-app scaffold, generate flask restful backend
Flask-app scaffold, generate flask restful backend
Auto locust load test config and worker distribution with Docker and GitHub Action
Auto locust load test config and worker distribution with Docker and GitHub Action Install Fork the repo and change the visibility option to private S
The fastai book, published as Jupyter Notebooks
English / Spanish / Korean / Chinese / Bengali / Indonesian The fastai book These notebooks cover an introduction to deep learning, fastai, and PyTorc
Extension to fastai for volumetric medical data
FAIMED 3D use fastai to quickly train fully three-dimensional models on radiological data Classification from faimed3d.all import * Load data in vari
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...
Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)
Graph Convolutional Gated Recurrent Neural Network (GCGRNN) Improved from Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF
Project5 Data processing system
Project5-Data-processing-system User just needed to copy both these file to a folder and open Project5.py using cmd or using any python ide. It is to
Epidemiology analysis package
zEpid zEpid is an epidemiology analysis package, providing easy to use tools for epidemiologists coding in Python 3.5+. The purpose of this library is
Explorative Data Analysis Guidelines
Explorative Data Analysis Get data into a usable format! Find out if the following predictive modeling phase will be successful! Combine everything in
cleanlab is the data-centric ML ops package for machine learning with noisy labels.
cleanlab is the data-centric ML ops package for machine learning with noisy labels. cleanlab cleans labels and supports finding, quantifying, and lear
Data imputations library to preprocess datasets with missing data
Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.
Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis.
Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis. It is distributed under the MIT License.
Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.
Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to ex
dirty_cat is a Python module for machine-learning on dirty categorical variables.
dirty_cat dirty_cat is a Python module for machine-learning on dirty categorical variables.
Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.
Pypeln Pypeln (pronounced as "pypeline") is a simple yet powerful Python library for creating concurrent data pipelines. Main Features Simple: Pypeln
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
Feature Engineering & Feature Selection A comprehensive guide [pdf] [markdown] for Feature Engineering and Feature Selection, with implementations and
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.
Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm
MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data
MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data. We demonstrate its use
OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages
OCR-Streamlit-App OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages OCR app gets an image a
Dump Data from FTDI Serial Port to Binary File on MacOS
Dump Data from FTDI Serial Port to Binary File on MacOS
Crypto Stats and Tweets Data Pipeline using Airflow
Crypto Stats and Tweets Data Pipeline using Airflow Introduction Project Overview This project was brought upon through Udacity's nanodegree program.
A package to predict protein inter-residue geometries from sequence data
trRosetta This package is a part of trRosetta protein structure prediction protocol developed in: Improved protein structure prediction using predicte
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models. Hyperactive: is very easy to lear
68 keypoint annotations for COFW test data
68 keypoint annotations for COFW test data This repository contains manually annotated 68 keypoints for COFW test data (original annotation of CFOW da
Centralized whale instance using github actions, sourcing metadata from bigquery-public-data.
Whale Demo Instance: Bigquery Public Data This is a fully-functioning demo instance of the whale data catalog, actively scraping data from Bigquery's
Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
FFT-accelerated Interpolation-based t-SNE (FIt-SNE) Introduction t-Stochastic Neighborhood Embedding (t-SNE) is a highly successful method for dimensi
An interactive UMAP visualization of the MNIST data set.
Code for an interactive UMAP visualization of the MNIST data set. Demo at https://grantcuster.github.io/umap-explorer/. You can read more about the de
A high-performance topological machine learning toolbox in Python
giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G
Single-Cell Analysis in Python. Scales to 1M cells.
Scanpy – Single-Cell Analysis in Python Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It inc
3D rendered visualization of the austrian monuments registry
Visualization of the Austrian Monuments Visualization of the monument landscape of the austrian monuments registry (Bundesdenkmalamt Denkmalverzeichni
Fast 1D and 2D histogram functions in Python
About Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. Numpy's histogram functions are versatile, a
Falcon: Interactive Visual Analysis for Big Data
Falcon: Interactive Visual Analysis for Big Data Crossfilter millions of records without latencies. This project is work in progress and not documente
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.
Visdom A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Python. Overview Concepts Setup Usage API To
Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns.
Make Complex Heatmaps Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. H
A set of useful perceptually uniform colormaps for plotting scientific data
Colorcet: Collection of perceptually uniform colormaps Build Status Coverage Latest dev release Latest release Docs What is it? Colorcet is a collecti
Streamlit — The fastest way to build data apps in Python
Welcome to Streamlit 👋 The fastest way to build and share data apps. Streamlit lets you turn data scripts into sharable web apps in minutes, not week
The purpose of this project is to share knowledge on how awesome Streamlit is and can be
Awesome Streamlit The fastest way to build Awesome Tools and Apps! Powered by Python! The purpose of this project is to share knowledge on how Awesome
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.
Visdom A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Python. Overview Concepts Setup Usage API To
Select, weight and analyze complex sample data
Sample Analytics In large-scale surveys, often complex random mechanisms are used to select samples. Estimates derived from such samples must reflect
Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.
Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.
An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu
Visualization ideas for data science
Nuance I use Nuance to curate varied visualization thoughts during my data scientist career. It is not yet a package but a list of small ideas. Welcom
🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
Approximate Nearest Neighbor Search for Sparse Data in Python!
Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).
Tesla App Update Differences Extractor
Tesla App Update Differences Extractor Python program that finds the differences between two versions of the Tesla App. When Tesla updates the app a l
Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.
Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.
A program that analyzes data from inertia measurement units installeed in aircraft and generates g-exceedance curves
A program that analyzes data from inertia measurement units installeed in aircraft and generates g-exceedance curves
What if home automation was homoiconic? Just transformations of data? No more YAML!
radiale what if home-automation was also homoiconic? The upper or proximal row contains three bones, to which Gegenbaur has applied the terms radiale,
A streamlit app for exploring image search results from HuggingPics
title emoji colorFrom colorTo sdk app_file pinned huggingpics-explorer 🤗 blue red streamlit app.py false huggingpics-explorer A streamlit app for exp
Steganography Image/Data Injector.
Byte Steganography Image/Data Injector. For artists or people to inject their own print/data into their images. TODO Add more file formats to support.
Python module for data science and machine learning users.
dsnk-distributions package dsnk distribution is a Python module for data science and machine learning that was created with the goal of reducing calcu
Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.
Facebook Scraper Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key. (Currently working 2021) Setup Befo
Python beta calculator that retrieves stock and market data and provides linear regressions.
Stock and Index Beta Calculator Python script that calculates the beta (β) of a stock against the chosen index. The script retrieves the data and resa
Lightweight library for accessing data and configuration
accsr This lightweight library contains utilities for managing, loading, uploading, opening and generally wrangling data and configurations. It was ba
BErt-like Neurophysiological Data Representation
BENDR BErt-like Neurophysiological Data Representation This repository contains the source code for reproducing, or extending the BERT-like self-super
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021) PyTorch implementation of SnapMix | paper Method Overview Cite
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management
ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈
Rainbow 🌈 An implementation of Rainbow DQN which outperforms the paper's (Hessel et al. 2017) results on 40% of tested games while using 20x less dat
Deep Learning with PyTorch made easy 🚀 !
Deep Learning with PyTorch made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. It also provides a c
Python package for missing-data imputation with deep learning
MIDASpy Overview MIDASpy is a Python package for multiply imputing missing data using deep learning methods. The MIDASpy algorithm offers significant
Create a database, insert data and easily select it with Sqlite
sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo
A Python package to process & model ChEMBL data.
insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper
An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files.
foamTEX An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files. Explore the docs » Report Bug · Requ
Data Applications Project
DBMS project- Hotel Franchise Data and application project By TEAM Kurukunda Bhargavi Pamulapati Pallavi Greeshma Amaraneni What is this project about
PVE with tcaledger app for payments and simulation of payment requests
tcaledger PVE with tcaledger app for payments and simulation of payment requests. The purpose of this API is to empower users to accept cryptocurrenci
Sheet Data Image/PDF-to-CSV Converter
Sheet Data Image/PDF-to-CSV Converter
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.
Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,