2784 Repositories
Python data-quality-monitoring Libraries
Missing data visualization module for Python.
missingno Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities tha
Quickly and accurately render even the largest data.
Turn even the largest data into images, accurately Build Status Coverage Latest dev release Latest release Docs Support What is it? Datashader is a da
With Holoviews, your data visualizes itself.
HoloViews Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data a
Fast data visualization and GUI tools for scientific / engineering applications
PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h
Uniform Manifold Approximation and Projection
UMAP Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, bu
Create HTML profiling reports from pandas DataFrame objects
Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
Interactive Data Visualization in the browser, from Python
Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.
Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t
Statistical data visualization using matplotlib
seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing
A toolkit for making real world machine learning and data analysis applications in C++
dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl
fklearn: Functional Machine Learning
fklearn: Functional Machine Learning fklearn uses functional programming principles to make it easier to solve real problems with Machine Learning. Th
Shōgun
The SHOGUN machine learning toolbox Unified and efficient Machine Learning since 1999. Latest release: Cite Shogun: Develop branch build status: Donat
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin
ktrain is a Python library that makes deep learning and AI more accessible and easier to apply
Overview | Tutorials | Examples | Installation | FAQ | How to Cite Welcome to ktrain News and Announcements 2020-11-08: ktrain v0.25.x is released and
Deep learning library featuring a higher-level API for TensorFlow.
TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
Deep Learning for humans
Keras: Deep Learning for Python Under Construction In the near future, this repository will be used once again for developing the Keras codebase. For
Statsmodels: statistical modeling and econometrics in Python
About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an
Apache Spark - A unified analytics engine for large-scale data processing
Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op
scikit-learn: machine learning in Python
scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started
Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automatically use request headers such as x-request-id or x-correlation-id.
starlette context Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automat
Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.
Glances - An eye on your system Summary Glances is a cross-platform monitoring tool which aims to present a large amount of monitoring information thr
Cross-platform lib for process and system monitoring in Python
Home Install Documentation Download Forum Blog Funding What's new Summary psutil (process and system utilities) is a cross-platform library for retrie
pytest plugin for manipulating test data directories and files
pytest-datadir pytest plugin for manipulating test data directories and files. Usage pytest-datadir will look up for a directory with the name of your
Data-Driven Tests for Python Unittest
DDT (Data-Driven Tests) allows you to multiply one test case by running it with different test data, and make it appear as multiple test cases. Instal
Inject code into running Python processes
pyrasite Tools for injecting arbitrary code into running Python processes. homepage: http://pyrasite.com documentation: http://pyrasite.rtfd.org downl
Explain yourself! Interrogate a codebase for docstring coverage.
interrogate: explain yourself Interrogate a codebase for docstring coverage. Why Do I Need This? interrogate checks your code base for missing docstri
Display tabular data in a visually appealing ASCII table format
PrettyTable Installation Install via pip: python -m pip install -U prettytable Install latest development version: python -m pip install -U git+https
A new kind of Progress Bar, with real time throughput, eta and very cool animations!
alive-progress :) A new kind of Progress Bar, with real-time throughput, eta and very cool animations! Ever found yourself in a remote ssh session, do
Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.
python-tabulate Pretty-print tabular data in Python, a library and a command-line utility. The main use cases of the library are: printing small table
A tool for measuring Python class cohesion.
Cohesion Cohesion is a tool for measuring Python class cohesion. In computer programming, cohesion refers to the degree to which the elements of a mod
Monitoring tool based on radon
xenon Xenon is a monitoring tool based on Radon. It monitors your code's complexity. Ideally, Xenon is run every time you commit code. Through command
Various code metrics for Python code
Radon Radon is a Python tool that computes various metrics from the source code. Radon can compute: McCabe's complexity, i.e. cyclomatic complexity ra
Flake8 wrapper to make it nice, legacy-friendly, configurable.
THE PROJECT IS ARCHIVED Forks: https://github.com/orsinium/forks It's a Flake8 wrapper to make it cool. Lint md, rst, ipynb, and more. Shareable and r
Flake8 plugin to find commented out or dead code
flake8-eradicate flake8 plugin to find commented out (or so called "dead") code. This is quite important for the project in a long run. Based on eradi
The strictest and most opinionated python linter ever!
wemake-python-styleguide Welcome to the strictest and most opinionated python linter ever. wemake-python-styleguide is actually a flake8 plugin with s
Performant type-checking for python.
Pyre is a performant type checker for Python compliant with PEP 484. Pyre can analyze codebases with millions of lines of code incrementally – providi
A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycodestyle.
flake8-bugbear A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycode
It's not just a linter that annoys you!
README for Pylint - https://pylint.pycqa.org/ Professional support for pylint is available as part of the Tidelift Subscription. Tidelift gives softwa
Lazydata: Scalable data dependencies for Python projects
lazydata: scalable data dependencies lazydata is a minimalist library for including data dependencies into Python projects. Problem: Keeping all data
MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)
mongo-connector The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Pl
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
PyPika - Python Query Builder Abstract What is PyPika? PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a s
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
dataset: databases for lazy people In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files. Read the
Pandas Google BigQuery
pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".
the problem What directory should your app use for storing user data? If running on macOS, you should use: ~/Library/Application Support/AppName If
Lightweight data validation and adaptation Python library.
Valideer Lightweight data validation and adaptation library for Python. At a Glance: Supports both validation (check if a value is valid) and adaptati
Python Data Structures for Humans™.
Schematics Python Data Structures for Humans™. About Project documentation: https://schematics.readthedocs.io/en/latest/ Schematics is a Python librar
Typical: Fast, simple, & correct data-validation using Python 3 typing.
typical: Python's Typing Toolkit Introduction Typical is a library devoted to runtime analysis, inference, validation, and enforcement of Python types
A simple, fast, extensible python library for data validation.
Validr A simple, fast, extensible python library for data validation. Simple and readable schema 10X faster than jsonschema, 40X faster than schematic
Python Data Validation for Humans™.
validators Python data validation for Humans. Python has all kinds of data validation tools, but every one of them seems to require defining a schema
CONTRIBUTIONS ONLY: Voluptuous, despite the name, is a Python data validation library.
CONTRIBUTIONS ONLY What does this mean? I do not have time to fix issues myself. The only way fixes or new features will be added is by people submitt
Lightweight, extensible data validation library for Python
Cerberus Cerberus is a lightweight and extensible data validation library for Python. v = Validator({'name': {'type': 'string'}}) v.validate({
Data parsing and validation using Python type hints
pydantic Data validation and settings management using Python type hinting. Fast and extensible, pydantic plays nicely with your linters/IDE/brain. De
Protocol Buffers - Google's data interchange format
Protocol Buffers - Google's data interchange format Copyright 2008 Google Inc. https://developers.google.com/protocol-buffers/ Overview Protocol Buffe
Easily share data across your company via SQL queries. From Grove Collab.
SQL Explorer SQL Explorer aims to make the flow of data between people fast, simple, and confusion-free. It is a Django-based application that you can
A reusable Django model field for storing ad-hoc JSON data
jsonfield jsonfield is a reusable model field that allows you to store validated JSON, automatically handling serialization to and from the database.
Django application and library for importing and exporting data with admin integration.
django-import-export django-import-export is a Django application and library for importing and exporting data with included admin integration. Featur
a flask profiler which watches endpoint calls and tries to make some analysis.
Flask-profiler version: 1.8 Flask-profiler measures endpoints defined in your flask application; and provides you fine-grained report through a web in
A flask extension using pyexcel to read, manipulate and write data in different excel formats: csv, ods, xls, xlsx and xlsm.
Flask-Excel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenu
Django Smuggler is a pluggable application for Django Web Framework that helps you to import/export fixtures via the automatically-generated administration interface.
Django Smuggler Django Smuggler is a pluggable application for Django Web Framework to easily dump/load fixtures via the automatically-generated admin
Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automatically use request headers such as x-request-id or x-correlation-id.
starlette context Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automat
GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
GoAccess What is it? GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal on *nix systems or through y
Real-time metrics for nginx server
ngxtop - real-time metrics for nginx server (and others) ngxtop parses your nginx access log and outputs useful, top-like, metrics of your nginx serve
Automatically monitor the evolving performance of Flask/Python web services.
Flask Monitoring Dashboard A dashboard for automatic monitoring of Flask web-services. Key Features • How to use • Live Demo • Feedback • Documentatio
Sentry is cross-platform application monitoring, with a focus on error reporting.
Users and logs provide clues. Sentry provides answers. What's Sentry? Sentry is a service that helps you monitor and fix crashes in realtime. The serv
Library to scrape and clean web pages to create massive datasets.
lazynlp A straightforward library that allows you to crawl, clean up, and deduplicate webpages to create massive monolingual datasets. Using this libr
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re
IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies
IMDbPY is a Python package for retrieving and managing the data of the IMDb movie database about movies, people and companies. Revamp notice Starting
TuShare is a utility for crawling historical data of China stocks
TuShare Tushare Pro版已发布,请访问新的官网了解和查询数据接口! https://tushare.pro TuShare是实现对股票/期货等金融数据从数据采集、清洗加工 到 数据存储过程的工具,满足金融量化分析师和学习数据分析的人在数据获取方面的需求,它的特点是数据覆盖范围广,接口
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
Best-of Machine Learning with Python 🏆 A ranked list of awesome machine learning Python libraries. Updated weekly. This curated list contains 840 awe
Mockoon is the easiest and quickest way to run mock APIs locally. No remote deployment, no account required, open source.
Mockoon Mockoon is the easiest and quickest way to run mock APIs locally. No remote deployment, no account required, open source. It has been built wi
Script used to download data for stocks.
This script is useful for downloading stock market data for a wide range of companies specified by their respective tickers. The script reads in the d
🏆 A ranked list of awesome Python open-source libraries and tools. Updated weekly.
Best-of Python 🏆 A ranked list of awesome Python open-source libraries & tools. Updated weekly. This curated list contains 230 awesome open-source pr
DeiT: Data-efficient Image Transformers
DeiT: Data-efficient Image Transformers This repository contains PyTorch evaluation code, training code and pretrained models for DeiT (Data-Efficient
Deploy a ML inference service on a budget in less than 10 lines of code.
BudgetML is perfect for practitioners who would like to quickly deploy their models to an endpoint, but not waste a lot of time, money, and effort trying to figure out how to do this end-to-end.
This is a database of 180.000+ symbols containing Equities, ETFs, Funds, Indices, Futures, Options, Currencies, Cryptocurrencies and Money Markets.
Finance Database As a private investor, the sheer amount of information that can be found on the internet is rather daunting.
Python AsyncIO data API to manage billions of resources
Introduction Please read the detailed docs This is the working project of the next generation Guillotina server based on asyncio. Dependencies Python
⚾🤖⚾ Automatic baseball pitching overlay in realtime
⚾ Automatically overlaying pitch motion and trajectory with machine learning! This project takes your baseball pitching clips and automatically genera
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.
WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.
Graphing communities on Twitch.tv in a visually intuitive way
VisualizingTwitchCommunities This project maps communities of streamers on Twitch.tv based on shared viewership. The data is collected from the Twitch
Converts Betaflight blackbox gyro to MP4 GoPro Meta data so it can be used with ReelSteady GO
Here are a bunch of scripts that I created some time ago as a proof of concept that Betaflight blackbox gyro data can be converted to GoPro Metadata F
Export your data from Xiami
Xiami Exporter 导出虾米音乐的个人数据,功能: 导出歌曲为 json 收藏歌曲 收藏专辑 播放列表 导出收藏艺人为 json 导出收藏专辑为 json 导出播放列表为 json (个人和收藏) 将导出的数据整理至 sqlite 数据库 收藏歌曲 收藏艺人 收藏专辑 播放列表 下载已导出
An API serving data on all creatures, monsters, materials, equipment, and treasure in The Legend of Zelda: Breath of the Wild
Hyrule Compendium API An API serving data on all creatures, monsters, materials, equipment, and treasure in The Legend of Zelda: Breath of the Wild. B
Soda SQL Data testing, monitoring and profiling for SQL accessible data.
Soda SQL Data testing, monitoring and profiling for SQL accessible data. What does Soda SQL do? Soda SQL allows you to Stop your pipeline when bad dat
HTTP(s) "monitoring" webpage via FastAPI+Jinja2. Inspired by https://github.com/RaymiiOrg/bash-http-monitoring
python-http-monitoring HTTP(s) "monitoring" powered by FastAPI+Jinja2+aiohttp. Inspired by bash-http-monitoring. Installation can be done with pipenv
A framework for joint super-resolution and image synthesis, without requiring real training data
SynthSR This repository contains code to train a Convolutional Neural Network (CNN) for Super-resolution (SR), or joint SR and data synthesis. The met
data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer"
C2F-FWN data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer" (https://arxiv.org/abs/
This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.
Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc
A Pythonic client for the official https://data.gov.gr API.
pydatagovgr An unofficial Pythonic client for the official data.gov.gr API. Aims to be an easy, intuitive and out-of-the-box way to: find data publish
A minimalistic library designed to provide native access to YNAB data from Python
pYNAB A minimalistic library designed to provide native access to YNAB data from Python. Install The simplest way is to install the latest version fro
Python wrapper for the Sportradar APIs ⚽️🏈
Sportradar APIs This is a Python wrapper for the sports APIs provided by Sportradar. You'll need to sign up for an API key to use the service. Sportra
Python client for the Socrata Open Data API
sodapy sodapy is a python client for the Socrata Open Data API. Installation You can install with pip install sodapy. If you want to install from sour
A Python API to retrieve and read MLB GameDay data
mlbgame mlbgame is a Python API to retrieve and read MLB GameDay data. mlbgame works with real time data, getting information as games are being playe
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
:snake: A simple library to fetch data from the iTunes Store API made for Python = 3.5
itunespy itunespy is a simple library to fetch data from the iTunes Store API made for Python 3.5 and beyond. Important: Since version 1.6 itunespy no
Backtest 1000s of minute-by-minute trading algorithms for training AI with automated pricing data from: IEX, Tradier and FinViz. Datasets and trading performance automatically published to S3 for building AI training datasets for teaching DNNs how to trade. Runs on Kubernetes and docker-compose. 150 million trading history rows generated from +5000 algorithms. Heads up: Yahoo's Finance API was disabled on 2019-01-03 https://developer.yahoo.com/yql/
Stock Analysis Engine Build and tune investment algorithms for use with artificial intelligence (deep neural networks) with a distributed stack for ru
Python SDK for IEX Cloud
iexfinance Python SDK for IEX Cloud. Architecture mirrors that of the IEX Cloud API (and its documentation). An easy-to-use toolkit to obtain data for