3196 Repositories
Python data-processing Libraries
A fast and reliable background task processing library for Python 3.
dramatiq A fast and reliable distributed task processing library for Python 3. Changelog: https://dramatiq.io/changelog.html Community: https://groups
Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files
pyexcel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenue ge
Statsmodels: statistical modeling and econometrics in Python
About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an
Rich Python data types for Redis
Created by Stephen McDonald Introduction HOT Redis is a wrapper library for the redis-py client. Rather than calling the Redis commands directly from
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
dataset: databases for lazy people In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files. Read the
Safely pass trusted data to untrusted environments and back.
ItsDangerous ... so better sign this Various helpers to pass data to untrusted environments and to get it back safe and sound. Data is cryptographical
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Welcome to TensorWatch TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Micr
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.
Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t
Multi-class confusion matrix library in Python
Table of contents Overview Installation Usage Document Try PyCM in Your Browser Issues & Bug Reports Todo Outputs Dependencies Contribution References
:bowtie: Create a dashboard with python!
Installation | Documentation | Gitter Chat | Google Group Bowtie Introduction Bowtie is a library for writing dashboards in Python. No need to know we
An intuitive library to add plotting functionality to scikit-learn objects.
Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i
Tools for exploratory data analysis in Python
Dora Exploratory data analysis toolkit for Python. Contents Summary Setup Usage Reading Data & Configuration Cleaning Feature Selection & Extraction V
Apache Superset is a Data Visualization and Data Exploration Platform
Superset A modern, enterprise-ready business intelligence web application. Why Superset? | Supported Databases | Installation and Configuration | Rele
The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction
windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr
The Python ensemble sampling toolkit for affine-invariant MCMC
emcee The Python ensemble sampling toolkit for affine-invariant MCMC emcee is a stable, well tested Python implementation of the affine-invariant ense
NumPy and Pandas interface to Big Data
Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte
Fast data visualization and GUI tools for scientific / engineering applications
PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h
Official Stanford NLP Python Library for Many Human Languages
Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac
Basic Utilities for PyTorch Natural Language Processing (NLP)
Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor
Topic Modelling for Humans
gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ
🗣️ NALP is a library that covers Natural Adversarial Language Processing.
NALP: Natural Adversarial Language Processing Welcome to NALP. Have you ever wanted to create natural text from raw sources? If yes, NALP is for you!
Multilingual text (NLP) processing toolkit
polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p
Yet Another Sequence Encoder - Encode sequences to vector of vector in python !
Yase Yet Another Sequence Encoder - encode sequences to vector of vectors in python ! Why Yase ? Yase enable you to encode any sequence which can be r
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual
NLP, before and after spaCy
textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig
💫 Industrial-strength Natural Language Processing (NLP) in Python
spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Colibri Core by Maarten van Gompel, [email protected], Radboud University Nijmegen Licensed under GPLv3 (See http://www.gnu.org/licenses/gpl-3.0.html
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
Ucto for Python This is a Python binding to the tokeniser Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task,
A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:
A Python package implementing a new model for text classification with visualization tools for Explainable AI 🍣 Online live demos: http://tworld.io/s
Tools, wrappers, etc... for data science with a concentration on text processing
Rosetta Tools for data science with a focus on text processing. Focuses on "medium data", i.e. data too big to fit into memory but too small to necess
Python library for processing Chinese text
SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par
Create UIs for prototyping your machine learning model in 3 minutes
Note: We just launched Hosted, where anyone can upload their interface for permanent hosting. Check it out! Welcome to Gradio Quickly create customiza
A unified framework for machine learning with time series
Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible
A Peer-to-peer Platform for Secure, Privacy-preserving, Decentralized Data Science
PyGrid is a peer-to-peer network of data owners and data scientists who can collectively train AI models using PySyft. PyGrid is also the central serv
A library for answering questions using data you cannot see
A library for computing on data you do not own and cannot see PySyft is a Python library for secure and private Deep Learning. PySyft decouples privat
Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:
MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents
A machine learning package for streaming data in Python. The other ancestor of River.
scikit-multiflow is a machine learning package for streaming data in Python. creme and scikit-multiflow are merging into a new project called River. W
Accelerated deep learning R&D
Accelerated deep learning R&D PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and
🌊 Online machine learning in Python
In a nutshell River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition
MiraiML: asynchronous, autonomous and continuous Machine Learning in Python
MiraiML Mirai: future in japanese. MiraiML is an asynchronous engine for continuous & autonomous machine learning, built for real-time usage. Usage In
StellarGraph - Machine Learning on Graphs
StellarGraph Machine Learning Library StellarGraph is a Python library for machine learning on graphs and networks. Table of Contents Introduction Get
Best Practices on Recommendation Systems
Recommenders What's New (February 4, 2021) We have a new relase Recommenders 2021.2! It comes with lots of bug fixes, optimizations and 3 new algorith
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose
Toolbox of models, callbacks, and datasets for AI/ML researchers.
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch Website • Installation • Main
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •
A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.
Xcessiv Xcessiv is a tool to help you create the biggest, craziest, and most excessive stacked ensembles you can think of. Stacked ensembles are simpl
Deep learning library featuring a higher-level API for TensorFlow.
TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista
A library of extension and helper modules for Python's data analysis and machine learning libraries.
Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc
Shōgun
The SHOGUN machine learning toolbox Unified and efficient Machine Learning since 1999. Latest release: Cite Shogun: Develop branch build status: Donat
Topic Modelling for Humans
gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par
Serverless proxy for Spark cluster
Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap
[UNMAINTAINED] Automated machine learning for analytics & production
auto_ml Automated machine learning for production and analytics Installation pip install auto_ml Getting started from auto_ml import Predictor from au
Lightweight, Python library for fast and reproducible experimentation :microscope:
Steppy What is Steppy? Steppy is a lightweight, open-source, Python 3 library for fast and reproducible experimentation. Steppy lets data scientist fo
(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Python Outlier Detection (PyOD) Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License PyOD is a comprehensive and sca
A Temporal Extension Library for PyTorch Geometric
Documentation | External Resources | Datasets PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library
a delightful machine learning tool that allows you to train, test and use models without writing code
igel A delightful machine learning tool that allows you to train/fit, test and use models without writing code Note I'm also working on a GUI desktop
A data-driven approach to quantify the value of classifiers in a machine learning ensemble.
Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The
A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.
imutils A series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displ
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125
Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc
Image processing in Python
scikit-image: Image processing in Python Website (including documentation): https://scikit-image.org/ Mailing list: https://mail.python.org/mailman3/l
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl
The easiest way to automate your data
Hello, world! 👋 We've rebuilt data engineering for the data science era. Prefect is a new workflow management system, designed for modern infrastruct
python binding for libvips using cffi
README PyPI package: https://pypi.python.org/pypi/pyvips conda package: https://anaconda.org/conda-forge/pyvips We have formatted docs online here: ht
The friendly PIL fork (Python Imaging Library)
Pillow Python Imaging Library (Fork) Pillow is the friendly PIL fork by Alex Clark and Contributors. PIL is the Python Imaging Library by Fredrik Lund
AkShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Overview AkShare requires Python(64 bit) 3.7 or greater, aims to make fetch financial data as convenient as possible. Write less, get more! Documentat
Python Stream Processing
Python Stream Processing Version: 1.10.4 Web: http://faust.readthedocs.io/ Download: http://pypi.org/project/faust Source: http://github.com/robinhood
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear
Deep Learning for humans
Keras: Deep Learning for Python Under Construction In the near future, this repository will be used once again for developing the Keras codebase. For
Statistical data visualization using matplotlib
seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing
A grammar of graphics for Python
plotnine Latest Release License DOI Build Status Coverage Documentation plotnine is an implementation of a grammar of graphics in Python, it is based
Interactive Data Visualization in the browser, from Python
Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi
:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof
Business Intelligence (BI) in Python, OLAP
Open Mining Business Intelligence (BI) Application Server written in Python Requirements Python 2.7 (Backend) Lua 5.2 or LuaJIT 5.1 (OML backend) Mong
NumPy and Pandas interface to Big Data
Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana
The Open Source Framework for Machine Vision
SimpleCV Quick Links: About Installation [Docker] (#docker) Ubuntu Virtual Environment Arch Linux Fedora MacOS Windows Raspberry Pi SimpleCV Shell Vid
Open Source Differentiable Computer Vision Library for PyTorch
Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set
Typical: Fast, simple, & correct data-validation using Python 3 typing.
typical: Python's Typing Toolkit Introduction Typical is a library devoted to runtime analysis, inference, validation, and enforcement of Python types
Data parsing and validation using Python type hints
pydantic Data validation and settings management using Python type hinting. Fast and extensible, pydantic plays nicely with your linters/IDE/brain. De
PyArmadillo: an alternative approach to linear algebra in Python
PyArmadillo is a linear algebra library for the Python language, with an emphasis on ease of use.
Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3
tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag
eyeD3 is a Python module and command line program for processing ID3 tags. Information about mp3 files (i.e bit rate, sample frequency, play time, etc.) is also provided. The formats supported are ID3v1 (1.0/1.1) and ID3v2 (2.3/2.4).
Status About eyeD3 is a Python tool for working with audio files, specifically MP3 files containing ID3 metadata (i.e. song info). It provides a comma
Scalable audio processing framework written in Python with a RESTful API
TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple
Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set
Python Sorted Containers Sorted Containers is an Apache2 licensed sorted collections library, written in pure-Python, and fast as C-extensions. Python
Repository for data structure and algorithms in Python for coding interviews
Python Data Structures and Algorithms This repository contains questions requiring implementation of data structures and algorithms concepts. It is us
Minimal examples of data structures and algorithms in Python
Pythonic Data Structures and Algorithms Minimal and clean example implementations of data structures and algorithms in Python 3. Contributing Thanks f
A Django app that creates automatic web UIs for Python scripts.
Wooey is a simple web interface to run command line Python scripts. Think of it as an easy way to get your scripts up on the web for routine data anal