786 Python Computational-social-science Libraries

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

102 Nov 10, 2022

Algorithmic trading using machine learning.

Algorithmic Trading This machine learning algorithm was built using Python 3 and scikit-learn with a Decision Tree Classifier. The program gathers sto

101 Nov 10, 2022

Introducing neural networks to predict stock prices

IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o

637 Jan 4, 2023

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi

1.3k Jan 7, 2023

Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

1.3k Dec 29, 2022

Satellite imagery for dummies.

felicette Satellite imagery for dummies. What can you do with this tool? TL;DR: Generate JPEG earth imagery from coordinates/location name with public

1.8k Jan 3, 2023

Processing and interpolating spatial data with a twist of machine learning

Documentation | Documentation (dev version) | Contact | Part of the Fatiando a Terra project About Verde is a Python library for processing spatial da

468 Dec 20, 2022

Apache Superset is a Data Visualization and Data Exploration Platform

49.9k Jan 2, 2023

IPython: Productive Interactive Computing

IPython: Productive Interactive Computing Overview Welcome to IPython. Our full documentation is available on ipython.readthedocs.io and contains info

15.6k Dec 31, 2022

FlaskBB is a Forum Software written in Python using the micro framework Flask.

2.3k Dec 30, 2022

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

6k Jan 4, 2023

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f

114 Oct 12, 2022

Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. I

2.4k Jan 2, 2023

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis

4.7k Jan 9, 2023

Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl

3.8k Jan 5, 2023

(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

combo: A Python Toolbox for Machine Learning Model Combination Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License

606 Dec 21, 2022

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

4.2k Dec 28, 2022

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

6.2k Jan 1, 2023

A simplified framework and utilities for PyTorch

Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne

534 Dec 17, 2022

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

1.8k Jan 2, 2023

Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

282 Dec 9, 2022

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

696 Dec 26, 2022

Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

536 Dec 29, 2022

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

2.5k Jan 6, 2023

A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

6k Jan 6, 2023

Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

7k Jan 6, 2023

Documentation and samples for ArcGIS API for Python

ArcGIS API for Python ArcGIS API for Python is a Python library for working with maps and geospatial data, powered by web GIS. It provides simple and

1.4k Dec 30, 2022

"Very simple but works well" Computer Vision based ID verification solution provided by LibraX.

ID Verification by LibraX.ai This is the first free Identity verification in the market. LibraX.ai is an identity verification platform for developers

46 Dec 6, 2022

A set of workflows for corpus building through OCR, post-correction and normalisation

PICCL: Philosophical Integrator of Computational and Corpus Libraries PICCL offers a workflow for corpus building and builds on a variety of tools. Th

41 Dec 27, 2022

A collection of full-stack resources for programmers.

22.3k Dec 30, 2022

Baselines for TrajNet++

TrajNet++ : The Trajectory Forecasting Framework PyTorch implementation of Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TrajNet

183 Jan 5, 2023

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.

Machine learning metrics for distributed, scalable PyTorch applications.

1.2k Jan 6, 2023

Consolidating and extending hosts files from several well-curated sources. You can optionally pick extensions to block pornography, social media, and other categories.

Take Note! With the exception of issues and PRs regarding changes to hosts/data/StevenBlack/hosts, all other issues regarding the content of the produ

22.1k Jan 2, 2023

🦉Data Version Control | Git for Data & Models

Website • Docs • Blog • Twitter • Chat (Community & Support) • Tutorial • Mailing List Data Version Control or DVC is an open-source tool for data sci

10.9k Jan 9, 2023

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

97 Jan 3, 2023

Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 1, 2023

An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

450 Dec 30, 2022

Supervised domain-agnostic prediction framework for probabilistic modelling

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data

112 Oct 23, 2022

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

3.7k Jan 1, 2023

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

Master status: Development status: Package information: scikit-rebate This package includes a scikit-learn-compatible Python implementation of ReBATE,

374 Dec 15, 2022

A fast xgboost feature selection algorithm

BoostARoota A Fast XGBoost Feature Selection Algorithm (plus other sklearn tree-based classifiers) Why Create Another Algorithm? Automated processes l

187 Dec 22, 2022

Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

7k Jan 3, 2023

An open source python library for automated feature engineering

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to

6.4k Jan 5, 2023

Build, test, deploy, iterate - Dev and prod tool for data science pipelines

Prodmodel is a build system for data science pipelines. Users, testers, contributors are welcome! Motivation · Concepts · Installation · Usage · Contr

53 Nov 29, 2022

BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

BatchFlow BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflo

185 Dec 20, 2022

Easy pipelines for pandas DataFrames.

pdpipe ˨ Easy pipelines for pandas DataFrames (learn how!). Website: https://pdpipe.github.io/pdpipe/ Documentation: https://pdpipe.github.io/pdpipe/d

694 Jan 5, 2023

Koalas: pandas API on Apache Spark

pandas API on Apache Spark Explore Koalas docs » Live notebook · Issues · Mailing list Help Thirsty Koalas Devastated by Recent Fires The Koalas proje

3.2k Jan 4, 2023

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

pysparkling Pysparkling provides a faster, more responsive way to develop programs for PySpark. It enables code intended for Spark applications to exe

254 Dec 6, 2022

Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 1, 2023

A library that implements fairness-aware machine learning algorithms

Themis ML themis-ml is a Python library built on top of pandas and sklearnthat implements fairness-aware machine learning algorithms. Fairness-aware M

105 Dec 30, 2022

A library for debugging/inspecting machine learning classifiers and explaining their predictions

ELI5 ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. It provides support for the following m

2.6k Dec 30, 2022

An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

2.3k Dec 31, 2022

Python library that makes it easy for data scientists to create charts.

Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l

3.2k Jan 1, 2023

Keras community contributions

keras-contrib : Keras community contributions Keras-contrib is deprecated. Use TensorFlow Addons. The future of Keras-contrib: We're migrating to tens

1.6k Dec 21, 2022

Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

3.2k Dec 23, 2022

A simplified framework and utilities for PyTorch

Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne

534 Dec 17, 2022

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 5, 2023

Python-based implementations of algorithms for learning on imbalanced data.

ND DIAL: Imbalanced Algorithms Minimalist Python-based implementations of algorithms for imbalanced learning. Includes deep and representational learn

220 Dec 13, 2022

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

6.2k Jan 1, 2023

MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

1.4k Jan 6, 2023

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

8.9k Jan 9, 2023

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

619 Dec 14, 2022

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

1.8k Jan 3, 2023

[HELP REQUESTED] Generalized Additive Models in Python

pyGAM Generalized Additive Models in Python. Documentation Official pyGAM Documentation: Read the Docs Building interpretable models with Generalized

747 Jan 5, 2023

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

1.4k Jan 1, 2023

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

4.2k Dec 29, 2022

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

3k Jan 8, 2023

Qwerkey is a social media platform for connecting and learning more about mechanical keyboards built on React and Redux in the frontend and Flask in the backend on top of a PostgreSQL database.

Flask React Project This is the backend for the Flask React project. Getting started Clone this repository (only this branch) git clone https://github

22 Dec 20, 2022

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

Flask React Project This is the backend for the Flask React project. Getting started Clone this repository (only this branch) git clone https://github

17 Sep 29, 2021

Abilian Social Business Engine - an enterprise social networking / collaboration platform.

About Abilian SBE (Social Business Engine) is a platform for social business applications, and more specifically collaborative / enterprise 2.0 busine

63 Dec 29, 2022

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm

267 Jan 1, 2023

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

3.9k Jan 5, 2023

Data intensive science for everyone.

The latest information about Galaxy can be found on the Galaxy Community Hub. Community support is available at Galaxy Help. Galaxy Quickstart Galaxy

1k Jan 8, 2023

A Python library that helps data scientists to infer causation rather than observing correlation.

1.7k Jan 4, 2023

Abilian Social Business Engine - an enterprise social networking / collaboration platform.

About Abilian SBE (Social Business Engine) is a platform for social business applications, and more specifically collaborative / enterprise 2.0 busine

63 Dec 29, 2022

🦉Data Version Control | Git for Data & Models

Website • Docs • Blog • Twitter • Chat (Community & Support) • Tutorial • Mailing List Data Version Control or DVC is an open-source tool for data sci

10.9k Jan 5, 2023

Social Distancing Detector using deep learning and capable to run on edge AI devices such as NVIDIA Jetson, Google Coral, and more.

Smart Social Distancing Smart Social Distancing Introduction Getting Started Prerequisites Usage Processor Optional Parameters Configuring AWS credent

129 Dec 12, 2022

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ranging from simulation engineering up to agent development, training and deployment.

222 Dec 24, 2022

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

5.4k Feb 20, 2021

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf 2021). Abstract In this work we propose Pathfind

49 Dec 1, 2022

The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algorithms that do the job in the least jargon possible and examples to guide you through every step of the way.

324 Dec 27, 2022

Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

11.7k Feb 18, 2021

Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

718 Feb 18, 2021

Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

1.5k Feb 17, 2021

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

VADER-Sentiment-Analysis VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifica

2.8k Feb 18, 2021

An open-source NLP research library, built on PyTorch.

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic

9.7k Feb 18, 2021

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

19.6k Feb 18, 2021

Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot

419 Feb 11, 2021

An open-source plotting library for statistical data.

Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le

509 Feb 17, 2021

Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

1.2k Feb 18, 2021

Python library that makes it easy for data scientists to create charts.

Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l

2.8k Feb 18, 2021

3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

692 Feb 18, 2021

Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

6.8k Feb 18, 2021

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

14k Feb 18, 2021

Statistical data visualization using matplotlib

seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing

8.1k Feb 18, 2021

Inner ear models for Python

cochlea cochlea is a collection of inner ear models. All models are easily accessible as Python functions. They take sound signal as input and return

98 Jan 5, 2023

Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.

Jupter Notebook REST API Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as

54 Nov 4, 2022

Download courses from khanacademy.org

khan-dl A python script to download courses from Khan Academy using youtube-dl and beautifulsoup4.

806 Jan 3, 2023

CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

CJK computer science terms comparison This repository contains the source code of the website. You can see the website from the following link: Englis

88 Dec 23, 2022

Python Computational-social-science Resources

Python computational-social-science Libraries

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Algorithmic trading using machine learning.

Introducing neural networks to predict stock prices

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Using python and scikit-learn to make stock predictions

Satellite imagery for dummies.

Processing and interpolating spatial data with a twist of machine learning

Apache Superset is a Data Visualization and Data Exploration Platform

IPython: Productive Interactive Computing

FlaskBB is a Forum Software written in Python using the micro framework Flask.

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Probabilistic reasoning and statistical analysis in TensorFlow

(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

A library of extension and helper modules for Python's data analysis and machine learning libraries.

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

A simplified framework and utilities for PyTorch

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

Distributed scikit-learn meta-estimators in PySpark

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

Python module for machine learning time series:

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

A unified framework for machine learning with time series

Automatic extraction of relevant features from time series:

Documentation and samples for ArcGIS API for Python

"Very simple but works well" Computer Vision based ID verification solution provided by LibraX.

A set of workflows for corpus building through OCR, post-correction and normalisation

A collection of full-stack resources for programmers.

Baselines for TrajNet++

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.

Consolidating and extending hosts files from several well-curated sources. You can optionally pick extensions to block pornography, social media, and other categories.

🦉Data Version Control | Git for Data & Models

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Create HTML profiling reports from pandas DataFrame objects

An extension to pandas dataframes describe function.

Supervised domain-agnostic prediction framework for probabilistic modelling

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

A fast xgboost feature selection algorithm

Automatic extraction of relevant features from time series:

An open source python library for automated feature engineering

Build, test, deploy, iterate - Dev and prod tool for data science pipelines

BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

Easy pipelines for pandas DataFrames.

Koalas: pandas API on Apache Spark

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Create HTML profiling reports from pandas DataFrame objects

A library that implements fairness-aware machine learning algorithms

A library for debugging/inspecting machine learning classifiers and explaining their predictions

An intuitive library to add plotting functionality to scikit-learn objects.

Python library that makes it easy for data scientists to create charts.

Keras community contributions

Machine Learning Platform for Kubernetes

A simplified framework and utilities for PyTorch

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Python-based implementations of algorithms for learning on imbalanced data.

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

MLBox is a powerful Automated Machine Learning python library.

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

[HELP REQUESTED] Generalized Additive Models in Python

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

A library of extension and helper modules for Python's data analysis and machine learning libraries.

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

Qwerkey is a social media platform for connecting and learning more about mechanical keyboards built on React and Redux in the frontend and Flask in the backend on top of a PostgreSQL database.

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

Abilian Social Business Engine - an enterprise social networking / collaboration platform.

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Data intensive science for everyone.

A Python library that helps data scientists to infer causation rather than observing correlation.

Abilian Social Business Engine - an enterprise social networking / collaboration platform.

🦉Data Version Control | Git for Data & Models

Social Distancing Detector using deep learning and capable to run on edge AI devices such as NVIDIA Jetson, Google Coral, and more.