3052 Repositories
Python research-data-management Libraries
:computer: Data Structures and Algorithms in Python
Algorithms in Python Implementations of a few algorithms and datastructures for fun and profit! Completed Karatsuba Multiplication Basic Sorting Rabin
Python library that makes it easy for data scientists to create charts.
Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l
Keras community contributions
keras-contrib : Keras community contributions Keras-contrib is deprecated. Use TensorFlow Addons. The future of Keras-contrib: We're migrating to tens
Machine Learning Platform for Kubernetes
Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica
A simplified framework and utilities for PyTorch
Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne
Data loaders and abstractions for text and NLP
torchtext This repository consists of: torchtext.datasets: The raw text iterators for common NLP datasets torchtext.data: Some basic NLP building bloc
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
TensorFlow implementation of an arbitrary order Factorization Machine
This is a TensorFlow implementation of an arbitrary order (=2) Factorization Machine based on paper Factorization Machines with libFM. It supports: d
Python-based implementations of algorithms for learning on imbalanced data.
ND DIAL: Imbalanced Algorithms Minimalist Python-based implementations of algorithms for imbalanced learning. Includes deep and representational learn
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla
MLBox is a powerful Automated Machine Learning python library.
MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar
A machine learning toolkit dedicated to time-series data
tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti
[HELP REQUESTED] Generalized Additive Models in Python
pyGAM Generalized Additive Models in Python. Documentation Official pyGAM Documentation: Read the Docs Building interpretable models with Generalized
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm
A library of extension and helper modules for Python's data analysis and machine learning libraries.
Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc
A toolkit for making real world machine learning and data analysis applications in C++
dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.
NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,
daily report of @arkinvest ETF activity + data collection
ark_invest daily weekday report of @arkinvest ETF activity + data collection This script was created to: Extract and save daily csv's from ARKInvest's
test
Lidar-data-decode In this project, you can decode your lidar data frame(pcap file) and make your own datasets(test dataset) in Windows without any hug
A shopping list and kitchen inventory management app.
Flask React Project This is the backend for the Flask React project. Getting started Clone this repository (only this branch) git clone https://github
An automated tool that fetches information about your crypto stake and generates historical data in time.
Introduction Yield explorer is a WIP! I needed a tool that would show me historical data and performance of my staked crypto but was unable to find a
Deep Illuminator is a data augmentation tool designed for image relighting.
Deep Illuminator Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently genera
Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly
Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly Code for this paper Ultra-Data-Efficient GAN Tra
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program
3D visualization of scientific data in Python
Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav
Data intensive science for everyone.
The latest information about Galaxy can be found on the Galaxy Community Hub. Community support is available at Galaxy Help. Galaxy Quickstart Galaxy
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work
An interactive explorer for single-cell transcriptomics data
an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra
RELATE is an Environment for Learning And TEaching
RELATE Relate is an Environment for Learning And TEaching RELATE is a web-based courseware package. It is set apart by the following features: Focus o
A Python library that helps data scientists to infer causation rather than observing correlation.
A Python library that helps data scientists to infer causation rather than observing correlation.
Conference planning tool: CfP, scheduling, speaker management
pretalx is a conference planning tool focused on providing the best experience for organisers, speakers, reviewers, and attendees alike. It handles th
Indico - A feature-rich event management system, made @ CERN, the place where the Web was born.
Indico Indico is: 🗓 a general-purpose event management tool; 🌍 fully web-based; 🧩 feature-rich but also extensible through the use of plugins; ⚖️ O
ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.
Collaborate This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to s
Agile project management platform. Built on top of Django and AngularJS
Taiga Backend Documentation Currently, we have authored three main documentation hubs: API: Our API documentation and reference for developing from Ta
Conference planning tool: CfP, scheduling, speaker management
pretalx is a conference planning tool focused on providing the best experience for organisers, speakers, reviewers, and attendees alike. It handles th
Open source platform for the machine learning lifecycle
MLflow: A Machine Learning Lifecycle Platform MLflow is a platform to streamline machine learning development, including tracking experiments, packagi
Indico - A feature-rich event management system, made @ CERN, the place where the Web was born.
Indico Indico is: 🗓 a general-purpose event management tool; 🌍 fully web-based; 🧩 feature-rich but also extensible through the use of plugins; ⚖️ O
🦉Data Version Control | Git for Data & Models
Website • Docs • Blog • Twitter • Chat (Community & Support) • Tutorial • Mailing List Data Version Control or DVC is an open-source tool for data sci
ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.
Collaborate This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to s
Admin Panel for GinoORM - ready to up & run (just add your models)
Gino-Admin Docs (state: in process): Gino-Admin docs Play with Demo (current master 0.2.3) Gino-Admin demo (login: admin, pass: 1234) Admin
Ready-to-use and customizable users management for FastAPI
FastAPI Users Ready-to-use and customizable users management for FastAPI Documentation: https://frankie567.github.io/fastapi-users/ Source Code: https
Backend, modern REST API for obtaining match and odds data crawled from multiple sites. Using FastAPI, MongoDB as database, Motor as async MongoDB client, Scrapy as crawler and Docker.
Introduction Apiestas is a project composed of a backend powered by the awesome framework FastAPI and a crawler powered by Scrapy. This project has fo
API & Webapp to answer questions about COVID-19. Using NLP (Question Answering) and trusted data sources.
This open source project serves two purposes. Collection and evaluation of a Question Answering dataset to improve existing QA/search methods - COVID-
MazeRL is an application oriented Deep Reinforcement Learning (RL) framework
MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ranging from simulation engineering up to agent development, training and deployment.
Data Visualizer for Super Mario Kart (SNES)
Data Visualizer for Super Mario Kart (SNES)
💻 Algo-Phantoms-Backend is an Application that provides pathways and quizzes along with a code editor to help you towards your DSA journey.📰🔥 This repository contains the REST APIs of the application.✨
Algo-Phantom-Backend 💻 Algo-Phantoms-Backend is an Application that provides pathways and quizzes along with a code editor to help you towards your D
Dynamic image server for web and print
Quru Image Server - dynamic imaging for web and print QIS is a high performance web server for creating and delivering dynamic images. It is ideal for
This is a new web-based photo management application. Run it on your home server and it will let you find the right photo from your collection on any device. Smart filtering is made possible by object recognition, location awareness, color analysis and other ML algorithms.
Photonix Photo Manager This is a photo management application based on web technologies. Run it on your home server and it will let you find what you
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis
Generate a roam research like Network Graph view from your Notion pages.
Notion Graph View Export Notion pages to a Roam Research like graph view.
A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"
A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf 2021). Abstract In this work we propose Pathfind
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
Rebiber: A tool for normalizing bibtex with official info. We often cite papers using their arXiv versions without noting that they are already PUBLIS
The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.
SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algorithms that do the job in the least jargon possible and examples to guide you through every step of the way.
Knowledge Management for Humans using Machine Learning & Tags
HyperTag HyperTag helps humans intuitively express how they think about their files using tags and machine learning.
Contrastive Learning Inverts the Data Generating Process
Official code to reproduce the results and data presented in the paper Contrastive Learning Inverts the Data Generating Process.
ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.
ForecastGA is a tool that combines a couple of popular libraries, Atspy and googleanalytics, with a few enhancements.
Aggregating gridded data (xarray) to polygons
A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample code run!
A simple worker for OpenClubhouse to sync data.
OpenClubhouse-Worker This is a simple worker for OpenClubhouse to sync CH channel data.
Knowledge Management for Humans using Machine Learning & Tags
HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your text documents (yes, even PDF's) and images.
Topic Modelling for Humans
gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ
NLP library designed for reproducible experimentation management
Welcome to the Transfer NLP library, a framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP You can
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides
Beautiful visualizations of how language differs among document types.
Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t
Basic Utilities for PyTorch Natural Language Processing (NLP)
Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor
Extract Keywords from sentence or Replace keywords in sentences.
FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!
Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok
An open-source NLP research library, built on PyTorch.
An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic
Data loaders and abstractions for text and NLP
torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual
💫 Industrial-strength Natural Language Processing (NLP) in Python
spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc
Dimensionality reduction in very large datasets using Siamese Networks
ivis Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets. Ivis
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js
pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot
A grammar of graphics for Python
plotnine Latest Release License DOI Build Status Coverage Documentation plotnine is an implementation of a grammar of graphics in Python, it is based
🧇 Make Waffle Charts in Python.
PyWaffle PyWaffle is an open source, MIT-licensed Python package for plotting waffle charts. It provides a Figure constructor class Waffle, which coul
A python package for animating plots build on matplotlib.
animatplot A python package for making interactive as well as animated plots with matplotlib. Requires Python = 3.5 Matplotlib = 2.2 (because slider
An open-source plotting library for statistical data.
Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le
Visualize and compare datasets, target values and associations, with one line of code.
In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat
HiPlot makes understanding high dimensional data easy
HiPlot - High dimensional Interactive Plotting HiPlot is a lightweight interactive visualization tool to help AI researchers discover correlations and
Joyplots in Python with matplotlib & pandas :chart_with_upwards_trend:
JoyPy JoyPy is a one-function Python package based on matplotlib + pandas with a single purpose: drawing joyplots (a.k.a. ridgeline plots). The code f
Visualizations for machine learning datasets
Introduction The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive
Python library that makes it easy for data scientists to create charts.
Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l
A Python toolbox for gaining geometric insights into high-dimensional data
"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)
PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization
Streaming pivot visualization via WebAssembly
Perspective is an interactive visualization component for large, real-time datasets. Originally developed for J.P. Morgan's trading business, Perspect
Library for exploring and validating machine learning data
TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig
Missing data visualization module for Python.
missingno Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities tha
Quickly and accurately render even the largest data.
Turn even the largest data into images, accurately Build Status Coverage Latest dev release Latest release Docs Support What is it? Datashader is a da
With Holoviews, your data visualizes itself.
HoloViews Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data a
Fast data visualization and GUI tools for scientific / engineering applications
PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h
Uniform Manifold Approximation and Projection
UMAP Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, bu
Create HTML profiling reports from pandas DataFrame objects
Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
Interactive Data Visualization in the browser, from Python
Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.
Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t