3863 Repositories
Python Fake-news-detection-Data-Mining Libraries
Visualizations for machine learning datasets
Introduction The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive
Python library that makes it easy for data scientists to create charts.
Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l
A Python toolbox for gaining geometric insights into high-dimensional data
"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview
Streaming pivot visualization via WebAssembly
Perspective is an interactive visualization component for large, real-time datasets. Originally developed for J.P. Morgan's trading business, Perspect
Library for exploring and validating machine learning data
TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig
Missing data visualization module for Python.
missingno Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities tha
Quickly and accurately render even the largest data.
Turn even the largest data into images, accurately Build Status Coverage Latest dev release Latest release Docs Support What is it? Datashader is a da
With Holoviews, your data visualizes itself.
HoloViews Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data a
Fast data visualization and GUI tools for scientific / engineering applications
PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h
Uniform Manifold Approximation and Projection
UMAP Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, bu
Create HTML profiling reports from pandas DataFrame objects
Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
Interactive Data Visualization in the browser, from Python
Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.
Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t
Statistical data visualization using matplotlib
seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
Audiomentations A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio a
Data manipulation and transformation for audio signal processing, powered by PyTorch
torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the
Python library for handling audio datasets.
AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener
Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3
tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag
Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.
Jupter Notebook REST API Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as
Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models
Python scripts to detect faces using Python with the BlazeFace Tensorflow Lite models. Tested on Windows 10, Tensorflow 2.4.0 (Python 3.8).
Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automatically use request headers such as x-request-id or x-correlation-id.
starlette context Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automat
a spacial-temporal pattern detection system for home automation
Argos a spacial-temporal pattern detection system for home automation. Based on OpenCV and Tensorflow, can run on raspberry pi and notify HomeAssistan
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib
Implementation of TabTransformer, attention network for tabular data, in Pytorch
Tab Transformer Implementation of Tab Transformer, attention network for tabular data, in Pytorch. This simple architecture came within a hair's bread
Exploring Cross-Image Pixel Contrast for Semantic Segmentation
Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z
HTTP Request Smuggling Detection Tool
HTTP Request Smuggling Detection Tool HTTP request smuggling is a high severity vulnerability which is a technique where an attacker smuggles an ambig
A vision library for performing sliced inference on large images/small objects
SAHI: Slicing Aided Hyper Inference A vision library for performing sliced inference on large images/small objects Overview Object detection and insta
Find exposed data in Azure with this public blob scanner
BlobHunter A tool for scanning Azure blob storage accounts for publicly opened blobs. BlobHunter is a part of "Hunting Azure Blobs Exposes Millions of
A standard framework for modelling Deep Learning Models for tabular data
PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike.
🌍💉 Global COVID-19 vaccination data at the regional level.
COVID-19 vaccination data at subnational level. To ensure its officiality, the source data is carefully verified.
Learning from Synthetic Shadows for Shadow Detection and Removal [Inoue+, IEEE TCSVT 2020].
Learning from Synthetic Shadows for Shadow Detection and Removal (IEEE TCSVT 2020) Overview This repo is for the paper "Learning from Synthetic Shadow
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.
CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images
CFC-Net This project hosts the official implementation for the paper: CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Dete
Topic Modelling for Humans
gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ
Text preprocessing, representation and visualization from zero to hero.
Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides
Beautiful visualizations of how language differs among document types.
Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t
Basic Utilities for PyTorch Natural Language Processing (NLP)
Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
pySBD: Python Sentence Boundary Disambiguation (SBD) pySBD - python Sentence Boundary Disambiguation (SBD) - is a rule-based sentence boundary detecti
Extract Keywords from sentence or Replace keywords in sentences.
FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install
State of the Art Natural Language Processing
Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide
An open source library for deep learning end-to-end dialog systems and chatbots.
DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re
An open-source NLP research library, built on PyTorch.
An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic
Data loaders and abstractions for text and NLP
torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto
💫 Industrial-strength Natural Language Processing (NLP) in Python
spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc
Dimensionality reduction in very large datasets using Siamese Networks
ivis Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets. Ivis
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js
pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot
A grammar of graphics for Python
plotnine Latest Release License DOI Build Status Coverage Documentation plotnine is an implementation of a grammar of graphics in Python, it is based
🧇 Make Waffle Charts in Python.
PyWaffle PyWaffle is an open source, MIT-licensed Python package for plotting waffle charts. It provides a Figure constructor class Waffle, which coul
A python package for animating plots build on matplotlib.
animatplot A python package for making interactive as well as animated plots with matplotlib. Requires Python = 3.5 Matplotlib = 2.2 (because slider
An open-source plotting library for statistical data.
Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le
Visualize and compare datasets, target values and associations, with one line of code.
In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat
HiPlot makes understanding high dimensional data easy
HiPlot - High dimensional Interactive Plotting HiPlot is a lightweight interactive visualization tool to help AI researchers discover correlations and
Joyplots in Python with matplotlib & pandas :chart_with_upwards_trend:
JoyPy JoyPy is a one-function Python package based on matplotlib + pandas with a single purpose: drawing joyplots (a.k.a. ridgeline plots). The code f
Visualizations for machine learning datasets
Introduction The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive
Python library that makes it easy for data scientists to create charts.
Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l
A Python toolbox for gaining geometric insights into high-dimensional data
"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview
Streaming pivot visualization via WebAssembly
Perspective is an interactive visualization component for large, real-time datasets. Originally developed for J.P. Morgan's trading business, Perspect
Library for exploring and validating machine learning data
TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig
Missing data visualization module for Python.
missingno Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities tha
Quickly and accurately render even the largest data.
Turn even the largest data into images, accurately Build Status Coverage Latest dev release Latest release Docs Support What is it? Datashader is a da
With Holoviews, your data visualizes itself.
HoloViews Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data a
Fast data visualization and GUI tools for scientific / engineering applications
PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h
Uniform Manifold Approximation and Projection
UMAP Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, bu
Create HTML profiling reports from pandas DataFrame objects
Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
Interactive Data Visualization in the browser, from Python
Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.
Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t
Statistical data visualization using matplotlib
seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing
A toolkit for making real world machine learning and data analysis applications in C++
dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl
fklearn: Functional Machine Learning
fklearn: Functional Machine Learning fklearn uses functional programming principles to make it easier to solve real problems with Machine Learning. Th
Shōgun
The SHOGUN machine learning toolbox Unified and efficient Machine Learning since 1999. Latest release: Cite Shogun: Develop branch build status: Donat
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin
ktrain is a Python library that makes deep learning and AI more accessible and easier to apply
Overview | Tutorials | Examples | Installation | FAQ | How to Cite Welcome to ktrain News and Announcements 2020-11-08: ktrain v0.25.x is released and
Deep learning library featuring a higher-level API for TensorFlow.
TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
Deep Learning for humans
Keras: Deep Learning for Python Under Construction In the near future, this repository will be used once again for developing the Keras codebase. For
Statsmodels: statistical modeling and econometrics in Python
About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an
Apache Spark - A unified analytics engine for large-scale data processing
Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op
scikit-learn: machine learning in Python
scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started
Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automatically use request headers such as x-request-id or x-correlation-id.
starlette context Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automat
pytest plugin for manipulating test data directories and files
pytest-datadir pytest plugin for manipulating test data directories and files. Usage pytest-datadir will look up for a directory with the name of your
Data-Driven Tests for Python Unittest
DDT (Data-Driven Tests) allows you to multiply one test case by running it with different test data, and make it appear as multiple test cases. Instal
Display tabular data in a visually appealing ASCII table format
PrettyTable Installation Install via pip: python -m pip install -U prettytable Install latest development version: python -m pip install -U git+https
Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.
python-tabulate Pretty-print tabular data in Python, a library and a command-line utility. The main use cases of the library are: printing small table
Automatic SQL injection and database takeover tool
sqlmap sqlmap is an open source penetration testing tool that automates the process of detecting and exploiting SQL injection flaws and taking over of
Safety checks your installed dependencies for known security vulnerabilities
Safety checks your installed dependencies for known security vulnerabilities. By default it uses the open Python vulnerability database Safety DB, but
Automated security testing using bandit and flake8.
flake8-bandit Automated security testing built right into your workflow! You already use flake8 to lint all your code for errors, ensure docstrings ar
Lazydata: Scalable data dependencies for Python projects
lazydata: scalable data dependencies lazydata is a minimalist library for including data dependencies into Python projects. Problem: Keeping all data
MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)
mongo-connector The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Pl
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
PyPika - Python Query Builder Abstract What is PyPika? PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a s
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
dataset: databases for lazy people In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files. Read the
Pandas Google BigQuery
pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".
the problem What directory should your app use for storing user data? If running on macOS, you should use: ~/Library/Application Support/AppName If
Lightweight data validation and adaptation Python library.
Valideer Lightweight data validation and adaptation library for Python. At a Glance: Supports both validation (check if a value is valid) and adaptati
Python Data Structures for Humans™.
Schematics Python Data Structures for Humans™. About Project documentation: https://schematics.readthedocs.io/en/latest/ Schematics is a Python librar