3252 Python Crime-Data-Batch-Processing Libraries

Find exposed data in Azure with this public blob scanner

BlobHunter A tool for scanning Azure blob storage accounts for publicly opened blobs. BlobHunter is a part of "Hunting Azure Blobs Exposes Millions of

250 Jan 3, 2023

A standard framework for modelling Deep Learning Models for tabular data

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike.

801 Jan 8, 2023

🌍💉 Global COVID-19 vaccination data at the regional level.

COVID-19 vaccination data at subnational level. To ensure its officiality, the source data is carefully verified.

61 Sep 21, 2022

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

212 Dec 25, 2022

Facilitating the design, comparison and sharing of deep text matching models.

MatchZoo Facilitating the design, comparison and sharing of deep text matching models. MatchZoo 是一个通用的文本匹配工具包，它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。 🔥 News

3.7k Jan 2, 2023

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

1.5k Dec 5, 2022

Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

1.8k Feb 10, 2021

Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

11.7k Feb 12, 2021

Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

NeuralQA: A Usable Library for (Extractive) Question Answering on Large Datasets with BERT Still in alpha, lots of changes anticipated. View demo on n

220 Dec 11, 2022

Textpipe: clean and extract metadata from text

textpipe: clean and extract metadata from text textpipe is a Python package for converting raw text in to clean, readable text and extracting metadata

298 Nov 21, 2022

Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

847 Dec 19, 2022

DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

1.5k Dec 26, 2022

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

2.3k Jan 7, 2023

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

2.9k Jan 2, 2023

Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

2k Dec 27, 2022

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

1.9k Feb 3, 2021

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

1.2k Jan 8, 2023

🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

1.5k Dec 25, 2022

Extract Keywords from sentence or Replace keywords in sentences.

FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install

5.3k Jan 1, 2023

⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

Translations 🇩🇪 DE 🇫🇷 FR 🇭🇺 HU 🇮🇩 ID 🇮🇹 IT 🇳🇱 NL 🇧🇷 PT-BR 🇷🇺 RU 🇨🇳 ZH ➡️ Documentation | Discord | Installation Guide ⬅️ Fully autom

11.2k Jan 5, 2023

Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

1.9k Jan 6, 2023

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理（NLP）工具包，目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性：统一的Tabular式数据容器，简化数据预处理过程；内置多种数据集的Loader和Pipe，省去预处理代码; 各种方便的NLP工具，例如Embedd

2.8k Jan 1, 2023

Official Stanford NLP Python Library for Many Human Languages

Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac

5.2k Feb 12, 2021

State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

3k Jan 5, 2023

NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

1.6k Feb 10, 2021

NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community

2.5k Jan 4, 2023

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

6.2k Dec 31, 2022

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

8.4k Dec 26, 2022

An open-source NLP research library, built on PyTorch.

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic

11.4k Jan 1, 2023

Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto

3.2k Dec 30, 2022

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

12.3k Dec 31, 2022

Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

6.4k Jan 1, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

NLTK Source

Natural Language Toolkit (NLTK) NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets, and tutorials supporting

11.4k Jan 4, 2023

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

77.3k Jan 3, 2023

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

19.5k Feb 13, 2021

Dimensionality reduction in very large datasets using Siamese Networks

ivis Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets. Ivis

284 Jan 1, 2023

Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot

512 Dec 26, 2022

A grammar of graphics for Python

plotnine Latest Release License DOI Build Status Coverage Documentation plotnine is an implementation of a grammar of graphics in Python, it is based

2.6k Feb 12, 2021

🧇 Make Waffle Charts in Python.

PyWaffle PyWaffle is an open source, MIT-licensed Python package for plotting waffle charts. It provides a Figure constructor class Waffle, which coul

528 Jan 2, 2023

A python package for animating plots build on matplotlib.

animatplot A python package for making interactive as well as animated plots with matplotlib. Requires Python = 3.5 Matplotlib = 2.2 (because slider

394 Dec 18, 2022

An open-source plotting library for statistical data.

Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le

820 Jan 6, 2023

Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

2.3k Jan 5, 2023

HiPlot makes understanding high dimensional data easy

HiPlot - High dimensional Interactive Plotting HiPlot is a lightweight interactive visualization tool to help AI researchers discover correlations and

2.4k Jan 4, 2023

Joyplots in Python with matplotlib & pandas :chart_with_upwards_trend:

JoyPy JoyPy is a one-function Python package based on matplotlib + pandas with a single purpose: drawing joyplots (a.k.a. ridgeline plots). The code f

462 Jan 2, 2023

Visualizations for machine learning datasets

Introduction The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive

7.1k Jan 7, 2023

Python library that makes it easy for data scientists to create charts.

Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l

3.2k Jan 4, 2023

A Python toolbox for gaining geometric insights into high-dimensional data

"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview

1.8k Dec 29, 2022

3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

1.6k Jan 8, 2023

Streaming pivot visualization via WebAssembly

Perspective is an interactive visualization component for large, real-time datasets. Originally developed for J.P. Morgan's trading business, Perspect

The Fintech Open Source Foundation (www.finos.org)

5.1k Dec 27, 2022

Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

688 Jan 3, 2023

Missing data visualization module for Python.

missingno Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities tha

3.4k Dec 29, 2022

Quickly and accurately render even the largest data.

Turn even the largest data into images, accurately Build Status Coverage Latest dev release Latest release Docs Support What is it? Datashader is a da

2.9k Dec 28, 2022

With Holoviews, your data visualizes itself.

HoloViews Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data a

2.3k Jan 2, 2023

Fast data visualization and GUI tools for scientific / engineering applications

2.3k Feb 13, 2021

Uniform Manifold Approximation and Projection

UMAP Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, bu

6k Jan 8, 2023

Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 1, 2023

Interactive Data Visualization in the browser, from Python

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

14.7k Feb 13, 2021

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

13.9k Feb 13, 2021

Statistical data visualization using matplotlib

seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing

8.1k Feb 13, 2021

A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

11.6k Jan 1, 2023

fklearn: Functional Machine Learning

fklearn: Functional Machine Learning fklearn uses functional programming principles to make it easier to solve real problems with Machine Learning. Th

1.4k Dec 7, 2022

Shōgun

The SHOGUN machine learning toolbox Unified and efficient Machine Learning since 1999. Latest release: Cite Shogun: Develop branch build status: Donat

2.8k Feb 13, 2021

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

2.8k Feb 12, 2021

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

1.1k Jan 2, 2023

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in 🇰🇷 Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

8.7k Jan 5, 2023

Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

9.5k Feb 12, 2021

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

5.7k Feb 12, 2021

🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

Thinc: A refreshing functional take on deep learning, compatible with your favorite libraries From the makers of spaCy, Prodigy and FastAPI Thinc is a

2.6k Dec 30, 2022

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

11.9k Feb 13, 2021

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 8, 2023

Deep Learning for humans

Keras: Deep Learning for Python Under Construction In the near future, this repository will be used once again for developing the Keras codebase. For

50.7k Feb 12, 2021

Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

8.1k Jan 2, 2023

Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

34.7k Jan 4, 2023

scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

52.5k Jan 8, 2023

Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automatically use request headers such as x-request-id or x-correlation-id.

starlette context Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automat

300 Dec 26, 2022

pytest plugin for manipulating test data directories and files

pytest-datadir pytest plugin for manipulating test data directories and files. Usage pytest-datadir will look up for a directory with the name of your

191 Dec 21, 2022

Data-Driven Tests for Python Unittest

DDT (Data-Driven Tests) allows you to multiply one test case by running it with different test data, and make it appear as multiple test cases. Instal

424 Nov 28, 2022

Display tabular data in a visually appealing ASCII table format

PrettyTable Installation Install via pip: python -m pip install -U prettytable Install latest development version: python -m pip install -U git+https

924 Jan 5, 2023

Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.

python-tabulate Pretty-print tabular data in Python, a library and a command-line utility. The main use cases of the library are: printing small table

1.5k Jan 6, 2023

Lazydata: Scalable data dependencies for Python projects

lazydata: scalable data dependencies lazydata is a minimalist library for including data dependencies into Python projects. Problem: Keeping all data

629 Nov 21, 2022

MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)

mongo-connector The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Pl

1.9k Jan 4, 2023

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

3.3k Dec 31, 2022

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

PyPika - Python Query Builder Abstract What is PyPika? PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a s