1749 Python Full-text-search Libraries

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

2.1k Feb 17, 2021

Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

1.5k Feb 17, 2021

Snips Python library to extract meaning from text

Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur

3.5k Feb 17, 2021

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

831 Feb 17, 2021

Extract Keywords from sentence or Replace keywords in sentences.

FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install

4.7k Feb 17, 2021

Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

1.4k Feb 17, 2021

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理（NLP）工具包，目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性：统一的Tabular式数据容器，简化数据预处理过程；内置多种数据集的Loader和Pipe，省去预处理代码; 各种方便的NLP工具，例如Embedd

2k Feb 14, 2021

Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

2.5k Feb 17, 2021

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

3.2k Feb 17, 2021

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

textgenrnn Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly tr

4.3k Feb 18, 2021

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

1.4k Feb 18, 2021

🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk [email protected] and Michael

1.4k Feb 17, 2021

Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you print(fix_encoding("(à¸‡'âŒ£')à¸‡")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li

2.9k Feb 17, 2021

Making text a first-class citizen in TensorFlow.

TensorFlow Text - Text processing in Tensorflow IMPORTANT: When installing TF Text with pip install, please note the version of TensorFlow you are run

692 Feb 16, 2021

An easier way to build neural search on the cloud

An easier way to build neural search on the cloud Jina is a deep learning-powered search framework for building cross-/multi-modal search systems (e.g

2.3k Feb 18, 2021

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

7.5k Feb 17, 2021

Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto

2.6k Feb 18, 2021

Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

22.2k Feb 18, 2021

Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

4.8k Feb 18, 2021

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

10.8k Feb 18, 2021

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

19.6k Feb 18, 2021

A Python toolbox for gaining geometric insights into high-dimensional data

"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview

1.6k Feb 17, 2021

Speech recognition module for Python, supporting several engines and APIs, online and offline.

SpeechRecognition Library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/

6.7k Jan 8, 2023

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

20.8k Jan 3, 2023

text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍该项目是基于pytorch深度学习框架，以统一的改写方式实现了以下6篇经典的文字识别论文，论文的详情如下。该项目会持续进行更新，欢迎大家提出问题以及对代码进行贡献。模型论文标题发表年份模型方法划分 CRNN 《An End-t

168 Dec 24, 2022

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

37 Nov 21, 2022

Darkdump - Search The Deep Web Straight From Your Terminal

Darkdump - Search The Deep Web Straight From Your Terminal About Darkdump Darkdump is a simple script written in Python3.9 in which it allows users to

264 Dec 30, 2022

Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

31 Oct 31, 2022

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper

83 Jan 9, 2023

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

211 Dec 28, 2022

Get list of common stop words in various languages in Python

Python Stop Words Table of contents Overview Available languages Installation Basic usage Python compatibility Overview Get list of common stop words

142 Dec 21, 2022

Facilitating the design, comparison and sharing of deep text matching models.

MatchZoo Facilitating the design, comparison and sharing of deep text matching models. MatchZoo 是一个通用的文本匹配工具包，它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。 🔥 News

3.7k Jan 2, 2023

Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

1.8k Feb 10, 2021

Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

NeuralQA: A Usable Library for (Extractive) Question Answering on Large Datasets with BERT Still in alpha, lots of changes anticipated. View demo on n

220 Dec 11, 2022

Textpipe: clean and extract metadata from text

textpipe: clean and extract metadata from text textpipe is a Python package for converting raw text in to clean, readable text and extracting metadata

298 Nov 21, 2022

Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

847 Dec 19, 2022

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

2.3k Dec 29, 2022

DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

1.5k Dec 26, 2022

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

2.7k Jan 8, 2023

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

3.1k Jan 7, 2023

Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

461 Dec 28, 2022

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

2.3k Jan 7, 2023

Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

2k Dec 27, 2022

Snips Python library to extract meaning from text

Snips NLU Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natur

3.5k Feb 12, 2021

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

1.3k Jan 3, 2023

Extract Keywords from sentence or Replace keywords in sentences.

FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install

5.3k Jan 1, 2023

Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

1.9k Jan 6, 2023

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理（NLP）工具包，目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性：统一的Tabular式数据容器，简化数据预处理过程；内置多种数据集的Loader和Pipe，省去预处理代码; 各种方便的NLP工具，例如Embedd

2.8k Jan 1, 2023

Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

3k Jan 8, 2023

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

4.6k Jan 1, 2023

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

textgenrnn Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly tr

4.8k Dec 30, 2022

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

6.4k Jan 9, 2023

🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk [email protected] and Michael

1.4k Feb 12, 2021

Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you print(fix_encoding("(à¸‡'âŒ£')à¸‡")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li

3.4k Dec 29, 2022

Making text a first-class citizen in TensorFlow.

TensorFlow Text - Text processing in Tensorflow IMPORTANT: When installing TF Text with pip install, please note the version of TensorFlow you are run

1k Dec 26, 2022

An easier way to build neural search on the cloud

An easier way to build neural search on the cloud Jina is a deep learning-powered search framework for building cross-/multi-modal search systems (e.g

17.1k Jan 9, 2023

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

8.4k Dec 26, 2022

Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto

3.2k Dec 30, 2022

Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

24.1k Jan 5, 2023

Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

6.4k Jan 1, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

19.5k Feb 13, 2021

A Python toolbox for gaining geometric insights into high-dimensional data

"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview

1.8k Dec 29, 2022

mlpack: a scalable C++ machine learning library --

4.2k Jan 9, 2023

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

8.1k Jan 6, 2023

Rich is a Python library for rich text and beautiful formatting in the terminal.

Rich 中文 readme • lengua española readme • Läs på svenska Rich is a Python library for rich text and beautiful formatting in the terminal. The Rich API

41.5k Jan 7, 2023

Looks at Python code to search for things which look "dodgy" such as passwords or diffs

dodgy Dodgy is a very basic tool to run against your codebase to search for "dodgy" looking values. It is a series of simple regular expressions desig

112 Nov 25, 2022

Programmatically edit text files with Python. Useful for source to source transformations.

massedit formerly known as Python Mass Editor Implements a python mass editor to process text files using Python code. The modification(s) is (are) sh

106 Dec 17, 2022

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

PyPika - Python Query Builder Abstract What is PyPika? PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a s

1.9k Jan 4, 2023

Pysolr — Python Solr client

pysolr pysolr is a lightweight Python client for Apache Solr. It provides an interface that queries the server and returns results based on the query.

626 Dec 1, 2022

High level Python client for Elasticsearch

Elasticsearch DSL Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built o

3.6k Jan 3, 2023

Official Python low-level client for Elasticsearch

Python Elasticsearch Client Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Py

3.8k Jan 1, 2023

Color text streams with a polished command line interface

colout(1) -- Color Up Arbitrary Command Output Synopsis colout [-h] [-r RESOURCE] colout [-g] [-c] [-l min,max] [-a] [-t] [-T DIR] [-P DIR] [-d COLORM

1.1k Dec 21, 2022

A cross platform package to do curses-like operations, plus higher level APIs and widgets to create text UIs and ASCII art animations

ASCIIMATICS Asciimatics is a package to help people create full-screen text UIs (from interactive forms to ASCII animations) on any platform. It is li

3.2k Jan 9, 2023

Rich is a Python library for rich text and beautiful formatting in the terminal.

Rich 中文 readme • lengua española readme • Läs på svenska Rich is a Python library for rich text and beautiful formatting in the terminal. The Rich API

41.4k Jan 2, 2023

Simple cross-platform colored terminal text in Python

Colorama Makes ANSI escape character sequences (for producing colored terminal text and cursor positioning) work under MS Windows. PyPI for releases |

3k Jan 1, 2023

Full control of form rendering in the templates.

django-floppyforms Full control of form rendering in the templates. Authors: Gregor Müllegger and many many contributors Original creator: Bruno Renié

811 Dec 1, 2022

Full-text multi-table search application for Django. Easy to install and use, with good performance.

django-watson django-watson is a fast multi-model full-text search plugin for Django. It is easy to install and use, and provides high quality search

1.1k Dec 22, 2022

Full featured redis cache backend for Django.

Redis cache backend for Django This is a Jazzband project. By contributing you agree to abide by the Contributor Code of Conduct and follow the guidel

2.5k Jan 3, 2023

Modular search for Django

Haystack Author: Daniel Lindsley Date: 2013/07/28 Haystack provides modular search for Django. It features a unified, familiar API that allows you to

3.4k Jan 8, 2023

🚀 Cookiecutter Template for FastAPI + React Projects. Using PostgreSQL, SQLAlchemy, and Docker

FastAPI + React · A cookiecutter template for bootstrapping a FastAPI and React project using a modern stack. Features FastAPI (Python 3.8) JWT authen

1.4k Jan 2, 2023

Full text search for flask.

flask-msearch Installation To install flask-msearch: pip install flask-msearch # when MSEARCH_BACKEND = "whoosh" pip install whoosh blinker # when MSE

197 Dec 29, 2022

Library to scrape and clean web pages to create massive datasets.

lazynlp A straightforward library that allows you to crawl, clean up, and deduplicate webpages to create massive monolingual datasets. Using this libr

2.1k Jan 6, 2023

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

704 Jan 6, 2023

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Newspaper3k: Article scraping & curation Inspired by requests for its simplicity and powered by lxml for its speed: "Newspaper is an amazing python li

12.3k Jan 7, 2023

Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!

Google Images Download Python Script for 'searching' and 'downloading' hundreds of Google images to the local hard disk! Documentation Documentation H

8.2k Jan 5, 2023

A tool for extracting plain text from Wikipedia dumps

WikiExtractor WikiExtractor.py is a Python script that extracts and cleans text from a Wikipedia database dump. The tool is written in Python and requ

3.2k Dec 31, 2022

Convert HTML to Markdown-formatted text.

html2text html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to

1.3k Dec 31, 2022

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力，旨在为飞桨开发者提升文本建模效率，并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

6.9k Jan 1, 2023

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

172 Dec 22, 2022

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

13 Dec 13, 2022

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

5k Jan 4, 2023

Free and open source full-stack enterprise framework for agile development of secure database-driven web-based applications, written and programmable in Python.

Readme web2py is a free open source full-stack framework for rapid development of fast, scalable, secure and portable database-driven web-based applic

2k Dec 31, 2022

Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

432 Dec 16, 2022

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据，将清华新闻数据、搜狗新闻数据等新闻数据集，以及开源的一些摘要数据进行整理清洗，构建一个较完善的中文摘要数据集。数据集清洗时，仅进行了简单地规则清洗。

785 Dec 29, 2022

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

4.4k Jan 3, 2023

Python Full-text-search Resources

Python full-text-search Libraries

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Beautiful visualizations of how language differs among document types.

Snips Python library to extract meaning from text

A full spaCy pipeline and models for scientific/biomedical documents.

Extract Keywords from sentence or Replace keywords in sentences.

Python implementation of TextRank for phrase extraction and summarization of text documents

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Module for automatic summarization of text documents and HTML pages.

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

🎐 a python library for doing approximate and phonetic matching of strings.

Fixes mojibake and other glitches in Unicode text, after the fact.

Making text a first-class citizen in TensorFlow.

An easier way to build neural search on the cloud

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

Data loaders and abstractions for text and NLP

Library for fast text representation and classification.

Unsupervised text tokenizer for Neural Network-based text generation.

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💫 Industrial-strength Natural Language Processing (NLP) in Python

A Python toolbox for gaining geometric insights into high-dimensional data

Speech recognition module for Python, supporting several engines and APIs, online and offline.

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Darkdump - Search The Deep Web Straight From Your Terminal

Shirt Bot is a discord bot which uses GPT-3 to generate text

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Get list of common stop words in various languages in Python

Facilitating the design, comparison and sharing of deep text matching models.

Multilingual text (NLP) processing toolkit

Text vectorization tool to outperform TFIDF for classification tasks

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

Textpipe: clean and extract metadata from text

Unsupervised text tokenizer focused on computational efficiency

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

DELTA is a deep learning based natural language and speech processing platform.

Text preprocessing, representation and visualization from zero to hero.

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Python package for performing Entity and Text Matching using Deep Learning.

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Beautiful visualizations of how language differs among document types.

Snips Python library to extract meaning from text

A full spaCy pipeline and models for scientific/biomedical documents.

Extract Keywords from sentence or Replace keywords in sentences.

Python implementation of TextRank for phrase extraction and summarization of text documents

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Module for automatic summarization of text documents and HTML pages.

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

🎐 a python library for doing approximate and phonetic matching of strings.

Fixes mojibake and other glitches in Unicode text, after the fact.

Making text a first-class citizen in TensorFlow.

An easier way to build neural search on the cloud

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

Data loaders and abstractions for text and NLP

Library for fast text representation and classification.

Unsupervised text tokenizer for Neural Network-based text generation.

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💫 Industrial-strength Natural Language Processing (NLP) in Python

A Python toolbox for gaining geometric insights into high-dimensional data

mlpack: a scalable C++ machine learning library --

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Rich is a Python library for rich text and beautiful formatting in the terminal.

Looks at Python code to search for things which look "dodgy" such as passwords or diffs

Programmatically edit text files with Python. Useful for source to source transformations.

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

Pysolr — Python Solr client

High level Python client for Elasticsearch

Official Python low-level client for Elasticsearch

Color text streams with a polished command line interface

A cross platform package to do curses-like operations, plus higher level APIs and widgets to create text UIs and ASCII art animations

Rich is a Python library for rich text and beautiful formatting in the terminal.

Simple cross-platform colored terminal text in Python

Full control of form rendering in the templates.