Python Text Processing Libraries

Filter By

Python Text Processing

Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you print(fix_encoding("(à¸‡'âŒ£')à¸‡")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li

3.4k Jan 8, 2023

Python character encoding detector

Chardet: The Universal Character Encoding Detector Detects ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) Big5, GB2312, EUC-TW, HZ-GB-2312, IS

1.8k Jan 8, 2023

Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

8.8k Jan 8, 2023

Answer some questions and get your brawler csvs ready!

BRAWL-STARS-V11-BRAWLER-MAKER-TOOL Answer some questions and get your brawler csvs ready! HOW TO RUN on android: Install pydroid3 from playstore, and

9 Jan 7, 2023

An experimental Fang Song style Chinese font generated with skeleton-tracing and pix2pix

An experimental Fang Song style Chinese font generated with skeleton-tracing and pix2pix, with glyphs based on cwTeXFangSong. The font is optimised fo

98 Jan 7, 2023

一款高性能敏感词(非法词/脏字)检测过滤组件，附带繁体简体互换，支持全角半角互换，汉字转拼音，模糊搜索等功能。

一款高性能非法词(敏感词)检测组件，附带繁体简体互换，支持全角半角互换，获取拼音首字母，获取拼音字母，拼音模糊搜索等功能。

3.6k Jan 7, 2023

Convert ebooks with few clicks on Telegram!

E-Book Converter Bot A bot that converts e-books to various formats, powered by calibre! It currently supports 34 input formats and 19 output formats.

45 Jan 5, 2023

A non-validating SQL parser module for Python

python-sqlparse - Parse SQL statements sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting S

3.1k Jan 4, 2023

Returns unicode slugs

Python Slugify A Python slugify application that handles unicode. Overview Best attempt to create slugs from unicode strings while keeping it DRY. Not

1.3k Jan 4, 2023

🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

5.6k Jan 3, 2023

汉字转拼音(pypinyin)

汉字拼音转换工具（Python 版）将汉字转为拼音。可以用于汉字注音、排序、检索(Russian translation) 。基于 hotoo/pinyin 开发。 Documentation: http://pypinyin.rtfd.io/ GitHub: https://github.co

4.2k Jan 3, 2023

An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones ([email protected]). Packaged by Peter Waller [email protected],

1.1k Jan 2, 2023

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

3k Jan 2, 2023

Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

hashids for Python 2.7 & 3 A python port of the JavaScript hashids implementation. It generates YouTube-like hashes from one or many numbers. Use hash

1.4k Jan 2, 2023

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

1.2k Jan 1, 2023

Amazing GitHub Template - Sane defaults for your next project!

🚀 Useful README.md, LICENSE, CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md, GitHub Issues and Pull Requests and Actions templates to jumpstart your projects.

276 Jan 1, 2023

A generator library for concise, unambiguous and URL-safe UUIDs.

Description shortuuid is a simple python library that generates concise, unambiguous, URL-safe UUIDs. Often, one needs to use non-sequential IDs in pl

1.8k Dec 31, 2022

从flomo导出的笔记中生成词云

flomo-word-cloud 从flomo导出的笔记中生成词云如何使用？将本项目克隆到你的电脑上，使用如下的命令，安装所需python库 pip install -r requirements.txt 在项目里新建一个file文件夹，把所有从flomo导出的html文件放入其中运行main

9 Dec 30, 2022

StealBit1.1 and earlier strings and config extraction scripts

StealBit1.1 and earlier scripts Use strings_decryptor.py to extract RC4 encrypted strings from a StealBit1.1 sample(s). Use config_extractor.py to ext

5 Dec 29, 2022

Widevine KEY Extractor in Python

Widevine Client 3 This was originally written by T3rry7f. This repo is slightly modified version of his repo. This only works on standard Windows! Usa

68 Dec 29, 2022

一个可以可以统计群组用户发言，并且能将聊天内容生成词云的机器人

当前版本 v2.2 更新维护日志更新维护日志有问题请加群组反馈 Telegram 交流反馈群组点击加入演示配置要求内存：1G以上安装方法使用 Docker 安装 Docker官方安装

117 Dec 29, 2022

Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

3.1k Dec 29, 2022

Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

1.7k Dec 27, 2022

BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes.

BaseCrack Decoder For Base Encoding Schemes BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes. This tool ca

383 Dec 27, 2022

PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

PyMultiDictionary PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different

19 Dec 26, 2022

Make writing easier!

Handwriter Make writing easier! How to Download and install a handwriting font, or create a font from your handwriting. Use a word processor like Micr

64 Dec 25, 2022

box is a text-based visual programming language inspired by Unreal Engine Blueprint function graphs.

Box is a text-based visual programming language inspired by Unreal Engine blueprint function graphs. $ cat factorial.box ┌─ƒ(Factorial)───┐

104 Dec 24, 2022

Wikipedia Reader for the GNOME Desktop

Wike Wike is a Wikipedia reader for the GNOME Desktop. Provides access to all the content of this online encyclopedia in a native application, with a

126 Dec 24, 2022

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

1.3k Dec 22, 2022

pydantic-i18n is an extension to support an i18n for the pydantic error messages.

pydantic-i18n is an extension to support an i18n for the pydantic error messages

48 Dec 21, 2022

Python flexible slugify function

awesome-slugify Python flexible slugify function PyPi: https://pypi.python.org/pypi/awesome-slugify Github: https://github.com/dimka665/awesome-slugif

471 Dec 20, 2022

A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

574 Dec 20, 2022

AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

29 Dec 20, 2022

You can encode and decode base85, ascii85, base64, base32, and base16 with this tool.

8 Dec 20, 2022

Repositori untuk belajar pemrograman Python dalam bahasa Indonesia

Python Repositori ini berisi kumpulan dari berbagai macam contoh struktur data, algoritma dan komputasi matematika yang diimplementasikan dengan mengg

111 Dec 19, 2022

Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their native language.

Find a Doc - Localization Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their n

18 Dec 19, 2022

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

146 Dec 18, 2022

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

1.2k Dec 16, 2022

Python Text Processing Resources

Filter By

Categories