232 Python Word-similarity Libraries

Deep Learning Chinese Word Segment

引用本项目模型BiLSTM+CRF参考论文：http://www.aclweb.org/anthology/N16-1030 ,IDCNN+CRF参考论文：https://arxiv.org/abs/1702.02098 构建安装好bazel代码构建工具，安装好tensorflow（目前本项目需

2.1k Dec 23, 2022

ARU-Net - Deep Learning Chinese Word Segment

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents Contents Introduction Installation Demo Training Introduction This is the

128 Sep 12, 2022

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

12.3k Jan 2, 2023

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

tensorlm Generate Shakespeare poems with 4 lines of code. Installation tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+ pip3 install tenso

63 May 22, 2021

一款高性能敏感词(非法词/脏字)检测过滤组件，附带繁体简体互换，支持全角半角互换，汉字转拼音，模糊搜索等功能。

一款高性能非法词(敏感词)检测组件，附带繁体简体互换，支持全角半角互换，获取拼音首字母，获取拼音字母，拼音模糊搜索等功能。

3.6k Jan 7, 2023

Knowledge Management for Humans using Machine Learning & Tags

HyperTag HyperTag helps humans intuitively express how they think about their files using tags and machine learning.

165 Nov 4, 2022

Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your text documents (yes, even PDF's) and images.

166 Jan 7, 2023

Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

228 Feb 17, 2021

Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

11.7k Feb 18, 2021

Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

718 Feb 18, 2021

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

2.1k Feb 13, 2021

Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

1.5k Feb 17, 2021

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

1.9k Feb 18, 2021

🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

1.2k Feb 17, 2021

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

10k Feb 18, 2021

Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

4.8k Feb 18, 2021

A little word cloud generator in Python

Linux macOS Windows PyPI word_cloud A little word cloud generator in Python. Read more about it on the blog post or the website. The code is tested ag

7.9k Feb 17, 2021

Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

230 Nov 16, 2022

Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

11.7k Feb 12, 2021

Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

847 Dec 19, 2022

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

2.7k Jan 8, 2023

Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

2k Dec 27, 2022

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

1.9k Feb 3, 2021

🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

1.5k Dec 25, 2022

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

12.3k Dec 31, 2022

Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

6.4k Jan 1, 2023

A little word cloud generator in Python

Linux macOS Windows PyPI word_cloud A little word cloud generator in Python. Read more about it on the blog post or the website. The code is tested ag

9.2k Dec 30, 2022

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

1.2k Dec 16, 2022

Python Word-similarity Resources

Python word-similarity Libraries

Deep Learning Chinese Word Segment

ARU-Net - Deep Learning Chinese Word Segment

A very simple framework for state-of-the-art Natural Language Processing (NLP)

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

一款高性能敏感词(非法词/脏字)检测过滤组件，附带繁体简体互换，支持全角半角互换，汉字转拼音，模糊搜索等功能。

Knowledge Management for Humans using Machine Learning & Tags

Knowledge Management for Humans using Machine Learning & Tags

Yet another Python binding for fastText

Topic Modelling for Humans

Unsupervised text tokenizer focused on computational efficiency

Text preprocessing, representation and visualization from zero to hero.

Beautiful visualizations of how language differs among document types.

Basic Utilities for PyTorch Natural Language Processing (NLP)

🦆 Contextually-keyed word vectors

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Unsupervised text tokenizer for Neural Network-based text generation.

A little word cloud generator in Python

Yet another Python binding for fastText

Topic Modelling for Humans

Unsupervised text tokenizer focused on computational efficiency

Text preprocessing, representation and visualization from zero to hero.

Beautiful visualizations of how language differs among document types.

Basic Utilities for PyTorch Natural Language Processing (NLP)

🦆 Contextually-keyed word vectors

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Unsupervised text tokenizer for Neural Network-based text generation.

A little word cloud generator in Python

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Basic Utilities for PyTorch Natural Language Processing (NLP)

Topic Modelling for Humans

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

Topic Modelling for Humans

Python Word-similarity Resources

Python word-similarity Libraries

Deep Learning Chinese Word Segment

ARU-Net - Deep Learning Chinese Word Segment

A very simple framework for state-of-the-art Natural Language Processing (NLP)

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

一款高性能敏感词(非法词/脏字)检测过滤组件，附带繁体简体互换，支持全角半角互换，汉字转拼音，模糊搜索等功能。

Knowledge Management for Humans using Machine Learning & Tags

Knowledge Management for Humans using Machine Learning & Tags

Yet another Python binding for fastText

Topic Modelling for Humans

Unsupervised text tokenizer focused on computational efficiency

Text preprocessing, representation and visualization from zero to hero.

Beautiful visualizations of how language differs among document types.

Basic Utilities for PyTorch Natural Language Processing (NLP)

🦆 Contextually-keyed word vectors

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Unsupervised text tokenizer for Neural Network-based text generation.

A little word cloud generator in Python

Yet another Python binding for fastText

Topic Modelling for Humans

Unsupervised text tokenizer focused on computational efficiency

Text preprocessing, representation and visualization from zero to hero.

Beautiful visualizations of how language differs among document types.

Basic Utilities for PyTorch Natural Language Processing (NLP)

🦆 Contextually-keyed word vectors

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Unsupervised text tokenizer for Neural Network-based text generation.

A little word cloud generator in Python

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Basic Utilities for PyTorch Natural Language Processing (NLP)

Topic Modelling for Humans

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

Topic Modelling for Humans

Related tags