Text preprocessing, representation and visualization from zero to hero.

Jonathan Besomi

Last update: Jan 8, 2023

Related tags

Text Data & NLP nlp machine-learning text-mining word-embeddings text-clustering text-visualization text-representation text-preprocessing nlp-pipeline texthero

Overview

Text preprocessing, representation and visualization from zero to hero.

From zero to hero • Installation • Getting Started • Examples • API • FAQ • Contributions

From zero to hero

Texthero is a python toolkit to work with text-based dataset quickly and effortlessly. Texthero is very simple to learn and designed to be used on top of Pandas. Texthero has the same expressiveness and power of Pandas and is extensively documented. Texthero is modern and conceived for programmers of the 2020 decade with little knowledge if any in linguistic.

You can think of Texthero as a tool to help you understand and work with text-based dataset. Given a tabular dataset, it's easy to grasp the main concept. Instead, given a text dataset, it's harder to have quick insights into the underline data. With Texthero, preprocessing text data, mapping it into vectors, and visualizing the obtained vector space takes just a couple of lines.

Texthero include tools for:

Preprocess text data: it offers both out-of-the-box solutions but it's also flexible for custom-solutions.
Natural Language Processing: keyphrases and keywords extraction, and named entity recognition.
Text representation: TF-IDF, term frequency, and custom word-embeddings (wip)
Vector space analysis: clustering (K-means, Meanshift, DBSCAN and Hierarchical), topic modeling (wip) and interpretation.
Text visualization: vector space visualization, place localization on maps (wip).

Texthero is free, open-source and well documented (and that's what we love most by the way!).

We hope you will find pleasure working with Texthero as we had during his development.

Hablas español? क्या आप हिंदी बोलते हैं? 日本語が話せるのか？

Texthero has been developed for the whole NLP community. We know how hard it is to deal with different NLP tools (NLTK, SpaCy, Gensim, TextBlob, Sklearn): that's why we developed Texthero, to simplify things.

Now, the next main milestone is to provide multilingual support and for this big step, we need the help of all of you. ¿Hablas español? Sie sprechen Deutsch? 你会说中文？日本語が話せるのか？ Fala português? Parli Italiano? Вы говорите по-русски? If yes or you speak another language not mentioned here, then you might help us develop multilingual support! Even if you haven't contributed before or you just started with NLP, contact us or open a Github issue, there is always a first time :) We promise you will learn a lot, and, ... who knows? It might help you find your new job as an NLP-developer!

For improving the python toolkit and provide an even better experience, your aid and feedback are crucial. If you have any problem or suggestion please open a Github issue, we will be glad to support you and help you.

Beta version

Texthero's community is growing fast. Texthero though is still in a beta version; soon, a faster and better version will be released and it will bring some major changes.

For instance, to give a more granular control over the pipeline, starting from the next version on, all preprocessing functions will require as argument an already tokenized text. This will be a major change.

Once released the stable version (Texthero 2.0), backward compatibility will be respected. Until this point, backward compatibility will be present but it will be weaker.

If you want to be part of this fast-growing movements, do not hesitate to contribute: CONTRIBUTING!

Installation

Install texthero via pip:

pip install texthero

☝️ Under the hoods, Texthero makes use of multiple NLP and machine learning toolkits such as Gensim, NLTK, SpaCy and scikit-learn. You don't need to install them all separately, pip will take care of that.

For faster performance, make sure you have installed Spacy version >= 2.2. Also, make sure you have a recent version of python, the higher, the best.

Getting started

The best way to learn Texthero is through the Getting Started docs.

In case you are an advanced python user, then help(texthero) should do the work.

Examples

1. Text cleaning, TF-IDF representation and Visualization

import texthero as hero
import pandas as pd

df = pd.read_csv(
   "https://github.com/jbesomi/texthero/raw/master/dataset/bbcsport.csv"
)

df['pca'] = (
   df['text']
   .pipe(hero.clean)
   .pipe(hero.tfidf)
   .pipe(hero.pca)
)
hero.scatterplot(df, 'pca', color='topic', title="PCA BBC Sport news")

2. Text preprocessing, TF-IDF, K-means and Visualization

import texthero as hero
import pandas as pd

df = pd.read_csv(
    "https://github.com/jbesomi/texthero/raw/master/dataset/bbcsport.csv"
)

df['tfidf'] = (
    df['text']
    .pipe(hero.clean)
    .pipe(hero.tfidf)
)

df['kmeans_labels'] = (
    df['tfidf']
    .pipe(hero.kmeans, n_clusters=5)
    .astype(str)
)

df['pca'] = df['tfidf'].pipe(hero.pca)

hero.scatterplot(df, 'pca', color='kmeans_labels', title="K-means BBC Sport news")

3. Simple pipeline for text cleaning

>>> import texthero as hero
>>> import pandas as pd
>>> text = "This sèntencé    (123 /) needs to [OK!] be cleaned!   "
>>> s = pd.Series(text)
>>> s
0    This sèntencé    (123 /) needs to [OK!] be cleane...
dtype: object

Remove all digits:

>>> s = hero.remove_digits(s)
>>> s
0    This sèntencé    (  /) needs to [OK!] be cleaned!
dtype: object

Remove digits replaces only blocks of digits. The digits in the string "hello123" will not be removed. If we want to remove all digits, you need to set only_blocks to false.

Remove all types of brackets and their content.

>>> s = hero.remove_brackets(s)
>>> s 
0    This sèntencé    needs to  be cleaned!
dtype: object

Remove diacritics.

>>> s = hero.remove_diacritics(s)
>>> s 
0    This sentence    needs to  be cleaned!
dtype: object

Remove punctuation.

>>> s = hero.remove_punctuation(s)
>>> s 
0    This sentence    needs to  be cleaned
dtype: object

Remove extra white-spaces.

>>> s = hero.remove_whitespace(s)
>>> s 
0    This sentence needs to be cleaned
dtype: object

Sometimes we also want to get rid of stop-words.

>>> s = hero.remove_stopwords(s)
>>> s
0    This sentence needs cleaned
dtype: object

API

Texthero is composed of four modules: preprocessing.py, nlp.py, representation.py and visualization.py.

1. Preprocessing

Scope: prepare text data for further analysis.

Full documentation: preprocessing

2. NLP

Scope: provide classic natural language processing tools such as named_entity and noun_phrases.

Full documentation: nlp

2. Representation

Scope: map text data into vectors and do dimensionality reduction.

Supported representation algorithms:

Term frequency (count)
Term frequency-inverse document frequency (tfidf)

Supported clustering algorithms:

K-means (kmeans)
Density-Based Spatial Clustering of Applications with Noise (dbscan)
Meanshift (meanshift)

Supported dimensionality reduction algorithms:

Principal component analysis (pca)
t-distributed stochastic neighbor embedding (tsne)
Non-negative matrix factorization (nmf)

Full documentation: representation

3. Visualization

Scope: summarize the main facts regarding the text data and visualize it. This module is opinionable. It's handy for anyone that needs a quick solution to visualize on screen the text data, for instance during a text exploratory data analysis (EDA).

Supported functions:

Text scatterplot (scatterplot)
Most common words (top_words)

Full documentation: visualization

FAQ

Why Texthero

Sometimes we just want things done, right? Texthero helps with that. It helps make things easier and give the developer more time to focus on his custom requirements. We believe that cleaning text should just take a minute. Same for finding the most important part of a text and the same for representing it.

In a very pragmatic way, texthero has just one goal: make the developer spare time. Working with text data can be a pain and in most cases, a default pipeline can be quite good to start. There is always time to come back and improve previous work.

Contributions

"Texthero has been developed by a member of the NLP community for the whole NLP-community"

Texthero is for all of us NLP-developers and it can continue to exist with the precious contribution of the community.

Your level of expertise of python and NLP does not matter, anyone can help and anyone is more than welcome to contribute!

Are you an NLP expert?

open an issue and tell us what you like and dislike of Texthero and what we can do better!

Are you good at creating websites?

The website will be soon moved from Docusaurus to Sphinx: read the open issue there. Good news: the website will look like now :) Average news: we need to do some web-development to adapt this Sphinx template to our needs. Can you help us?

Are you good at writing?

Probably this is the most important piece missing now on Texthero: more tutorials and more "Getting Started" guide.

If you are good at writing you can help us! Why don't you start by Adding a FAQ page to the website or explain how to create a custom pipeline? Need help? We are there for you.

Are you good in python?

There are a lot of open issues for techie guys. Which one do you choose?

If you have just other questions or inquiry drop me a line at jonathanbesomi__AT__gmail.com

Contributors (in chronological order)

License

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Comments

Change representation_series to DataFrame
all functions, which previously dealt with representation series now handle only the dataframe instead. 🚀

rm all functions like flatten, as they are not needed anymore

adopted docstrings and tests

-> further stuff to do:

add those examples into the tutorials, readme, getting started

enhancement
opened by mk2510 20
Can we avoid having a cell with a list?
As we know, it's really not recommended to store a list in a Pandas cell. TokenSeries and VectorSeries, two of the core ideas of (current) Texthero are actually using it, can this process be avoided?

Need to discuss:

Alternatives using sub-columns (it's still MultiIndex). Understand how complex and flexible this solution is. 99% of the cases, the standard Pandas/Texthero user does not really know how to work with MultiIndex ...

Can we just use RepresentationSeries? Probably not as we cannot merge it into a DataFrame with a single index, other alternatives than data alignment with reindex (too complicated)?

@mk2510 @henrifroese
discussion
opened by jbesomi 18
Add Lemmatization

Lemmatization can be thought of as a more advanced stemming that we already have in the preprocessing module. You can read about it e.g. here. Implementation should be done with spaCy.

ToDo

Implement a function hero.lemmatize(s: TokenSeries) (or mayber rather TextSeries?). Using spaCy this should be fairly straightforward. It should go into the NLP module and probably look very similar to the other spacy-based functions there.

Just comment below if you want to work on this and/or have any questions. I think this is a good first issue for new contributors.
enhancement good first issue

opened by henrifroese 15
RepresentationSeries: count, term_frequency and tfidf
Implement full support for Representation Series in "Vectorization" functions of representation module

add appropriate tests

add function_check_is_valid_representation

We already wrote & tested all the code for Representation Series in the whole module, but we want to split this up into separate PRs so it's easier to review etc. As soon as this is merged, we'll open the other PRs.

Roadmap for Representation Series implementation:

This PR

Implement Representation Series in rest of representation module over 2-3 more PRs

Write tutorial for Representation Series

Incorporate Representation Series into README and getting-started

Release to PyPI

enhancement
opened by henrifroese 15

replace `tokenize_with_phrases` with `phrases` and added tests

This PR will replace tokenize_with_phrases with phrases. I added unit tests as well for phrases.

This is the result of running ./tests.sh:

..............................................................................................................................................................
----------------------------------------------------------------------
Ran 158 tests in 9.798s

OK

This is the result of running ./format.sh:

All done! ✨ 🍰 ✨
6 files left unchanged.
All done! ✨ 🍰 ✨
6 files left unchanged.

opened by cedricconol 12

Update documentation docstrings etc
So this is quite a big PR that will finish the first part of #85 . We went through all docstrings and added examples/tests, added other arguments, and fixed some stuff along the way. We also updated the README.md and the getting-started.md

Besides the docstrings updates, some small code changes are:

more parameters for the representation functions

change to scatterplot to support 3d visualization and return figure correctly

I just went through some other issues and think that additionally this fixes

parts of #100 and #98

all of #99

After this, in line with #85 , a new version should be deployed / published.
opened by henrifroese 11
Update docstring for hero.wordcloud

After the discussion on #78

We should add something like:

"To reduce blur in the images, width and height should have the same size, i.e the image should be squared"
documentation good first issue

opened by vidyap-xgboost 11
Fix NaNs (Closes #86)

Implement dealing with np.nan, closes #86

Every function in the library now handles NaNs correctly.

Implemented through decorator @handle_nans in new file _helper.py.

Tests added in test_nan.py

As we went through the whole library anyways, argument "input" was renamed to "s" in some functions to be in line with the others.

opened by henrifroese 10
Added the function to POS tag

@richramalho

Added the function to POS tag.

I saw the suggestions in PR #57 and read the CONTRIBUTING.md file.

Any suggestions please tell me. Thank you!

opened by ghost 9
Improve remove_diacritics function. Fixes #71

The remove_diacritics function produced transliterated output for e.g. the Urdu alphabet.

Through the unicodedata package, diacritics are now safely filtered out.

opened by henrifroese 9
HeroTypes in Representation; DataFrame in _types
switch DocumentTermDF in for RepresentationSeries in _types.py

add functionality for decorator @InputSeries to handle several allowed input types

Add typing decorator/hints to representation.py

add tests for DocumentTermDF type in test_types.py

NOTE: only so many commits/lines as this builds on #156
enhancement
opened by henrifroese 8
Is there any function to find how the weights are calculated for each word to represent a sentence?

Is there any function or way to get the weights each word is given while calculating each component.

I would want to see something like this? This will be really helpful as it will help me with interpretability in getting to know what weight was given to each word. Source : https://www.displayr.com/principal-component-analysis-of-text-data/

opened by cassin-edwin 0
installation error: Could not build wheels for spacy, which is required to install pyproject.toml-based projects

Hi, I ran into an error when I tried to install Texthero. I already all packages needed such as spacy, gensim, etc. But when installation processed to building wheel for gensim, error showed up:

Failed to build gensim spacy ERROR: Could not build wheels for spacy, which is required to install pyproject.toml-based projects

I'm wondering if you can help point out in which direction I should be looking at to fix this. Is it pip or the spacy/ gensim is not pyproject.toml-based?

Thanks.

opened by Wes-Wwang 1
Import error

KeyError: "[E002] Can't find factory for 'tok2vec'. This usually happens when spaCy calls nlp.create_pipe with a component name that's not built in - for example, when constructing the pipeline from a model's meta.json. If you're using a custom component, you can write to Language.factories['tok2vec'] or remove it from the model meta and add it via nlp.add_pipe instead.

Tried in my virtual environment and kaggle, it doesn't working.

opened by RAravindDS 2
Deprecated arguments on kmeans function call

When I tried to use kmeans I noticed that a couple of deprecated arguments were used to make the function call: precompute_distances and n_jobs. After I removed them it worked fine. Tried both on version 1.1.0 and 1.0.9. I would make a PR about it but the code on the main branch looks different from the one installed with pip install.

opened by svthiago 5

`remove_punctuation()` is not removing "\"

>>> import texthero as hero
>>> import pandas as pd
>>> import string
>>>
>>> s = pd.Series(rf"{string.punctuation}")
>>> hero.remove_punctuation(s)
0     \ 
dtype: object

opened by batmanscode 0

Releases(1.1.0)

1.1.0(Jul 1, 2021)
Fix packages versions (as #206)

Lazy-load stopwords only if needed (as #194)

Fix Pandas FutureWarnings by adding regex=True/False to str.replace()

Source code(tar.gz)
Source code(zip)
texthero-1.1.0-py3-none-any.whl(23.68 KB)
texthero-1.1.0.tar.gz(23.86 KB)
1.0.9(Jul 6, 2020)

Fix Wordcloud issue #33 Fix issue #22
Source code(tar.gz)
Source code(zip)
texthero-1.0.9-py3-none-any.whl(24.46 KB)
texthero-1.0.9.tar.gz(20.16 KB)
1.0.8(Jun 1, 2020)
Version 1.0.8

Main changes:

Added (named_entities)[https://texthero.org/docs/api/texthero.nlp.named_entities.html]

Added (noun_chunks)[https://texthero.org/docs/api/texthero.nlp.noun_chunks]

Added remove_urls, replace_urls, remove_html_tags, remove_stopwords and replace_stopwords

Dev changes

Added API with Sphinx + Docusaurus for each function

Improved docstring documentation

Improved test coverage and started using Trevis CI for continuous integration

Format code with black rather than yapf

Source code(tar.gz)
Source code(zip)
texthero-1.0.8-py3-none-any.whl(20.81 KB)
texthero-1.0.8.tar.gz(17.02 KB)
1.0.4(Apr 27, 2020)
Releasing V 1.0.4

Methods and their respective functions:

Preprocessing

clean

Representation

do_tfidf, do_count, do_pca, do_nmf, do_kmeans, do_dbscan

Visualization

scatterplot

top_words

Source code(tar.gz)
Source code(zip)
texthero-1.0.4-py3-none-any.whl(10.97 KB)
texthero-1.0.4.tar.gz(6.24 KB)

Owner

Jonathan Besomi

NLP and text mining.

GitHub https://texthero.org

Small tool to use hero .json files created with Optolith for The Dark Eye/ Das Schwarze Auge 5 to perform talent probes

DSA5-ProbeMaker A little tool for The Dark Eye 5th Edition (Das Schwarze Auge 5) to load .json from Optolith character generation and easily perform t

2 Dec 14, 2021

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

114 Dec 15, 2022

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Fork from https://github.com/huggingface/transformers/tree/86d5fb0b360e68de46d40265e7c707fe68c8015b/examples/pytorch/language-modeling at 2021.05.17.

12 Oct 26, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

33 Sep 22, 2022

Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

290 Dec 26, 2022

Ray-based parallel data preprocessing for NLP and ML.

Wrangl Ray-based parallel data preprocessing for NLP and ML. pip install wrangl # for latest pip install git+https://github.com/vzhong/wrangl See exa

33 Dec 27, 2022

Data preprocessing rosetta parser for python

datapreprocessing_rosetta_parser I've never done any NLP or text data processing before, so I wanted to use this hackathon as a learning opportunity,

2 Nov 28, 2021

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Text-Summarization-using-NLP Text Summarization using NLP to fetch BBC News Arti

21 Aug 6, 2022

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

A Python package implementing a new model for text classification with visualization tools for Explainable AI ?? Online live demos: http://tworld.io/s

285 Jan 2, 2023

Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

24.1k Jan 5, 2023

Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

22.2k Feb 18, 2021

CPC-big and k-means clustering for zero-resource speech processing

The CPC-big model and k-means checkpoints used in Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.

5 Nov 23, 2022

Final Project Bootcamp Zero

The Quest (Pygame) Descripción Este es el repositorio de código The-Quest para el proyecto final Bootcamp Zero de KeepCoding. El juego consiste en la

1 Mar 2, 2022

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

2 Jul 15, 2022

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Dec 30, 2022

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

2.3k Dec 29, 2022

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

2k Feb 9, 2021

Text preprocessing, representation and visualization from zero to hero.

Related tags

Overview

From zero to hero

Hablas español? क्या आप हिंदी बोलते हैं? 日本語が話せるのか？

Beta version

Installation

Getting started

Examples

1. Text cleaning, TF-IDF representation and Visualization

2. Text preprocessing, TF-IDF, K-means and Visualization

3. Simple pipeline for text cleaning

API

1. Preprocessing

2. NLP

2. Representation

3. Visualization

FAQ

Why Texthero

Contributions

Contributors (in chronological order)

Comments

ToDo

Releases(1.1.0)

1.1.0(Jul 1, 2021)

1.0.9(Jul 6, 2020)

1.0.8(Jun 1, 2020)

1.0.4(Apr 27, 2020)

Owner

Jonathan Besomi

Small tool to use hero .json files created with Optolith for The Dark Eye/ Das Schwarze Auge 5 to perform talent probes

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Task-based datasets, preprocessing, and evaluation for sequence models.

Ray-based parallel data preprocessing for NLP and ML.

Data preprocessing rosetta parser for python

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

Library for fast text representation and classification.

Library for fast text representation and classification.

CPC-big and k-means clustering for zero-resource speech processing

Final Project Bootcamp Zero

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.