Various Algorithms for Short Text Mining

Kwan-Yuet "Stephen" Ho

Last update: Dec 6, 2022

Related tags

Text Data & NLP python package machine-learning natural-language-processing text-mining algorithm neural-network python-library topic-modeling

Overview

Short Text Mining in Python

Introduction

This package shorttext is a Python package that facilitates supervised and unsupervised learning for short text categorization. Due to the sparseness of words and the lack of information carried in the short texts themselves, an intermediate representation of the texts and documents are needed before they are put into any classification algorithm. In this package, it facilitates various types of these representations, including topic modeling and word-embedding algorithms.

Since release 1.5.2, it runs on Python 3.9. Since release 1.5.0, support for Python 3.6 was decommissioned. Since release 1.2.4, it runs on Python 3.8. Since release 1.2.3, support for Python 3.5 was decommissioned. Since release 1.1.7, support for Python 2.7 was decommissioned. Since release 1.0.8, it runs on Python 3.7 with 'TensorFlow' being the backend for keras. Since release 1.0.7, it runs on Python 3.7 as well, but the backend for keras cannot be TensorFlow. Since release 1.0.0, shorttext runs on Python 2.7, 3.5, and 3.6.

Characteristics:

example data provided (including subject keywords and NIH RePORT);
text preprocessing;
pre-trained word-embedding support;
gensim topic models (LDA, LSI, Random Projections) and autoencoder;
topic model representation supported for supervised learning using scikit-learn;
cosine distance classification;
neural network classification (including ConvNet, and C-LSTM);
maximum entropy classification;
metrics of phrases differences, including soft Jaccard score (using Damerau-Levenshtein distance), and Word Mover's distance (WMD);
character-level sequence-to-sequence (seq2seq) learning;
spell correction;
API for word-embedding algorithm for one-time loading; and
Sentence encodings and similarities based on BERT.

Documentation

Documentation and tutorials for shorttext can be found here: http://shorttext.rtfd.io/.

See tutorial for how to use the package, and FAQ.

Installation

To install it, in a console, use pip.

>>> pip install -U shorttext

or, if you want the most recent development version on Github, type

>>> pip install -U git+https://github.com/stephenhky/PyShortTextCategorization@master

Developers are advised to make sure Keras >=2 be installed. Users are advised to install the backend Tensorflow (preferred) or Theano in advance. It is desirable if Cython has been previously installed too.

See installation guide for more details.

Issues

To report any issues, go to the Issues tab of the Github page and start a thread. It is welcome for developers to submit pull requests on their own to fix any errors.

Contributors

If you would like to contribute, feel free to submit the pull requests. You can talk to me in advance through e-mails or the Issues page.

Useful Links

Documentation: http://shorttext.readthedocs.io
Github: https://github.com/stephenhky/PyShortTextCategorization
PyPI: https://pypi.org/project/shorttext/
"Package shorttext 1.0.0 released," Medium
"Python Package for Short Text Mining", WordPress
"Document-Term Matrix: Text Mining in R and Python," WordPress
An earlier version of this repository is a demonstration of the following blog post: Short Text Categorization using Deep Neural Networks and Word-Embedding Models

News

07/11/2021: shorttext 1.5.3 released.
07/06/2021: shorttext 1.5.2 released.
04/10/2021: shorttext 1.5.1 released.
04/09/2021: shorttext 1.5.0 released.
02/11/2021: shorttext 1.4.8 released.
01/11/2021: shorttext 1.4.7 released.
01/03/2021: shorttext 1.4.6 released.
12/28/2020: shorttext 1.4.5 released.
12/24/2020: shorttext 1.4.4 released.
11/10/2020: shorttext 1.4.3 released.
10/18/2020: shorttext 1.4.2 released.
09/23/2020: shorttext 1.4.1 released.
09/02/2020: shorttext 1.4.0 released.
07/23/2020: shorttext 1.3.0 released.
06/05/2020: shorttext 1.2.6 released.
05/20/2020: shorttext 1.2.5 released.
05/13/2020: shorttext 1.2.4 released.
04/28/2020: shorttext 1.2.3 released.
04/07/2020: shorttext 1.2.2 released.
03/23/2020: shorttext 1.2.1 released.
03/21/2020: shorttext 1.2.0 released.
12/01/2019: shorttext 1.1.6 released.
09/24/2019: shorttext 1.1.5 released.
07/20/2019: shorttext 1.1.4 released.
07/07/2019: shorttext 1.1.3 released.
06/05/2019: shorttext 1.1.2 released.
04/23/2019: shorttext 1.1.1 released.
03/03/2019: shorttext 1.1.0 released.
02/14/2019: shorttext 1.0.8 released.
01/30/2019: shorttext 1.0.7 released.
01/29/2019: shorttext 1.0.6 released.
01/13/2019: shorttext 1.0.5 released.
10/03/2018: shorttext 1.0.4 released.
08/06/2018: shorttext 1.0.3 released.
07/24/2018: shorttext 1.0.2 released.
07/17/2018: shorttext 1.0.1 released.
07/14/2018: shorttext 1.0.0 released.
06/18/2018: shorttext 0.7.2 released.
05/30/2018: shorttext 0.7.1 released.
05/17/2018: shorttext 0.7.0 released.
02/27/2018: shorttext 0.6.0 released.
01/19/2018: shorttext 0.5.11 released.
01/15/2018: shorttext 0.5.10 released.
12/14/2017: shorttext 0.5.9 released.
11/08/2017: shorttext 0.5.8 released.
10/27/2017: shorttext 0.5.7 released.
10/17/2017: shorttext 0.5.6 released.
09/28/2017: shorttext 0.5.5 released.
09/08/2017: shorttext 0.5.4 released.
09/02/2017: end of GSoC project. (Report)
08/22/2017: shorttext 0.5.1 released.
07/28/2017: shorttext 0.4.1 released.
07/26/2017: shorttext 0.4.0 released.
06/16/2017: shorttext 0.3.8 released.
06/12/2017: shorttext 0.3.7 released.
06/02/2017: shorttext 0.3.6 released.
05/30/2017: GSoC project (Chinmaya Pancholi, with gensim)
05/16/2017: shorttext 0.3.5 released.
04/27/2017: shorttext 0.3.4 released.
04/19/2017: shorttext 0.3.3 released.
03/28/2017: shorttext 0.3.2 released.
03/14/2017: shorttext 0.3.1 released.
02/23/2017: shorttext 0.2.1 released.
12/21/2016: shorttext 0.2.0 released.
11/25/2016: shorttext 0.1.2 released.
11/21/2016: shorttext 0.1.1 released.

Possible Future Updates

Dividing components to other packages;
More available corpus.

Comments

standalone ?

Hi. I have many questions.... :-)

I'm a beginner for python. Is there any method to run the code standalone ?

e.g. I trained my data. And I'd like to see the scores on terminal by classifier.score('apple') . The word 'apple' can be changed.

Thank you regards,

opened by chocosando 20

ImportError: No module named classification_exceptions

import shorttext


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-5-cb09b3381050> in <module>()
----> 1 import shorttext

/usr/local/lib/python2.7/dist-packages/shorttext/__init__.py in <module>()
      5 sys.path.append(thisdir)
      6 
----> 7 from . import utils
      8 from . import data
      9 from . import classifiers

/usr/local/lib/python2.7/dist-packages/shorttext/utils/__init__.py in <module>()
      4 from . import textpreprocessing
      5 from .wordembed import load_word2vec_model
----> 6 from . import compactmodel_io
      7 
      8 from .textpreprocessing import spacy_tokenize as tokenize

/usr/local/lib/python2.7/dist-packages/shorttext/utils/compactmodel_io.py in <module>()
     13 from functools import partial
     14 
---> 15 import utils.classification_exceptions as e
     16 
     17 def removedir(dir):

ImportError: No module named classification_exceptions

opened by spate141 11

ImportError: dlopen: cannot load any more object with static TLS

Hi, I got the following error when i import shorttext, how shall i resolve?

Using TensorFlow backend.

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.7.5 locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.7.5 locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.7.5 locally Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/shorttext/init.py", line 7, in from . import utils File "/usr/local/lib/python2.7/dist-packages/shorttext/utils/init.py", line 3, in from . import gensim_corpora File "/usr/local/lib/python2.7/dist-packages/shorttext/utils/gensim_corpora.py", line 2, in from .textpreprocessing import spacy_tokenize as tokenize File "/usr/local/lib/python2.7/dist-packages/shorttext/utils/textpreprocessing.py", line 5, in import spacy File "/usr/local/lib/python2.7/dist-packages/spacy/init.py", line 8, in from . import en, de, zh, es, it, hu, fr, pt, nl, sv, fi, bn, he File "/usr/local/lib/python2.7/dist-packages/spacy/en/init.py", line 4, in from ..language import Language File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 12, in from .syntax.parser import get_templates ImportError: dlopen: cannot load any more object with static TLS

opened by kenyeung128 8
extend score to take an array of shorttext

Currently, score takes only a single input and as a result, the method is very slow if you are trying to classify thousands of examples. Is there a way you can generate scores for 10K+ samples at the same time.

opened by rja172 6
Importing problem (not installation) over google colab

I am experimenting with the library for the first time. The installation was successful and didn't need any extra steps. however when I started importing the library I got the following error related to keras:

/usr/local/lib/python3.7/dist-packages/shorttext/generators/bow/AutoEncodingTopicModeling.py in () 8 from gensim.corpora import Dictionary 9 from keras import Input ---> 10 from keras.engine import Model 11 from keras.layers import Dense 12 from scipy.spatial.distance import cosine

ImportError: cannot import name 'Model' from 'keras.engine' (/usr/local/lib/python3.7/dist-packages/keras/engine/init.py)

I tried to install keras separately but no improvement. any suggestions would be appreciated.

opened by yomnamahmoud 6
RuntimeWarning: overflow encountered in exp2 topicmodeler.train

Code: trainclassdict = shorttext.data.nihreports(sample_size=None) topicmodeler = shorttext.generators.LDAModeler() topicmodeler.train(trainclassdict, 128) Error message: /lib/python2.7/site-packages/gensim/models/ldamodel.py:535: RuntimeWarning: overflow encountered in exp2 perwordbound, np.exp2(-perwordbound), len(chunk), corpus_words

Then the results are variable for topicmodeler.retrieve_topicvec('stem cell research')

opened by dbonner 6
Remove negation terms from stopwords.txt

I noticed that stopwords.txt includes negation terms such as "no" and "not". These terms revert the meaning of a word or a sentence, so they should be preserved in the text data. For example, "not a good idea" would become "good idea" after stopword removal. Therefore, I recommend removing negation terms from the stopword list. Thanks!

opened by star1327p 5
Input to shorttext.generators.LDAModeler()

I was wondering what should be the format of data as input for:

shorttext.generators.LDAModeler() topicmodeler.train(data, 100)

Can I feed it with a pandas column? Or it should be in a dictionary format? If a dictionary, what should be the keys? I have a large set of tweets.

opened by malizad 5
from shorttext.classifiers import MaxEntClassifier is it regression?
seems to be maxent is a fancy word for regression or you do have something special in your maxent? https://www.quora.com/What-is-the-relationship-between-Log-Linear-model-MaxEnt-model-and-Logistic-Regression or https://en.wikipedia.org/wiki/Multinomial_logistic_regression

Multinomial logistic regression is known by a variety of other names, including polytomous LR,[2][3] multiclass LR, softmax regression, multinomial logit, the maximum entropy (MaxEnt) classifier, and the conditional maximum entropy model.[4]
opened by Sandy4321 5
No Python 3.6 support with SciPy 1.6

SciPy 1.6.0 drops Python 3.6 support https://scipy.github.io/devdocs/release.1.6.0.html So maybe scipy<1.6.0 should be specified in the requirements.txt and setup_requirements.txt files.

opened by Dobatymo 4

Data nihreports not available anymore

Some datasets are not available anymore.

For example the following: nihtraindata = shorttext.data.nihreports(sample_size=None)

Error message:

Downloading...
Source:  http://storage.googleapis.com/pyshorttext/nih_grant_public/nih_full.csv.zip
Failure to download file!
(<class 'urllib.error.HTTPError'>, <HTTPError 404: 'Not Found'>, <traceback object at 0x7f09063ed788>)

Python error:

HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:

When opening the link the same error appears:

opened by AlessandroVol23 4

Releases(1.5.8)

1.5.8(Sep 23, 2022)
Package administration.

Source code(tar.gz)
Source code(zip)
1.5.7(Sep 22, 2022)
Removal of requirement of pre-installation of numpy and Cython.

Source code(tar.gz)
Source code(zip)
1.5.6(Aug 30, 2022)
Faster inference for VarNNEmbeddedVecClassifier. (Contributed by Ritesh Agrawal)

Source code(tar.gz)
Source code(zip)
1.5.5(May 28, 2022)
Support for Python 3.10.

Source code(tar.gz)
Source code(zip)
1.5.4(Dec 15, 2021)
Non-negative stop words.

Source code(tar.gz)
Source code(zip)
1.5.3(Jul 11, 2021)
Documentation updated.

Source code(tar.gz)
Source code(zip)
1.5.2(Jul 7, 2021)
Resolved bugs regarding keras import.

Support for Python 3.9.

Source code(tar.gz)
Source code(zip)
1.5.1(Apr 10, 2021)

Replace TravisCI with CircleCI in the continuous integration (CI) pipeline.
Source code(tar.gz)
Source code(zip)
1.5.0(Apr 9, 2021)
Decommissioned support for Python 3.6;

Removed buggy unit tests.

Source code(tar.gz)
Source code(zip)
1.4.8(Feb 11, 2021)
Updated requirements for scipy for Python 3.7 or above.

Source code(tar.gz)
Source code(zip)
1.4.7(Jan 11, 2021)
Updated version of transformers in requirement.txt;

Updated BERT encoder for the change of implementation;

Fixed unit tests.

Source code(tar.gz)
Source code(zip)
1.4.6(Jan 4, 2021)
Bug regarding Python 3.6 requirement for scipy.

Source code(tar.gz)
Source code(zip)
1.4.5(Dec 29, 2020)
Bugs fixed about Python 2 to 3 updates, filter in shorttext.metrics.embedfuzzy.

Source code(tar.gz)
Source code(zip)
1.4.4(Dec 24, 2020)
Bugs regarding SumEmbedVeccClassification.py;

Fixing bugs due to Python 3.6 restriction on scipy.

Source code(tar.gz)
Source code(zip)
1.4.3(Nov 10, 2020)
Bugs about transformer-based model on different devices resolved.

Source code(tar.gz)
Source code(zip)
1.4.2(Oct 18, 2020)
Documentation requirements and PyUp configs cleaned up.

Source code(tar.gz)
Source code(zip)
1.4.1(Sep 23, 2020)

Documentation and codes cleaned up.
Source code(tar.gz)
Source code(zip)
1.4.0(Sep 2, 2020)
Provided support BERT-based sentence and tokens embeddings;

Implemented support for BERTScores.

Source code(tar.gz)
Source code(zip)
1.3.0(Jul 23, 2020)
Removing dependencies on PuLP, and reimplementing word mover's distance (WMD) using SciPy.

Source code(tar.gz)
Source code(zip)
1.2.6(Jun 5, 2020)
Removed Python-2 codes (urllib2).

Source code(tar.gz)
Source code(zip)
1.2.5(May 21, 2020)
Update on gensim package usage and requirements;

Removed some deprecated functions.

Source code(tar.gz)
Source code(zip)
1.2.4(May 13, 2020)
Update on scikit-learn requirements to >=0.23.0.

Directly dependence on joblib;

Support for Python 3.8 added.

Source code(tar.gz)
Source code(zip)
1.2.3(Apr 28, 2020)
PyUP scan implemented;

Support for Python 3.5 decommissioned.

Source code(tar.gz)
Source code(zip)
1.2.2(Apr 8, 2020)
Removed dependence on PyStemmer, which is replaced by snowballstemmer.

Source code(tar.gz)
Source code(zip)
1.2.1(Mar 23, 2020)
Added port number adjustability for word-embedding API;

Removal of Spacy dependency.

Source code(tar.gz)
Source code(zip)
1.2.0(Mar 21, 2020)
RESTful API support for word-embedding models.

Source code(tar.gz)
Source code(zip)
1.1.6(Dec 2, 2019)
Compatibility with TensorFlow 2.0.0

Source code(tar.gz)
Source code(zip)
1.1.5(Sep 24, 2019)
Decommissioned GCP buckets; using data files stored in AWS S3 buckets.

Source code(tar.gz)
Source code(zip)
1.1.4(Jul 20, 2019)
Minor bugs fixed.

Source code(tar.gz)
Source code(zip)
1.1.3(Jul 7, 2019)
Updated codes for Console code loading;

Updated Travis CI script.

Source code(tar.gz)
Source code(zip)

Owner

Kwan-Yuet "Stephen" Ho

quantitative research, machine learning, data science, text mining, physics

GitHub http://shorttext.rtfd.io

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Automated Phrase Mining from Massive Text Corpora in Python.

28 Apr 15, 2021

Blue Brain text mining toolbox for semantic search and structured information extraction

Blue Brain Search Source Code DOI Data & Models DOI Documentation Latest Release Python Versions License Build Status Static Typing Code Style Securit

29 Dec 1, 2022

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play around in the Colab notebook provided. Note that, in both cases, you will need to train a WaveGAN model first

91 Dec 23, 2022

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Dec 30, 2022

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

NLP-Pytorch-Assignment An assignment from my grad-level data mining course (before I started personal projects) demonstrating some experience with NLP

0 Feb 6, 2022

Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

49 Dec 30, 2022

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform This repo try to implement iSTFTNet : Fast

126 Jan 2, 2023

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Text-Summarization-using-NLP Text Summarization using NLP to fetch BBC News Arti

21 Aug 6, 2022

Various capabilities for static malware analysis.

Malchive The malchive serves as a compendium for a variety of capabilities mainly pertaining to malware analysis, such as scripts supporting day to da

64 Nov 22, 2022

Get list of common stop words in various languages in Python

Python Stop Words Table of contents Overview Available languages Installation Basic usage Python compatibility Overview Get list of common stop words

142 Dec 21, 2022

Get list of common stop words in various languages in Python

Python Stop Words Table of contents Overview Available languages Installation Basic usage Python compatibility Overview Get list of common stop words

121 Jan 6, 2021

This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

37 Dec 14, 2022

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics. Jury offers a smooth and easy-to-use interface. It uses datasets for underlying metric computation, and hence adding custom metric is easy as adopting datasets.Metric.

129 Jan 6, 2023

Tools and data for measuring the popularity & growth of various programming languages.

growth-data Tools and data for measuring the popularity & growth of various programming languages. Install the dependencies $ pip install -r requireme

3 Jan 6, 2022

Contains descriptions and code of the mini-projects developed in various programming languages

TexttoSpeechAndLanguageTranslator-project introduction A pleasant application where the client will be given buttons like play,reset and exit. The cli

1 Dec 22, 2021

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

2 Jun 10, 2022

An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | 中文 Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

137 Oct 26, 2022

Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

6.4k Jan 1, 2023

Various Algorithms for Short Text Mining

Related tags

Overview

Short Text Mining in Python

Introduction

Documentation

Installation

Issues

Contributors

Useful Links

News

Possible Future Updates

Comments

Releases(1.5.8)

1.5.8(Sep 23, 2022)

1.5.7(Sep 22, 2022)

1.5.6(Aug 30, 2022)

1.5.5(May 28, 2022)

1.5.4(Dec 15, 2021)

1.5.3(Jul 11, 2021)

1.5.2(Jul 7, 2021)

1.5.1(Apr 10, 2021)

1.5.0(Apr 9, 2021)

1.4.8(Feb 11, 2021)

1.4.7(Jan 11, 2021)

1.4.6(Jan 4, 2021)

1.4.5(Dec 29, 2020)

1.4.4(Dec 24, 2020)

1.4.3(Nov 10, 2020)

1.4.2(Oct 18, 2020)

1.4.1(Sep 23, 2020)

1.4.0(Sep 2, 2020)

1.3.0(Jul 23, 2020)

1.2.6(Jun 5, 2020)

1.2.5(May 21, 2020)

1.2.4(May 13, 2020)

1.2.3(Apr 28, 2020)

1.2.2(Apr 8, 2020)

1.2.1(Mar 23, 2020)

1.2.0(Mar 21, 2020)

1.1.6(Dec 2, 2019)

1.1.5(Sep 24, 2019)

1.1.4(Jul 20, 2019)

1.1.3(Jul 7, 2019)

Owner

Kwan-Yuet "Stephen" Ho

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Automated Phrase Mining from Massive Text Corpora in Python.

Blue Brain text mining toolbox for semantic search and structured information extraction

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

Biterm Topic Model (BTM): modeling topics in short texts

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Various capabilities for static malware analysis.

Get list of common stop words in various languages in Python

Get list of common stop words in various languages in Python

This repository is home to the Optimus data transformation plugins for various data processing needs.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Tools and data for measuring the popularity & growth of various programming languages.

Contains descriptions and code of the mini-projects developed in various programming languages

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

Unsupervised text tokenizer for Neural Network-based text generation.